Compare commits

...

1051 Commits

Author SHA1 Message Date
Peter Maydell
f05b42d3fd Update version for v2.5.0-rc4 release
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-12-11 16:37:55 +00:00
Max Reitz
6e0abc251d blockdev: Mark {insert, remove}-medium experimental
While in the long term we want throttling to be its own block filter
BDS, in the short term we want it to be part of the BB instead of a BDS;
even in the long term we may want legacy throttling to be automatically
tied to the BB.

blockdev-insert-medium and blockdev-remove-medium do not retain
throttling information in the BB (deliberately so). Therefore, using
them means tying this information to a BDS, which would break the model
described above. (The same applies to other flags such as
detect_zeroes.) We probably want to move this information to the BB or
its own filter BDS before blockdev-{insert,remove}-medium can be
considered completely stable.

Therefore, mark these functions experimental for the time being.

Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1449847385-13986-2-git-send-email-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
[PMM: fixed format nit (underlining) in qmp-commands.hx]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-12-11 15:39:29 +00:00
Dr. David Alan Gilbert
3fd3c4b37c Fix xbzrle vs last_sent_block update
My fix (84e7b80a) replaced the last_sent_block update that I'd
removed earlier; however it was too aggressive in the xbzrle case.

save_xbzrle_page might return '0' to mean that the page didn't
need sending since it was the same as the last sent version;
in this case we can't update 'last_sent_block' since we didn't
actually send it.

Symptom: 'Illegal RAM offset 1018000' as we try and send a page
        to the wrong RAMBlock;  potentially that could be a data
        corruption if you were really unlucky.

Fixes: 84e7b80a05

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-id: 1449765106-6528-1-git-send-email-dgilbert@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-12-11 12:51:27 +00:00
Peter Maydell
b969526adf Update language files for QEMU 2.5.0
Update translation files (change created via 'make -C po update').

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Message-id: 1449754467-3496-1-git-send-email-peter.maydell@linaro.org
2015-12-10 13:50:45 +00:00
Alex Zuepke
bd4e097a8e sparc: allow CASA with ASI 0xa from user space
LEON3 allows the CASA instruction to be used from user space
if the ASI is set to 0xa (user data).

Signed-off-by: Alex Zuepke <azu@sysgo.de>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-12-10 11:19:18 +00:00
Greg Kurz
a3154ccabc MAINTAINERS: add maintainer to virtio-9p
As suggested by Paolo, I add myself as maintainer for virtio-9p.

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Acked-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Message-id: 20151130154016.20108.79073.stgit@bahia.huguette.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-12-10 11:17:25 +00:00
Greg Kurz
6cecf09373 virtio-9p-device: add minimal unrealize handler
Since commit 4652f1640e "virtio-9p: add savevm
handlers", if the user hot-unplugs a quiescent 9p device and live
migrates, the source QEMU crashes before migration completetion...
This happens because virtio-9p devices have a realize handler which
calls virtio_init() and register_savevm().  Both calls store pointers
to the device internals, that get dereferenced during migration even
if the device got unplugged.

This patch simply adds an unrealize handler to perform minimal
cleanup and avoid the crash.  Hot unplug of non-quiescent 9p devices
is still not supported in QEMU, and not supported by linux guests
either.

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-id: 20151208155457.27775.69441.stgit@bahia.huguette.org
[PMM: rewrapped long lines in commit message]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-12-10 10:46:22 +00:00
Peter Maydell
c3626ca7df Update version for v2.5.0-rc3 release
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-12-07 17:47:40 +00:00
Markus Armbruster
ba306c7a55 sd: Mark brittle abuse of blk_attach_dev() FIXME
blk_attach_dev() fails here only when we're working for device
"sdhci-pci" (which already attached the backend), and then we don't
want to attach a second time.  If we ever create another failure mode,
we're setting up ourselves to using the same backend from multiple
frontends, which is likely to end in tears.  Can't clean this up this
close to the release, so mark it FIXME.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1449503710-3707-3-git-send-email-armbru@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-12-07 17:13:10 +00:00
Markus Armbruster
79f2170789 sdhci: Sanitize "sdhci-pci" properties for future qomification
We currently fuse controller and card into a single device model, but
we intend qomify things properly and separate the two.  The properties
that really belong to the card would then have to somehow pass-through
to the card's properties.  To avoid that complication, either mark
them experimental or drop them.

Properties "capareg", "maxcurr" and the usual PCI device properties
belong to the controller.  Property "drive" belongs to the card;
rename it to "x-drive".  Properties "logical_block_size",
"physical_block_size", "min_io_size", "opt_io_size",
"discard_granularity" belong to the card, but have no effect; drop
them.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1449503710-3707-2-git-send-email-armbru@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-12-07 17:13:10 +00:00
Fam Zheng
a616fb75c2 virtio-blk: Drop x-data-plane option
The official way of enabling dataplane is through the "iothread"
property that references an iothread object created by "-object
iothread".  Since the old "x-data-plane=on" way now even crashes, it's
probably easier to just drop it:

$ qemu-system-x86_64 -drive file=null-co://,id=d0,if=none \
    -device virtio-blk-pci,drive=d0,x-data-plane=on

ERROR:/home/fam/work/qemu/qom/object.c:1515:
object_get_canonical_path_component: assertion failed: (obj->parent != NULL)
Aborted

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1449485967-19240-1-git-send-email-famz@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-12-07 16:47:16 +00:00
Peter Maydell
84942979de Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging
# gpg: Signature made Mon 07 Dec 2015 14:06:07 GMT using RSA key ID 398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F  3562 EF04 965B 398D 6211

* remotes/jasowang/tags/net-pull-request:
  lan9118: log and ignore access to invalid registers, rather than aborting
  lan9118: fix emulation of MAC address loaded bit in E2P_CMD register
  vmxnet3: silence warning
  pcnet: fix rx buffer overflow(CVE-2015-7512)
  net: pcnet: add check to validate receive data size(CVE-2015-7504)
  e1000: fix hang of win2k12 shutdown with flood ping

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-12-07 14:18:31 +00:00
Andrew Baumann
52b4bb7383 lan9118: log and ignore access to invalid registers, rather than aborting
With this change, access to invalid/unimplemented device registers are
logged as a "guest error" rather than aborting qemu with
hw_error. This enables drivers for similar devices (e.g. SMSC 9221),
by simply ignoring the unimplemented writes. It's also closer to what
real hardware does.

Signed-off-by: Andrew Baumann <Andrew.Baumann@microsoft.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2015-12-07 21:43:48 +08:00
Andrew Baumann
12fdd928c8 lan9118: fix emulation of MAC address loaded bit in E2P_CMD register
There appears to have been a longstanding typo in the implementation
of the "MAC address loaded" bit in the E2P_CMD (EEPROM command)
register. The code was using 0x10, but the controller spec says it
should be bit 8 (0x100).

Signed-off-by: Andrew Baumann <Andrew.Baumann@microsoft.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2015-12-07 21:43:48 +08:00
Michael S. Tsirkin
6a9c647095 vmxnet3: silence warning
vmxnet3 always produces a warning under qtest.

This is not a user error, don't warn.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2015-12-07 21:43:48 +08:00
Jason Wang
8b98a2f071 pcnet: fix rx buffer overflow(CVE-2015-7512)
Backends could provide a packet whose length is greater than buffer
size. Check for this and truncate the packet to avoid rx buffer
overflow in this case.

Cc: Prasad J Pandit <pjp@fedoraproject.org>
Cc: qemu-stable@nongnu.org
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2015-12-07 21:43:48 +08:00
Prasad J Pandit
837f21aacf net: pcnet: add check to validate receive data size(CVE-2015-7504)
In loopback mode, pcnet_receive routine appends CRC code to the
receive buffer. If the data size given is same as the buffer size,
the appended CRC code overwrites 4 bytes after s->buffer. Added a
check to avoid that.

Reported by: Qinghao Tang <luodalongde@gmail.com>
Cc: qemu-stable@nongnu.org
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2015-12-07 21:43:48 +08:00
Denis V. Lunev
9596ef7c7b e1000: fix hang of win2k12 shutdown with flood ping
e1000 driver in Win2k12 is really well rotten. It 100% hangs on shutdown
of UP VM under flood ping. The guest checks card state and reinjects
itself interrupt in a loop. This is fatal for UP machine.

There is no good way to fix this misbehavior but to kludge it. The
emulation has interrupt throttling register aka ITR which limits
interrupt rate and allows the guest to proceed this phase.
There is no problem with this kludge for Linux guests - it adjust the
value of it itself.

On the other hand according to the initial research in
    commit e9845f0985
    Author: Vincenzo Maffione <v.maffione@gmail.com>
    Date:   Fri Aug 2 18:30:52 2013 +0200

    e1000: add interrupt mitigation support

    ...

    Interrupt mitigation boosts performance when the guest suffers from
    an high interrupt rate (i.e. receiving short UDP packets at high packet
    rate). For some numerical results see the following link
    http://info.iet.unipi.it/~luigi/papers/20130520-rizzo-vm.pdf

this should also boost performance a bit.

See https://bugzilla.redhat.com/show_bug.cgi?id=874406 for additional
details.

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Vincenzo Maffione <v.maffione@gmail.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2015-12-07 21:43:43 +08:00
Peter Maydell
a5582eac15 Merge remote-tracking branch 'remotes/afaerber/tags/qom-devices-for-peter' into staging
QOM infrastructure fixes and device conversions

* Documentation update
* qom-test and related fixes

# gpg: Signature made Fri 04 Dec 2015 17:54:55 GMT using RSA key ID 3E7E013F
# gpg: Good signature from "Andreas Färber <afaerber@suse.de>"
# gpg:                 aka "Andreas Färber <afaerber@suse.com>"

* remotes/afaerber/tags/qom-devices-for-peter:
  qom-test: Fix qmp() leaks
  tests: Use proper functions types instead of void (*fn)
  qom: Update documentation comment of struct Object
  tests: Fix check-report-qtest-% target

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-12-04 18:11:40 +00:00
Marc-André Lureau
0d2cd785ef qom-test: Fix qmp() leaks
Before this patch ASAN reported:
SUMMARY: AddressSanitizer: 677165875 byte(s) leaked in 1272437 allocation(s)

After this patch:
SUMMARY: AddressSanitizer: 465 byte(s) leaked in 32 allocation(s)

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1448551895-871-1-git-send-email-marcandre.lureau@redhat.com>
[Straightforwardly rebased onto the previous patch]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-12-04 18:29:31 +01:00
Markus Armbruster
041088c719 tests: Use proper functions types instead of void (*fn)
We have several function parameters declared as void (*fn).  This is
just a stupid way to write void *, and the only purpose writing it
like that could serve is obscuring the sin of bypassing the type
system without need.

The original sin is commit 49ee359: its qtest_add_func() is a wrapper
for g_test_add_func().  Fix the parameter type to match
g_test_add_func()'s.  This uncovers type errors in ide-test.c; fix
them.

Commit 7949c0e faithfully repeated the sin for qtest_add_data_func().
Fix it the same way, along with a harmless type error uncovered in
vhost-user-test.c.

Commit 063c23d repeated it for qtest_add_abrt_handler().  The screwy
parameter gets assigned to GHook member func, so change its type to
match.  Requires wrapping kill_qemu() to keep the type checker happy.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[AF/armbru: Inline GTestFunc/GTestDataFunc typedef for old GLib]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-12-04 18:25:42 +01:00
Peter Maydell
61e3aa25b1 Merge remote-tracking branch 'remotes/mjt/tags/pull-trivial-patches-2015-12-04' into staging
trivial patches for 2015-12-04

# gpg: Signature made Fri 04 Dec 2015 06:40:23 GMT using RSA key ID A4C3D7DB
# gpg: Good signature from "Michael Tokarev <mjt@tls.msk.ru>"
# gpg:                 aka "Michael Tokarev <mjt@corpit.ru>"
# gpg:                 aka "Michael Tokarev <mjt@debian.org>"

* remotes/mjt/tags/pull-trivial-patches-2015-12-04:
  bt: check struct sizes
  typedefs: Put them back into alphabetical order
  scsi: remove scsi_req_free prototype
  gt64xxx: fix decoding of ISD register
  configure: use appropriate code fragment for -fstack-protector checks
  crypto: avoid two coverity false positive error reports
  configure: Diagnose broken linkers directly
  bt: avoid unintended sign extension
  util/id: fully allocate names table

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-12-04 10:55:03 +00:00
Peter Maydell
f33d046d23 Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.5-20151204' into staging
ppc patch queue for 2.5 2015-12-04

This contains some last minute QOM behaviour fixes from Markus
Armbruster.

# gpg: Signature made Fri 04 Dec 2015 06:43:54 GMT using RSA key ID 20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-2.5-20151204:
  spapr_drc: Change value of property "fdt" from null back to {}
  spapr_drc: Make device "spapr-dr-connector" unavailable with -device
  spapr_drc: Handle visitor errors properly

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-12-04 09:49:28 +00:00
Paolo Bonzini
98475746b3 bt: check struct sizes
See http://permalink.gmane.org/gmane.linux.bluez.kernel/36505.  For historical
reasons these do not use sizeof, and Coverity caught a mistake in
EVT_ENCRYPT_CHANGE_SIZE.

In addition:

- remove status from create_conn_cancel_cp; the "status" field is only
in rp structs.  Note that this means that the OCF_CREATE_CONN_CANCEL
could never have worked (it would have failed the LENGTH_CHECK), but
I am keeping it anyway.

- OCF_READ_LINK_QUALITY similarly could never have worked, but I am
fixing read_link_quality_cp anyway.

- fix inquiry_info which is shorter by one: the kernel has a struct that
is 14 byte long, but not counting the initial num_responses byte which
the kernel parses separately;

- remove extended_inquiry_info altogether, since it's not used and unlike
the other inquiry structs does not have the initial num_responses byte.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-12-04 09:39:55 +03:00
Markus Armbruster
2988cbeaf9 typedefs: Put them back into alphabetical order
"Please keep this list in alphabetical order" has been more honoured
in the breach than in the observance.  Clean up.

While there, drop a redundant struct declaration.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-12-04 09:39:55 +03:00
Hervé Poussineau
8ea9900330 scsi: remove scsi_req_free prototype
Function has been deleted in ad2d30f79d.

Signed-off-by: Hervé Poussineau <hpoussin@reactos.org>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-12-04 09:39:55 +03:00
Paolo Bonzini
63fc7375d6 gt64xxx: fix decoding of ISD register
The GT64xxx's internal registers can be placed above the first 4 GiB
in the address space, but not above the first 64 GiB.  Correctly cast
the register to a 64-bit integer, and mask away bits above bit 35.

Datasheet at http://pdf.datasheetarchive.com/datasheetsmain/Datasheets-33/DSA-655889.pdf
(bug reported by Coverity).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-12-04 09:39:55 +03:00
Rodrigo Rebello
fccd35a046 configure: use appropriate code fragment for -fstack-protector checks
The check for stack-protector support consisted in compiling and linking
the test program below (output by function write_c_skeleton()) with the
compiler flag -fstack-protector-strong first and then with
-fstack-protector-all if the first one failed to work:

  int main(void) { return 0; }

This caused false positives when using certain toolchains in which the
compiler accepted -fstack-protector-strong but no support was provided
by the C library, since for this stack-protector variant the compiler
emits canary code only for functions that meet specific conditions
(local arrays, memory references to local variables, etc.) and the code
fragment under test included none of them (hence no stack protection
code generated, no link failure).

This fix changes the test program used for -fstack-protector checks to
include a function that meets conditions which cause the compiler to
generate canary code in all variants.

Signed-off-by: Rodrigo Rebello <rprebello@gmail.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-12-04 09:39:55 +03:00
Daniel P. Berrange
0e1d02452b crypto: avoid two coverity false positive error reports
In qcrypto_tls_creds_get_path() coverity complains that
we are checking '*creds' for NULL, despite having
dereferenced it previously. This is harmless bug due
to fact that the trace call was too early. Moving it
after the cleanup gets the desired semantics.

In qcrypto_tls_creds_check_cert_key_purpose() coverity
complains that we're passing a pointer to a previously
free'd buffer into gnutls_x509_crt_get_key_purpose_oid()
This is harmless because we're passing a size == 0, so
gnutls won't access the buffer, but rather just report
what size it needs to be. We can avoid it though by
explicitly setting the buffer to NULL after free'ing
it.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-12-04 09:39:55 +03:00
Peter Maydell
0ef74c7496 configure: Diagnose broken linkers directly
Currently if the user's compiler works for creating .o files but
their linker is broken such that compiling an executable from a
C file does not work, we will report a misleading error message
about the compiler not supporting __thread (since that happens
to be the first test we run which requires a working linker).
Explicitly check that compile_prog works as well as compile_object,
so that people whose toolchain setup is broken get a more helpful
error message.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-12-04 09:39:55 +03:00
Paolo Bonzini
e0df8f18f7 bt: avoid unintended sign extension
In the case of a 4-byte length, shifting a value by 24 may cause
an unintended sign extension when converting from int to size_t.
Use a uint32_t variable instead.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-12-04 09:39:55 +03:00
John Snow
624533e5a5 util/id: fully allocate names table
Trivial: this array should be allocated to have ID_MAX entries always.
Otherwise if someone were to forget to expand this table, the assertion
in the id generator won't actually trigger; it will read junk data.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-12-04 09:39:55 +03:00
Markus Armbruster
ab8bf1d735 spapr_drc: Change value of property "fdt" from null back to {}
prop_get_fdt() misuses the visitor API: when fdt is null, it doesn't
visit anything.  object_property_get_qobject() happily
object_property_get_qobject().  Amazingly, the latter survives the
misuse.  Turns out we've papered over it long before prop_get_fdt()
existed, in commit 1d10b44.

However, commit 6c2f9a1 changed how we paper over it, and as a side
effect changed qom-get's value from {} to null.  Change it right back
by fixing the visitor misuse.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-12-04 16:50:59 +11:00
Markus Armbruster
c401ae8c9c spapr_drc: Make device "spapr-dr-connector" unavailable with -device
It should only be created via spapr_dr_connector_new().  Attempting to
create it with -device crashes.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-12-04 10:56:29 +11:00
Markus Armbruster
c75304a139 spapr_drc: Handle visitor errors properly
Since prop_get_fdt() is only used with QmpOutputVisitor, errors
shouldn't actually happen, so this is only a latent bug.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-12-04 10:56:29 +11:00
Cao jin
70ae0b6d0e qom: Update documentation comment of struct Object
It doesn't have "GSList *interfaces" anymore, drop the paragraph.

Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-12-03 20:10:22 +01:00
Andreas Färber
b5e62af8aa tests: Fix check-report-qtest-% target
Commit e253c28 ("tests: Fix how qom-test is run") introduced
$(qtest-generic-y) and used it for check-qtest-% target, but did not
update check-report-qtest-%. This causes check-report-qtest-aarch64.xml
target to fail with a gtester usage error for lack of test arguments.

Fix this by adding $(qtest-generic-y) in check-report-qtest-%.
Also add it in check-clean target, spotted by Markus.

Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-12-03 20:07:05 +01:00
Prasad J Pandit
4c65fed8bd ui: vnc: avoid floating point exception
While sending 'SetPixelFormat' messages to a VNC server,
the client could set the 'red-max', 'green-max' and 'blue-max'
values to be zero. This leads to a floating point exception in
write_png_palette while doing frame buffer updates.

Reported-by: Lian Yihan <lianyihan@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-12-03 13:34:50 +00:00
Peter Maydell
efdeb96c5a Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging
# gpg: Signature made Thu 03 Dec 2015 04:59:48 GMT using RSA key ID 81AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"

* remotes/stefanha/tags/block-pull-request:
  iotests: Add regresion test case for write notifier assertion failure
  iotests: Add "add_drive_raw" method
  block: Don't wait serialising for non-COR read requests
  iothread: include id in thread name

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-12-03 11:08:43 +00:00
Peter Maydell
eab0ebc7fe Merge remote-tracking branch 'remotes/juanquintela/tags/migration/20151203' into staging
migration/next for 20151203

# gpg: Signature made Wed 02 Dec 2015 23:19:10 GMT using RSA key ID 5872D723
# gpg: Good signature from "Juan Quintela <quintela@redhat.com>"
# gpg:                 aka "Juan Quintela <quintela@trasno.org>"

* remotes/juanquintela/tags/migration/20151203:
  migration: do floating-point division
  migration: Clean up use of g_poll() in socket_writev_buffer()

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-12-03 10:43:43 +00:00
Fam Zheng
9cc0f19de2 iotests: Add regresion test case for write notifier assertion failure
The idea is to let the top level bs have a big request alignment with
blkdebug, so that the aio_write request issued from monitor will be
serialised. This tests that QEMU doesn't crash upon the read request
from the backup job's write notifier, which is a very special case of
"reentrant" request.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1448962590-2842-4-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-12-03 11:08:07 +08:00
Fam Zheng
78b666f46b iotests: Add "add_drive_raw" method
This offers full manual control over the "-drive" options.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1448962590-2842-3-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-12-03 11:08:07 +08:00
Fam Zheng
61408b250e block: Don't wait serialising for non-COR read requests
The assertion problem was noticed in 06c3916b35, but it wasn't
completely fixed, because even though the req is not marked as
serialising, it still gets serialised by wait_serialising_requests
against other serialising requests, which could lead to the same
assertion failure.

Fix it by even more explicitly skipping the serialising for this
specific case.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1448962590-2842-2-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-12-03 11:08:07 +08:00
Paolo Bonzini
d21e8776f6 iothread: include id in thread name
This makes it easier to find the desired thread.  Use "IO" plus the id;
even with the 14 character limit on the thread name, enough of the id should
be readable (e.g. "IO iothreadNNN" with three characters for the number).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-id: 1448372804-5034-1-git-send-email-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-12-03 11:08:01 +08:00
Peter Maydell
ec1b9aa89d Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
virtio,vhost,mmap fixes for 2.5

vhost test patches to fix the travis build
virtio ccw patch to fix virtio 1
virtio pci patch to fix pci express
vhost user bridge patch to fix fd leaks
mmap-alloc patch to fix hugetlbfs on ppc64
remove dead code for vhost (trivial)

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# gpg: Signature made Wed 02 Dec 2015 20:38:41 GMT using RSA key ID D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>"

* remotes/mst/tags/for_upstream:
  util/mmap-alloc: fix hugetlb support on ppc64
  virtio-pci: Set the QEMU_PCI_CAP_EXPRESS capability early in its DeviceClass realize method
  virtio: handle non-virtio-1-capable backend for ccw
  tests/vhost-user-bridge.c: fix fd leakage
  vhost: drop dead code
  vhost-user: verify that number of queues is non-zero
  vhost-user-test: fix crash with glib < 2.36
  vhost-user-test: use unix port for migration
  vhost-user-test: fix chardriver race

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-12-02 23:11:24 +00:00
Paolo Bonzini
a694ee343d migration: do floating-point division
Dividing integer expressions transferred_bytes and time_spent, and then converting
the integer quotient to type double. Any remainder, or fractional part of the
quotient, is ignored.  Fix this.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-12-03 00:03:00 +01:00
Markus Armbruster
4e39f57c00 migration: Clean up use of g_poll() in socket_writev_buffer()
socket_writev_buffer() writes in a loop, using g_poll() to block.  If
g_poll() fails, it tries to write more before the file descriptor is
ready.  In theory, this could go into a tight loop.  In practice,
errors other than EINTR are really unlikely, and when they happen,
we're probably screwed anyway, so we can just as well loop.

Clean it up a bit: retry poll on EINTR, keep ignoring other errors.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-12-03 00:03:00 +01:00
Michael S. Tsirkin
7197fb4058 util/mmap-alloc: fix hugetlb support on ppc64
Since commit 8561c9244d "exec: allocate PROT_NONE pages on top of
RAM", it is no longer possible to back guest RAM with hugepages on ppc64
hosts:

mmap(NULL, 285212672, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x3fff57000000
mmap(0x3fff57000000, 268435456, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED, 19, 0) = -1 EBUSY (Device or resource busy)

This is because on ppc64, Linux fixes a page size for a virtual address
at mmap time, so we can't switch a range of memory from anonymous
small pages to hugetlbs with MAP_FIXED.

See commit d0f13e3c20b6fb73ccb467bdca97fa7cf5a574cd
("[POWERPC] Introduce address space "slices"") in Linux
history for the details.

Detect this and create the PROT_NONE mapping using the same fd.

Naturally, this makes the guard page bigger with hugetlbfs.

Based on patch by Greg Kurz.

Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Tested-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-12-02 22:38:23 +02:00
Shmulik Ladkani
0560b0e97d virtio-pci: Set the QEMU_PCI_CAP_EXPRESS capability early in its DeviceClass realize method
In 1811e64 'hw/virtio: Add PCIe capability to virtio devices', the
QEMU_PCI_CAP_EXPRESS capability was added to virtio's pci_dev, within
'virtio_pci_realize' - the pci device object realization method.

This occurs to late, as 'pci_qdev_realize' (DeviceClass.realize of
TYPE_PCI_DEVICE) has already been called, without knowing that the
device instance is indeed an "express" instance, thus allocating
insufficient pci config space.

As a result, device may crash upon attempt to write to the PCIE config
space.

Fix, by arming the QEMU_PCI_CAP_EXPRESS capability early in virtio-pci's
own DeviceClass realize method.

This also makes code cleaner, as 'virtio_pci_realize' may now access the
'pci_is_express' predicate when needed.

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Tested-by: Marcel Apfelbaum <marcel@redhat.com>
2015-12-02 21:51:33 +02:00
Cornelia Huck
11380b3619 virtio: handle non-virtio-1-capable backend for ccw
If you run a qemu advertising VERSION_1 with an old kernel where
vhost did not yet support VERSION_1, you'll end up with a device
that is {modern pci|ccw revision 1} but does not advertise VERSION_1.
This is not a sensible configuration and is rejected by the Linux
guest drivers.

To fix this, add a ->post_plugged() callback invoked after features
have been queried that can handle the VERSION_1 bit being withdrawn
and change ccw to fall back to revision 0 if VERSION_1 is gone.

Note that pci is _not_ fixed; we'll need to rethink the approach
for the next release but at least for pci it's not a regression.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-12-02 19:34:11 +02:00
Victor Kaplansky
6d0b908a62 tests/vhost-user-bridge.c: fix fd leakage
This fixes file descriptor leakage in vhost-user-bridge
application. Whenever a new callfd or kickfd is set, the previous
one should be explicitly closed. File descriptors used to map
guest's memory are closed immediately after mmap call.

Signed-off-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-12-02 19:27:26 +02:00
Peter Maydell
cf22132367 Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches

# gpg: Signature made Wed 02 Dec 2015 15:57:35 GMT using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"

* remotes/kevin/tags/for-upstream:
  blkdebug: silence warning under qtest
  qcow2: Fix potential qemu-img check crash on 32 bit hosts

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-12-02 17:05:34 +00:00
Peter Maydell
2196b6f5dd Merge remote-tracking branch 'remotes/cody/tags/block-pull-request' into staging
# gpg: Signature made Wed 02 Dec 2015 15:45:36 GMT using RSA key ID C0DE3057
# gpg: Good signature from "Jeffrey Cody <jcody@redhat.com>"
# gpg:                 aka "Jeffrey Cody <jeff@codyprime.org>"
# gpg:                 aka "Jeffrey Cody <codyprime@gmail.com>"

* remotes/cody/tags/block-pull-request:
  mirror: Quiesce source during "mirror_exit"

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-12-02 16:24:26 +00:00
Michael S. Tsirkin
b0ae1536c5 vhost: drop dead code
commit 1e7398a1 ("vhost: enable vhost without without MSI-X"_
dropped the implementation of vhost_dev_query,
drop it from the header file as well.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2015-12-02 17:59:13 +02:00
Fam Zheng
176c36997f mirror: Quiesce source during "mirror_exit"
With dataplane, the ioeventfd events could be dispatched after
mirror_run releases the dirty bitmap, but before mirror_exit actually
does the device switch, because the iothread will still be running, and
it will cause silent data loss.

Fix this by adding a bdrv_drained_begin/end pair around the window, so
that no new external request will be handled.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
2015-12-02 10:44:06 -05:00
Peter Maydell
30a9fd5d13 Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
* exec.c use after free
* Xen 32-on-64 breakage
* missing EINTR
* naughty warning under qtest

# gpg: Signature made Wed 02 Dec 2015 12:13:55 GMT using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"

* remotes/bonzini/tags/for-upstream:
  translate-all: ensure host page mask is always extended with 1's
  main-loop: suppress warnings under qtest
  qemu-char: retry g_poll on EINTR
  exec: Stop using memory after free

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-12-02 15:41:38 +00:00
Kevin Wolf
ab7fe3a29a Merge remote-tracking branch 'mreitz/tags/pull-block-for-kevin-2015-12-02' into queue-block
One block patch for qemu 2.5-rc3.

# gpg: Signature made Wed Dec  2 16:29:17 2015 CET using RSA key ID E838ACAD
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>"

* mreitz/tags/pull-block-for-kevin-2015-12-02:
  blkdebug: silence warning under qtest

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-12-02 16:38:03 +01:00
Michael S. Tsirkin
20873526a3 blkdebug: silence warning under qtest
make check always outputs warnings, this
is not nice.  Disable blkdebug warnings under qtest.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-id: 1448883874-17933-1-git-send-email-mst@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2015-12-02 16:28:10 +01:00
Victor Kaplansky
6f6f9512ea vhost-user: verify that number of queues is non-zero
Fix QEMU crash when -netdev type=vhost-user,queues=n is passed
with zero number of queues.

Signed-off-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2015-12-02 16:42:27 +02:00
Marc-André Lureau
45ce512670 vhost-user-test: fix crash with glib < 2.36
The prepare callback needs to be implemented with glib < 2.36,
quoting glib documentation:
"Since 2.36 this may be NULL, in which case the effect is as if the
function always returns FALSE with a timeout of -1."

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-12-02 16:42:26 +02:00
Marc-André Lureau
a899b1ea2a vhost-user-test: use unix port for migration
TCP port 1234 may be used by another process concurrently. Instead use a
temporary unix socket.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-12-02 16:42:26 +02:00
Marc-André Lureau
9732baf678 vhost-user-test: fix chardriver race
vhost-user-tests uses a helper thread to dispatch the vhost-user servers
sources. However the CharDriverState is not thread-safe. Therefore, when
it's given to the thread, it shouldn't be manipulated concurrently.

We dispatch cleaning the server in an idle source. By the end of the
test, we ensure not to leave anything behind by joining the thread and
finishing the sources dispatch.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-12-02 16:42:26 +02:00
Kevin Wolf
c2551b47c9 qcow2: Fix potential qemu-img check crash on 32 bit hosts
This crash was caught with qemu-iotests test case 138.

Commit b6d36de already fixed a few 32 bit truncation bugs that could
cause qemu-img check to allocate too little memory and consequently
it would segfault. On 32 bit hosts, there is one more place that needs
to be fixed because size_t was involved in the calculation and is a
32 bit type there.

Cc: qemu-stable@nongnu.org
Reported-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Tested-by: Michael S. Tsirkin <mst@redhat.com>
2015-12-02 13:22:29 +01:00
Paolo Bonzini
0c2d70c448 translate-all: ensure host page mask is always extended with 1's
Anthony reported that >4GB guests on Xen with 32bit QEMU broke after
commit 4ed023c ("Round up RAMBlock sizes to host page sizes", 2015-11-05).

In that patch sizes are masked against qemu_host_page_size/mask which
are uintptr_t, and thus 32bit on a 32bit QEMU, even though the ram space
might be bigger than 4GB on Xen.

Since ram_addr_t is not available on user-mode emulation targets, ensure
that we get a sign extension when masking away the low bits of the address.
Remove the ~10 year old scary comment that the type of these variables
is probably wrong, with another equally scary comment.  The new comment
however does not have "???" in it, which is arguably an improvement.

For completeness use the alignment macros in linux-user and bsd-user
instead of manually doing an &.  linux-user and bsd-user are not affected
by the Xen issue, however.

Reviewed-by: Juan Quintela <quintela@redhat.com>
Reported-by: Anthony PERARD <anthony.perard@citrix.com>
Fixes: 4ed023ce2a
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-12-02 13:12:30 +01:00
Michael S. Tsirkin
21a24302e8 main-loop: suppress warnings under qtest
commit 01c22f2cdd ("main-loop: Suppress
"I/O thread spun" warnings for qtest") doesn't actually disable the
warning for everyone since some tests don't run under the qtest
accelerator.

Check qtest_driver instead.

Cc: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <1448882964-22433-1-git-send-email-mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-12-02 12:01:43 +01:00
Paolo Bonzini
c1f2448998 qemu-char: retry g_poll on EINTR
This is a case where pty_chr_update_read_handler_locked's lack
of error checking can produce incorrect values.  We are not using
SIGUSR1 anymore, so this is quite theoretical, but easy to fix.

Reported-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-12-02 12:01:43 +01:00
Don Slutz
55b4e80b04 exec: Stop using memory after free
memory_region_unref(mr) can free memory.

For example I got:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f43280d4700 (LWP 4462)]
0x00007f43323283c0 in phys_section_destroy (mr=0x7f43259468b0)
    at /home/don/xen/tools/qemu-xen-dir/exec.c:1023
1023        if (mr->subpage) {
(gdb) bt
    at /home/don/xen/tools/qemu-xen-dir/exec.c:1023
    at /home/don/xen/tools/qemu-xen-dir/exec.c:1034
    at /home/don/xen/tools/qemu-xen-dir/exec.c:2205
(gdb) p mr
$1 = (MemoryRegion *) 0x7f43259468b0

And this change prevents this.

Signed-off-by: Don Slutz <Don.Slutz@Gmail.com>
Message-Id: <1448921464-21845-1-git-send-email-Don.Slutz@Gmail.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-12-02 12:01:43 +01:00
Peter Maydell
9d7b969ea6 Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20151201' into staging
Last minute fix

# gpg: Signature made Tue 01 Dec 2015 22:37:25 GMT using RSA key ID 4DD0279B
# gpg: Good signature from "Richard Henderson <rth7680@gmail.com>"
# gpg:                 aka "Richard Henderson <rth@redhat.com>"
# gpg:                 aka "Richard Henderson <rth@twiddle.net>"

* remotes/rth/tags/pull-tcg-20151201:
  tcg: Increase the highwater reservation

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-12-02 10:16:53 +00:00
Richard Henderson
b17a6d3390 tcg: Increase the highwater reservation
If there are a lot of guest memory ops in the TB, the amount of
code generated by tcg_out_tb_finalize could be well more than 1k.
In the short term, increase the reservation larger than any TB
seen in practice.

Reported-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Tested-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-12-01 14:36:32 -08:00
Peter Maydell
8d3a5d9b0f ui/cocoa.m: Prevent activation clicks from going to guest
When QEMU is brought to the foreground, the click event that activates QEMU
should not go to the guest. Accidents happen when they do go to the guest
without giving the user a chance to handle them. In particular, if the
guest input device is not an absolute-position one then the location of
the guest cursor (and thus the click) will likely not be the location of
the host cursor when it is clicked, and could be completely obscured
below another window. Don't send mouse clicks to QEMU unless the
window either has focus or has grabbed mouse events.

Reported-by: John Arbuckle <programmingkidx@gmail.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: John Arbuckle <programmingkidx@gmail.com>
Message-id: 1448551168-13196-1-git-send-email-peter.maydell@linaro.org
2015-12-01 21:22:41 +00:00
Peter Maydell
e3d58827fe Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20151201' into staging
Last round of s390x fixes for 2.5:
- The bios should be built for the first z machine, so that newer
  instructions don't creep in.
- Silence annoying message when running make check.
- Fix a problem with the pci iommu exposed by recent changes.

# gpg: Signature made Tue 01 Dec 2015 08:59:42 GMT using RSA key ID C6F02FAF
# gpg: Good signature from "Cornelia Huck <huckc@linux.vnet.ibm.com>"
# gpg:                 aka "Cornelia Huck <cornelia.huck@de.ibm.com>"

* remotes/cohuck/tags/s390x-20151201:
  s390x/pci: fix up IOMMU size
  s390x: no deprecation warning while testing
  pc-bios/s390-ccw: rebuild image
  pc-bios/s390-ccw: build for z900

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-12-01 16:30:27 +00:00
Yi Min Zhao
f0a399dbae s390x/pci: fix up IOMMU size
Present code uses @size==UINT64_MAX to initialize IOMMU. It infers that it
can map any 64-bit IOVA whatsoever. But in fact, the largest DMA range for
each PCI Device on s390x is from ZPCI_SDMA_ADDR to ZPCI_EDMA_ADDR. The largest
value is returned from hardware, which is to indicate the largest range
hardware can support. But the real IOMMU size for specific PCI Device is
obtained once qemu intercepts mpcifc instruction that guest is requesting a
DMA range for that PCI Device. Therefore, before intercepting mpcifc instruction,
qemu cannot be aware of the size of IOMMU region that guest will use.

Moreover, iommu replay during device initialization for the whole region in
4k steps takes a very long time.

In conclusion, this patch intializes IOMMU region for each PCI Device when
intercept mpcifc instruction which is to register DMA range for the PCI Device.
And then, destroy IOMMU region when guest wants to deregister IOAT.

Signed-off-by: Yi Min Zhao <zyimin@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2015-12-01 09:57:28 +01:00
Cornelia Huck
567c88c354 s390x: no deprecation warning while testing
'make check' tries to start all available machines; the deprecation
message for the s390-virtio machine is both useless and annoying
there. Silence it while testing.

Reported-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
2015-12-01 09:57:27 +01:00
Cornelia Huck
07af4c53a5 pc-bios/s390-ccw: rebuild image
Contains:
- pc-bios/s390-ccw: build for z900

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2015-12-01 09:57:27 +01:00
Christian Borntraeger
7619562a64 pc-bios/s390-ccw: build for z900
Newer distributions have an architecture level set to z9, z196
or similar - also as default option for the compiler.

We should build the bios for z900 to allow it to run with
all 64bit CPUs. This will become more important as soon as
QEMU/KVM does support CPU models.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-By: Sascha Silbe <silbe@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2015-12-01 09:57:27 +01:00
Peter Maydell
d90eb45902 Merge remote-tracking branch 'remotes/gkurz/tags/for-upstream' into staging
Two fixes for virtfs/9p from Paolo.

# gpg: Signature made Mon 30 Nov 2015 14:10:47 GMT using DSA key ID 0101DBC2
# gpg: Good signature from "Greg Kurz <gkurz@fr.ibm.com>"
# gpg:                 aka "Greg Kurz <groug@free.fr>"
# gpg:                 aka "Greg Kurz <gkurz@linux.vnet.ibm.com>"
# gpg:                 aka "Gregory Kurz (Groug) <groug@free.fr>"
# gpg:                 aka "Gregory Kurz (Cimai Technology) <gkurz@cimai.com>"
# gpg:                 aka "Gregory Kurz (Meiosys Technology) <gkurz@meiosys.com>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 2BD4 3B44 535E C0A7 9894  DBA2 02FC 3AEB 0101 DBC2

* remotes/gkurz/tags/for-upstream:
  virtio-9p: use QEMU thread pool
  fsdev-proxy-helper: avoid TOC/TOU race

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-30 21:59:22 +00:00
Peter Maydell
a2485925f7 Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.5-20151130' into staging
ppc patch queue for qemu-2.5 20151130

target-ppc and related bugfix patches for qemu-2.5

I don't have the facilities to test the Macintosh and BookE related
patches.  I've sanity checked them (inspection + make check), but I'm
otherwise relying on the submitters.

# gpg: Signature made Mon 30 Nov 2015 08:42:01 GMT using RSA key ID 20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-2.5-20151130:
  target-ppc/fpu_helper: fix FPSCR_FX bit shift operation
  target-ppc: Move the FPSCR bit update macros to cpu.h
  hw/ppc/ppc405_boards: Fix infinite recursion by converting taihu_cpld from old_mmio
  hw/ppc/spapr: Remove duplicated "pseries" alias
  mac_dbdma: always initialize channel field in DBDMA_channel

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-30 17:09:35 +00:00
Peter Maydell
680617ed43 Merge remote-tracking branch 'remotes/weil/tags/pull-wxx-20151130' into staging
wxx patch queue

# gpg: Signature made Mon 30 Nov 2015 05:48:33 GMT using RSA key ID 677450AD
# gpg: Good signature from "Stefan Weil <sw@weilnetz.de>"
# gpg:                 aka "Stefan Weil <stefan.weil@weilnetz.de>"
# gpg:                 aka "Stefan Weil <stefan.weil@bib.uni-mannheim.de>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 4923 6FEA 75C9 5D69 8EC2  B78A E08C 21D5 6774 50AD

* remotes/weil/tags/pull-wxx-20151130:
  w32: Use gcc option -mthreads
  oslib-win32: Change return type of function getpagesize
  trace/simple: Fix warning and wrong trace file name for MinGW

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-30 15:35:20 +00:00
Paolo Bonzini
ebac1202c9 virtio-9p: use QEMU thread pool
The QEMU thread pool already has a mechanism to invoke callbacks in the main
thread.  It does not need an EventNotifier and it is more efficient too.
Use it instead of GAsyncQueue + GThreadPool + glue.

As a side effect, it silences Coverity's complaint about an unchecked
return value for event_notifier_init.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
(removed no more needed #include <glib.h> from virtio-9p-coth.h)
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
2015-11-30 12:36:12 +01:00
Paolo Bonzini
49f817caaf fsdev-proxy-helper: avoid TOC/TOU race
There is a minor time of check/time of use race between statfs and chroot.
It can be fixed easily by stat-ing the root after it has been changed.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
2015-11-30 12:31:53 +01:00
Madhavan Srinivasan
7624789234 target-ppc/fpu_helper: fix FPSCR_FX bit shift operation
Currently in TCG mode, updating floating exception
summary bit (FPSCR_FX) in fpscr also updates
the upper 32bits of fpscr with all 1s.
Modify the bit shift operation statement to use
1ULL instead.

Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-11-30 19:39:01 +11:00
Madhavan Srinivasan
dbdc13a1ac target-ppc: Move the FPSCR bit update macros to cpu.h
Move the FPSCR bit update macros defined in dfp_helper
to cpu.h. This way, fpu_helper functions can also use them

Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-11-30 19:39:01 +11:00
Peter Maydell
e2a176dfda hw/ppc/ppc405_boards: Fix infinite recursion by converting taihu_cpld from old_mmio
The taihu_cpld_writel() function had an obvious typo that meant that
if it was ever called it would go into an infinite recursion. Newer
versions of clang will detect and warn about this:
  hw/ppc/ppc405_boards.c:481:1: warning: all paths through this function will call itself [-Winfinite-recursion]

Fix this by converting taihu_cpld from the legacy old_mmio accessors
to new-style ones, with an impl {} declaration to cause the core
memory code to do the splitting of 16 bit and 32 bit accesses into
multiple 8-bit accesses.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-11-30 19:39:00 +11:00
Thomas Huth
9b7a70e63e hw/ppc/spapr: Remove duplicated "pseries" alias
The "pseries" alias is currently set twice, one time for the
pseries-2.4 machine and one time for the "pseries-2.5" machine.
To avoid confusion with the alias, let's remove the one from
the older machine class. And while we're at it, also remove
the "is_default = 0" there since the is_default variable
should be set to zero by default already.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-11-30 19:39:00 +11:00
Hervé Poussineau
7f0d763ce6 mac_dbdma: always initialize channel field in DBDMA_channel
dbdma_from_ch() uses channel field to return the right DBDMA object.
Previous code was working if guest OS was only using registered DMA channels.
However, it lead to QEMU crashes if guest OS was using unregistered DMA channels.

Signed-off-by: Hervé Poussineau <hpoussin@reactos.org>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-11-30 19:38:44 +11:00
Stefan Weil
78e9d4ad11 w32: Use gcc option -mthreads
QEMU uses threads / coroutines, therefore support for thread local storage
and thread safe libraries (-D_MT) must be enabled by using -mthreads.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2015-11-30 06:47:02 +01:00
Stefan Weil
a28c2f2df7 oslib-win32: Change return type of function getpagesize
getpagesize on Linux returns an int. Fix QEMU's implementation for
Windows to return an int (instead of size_t), too.

This fixes a compiler warning which was introduced recently
(commit 093e3c42).

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2015-11-30 06:47:02 +01:00
Stefan Weil
857a0e387a trace/simple: Fix warning and wrong trace file name for MinGW
On Windows, getpid() always returns an int value, but pid_t (which is
expected by the format string) is either a 32 bit or a 64 bit value.

Without a type cast (or a modified format string), the compiler prints
a warning when building for 64 bit Windows and the resulting trace_file_name
will include a wrong pid:

trace/simple.c:332:9: warning:
 format ‘%lld’ expects argument of type ‘long long int’,
 but argument 2 has type ‘int’ [-Wformat=]

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2015-11-30 06:47:02 +01:00
Peter Maydell
714487515d Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging
# gpg: Signature made Fri 27 Nov 2015 02:42:02 GMT using RSA key ID 398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F  3562 EF04 965B 398D 6211

* remotes/jasowang/tags/net-pull-request:
  tap-win32: disable broken async write path
  tap-win32: skip unexpected nodes during registry enumeration
  eepro100: Prevent two endless loops

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-27 10:44:42 +00:00
Andrew Baumann
b73c184914 tap-win32: disable broken async write path
The code under the TUN_ASYNCHRONOUS_WRITES path makes two incorrect
assumptions about the behaviour of the WriteFile API for overlapped
file handles. First, WriteFile does not update the
lpNumberOfBytesWritten parameter when the write completes
asynchronously (the number of bytes written is known only when the
operation completes). Second, the buffer shouldn't be touched (or
freed) until the operation completes. This led to at least one bug
where tap_win32_write returned zero bytes written, which in turn
caused further writes ("receives") to be disabled for that device.

This change disables the asynchronous write path, while keeping most
of the code around in case someone sees value in resurrecting it. It
also adds some conditional debug output, similar to the read path.

Signed-off-by: Andrew Baumann <Andrew.Baumann@microsoft.com>
Acked-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2015-11-27 10:39:55 +08:00
Andrew Baumann
ee0428e3ac tap-win32: skip unexpected nodes during registry enumeration
In order to find a named tap device, get_device_guid() enumerates children of
HKLM\SYSTEM\CCS\Control\Network\{4D36E972-E325-11CE-BFC1-08002BE10318}
(aka NETWORK_CONNECTIONS_KEY). For each child, it then looks for a
"Connection" subkey, but if this key doesn't exist, it aborts the
entire search. This was observed to fail on at least one Windows 10
machine, where there is an additional child of NETWORK_CONNECTIONS_KEY
(named "Descriptions"). Since registry enumeration doesn't guarantee
any particular sort order, we should continue to search for matching
children rather than aborting the search.

Signed-off-by: Andrew Baumann <Andrew.Baumann@microsoft.com>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2015-11-27 10:39:55 +08:00
Stefan Weil
00837731d2 eepro100: Prevent two endless loops
http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg04592.html
shows an example how an endless loop in function action_command can
be achieved.

During my code review, I noticed a 2nd case which can result in an
endless loop.

Reported-by: Qinghao Tang <luodalongde@gmail.com>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2015-11-27 10:39:55 +08:00
Peter Maydell
b04fc42835 Update version for v2.5.0-rc2 release
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-26 17:50:12 +00:00
Peter Maydell
72f75c76d8 Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
vhost, pc: fixes for 2.5

Minor vhost fixes.  HW version tweak for PC.
Documentation and test updates.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# gpg: Signature made Thu 26 Nov 2015 16:40:25 GMT using RSA key ID D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>"

* remotes/mst/tags/for_upstream:
  vhost-user-test: fix migration overlap test
  Fix memory leak on error
  Revert "vhost: send SET_VRING_ENABLE at start/stop"
  tests/vhost-user-bridge: read command line arguments
  tests/vhost-user-bridge: propose GUEST_ANNOUNCE feature
  vhost-user: clarify start and enable
  vhost-user: set link down when the char device is closed
  pc: Don't set hw_version on pc-*-2.5
  osdep: Change default value of qemu_hw_version() to "2.5+"

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-26 16:50:59 +00:00
Michael S. Tsirkin
d08e42a112 vhost-user-test: fix migration overlap test
During migration, source does GET_BASE, destination does SET_BASE.
Use that as opposed to fds being configured to detect
vhost user running on both source and destination.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-11-26 18:39:34 +02:00
Peter Maydell
a5df35070a Merge remote-tracking branch 'remotes/armbru/tags/pull-monitor-2015-11-26' into staging
QMP and QObject patches

# gpg: Signature made Thu 26 Nov 2015 09:07:18 GMT using RSA key ID EB918653
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>"
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>"

* remotes/armbru/tags/pull-monitor-2015-11-26:
  qjson: Limit number of tokens in addition to total size
  qjson: surprise, allocating 6 QObjects per token is expensive
  qjson: store tokens in a GQueue
  qjson: Convert to parser to recursive descent
  qjson: replace QString in JSONLexer with GString
  qjson: Inline token_is_escape() and simplify
  qjson: Inline token_is_keyword() and simplify
  qjson: Give each of the six structural chars its own token type
  qjson: Spell out some silent assumptions
  check-qjson: Add test for JSON nesting depth limit
  qjson: Don't crash when input exceeds nesting limit
  qjson: Apply nesting limit more sanely
  monitor: Plug memory leak on QMP error

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-26 16:27:26 +00:00
Peter Maydell
317e4db6e9 Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
Small patches, without the one that introduces -fwrapv.

# gpg: Signature made Thu 26 Nov 2015 15:48:53 GMT using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"

* remotes/bonzini/tags/for-upstream:
  target-i386: kvm: Print warning when clearing mcg_cap bits
  target-i386: kvm: Use env->mcg_cap when setting up MCE
  target-i386: kvm: Abort if MCE bank count is not supported by host
  virtio-scsi: don't crash without a valid device
  target-sparc: fix 32-bit truncation in fpackfix
  exec: remove warning about mempath and hugetlbfs
  Revert "exec: silence hugetlbfs warning under qtest"
  call bdrv_drain_all() even if the vm is stopped
  MAINTAINERS: Update TCG CPU cores section

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-26 15:56:53 +00:00
Eduardo Habkost
5120901a37 target-i386: kvm: Print warning when clearing mcg_cap bits
Instead of silently clearing mcg_cap bits when the host doesn't
support them, print a warning when doing that.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
[Avoid \n at end of error_report. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1448471956-66873-10-git-send-email-pbonzini@redhat.com>
2015-11-26 16:48:16 +01:00
Eduardo Habkost
2590f15b13 target-i386: kvm: Use env->mcg_cap when setting up MCE
When setting up MCE, instead of using the MCE_*_DEF macros
directly, just filter the existing env->mcg_cap value.

As env->mcg_cap is already initialized as
MCE_CAP_DEF|MCE_BANKS_DEF at target-i386/cpu.c:mce_init(), this
doesn't change any behavior. But it will allow us to change
mce_init() in the future, to implement different defaults
depending on CPU model, machine-type or command-line parameters.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1448471956-66873-9-git-send-email-pbonzini@redhat.com>
2015-11-26 16:48:11 +01:00
Eduardo Habkost
49b69cbfcd target-i386: kvm: Abort if MCE bank count is not supported by host
Instead of silently changing the number of banks in mcg_cap based
on kvm_get_mce_cap_supported(), abort initialization if the host
doesn't support MCE_BANKS_DEF banks.

Note that MCE_BANKS_DEF was always 10 since it was introduced in
QEMU, and Linux always returned 32 at KVM_CAP_MCE since
KVM_CAP_MCE was introduced, so no behavior is being changed and
the error can't be triggered by any Linux version. The point of
the new check is to ensure we won't silently change the bank
count if we change MCE_BANKS_DEF or make the bank count
configurable in the future.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
[Avoid Yoda condition and \n at end of error_report. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1448471956-66873-8-git-send-email-pbonzini@redhat.com>
2015-11-26 16:48:07 +01:00
Eugene (jno) Dvurechenski
3e32e8a96e virtio-scsi: don't crash without a valid device
Make sure that we actually have a device when checking the aio
context. Otherwise guests could trigger QEMU crashes.

Signed-off-by: "Eugene (jno) Dvurechenski" <jno@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Message-Id: <1448549135-6582-2-git-send-email-jno@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-26 16:47:44 +01:00
Paolo Bonzini
12a3567c40 target-sparc: fix 32-bit truncation in fpackfix
This is reported by Coverity.  The algorithm description at
ftp://ftp.icm.edu.pl/packages/ggi/doc/hw/sparc/Sparc.pdf suggests
that the 32-bit parts of rs2, after the left shift, is treated
as a 64-bit integer.  Bits 32 and above are used to do the
saturating truncation.

Message-Id: <1446473134-4330-1-git-send-email-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-26 16:47:44 +01:00
Daniel P. Berrange
bfc2a1a1f4 exec: remove warning about mempath and hugetlbfs
The gethugepagesize() method in exec.c printed a warning if
the file path for "-mem-path" or "-object memory-backend-file"
was not on a hugetlbfs filesystem. This warning is bogus, because
QEMU functions perfectly well with the path on a regular tmpfs
filesystem. Use of hugetlbfs vs tmpfs is a choice for the management
application or end user to make as best fits their needs. As such it
is inappropriate for QEMU to have an opinion on whether the user's
choice is right or wrong in this case.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1448448749-1332-3-git-send-email-berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-26 16:47:44 +01:00
Daniel P. Berrange
2c189a4e12 Revert "exec: silence hugetlbfs warning under qtest"
This reverts commit 1c7ba94a18.

That commit changed QEMU initialization order from

 - object-initial, chardev, qtest, object-late

to

 - chardev, qtest, object-initial, object-late

This breaks chardev setups which need to rely on objects
having been created. For example, when chardevs use TLS
encryption in the future, they need to have tls credential
objects created first.

This revert, restores the ordering introduced in

  commit f08f9271bf
  Author: Daniel P. Berrange <berrange@redhat.com>
  Date:   Wed May 13 17:14:04 2015 +0100

    vl: Create (most) objects before creating chardev backends

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1448448749-1332-2-git-send-email-berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-26 16:47:44 +01:00
Wen Congyang
b2780d3253 call bdrv_drain_all() even if the vm is stopped
There are still I/O operations when the vm is stopped. For example,
stop the vm, and do block migration. In this case, we don't drain all
I/O operation, and may meet the following problem:

qemu-system-x86_64: migration/block.c:731: block_save_complete: Assertion `block_mig_state.submitted == 0' failed.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Message-Id: <564EE92E.4070701@cn.fujitsu.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-26 16:47:44 +01:00
Stefano Dong (董兴水)
903a41d341 Fix memory leak on error
hw/ppc/spapr.c: Fix memory leak on error, it was introduced in bc09e0611
hw/acpi/memory_hotplug.c: Fix memory leak on error, it was introduced in 34f2af3d

Signed-off-by: Stefano Dong (董兴水) <opensource.dxs@aliyun.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-11-26 14:27:52 +02:00
Peter Maydell
fe4cf57da7 Merge remote-tracking branch 'remotes/kraxel/tags/pull-vnc-20151126-1' into staging
vnc: fix segfault

# gpg: Signature made Thu 26 Nov 2015 07:37:43 GMT using RSA key ID D3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"

* remotes/kraxel/tags/pull-vnc-20151126-1:
  vnc: fix segfault

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-26 10:58:10 +00:00
Peter Maydell
b8b0ee0ea3 Merge remote-tracking branch 'remotes/mdroth/tags/qga-pull-2015-11-25-v2-tag' into staging
qemu-ga patch queue for 2.5

* include additional w32 MSI install components needed for
  guest-exec
* fix 'make install' when compiling with --disable-tools
* fix potential data corruption/loss when accessing files
  bi-directionally via guest-file-{read,write}
* explicitly document how integer args for guest-file-seek map to
  SEEK_SET/SEEK_CUR/etc to avoid platform-specific differences

v2:
* fixed missing SoB

# gpg: Signature made Wed 25 Nov 2015 23:58:45 GMT using RSA key ID F108B584
# gpg: Good signature from "Michael Roth <flukshun@gmail.com>"
# gpg:                 aka "Michael Roth <mdroth@utexas.edu>"
# gpg:                 aka "Michael Roth <mdroth@linux.vnet.ibm.com>"

* remotes/mdroth/tags/qga-pull-2015-11-25-v2-tag:
  qga: added another non-interactive gspawn() helper file.
  qga: Better mapping of SEEK_* in guest-file-seek
  tests: add file-write-read test
  qga: flush explicitly when needed
  qga: gspawn() console helper to Windows guest agent msi build
  makefile: fix qemu-ga make install for --disable-tools

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-26 10:24:18 +00:00
Michael S. Tsirkin
449e357810 Revert "vhost: send SET_VRING_ENABLE at start/stop"
This reverts commit 3a12f32229.

In case of live migration several queues can be enabled and not only the
first one. So informing backend that only the first queue is enabled is
wrong.

Reported-by: Thibaut Collet <thibaut.collet@6wind.com>
Cc: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2015-11-26 12:02:11 +02:00
Peter Maydell
7ef7ddf376 Merge remote-tracking branch 'remotes/jnsnow/tags/ide-pull-request' into staging
# gpg: Signature made Wed 25 Nov 2015 20:25:21 GMT using RSA key ID AAFC390E
# gpg: Good signature from "John Snow (John Huston) <jsnow@redhat.com>"

* remotes/jnsnow/tags/ide-pull-request:
  ide-test: fix timeouts
  atapi: Fix code indentation
  atapi: Account for failed and invalid operations in cd_read_sector()
  ide-test: cdrom_pio_impl fixup

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-26 09:44:25 +00:00
Markus Armbruster
df649835fe qjson: Limit number of tokens in addition to total size
Commit 29c75dd "json-streamer: limit the maximum recursion depth and
maximum token count" attempts to guard against excessive heap usage by
limiting total token size (it says "token count", but that's a lie).

Total token size is a rather imprecise predictor of heap usage: many
small tokens use more space than few large tokens with the same input
size, because there's a constant per-token overhead: 37 bytes on my
system.

Tighten this up: limit the token count to 2Mi.  Chosen to roughly
match the 64MiB total token size limit.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1448486613-17634-13-git-send-email-armbru@redhat.com>
2015-11-26 10:07:07 +01:00
Paolo Bonzini
9bada89711 qjson: surprise, allocating 6 QObjects per token is expensive
Replace the contents of the tokens GQueue with a simple struct.  This cuts
the amount of memory allocated by tests/check-qjson from ~500MB to ~20MB,
and the execution time from 600ms to 80ms on my laptop.  Still a lot (some
could be saved by using an intrusive list, such as QSIMPLEQ, instead of
the GQueue), but the savings are already massive and the right thing to
do would probably be to get rid of json-streamer completely.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1448300659-23559-5-git-send-email-pbonzini@redhat.com>
[Straightforwardly rebased on my patches]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2015-11-26 10:07:07 +01:00
Paolo Bonzini
95385fe9ac qjson: store tokens in a GQueue
Even though we still have the "streamer" concept, the tokens can now
be deleted as they are read.  While doing so convert from QList to
GQueue, since the next step will make tokens not a QObject and we
will have to do the conversion anyway.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1448300659-23559-4-git-send-email-pbonzini@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2015-11-26 10:07:07 +01:00
Markus Armbruster
d538b25543 qjson: Convert to parser to recursive descent
We backtrack in parse_value(), even though JSON is LL(1) and thus can
be parsed by straightforward recursive descent.  Do exactly that.

Based on an almost-correct patch from Paolo Bonzini.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1448486613-17634-10-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2015-11-26 10:06:57 +01:00
Paolo Bonzini
d2ca7c0b0d qjson: replace QString in JSONLexer with GString
JSONLexer only needs a simple resizable buffer.  json-streamer.c
can allocate memory for each token instead of relying on reference
counting of QStrings.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1448300659-23559-2-git-send-email-pbonzini@redhat.com>
[Straightforwardly rebased on my patches, checkpatch made happy]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2015-11-26 09:31:22 +01:00
Markus Armbruster
6b9606f68e qjson: Inline token_is_escape() and simplify
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1448486613-17634-8-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2015-11-26 09:27:23 +01:00
Markus Armbruster
50e2a467f5 qjson: Inline token_is_keyword() and simplify
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1448486613-17634-7-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2015-11-26 09:22:57 +01:00
Markus Armbruster
c54616608a qjson: Give each of the six structural chars its own token type
Simplifies things, because we always check for a specific one.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1448486613-17634-6-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2015-11-26 09:22:54 +01:00
Markus Armbruster
b8d3b1da3c qjson: Spell out some silent assumptions
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1448486613-17634-5-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2015-11-26 09:18:45 +01:00
Markus Armbruster
f0ae0304c7 check-qjson: Add test for JSON nesting depth limit
This would have prevented the regression mentioned in the previous
commit.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1448486613-17634-4-git-send-email-armbru@redhat.com>
2015-11-26 09:18:38 +01:00
Markus Armbruster
0753113a26 qjson: Don't crash when input exceeds nesting limit
We limit nesting depth and input size to defend against input
triggering excessive heap or stack memory use (commit 29c75dd
json-streamer: limit the maximum recursion depth and maximum token
count).  However, when the nesting limit is exceeded,
parser_context_peek_token()'s assertion fails.

Broken in commit 65c0f1e "json-parser: don't replicate tokens at each
level of recursion".

To reproduce stuff 1025 open braces or brackets into QMP.

Fix by taking the error exit instead of the normal one.

Reported-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1448486613-17634-3-git-send-email-armbru@redhat.com>
2015-11-26 09:18:04 +01:00
Markus Armbruster
4f2d31fbc0 qjson: Apply nesting limit more sanely
The nesting limit from commit 29c75dd "json-streamer: limit the
maximum recursion depth and maximum token count" applies separately to
braces and brackets.  This makes no sense.  Apply it to their sum,
because that's actually a measure of recursion depth.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1448486613-17634-2-git-send-email-armbru@redhat.com>
2015-11-26 09:17:57 +01:00
Markus Armbruster
3a81a10179 monitor: Plug memory leak on QMP error
Leak introduced in commit 8a4f501..710aec9, v2.4.0.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1446117309-15322-1-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
2015-11-26 09:15:37 +01:00
Gerd Hoffmann
7fe4a41c26 vnc: fix segfault
Commit "c7628bf vnc: only alloc server surface with clients connected"
missed one rarely used codepath (cirrus with guest drivers using 2d
accel) where we have to check for the server surface being present,
to avoid qemu crashing with a NULL pointer dereference.  Add the check.

Reported-by: Anthony PERARD <anthony.perard@citrix.com>
Tested-by: Anthony PERARD <anthony.perard@citrix.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2015-11-26 08:32:11 +01:00
Yuri Pudgorodskiy
44c6e00c3f qga: added another non-interactive gspawn() helper file.
With previous commit we added gspawn-win64-helper-console.exe,
required for gspawn() mingw implementation.
Unfortunatly when running as a service without interactive
desktop, gspawn() also requires another helper app.

Added gspawn-win64-helper.exe and gspawn-win32-helper.exe
for corresponding architectures.

Signed-off-by: Yuri Pudgorodskiy <yur@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Michael Roth <mdroth@linux.vnet.ibm.com>
* remove trailing whitespace
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-11-25 17:56:45 -06:00
Eric Blake
0a982b1bf3 qga: Better mapping of SEEK_* in guest-file-seek
Exposing OS-specific SEEK_ constants in our qapi was a mistake
(if the host has SEEK_CUR as 1, but the guest has it as 2, then
the semantics are unclear what should happen); if we had a time
machine, we would instead expose only a symbolic enum.  It's too
late to change the fact that we have an integer in qapi, but we
can at least document what mapping we want to enforce for all
qga clients (and luckily, it happens to be the mapping that both
Linux and Windows use); then fix the code to match that mapping.
It also helps us filter out unsupported SEEK_DATA and SEEK_HOLE.

In the future, we may wish to move our QGA_SEEK_* constants into
qga/qapi-schema.json, along with updating the schema to take an
alternate type (either the integer, or the string value of the
enum name) - but that's too much risk during hard freeze.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-11-25 17:56:45 -06:00
Marc-André Lureau
4eaab85cb1 tests: add file-write-read test
This test exhibits a POSIX behaviour regarding switching between write
and read. It's undefined result if the application doesn't ensure a
flush between the two operations (with glibc, the flush can be implicit
when the buffer size is relatively small). The previous commit fixes
this test.

Related to:
https://bugzilla.redhat.com/show_bug.cgi?id=1210246

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-11-25 17:56:45 -06:00
Marc-André Lureau
895b00f62a qga: flush explicitly when needed
According to the specification:
http://pubs.opengroup.org/onlinepubs/9699919799/functions/fopen.html

"the application shall ensure that output is not directly followed by
input without an intervening call to fflush() or to a file positioning
function (fseek(), fsetpos(), or rewind()), and input is not directly
followed by output without an intervening call to a file positioning
function, unless the input operation encounters end-of-file."

Without this change, an fwrite() followed by an fread() may lose the
previously written content, as shown in the following test.

Fixes:
https://bugzilla.redhat.com/show_bug.cgi?id=1210246

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
* don't confuse {write,read}() with f{write,read}() in
  commit msg (Laszlo)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-11-25 17:56:31 -06:00
John Snow
9c73517ca5 ide-test: fix timeouts
Use explicit timeouts instead of trying to approximate it by counting
the cumulative duration of nsleep calls.

In practice, the timeout if inb() dwarfed the nsleep delays, and as a
result the real timeout value became a lot larger than 5 seconds.

So: change the semantics from "Not sooner than 5 seconds" to "no more
than 5 seconds" to ensure we don't hang the tester for very long.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1448393771-15483-2-git-send-email-jsnow@redhat.com
2015-11-25 11:37:34 -05:00
Yuri Pudgorodskiy
f2b608ab80 qga: gspawn() console helper to Windows guest agent msi build
This helper, gspawn-win64-helper-console.exe for 64-bit and
gspawn-win32-helper-console.exe for 32-bit environment,
is needed for gspawn() mingw implementation, used by guest-exec command.

Without these files guest-exec command on Windows will not
work with "file not found" diagnostic message.

Signed-off-by: Yuri Pudgorodskiy <yur@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-11-25 10:21:55 -06:00
Michael Roth
68aa262ad0 makefile: fix qemu-ga make install for --disable-tools
ab59e3e introduced a fix for `make install` on w32 that involved
filtering out qemu-ga from $TOOLS install recipe so that we could
append $(EXESUF) to it before attempting to install the binary
via install-prog function.

install-prog takes a list of binaries to install to a particular
directory. If the list is empty it breaks. We guard against this
by ensuring $TOOLS is not empty prior to calling.

However, ab59e3e introduces extra filtering after this check which
can still result on us attempting to call install-prog with an
empty list of binaries. In particular, this occurs if we
build with the --disable-tools configure option, which results
in qemu-ga being the only member of $TOOLS.

Fix this by doing a simple s/qemu-ga/qemu-ga$(EXESUF)/ pass through
$TOOLS instead of filtering out qemu-ga to handle it seperately.

Reported-by: Steve Ellcey <sellcey@imgtec.com>
Cc: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-11-25 10:21:54 -06:00
Peter Maydell
c7933a80bc Merge remote-tracking branch 'remotes/juanquintela/tags/migration/20151125' into staging
migration/next for 20151125

# gpg: Signature made Wed 25 Nov 2015 14:28:47 GMT using RSA key ID 5872D723
# gpg: Good signature from "Juan Quintela <quintela@redhat.com>"
# gpg:                 aka "Juan Quintela <quintela@trasno.org>"

* remotes/juanquintela/tags/migration/20151125:
  block-migration: limit the memory usage
  Assume madvise for (no)hugepage works

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-25 16:20:58 +00:00
Peter Maydell
1a4dab849d Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches

# gpg: Signature made Wed 25 Nov 2015 13:33:14 GMT using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"

* remotes/kevin/tags/for-upstream:
  qemu-iotests: Add -nographic when starting QEMU in 119 and 120
  block/qapi: Plug memory leak on query-block error path
  raw-posix.c: Make GetBSDPath() handle caching options
  nand: fix flash erase when oob is in memory
  test-aio: Fix event notifier cleanup
  tests/Makefile: Add more dependencies for test-timed-average

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-25 14:47:06 +00:00
Wen Congyang
f77dcdbc76 block-migration: limit the memory usage
If we set migration speed in a very large value, block-migration will try to read
all data to the memory. Because
    (block_mig_state.submitted + block_mig_state.read_done) * BLOCK_SIZE
will be overflow, and it will be always less than rate limit.

There is no need to read too many data into memory when the rate limit is very large.
So limit the memory usage can fix the overflow problem.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-25 15:27:28 +01:00
Dr. David Alan Gilbert
1d7414396f Assume madvise for (no)hugepage works
madvise() returns EINVAL in the case of many failures, but also
returns it in cases where the host kernel doesn't have THP enabled.
Postcopy only really cares that THP is off before it detects faults,
and turns it back on afterwards; so we're going to have
to assume that if the madvise fails then the host just doesn't do
THP and we can carry on with the postcopy.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Tested-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-25 15:27:28 +01:00
Kevin Wolf
8c34d891b1 Merge remote-tracking branch 'mreitz/tags/pull-block-for-kevin-2015-11-25' into queue-block
One block patch for qemu 2.5-rc2.

# gpg: Signature made Wed Nov 25 14:30:45 2015 CET using RSA key ID E838ACAD
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>"

* mreitz/tags/pull-block-for-kevin-2015-11-25:
  qemu-iotests: Add -nographic when starting QEMU in 119 and 120

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-25 14:33:01 +01:00
Fam Zheng
4d7f853ff0 qemu-iotests: Add -nographic when starting QEMU in 119 and 120
Otherwise, a window flashes on my desktop (built with SDL). Add this as
other cases have it.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1448245930-15031-1-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2015-11-25 14:29:39 +01:00
Markus Armbruster
903c341d57 block/qapi: Plug memory leak on query-block error path
Spotted by Coverity.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-25 14:27:43 +01:00
Programmingkid
98caa5bc00 raw-posix.c: Make GetBSDPath() handle caching options
Add support for caching options that can be specified from the command
line.

The CD-ROM raw char device bypasses the host page cache and therefore
has alignment requirements.  Alignment probing is necessary so only use
the raw char device if BDRV_O_NOCACHE is set.

This patch fixes -cdrom /dev/cdrom on Mac OS X hosts, where bdrv_read()
used to fail due to misaligned requests during image format probing.

Signed-off-by: John Arbuckle <programmingkidx@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-25 14:27:43 +01:00
Ricard Wanderlof
8e37ca6d0b nand: fix flash erase when oob is in memory
For the "main area on file, oob in memory" case, fix the shifts so that
we erase the correct number of pages.

Signed-off-by: Ricard Wanderlöf <ricardw@axis.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-25 14:27:43 +01:00
Kevin Wolf
7595ed7439 test-aio: Fix event notifier cleanup
One test case closed an event notifier (event_notifier_cleanup())
without first disabling it (set_event_notifier(..., NULL)). This
resulted in a leftover handle 0 that was added to each subsequent
WaitForMultipleObjects() call, causing the function to fail (invalid
handle).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-11-25 14:27:43 +01:00
Kevin Wolf
5e41fbffa1 tests/Makefile: Add more dependencies for test-timed-average
'make check' failed to compile the test case for mingw because of
undefined references. Pull in a few more dependencies so that it builds.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-11-25 14:27:43 +01:00
Peter Maydell
e85dda8070 Merge remote-tracking branch 'remotes/sstabellini/tags/xen-20151125' into staging
Xen 2015/11/25

# gpg: Signature made Wed 25 Nov 2015 11:19:26 GMT using RSA key ID 70E1AE90
# gpg: Good signature from "Stefano Stabellini <stefano.stabellini@eu.citrix.com>"

* remotes/sstabellini/tags/xen-20151125:
  xen_disk: Remove ioreq.postsync
  xen: fix usage of xc_domain_create in domain builder

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-25 12:09:34 +00:00
Paolo Bonzini
2b1641d0a2 MAINTAINERS: Update TCG CPU cores section
These are the people that I think have been touching it lately
or reviewing patches.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-25 12:50:20 +01:00
Victor Kaplansky
7cf32491ea tests/vhost-user-bridge: read command line arguments
Now some vhost-user-bridge parameters can be passed from the
command line:

Usage: prog [-u ud_socket_path] [-l lhost:lport] [-r rhost:rport]
        -u path to unix doman socket. default: /tmp/vubr.sock
        -l local host and port. default: 127.0.0.1:4444
        -r remote host and port. default: 127.0.0.1:5555

Signed-off-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-11-25 13:42:38 +02:00
Victor Kaplansky
85ea9da5b8 tests/vhost-user-bridge: propose GUEST_ANNOUNCE feature
The backend has to know whether VIRTIO_NET_F_GUEST_ANNOUNCE was
negotiated, so, as a hack we propose the feature by
vhost-user-bridge during the feature negotiation.

Signed-off-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-11-25 13:42:38 +02:00
Michael S. Tsirkin
c61f09ed85 vhost-user: clarify start and enable
It seems that we currently have some duplication between
started and enabled states.

The actual reason is that enable is not documented correctly:
what it does is connecting ring to the backend.

This is important for MQ, because a Linux guest expects TX
packets to be completed even if it disables some queues
temporarily.

Cc: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Cc: Victor Kaplansky <victork@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-11-25 13:42:38 +02:00
Wen Congyang
d39c87d707 vhost-user: set link down when the char device is closed
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2015-11-25 13:42:38 +02:00
Eduardo Habkost
463b52f285 pc: Don't set hw_version on pc-*-2.5
Now that qemu_hw_version() returns a fixed "2.5+" string instead
of QEMU_VERSION, we don't need to set hw_version on pc-*-2.5
explicitly.

Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-11-25 13:42:37 +02:00
Eduardo Habkost
fac862ffa6 osdep: Change default value of qemu_hw_version() to "2.5+"
There are two issues with qemu_hw_version() today:

1) If a machine has hw_version set, the value returned by it is
   not very useful, because it is not the actual QEMU version.
2) If a machine does't set hw_version, the return value of
   qemu_hw_version() is broken, because it will change when
   upgrading QEMU.

For those reasons, using qemu_hw_version() is strongly
discouraged, and should be used only in code that used
QEMU_VERSION in the past and needs to keep compatibility.

To fix (2), instead of making every machine broken by default
unless they set hw_version, make qemu_hw_version() simply return
"2.5+" if qemu_set_hw_version() is not called.

Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-11-25 13:42:37 +02:00
Peter Maydell
1aae36df4b Merge remote-tracking branch 'remotes/armbru/tags/pull-ivshmem-2015-11-25' into staging
ivshmem patches for 2.5

# gpg: Signature made Wed 25 Nov 2015 09:25:38 GMT using RSA key ID EB918653
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>"
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>"

* remotes/armbru/tags/pull-ivshmem-2015-11-25:
  ivshmem: Rename property memdev to x-memdev for 2.5
  ivshmem: Mark questionable socket type test FIXME
  tests/ivshmem-test: Supply missing initializer in get_device()
  qemu-doc: Fix ivshmem usage example with shm=...
  qemu-doc: Fix ivshmem example markup

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-25 11:38:03 +00:00
Alberto Garcia
22037db38c xen_disk: Remove ioreq.postsync
This code has been dead for three years (since commit 7e7b7cba1).

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2015-11-25 11:04:55 +00:00
Markus Armbruster
1d649244b3 ivshmem: Rename property memdev to x-memdev for 2.5
The device's guest interface and its QEMU user interface are
flawed^Whotly debated.  We'll resolve that in the next development
cycle, probably by deprecating the device in favour of a cleaned up,
but not quite compatible revision.

To avoid adding more baggage to the soon-to-be-deprecated interface,
mark property "memdev" as experimental, by renaming it to "x-memdev".
It's the only recent user interface change.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1448384789-14830-6-git-send-email-armbru@redhat.com>
[Update of qemu-doc.texi squashed in]
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2015-11-25 10:24:27 +01:00
Markus Armbruster
2825717c02 ivshmem: Mark questionable socket type test FIXME
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2015-11-25 10:24:15 +01:00
Markus Armbruster
1613094766 tests/ivshmem-test: Supply missing initializer in get_device()
If the device isn't found, the assertion uses dev without
initialization.  Fix that.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1448384789-14830-4-git-send-email-armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2015-11-25 10:24:04 +01:00
Markus Armbruster
a9282c25a5 qemu-doc: Fix ivshmem usage example with shm=...
The example suggests you can omit "shm".  This isn't true; you must
specify exactly one of "shm", "chardev", "memdev".  Fix it.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1448384789-14830-3-git-send-email-armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2015-11-25 10:23:52 +01:00
Markus Armbruster
50d34c4e35 qemu-doc: Fix ivshmem example markup
Use @var{foo} like we do everywhere else, not <foo>.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1448384789-14830-2-git-send-email-armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2015-11-25 10:23:33 +01:00
Alberto Garcia
73a27d9ac3 atapi: Fix code indentation
This was accidentally changed by commit 5f81724d

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 93fb43522e3b8dddb6c709d568919347d9a5ba3f.1448367341.git.berto@igalia.com
Signed-off-by: John Snow <jsnow@redhat.com>
2015-11-24 14:56:49 -05:00
Alberto Garcia
36be0929f5 atapi: Account for failed and invalid operations in cd_read_sector()
Commit 5f81724d made PIO read requests async but didn't add the
relevant block_acct_failed() and block_acct_invalid() calls.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 9b87e09d61019c128139b6c999ed0c07f0674170.1448367341.git.berto@igalia.com
Signed-off-by: John Snow <jsnow@redhat.com>
2015-11-24 14:56:48 -05:00
John Snow
a421f3c385 ide-test: cdrom_pio_impl fixup
Final tidying: move the interrupt wait into the loop,
document that the status read clears the IRQ, and move
the final interrupt check outside of the loop.

This should be functionally equivalent to how it works
currently, but a little less ambiguous and slightly more
explicit about the state transitions.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1448060035-31973-3-git-send-email-jsnow@redhat.com
2015-11-24 14:51:43 -05:00
Peter Maydell
4b6eda626f Merge remote-tracking branch 'remotes/lalrae/tags/mips-20151124' into staging
MIPS patches 2015-11-24

Changes:
* Fixes for enabling/disabling 64-bit addressing

# gpg: Signature made Tue 24 Nov 2015 14:54:35 GMT using RSA key ID 0B29DA6B
# gpg: Good signature from "Leon Alrae <leon.alrae@imgtec.com>"

* remotes/lalrae/tags/mips-20151124:
  target-mips: flush QEMU TLB when disabling 64-bit addressing
  target-mips: Fix exceptions while UX=0

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-24 17:05:06 +00:00
Peter Maydell
d9636b6c2b Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20151124' into staging
target-arm queue:
 * fix minimum RAM check warning on xlnx-ep108
 * remove unused define from aarch64-linux-user.mak config
 * don't mask out bits [47:40] in ARMv8 LPAE descriptors
 * correct unallocated instruction checks for ldst_excl

# gpg: Signature made Tue 24 Nov 2015 14:17:10 GMT using RSA key ID 14360CDE
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>"
# gpg:                 aka "Peter Maydell <pmaydell@gmail.com>"
# gpg:                 aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>"

* remotes/pmaydell/tags/pull-target-arm-20151124:
  target-arm/translate-a64.c: Correct unallocated checks for ldst_excl
  target-arm: Don't mask out bits [47:40] in LPAE descriptors for v8
  default-configs/aarch64-linux-user.mak: Remove unused define
  xlnx-ep108: Fix minimum RAM check

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-24 14:22:38 +00:00
Peter Maydell
e14f0eb12f target-arm/translate-a64.c: Correct unallocated checks for ldst_excl
The checks for the unallocated encodings in the ldst_excl group
(exclusives and load-acquire/store-release) were not correct. This
error meant that in turn we ended up with code attempting to handle
the non-existent case of "non-exclusive load-acquire/store-release
pair". Delete that broken and now unreachable code.

Reported-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Sergey Fedorov <serge.fdrv@gmail.com>
2015-11-24 14:12:15 +00:00
Peter Maydell
6109769a8b target-arm: Don't mask out bits [47:40] in LPAE descriptors for v8
In an LPAE format descriptor in ARMv8 the address field extends
up to bit 47, not just bit 39. Correct the masking so we don't
give incorrect results if the output address size is greater
than 40 bits, as it can be for AArch64.

(Note that we don't yet support the new-in-v8 Address Size fault which
should be generated if any translation table entry or TTBR contains
an address with non-zero bits above the most significant bit of the
maximum output address size.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Message-id: 1448029971-9875-1-git-send-email-peter.maydell@linaro.org
2015-11-24 14:12:15 +00:00
Peter Maydell
f72c0a79f7 default-configs/aarch64-linux-user.mak: Remove unused define
The uses of the CONFIG_GDBSTUB_XML define were removed in commit
b77abd95a9, but the define in aarch64-linux-user.mak somehow
escaped the cull (the patchset probably crossed in the mail with
the patches adding aarch64 support). Remove the stray define.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Message-id: 1447690178-4560-1-git-send-email-peter.maydell@linaro.org
2015-11-24 14:12:15 +00:00
Alistair Francis
5b4a047fbe xlnx-ep108: Fix minimum RAM check
The minimum RAM check logic for the Xiilnx EP108 was off by one,
which caused a false positive. Correct the logic to only print
warnings when the RAM is below 0x8000000.

Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Message-id: fba8112ca7b01efd72553332b8045ecf107b7662.1448021100.git.alistair.francis@xilinx.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-24 14:12:15 +00:00
Leon Alrae
f93c3a8d0c target-mips: flush QEMU TLB when disabling 64-bit addressing
CP0.Status.KX/SX/UX bits are responsible for enabling access to 64-bit
Kernel/Supervisor/User Segments. If bit is cleared an access to
corresponding segment should generate Address Error Exception.

However, the guest may still be able to access some pages belonging to
the disabled 64-bit segment because we forget to flush QEMU TLB.

This patch fixes it.

Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
2015-11-24 11:01:03 +00:00
James Hogan
7871abb94c target-mips: Fix exceptions while UX=0
Commit 01f7288579 ("target-mips: Status.UX/SX/KX enable 32-bit address
wrapping") added a new hflag MIPS_HFLAG_AWRAP, which indicates that
64-bit addressing is disallowed in the current mode, so hflag users
don't need to worry about the complexities of working that out, for
example checking both MIPS_HFLAG_KSU and MIPS_HFLAG_UX.

However when exceptions are taken outside of exception level,
mips_cpu_do_interrupt() manipulates the env->hflags directly rather than
using compute_hflags() to update them, and this code wasn't updated
accordingly. As a result, when UX is cleared, MIPS_HFLAG_AWRAP is set,
but it doesn't get cleared on entry back into kernel mode due to an
exception. Kernel mode then cannot access the 64-bit segments resulting
in a nested exception loop. The same applies to errors and debug
exceptions.

Fix by updating mips_cpu_do_interrupt() to clear the MIPS_HFLAG_WRAP
flag when necessary, according to compute_hflags().

Fixes: 01f7288579 ("target-mips: Status.UX/SX/KX enable 32-bit...")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Leon Alrae <leon.alrae@imgtec.com>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Leon Alrae <leon.alrae@imgtec.com>
Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
2015-11-24 11:01:03 +00:00
Peter Maydell
229c0372cf Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging
# gpg: Signature made Tue 24 Nov 2015 08:04:07 GMT using RSA key ID 81AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"

* remotes/stefanha/tags/block-pull-request:
  virtio-blk: Move resetting of req->mr_next to virtio_blk_handle_rw_error
  parallels: dirty BAT properly for continuous allocations

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-24 10:27:19 +00:00
Fam Zheng
466138dc68 virtio-blk: Move resetting of req->mr_next to virtio_blk_handle_rw_error
"werror=report" would free the req in virtio_blk_handle_rw_error, we
mustn't write to it in that case.

Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1448239280-15025-1-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-11-24 09:27:49 +08:00
Vladimir Sementsov-Ogievskiy
c9f6856ded parallels: dirty BAT properly for continuous allocations
This patch marks part of the BAT dirty properly. There is a possibility that
multy-block allocation could have one block allocated on one BAT page and
next block on the next page. The code without the patch could not save
updated position to the file.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Message-id: 1447779778-26062-1-git-send-email-den@openvz.org
CC: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-11-24 09:25:36 +08:00
Peter Maydell
5522a841ca Merge remote-tracking branch 'remotes/ehabkost/tags/numa-pull-request' into staging
NUMA fix for -rc2

# gpg: Signature made Mon 23 Nov 2015 12:45:34 GMT using RSA key ID 984DC5A6
# gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>"

* remotes/ehabkost/tags/numa-pull-request:
  hostmem: Ignore ENOSYS while setting MPOL_DEFAULT

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-23 16:07:49 +00:00
Peter Maydell
68c61282fe Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20151123' into staging
Last minute fix.

# gpg: Signature made Mon 23 Nov 2015 12:17:26 GMT using RSA key ID 4DD0279B
# gpg: Good signature from "Richard Henderson <rth7680@gmail.com>"
# gpg:                 aka "Richard Henderson <rth@redhat.com>"
# gpg:                 aka "Richard Henderson <rth@twiddle.net>"

* remotes/rth/tags/pull-tcg-20151123:
  tcg: Fix highwater check

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-23 13:54:41 +00:00
Pavel Fedin
a3567ba1e6 hostmem: Ignore ENOSYS while setting MPOL_DEFAULT
Currently hostmem backend fails if CONFIG_NUMA is enabled in QEMU
(the default) but NUMA is not supported by the kernel. This makes
it impossible to use ivshmem in such configurations.

This patch fixes the problem by ignoring ENOSYS error if policy is set to
MPOL_DEFAULT. This way the code behaves in the same way as if CONFIG_NUMA
was not defined. qemu will still fail if the user specifies some other
policy, so that the user knows it.

Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-11-23 10:43:38 -02:00
John Clarke
644da9b39e tcg: Fix highwater check
A simple typo in the variable to use when comparing vs the highwater mark.
Reports are that qemu can in fact segfault occasionally due to this mistake.

Signed-off-by: John Clarke <johnc@kirriwa.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-11-23 13:16:05 +01:00
Peter Maydell
541abd10a0 Update version for v2.5.0-rc1 release
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-20 17:43:46 +00:00
Peter Lieven
f348daf3d5 tests: fix cdrom_pio_impl in ide-test
The check for the cleared BSY flag has to be performed
before each data transfer and not just before the
first one.

Commit 5f81724d revealed this glitch as the BSY flag
was not set in ATAPI PIO transfers before.

While at it fix the descriptions and add a comment before
the nested for loop that transfers the data.

Signed-off-by: Peter Lieven <pl@kamp.de>
Message-id: 1448029742-19771-1-git-send-email-pl@kamp.de
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-20 17:37:06 +00:00
Peter Maydell
28c3e6ee72 Merge remote-tracking branch 'remotes/afaerber/tags/qom-devices-for-peter' into staging
QOM infrastructure fixes and device conversions

* Fix for properties on objects > 4 GiB
* Performance improvements for QOM property handling
* Assertion cleanups
* MAINTAINERS additions

# gpg: Signature made Thu 19 Nov 2015 14:32:16 GMT using RSA key ID 3E7E013F
# gpg: Good signature from "Andreas Färber <afaerber@suse.de>"
# gpg:                 aka "Andreas Färber <afaerber@suse.com>"

* remotes/afaerber/tags/qom-devices-for-peter:
  MAINTAINERS: Add check-qom-{interface,proplist} to QOM
  qom: Clean up assertions to display values on failure
  qom: Replace object property list with GHashTable
  qom: Add a test case for complex property finalization
  net: Convert net filter code to use object property iterators
  ppc: Convert spapr code to use object property iterators
  vl: Convert machine help code to use object property iterators
  qmp: Convert QMP code to use object property iterators
  qom: Introduce ObjectPropertyIterator struct for iteration
  qdev: Change Property::offset field to ptrdiff_t type

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-19 17:54:46 +00:00
Peter Maydell
348c32709f Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
vhost, pc: fixes for 2.5

Fixes all over the place.

This also re-enables a test we disabled in 2.5 cycle
now that there's a way not to get a warning from it.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# gpg: Signature made Thu 19 Nov 2015 13:27:43 GMT using RSA key ID D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>"

* remotes/mst/tags/for_upstream:
  exec: silence hugetlbfs warning under qtest
  tests: re-enable vhost-user-test
  acpi: fix buffer overrun on migration
  vhost-user: fix log size
  vhost-user: ignore qemu-only features
  specs/vhost-user: fix spec to match reality
  tests/vhost-user-bridge: implement logging of dirty pages
  i440fx: print an error message if user tries to enable iommu
  q35: Check propery to determine if iommu is set
  vhost-user: start/stop all rings
  vhost-user: print original request on error
  vhost-user-test: support VHOST_USER_SET_VRING_ENABLE
  vhost-user: update spec description
  vhost: don't send RESET_OWNER at stop
  vhost: let SET_VRING_ENABLE message depends on protocol feature

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-19 16:26:08 +00:00
Peter Maydell
c601a244a4 Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20151119' into staging
target-arm queue:
 * add missing condexec updates when emulating architectural breakpoints
   and coprocessor access checks in Thumb translation (could in theory
   cause problems when these happened inside a Thumb IT block and an
   exception was taken)
 * arm_gic: correctly restore nested IRQ priority

# gpg: Signature made Thu 19 Nov 2015 13:29:37 GMT using RSA key ID 14360CDE
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>"
# gpg:                 aka "Peter Maydell <pmaydell@gmail.com>"
# gpg:                 aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>"

* remotes/pmaydell/tags/pull-target-arm-20151119:
  target-arm: Update condexec before arch BP check in AA32 translation
  target-arm: Update condexec before CP access check in AA32 translation
  hw/arm_gic: Correctly restore nested irq priority

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-19 15:56:51 +00:00
Peter Maydell
80fda8f609 Merge remote-tracking branch 'remotes/juanquintela/tags/migration/20151119' into staging
migration/next for 20151119

# gpg: Signature made Thu 19 Nov 2015 11:17:07 GMT using RSA key ID 5872D723
# gpg: Good signature from "Juan Quintela <quintela@redhat.com>"
# gpg:                 aka "Juan Quintela <quintela@trasno.org>"

* remotes/juanquintela/tags/migration/20151119:
  migration: normalize locking in migration/savevm.c
  migration: implement bdrv_all_find_vmstate_bs helper
  migration: reorder processing in hmp_savevm
  snapshot: create bdrv_all_create_snapshot helper
  migration: drop find_vmstate_bs check in hmp_delvm
  snapshot: create bdrv_all_find_snapshot helper
  migration: factor our snapshottability check in load_vmstate
  snapshot: create bdrv_all_goto_snapshot helper
  snapshot: create bdrv_all_delete_snapshot helper
  snapshot: return error code from bdrv_snapshot_delete_by_id_or_name
  snapshot: create helper to test that block drivers supports snapshots
  Unneeded NULL check
  migration: Dead assignment of current_time
  Set last_sent_block

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-19 15:05:06 +00:00
Andreas Färber
9f4aa7cef2 MAINTAINERS: Add check-qom-{interface,proplist} to QOM
Add the QOM unit tests to the QOM maintenance area so that maintainers
get CC'ed on changes and to document QOM test coverage.

Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-11-19 15:15:40 +01:00
Andreas Färber
8438a13543 qom: Clean up assertions to display values on failure
Instead of using g_assert() for integer comparisons, use
g_assert_cmpint() so that we can see the respective values.

While at it, fix one stray indentation.

Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-11-19 15:15:33 +01:00
Pavel Fedin
b604a854e8 qom: Replace object property list with GHashTable
ARM GICv3 systems with large number of CPUs create lots of IRQ pins. Since
every pin is represented as a property, number of these properties becomes
very large. Every property add first makes sure there's no duplicates.
Traversing the list becomes very slow, therefore QEMU initialization takes
significant time (several seconds for e. g. 16 CPUs).

This patch replaces list with GHashTable, making lookup very fast. The only
drawback is that object_child_foreach() and object_child_foreach_recursive()
cannot add or remove properties during traversal, since GHashTableIter does
not have modify-safe version. However, the code seems not to modify objects
via these functions.

Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Tested-by: Pavel Fedin <p.fedin@samsung.com>
[AF: Fixed object_property_del_{all,child}() issues;
     g_hash_table_contains() -> g_hash_table_lookup(), suggested by Daniel]
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-11-19 15:00:15 +01:00
Marc-André Lureau
1c7ba94a18 exec: silence hugetlbfs warning under qtest
vhost-user-test prints a warning. A test should not need to run on
hugetlbfs, let's silence the warning under qtest. The
condition can't check on qtest_enabled() since vhost-user-test actually
doesn't use qtest accel. However, qtest_driver() can be used, if
qtest_init() is called early enough. For that reason, move chardev and
qtest initialization early.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-11-19 15:26:05 +02:00
Marc-André Lureau
421f4448ce tests: re-enable vhost-user-test
Commit 7fe34ca9c2 actually disabled vhost-user-test altogether,
since CONFIG_VHOST_NET is a per-target config variable.

tests/vhost-user-test is already x86/x64 softmmu specific test, in order
to enable it correctly, kvm & vhost-net are also conditions. To check
that, set CONFIG_VHOST_NET_TEST_$target when kvm is also enabled.

Since "check-qtest-x86_64-y = $(check-qtest-i386-y)", avoid duplication
when both x86 & x64 are enabled.

Other targets than x86 aren't enabled yet, and is intentionally left as
a future improvement, since I can't easily test those.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-11-19 15:26:05 +02:00
Michael S. Tsirkin
d9a3b33d2c acpi: fix buffer overrun on migration
ich calls acpi_gpe_init with length ICH9_PMIO_GPE0_LEN so
ICH9_PMIO_GPE0_LEN/2 bytes are allocated, but then the full
ICH9_PMIO_GPE0_LEN bytes are migrated.

As a quick work-around, allocate twice the memory.
We'll probably want to tweak code to avoid
migrating the extra ICH9_PMIO_GPE0_LEN/2 bytes,
but that is a bit trickier to do without breaking
migration compatibility.

Tested-by: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Reported-by: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-11-19 15:26:00 +02:00
Sergey Fedorov
ce8a1b5449 target-arm: Update condexec before arch BP check in AA32 translation
Architectural breakpoint check could raise an exceptions, thus condexec
bits should be updated before calling gen_helper_check_breakpoints().

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Message-id: 1447767527-21268-3-git-send-email-serge.fdrv@gmail.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-19 12:51:08 +00:00
Sergey Fedorov
43bfa4a100 target-arm: Update condexec before CP access check in AA32 translation
Coprocessor access instructions are allowed inside IT block.
gen_helper_access_check_cp_reg() can raise an exceptions thus condexec
bits should be updated before.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Message-id: 1447767527-21268-2-git-send-email-serge.fdrv@gmail.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-19 12:11:42 +00:00
François Baldassari
a859595791 hw/arm_gic: Correctly restore nested irq priority
Upon activating an interrupt, set the corresponding priority bit in the
APR/NSAPR registers without touching the currently set bits. In the event
of nested interrupts, the GIC will then have the information it needs to
restore the priority of the pre-empted interrupt once the higher priority
interrupt finishes execution.

Signed-off-by: François Baldassari <francois@pebble.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-19 12:09:52 +00:00
Denis V. Lunev
79b3c12ac5 migration: normalize locking in migration/savevm.c
basically all bdrv_* operations must be called under aio_context_acquire
except ones with bdrv_all prefix.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
Tested-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-19 11:50:00 +01:00
Denis V. Lunev
7cb1448149 migration: implement bdrv_all_find_vmstate_bs helper
The patch also ensures proper locking for the operation.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
Tested-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-19 11:50:00 +01:00
Denis V. Lunev
0b46160521 migration: reorder processing in hmp_savevm
State deletion can be performed on running VM which reduces VM downtime
This approach looks a bit more natural.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Tested-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-19 11:50:00 +01:00
Denis V. Lunev
a9085f9b55 snapshot: create bdrv_all_create_snapshot helper
to create snapshot for all loaded block drivers.

The patch also ensures proper locking.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
Tested-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-19 11:50:00 +01:00
Denis V. Lunev
c6258b04f1 migration: drop find_vmstate_bs check in hmp_delvm
There is no much sense to do the check and write warning.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Tested-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-19 11:50:00 +01:00
Denis V. Lunev
723ccda1a0 snapshot: create bdrv_all_find_snapshot helper
to check that snapshot is available for all loaded block drivers.
The check bs != bs1 in hmp_info_snapshots is an optimization. The check
for availability of this snapshot will return always true as the list
of snapshots was collected from that image.

The patch also ensures proper locking.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
Tested-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-19 11:50:00 +01:00
Denis V. Lunev
849f96e2f7 migration: factor our snapshottability check in load_vmstate
We should check that all inserted and not read-only images support
snapshotting. This could be made using already invented helper
bdrv_all_can_snapshot().

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
Tested-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-19 11:50:00 +01:00
Denis V. Lunev
4c1cdbaad0 snapshot: create bdrv_all_goto_snapshot helper
to switch to snapshot on all loaded block drivers.

The patch also ensures proper locking.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
Tested-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-19 11:50:00 +01:00
Denis V. Lunev
9b00ea376d snapshot: create bdrv_all_delete_snapshot helper
to delete snapshots from all loaded block drivers.

The patch also ensures proper locking.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
Tested-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-19 11:50:00 +01:00
Denis V. Lunev
25af925fff snapshot: return error code from bdrv_snapshot_delete_by_id_or_name
this will make code better in the next patch

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
Tested-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-19 11:50:00 +01:00
Denis V. Lunev
e9ff957ac2 snapshot: create helper to test that block drivers supports snapshots
The patch enforces proper locking for this operation.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
Tested-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-19 11:50:00 +01:00
Dr. David Alan Gilbert
5df5416e63 Unneeded NULL check
The check is unneccesary, we read the value at the start of the
thread, use it, and never change it.  The value is checked to be
non-NULL before thread creation.

Spotted by coverity, CID 1339211

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-19 11:49:53 +01:00
Dr. David Alan Gilbert
95a7788b2f migration: Dead assignment of current_time
I set current_time before the postcopy test but never use it;
(I think this was from the original version where it was time based).
Spotted by coverity, CID 1339208

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-19 11:49:53 +01:00
Dr. David Alan Gilbert
84e7b80a05 Set last_sent_block
In a82d593b61 I accidentally removed the setting of
last_sent_block,  put it back.

Symptoms:
  Multithreaded compression only uses one thread.
  Migration is a bit less efficient since it won't use 'cont' flags.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Fixes: a82d593b61
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-19 11:49:53 +01:00
Daniel P. Berrange
8c4d156c18 qom: Add a test case for complex property finalization
Devices have some quite complex object child/link relationships
which place some requirements on the object_property_del_all()
function to consider that properties can be modified while
being iterated over.

This extends the QOM property test case to replicate the
device like structure and expose any potential bugs in the
object_property_del_all() function.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-11-18 21:13:49 +01:00
Daniel P. Berrange
456fb0bfe0 net: Convert net filter code to use object property iterators
Stop directly accessing the Object::properties field data
structure and instead use the formal object property iterator
APIs. This insulates the code from future data structure
changes in the Object struct.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Tested-by: Pavel Fedin <p.fedin@samsung.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-11-18 21:13:49 +01:00
Daniel P. Berrange
9a842f7d3c ppc: Convert spapr code to use object property iterators
Stop directly accessing the Object::properties field data
structure and instead use the formal object property iterator
APIs. This insulates the code from future data structure
changes in the Object struct.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Tested-by: Pavel Fedin <p.fedin@samsung.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-11-18 21:13:49 +01:00
Daniel P. Berrange
2465bc564d vl: Convert machine help code to use object property iterators
Stop directly accessing the Object::properties field data
structure and instead use the formal object property iterator
APIs. This insulates the code from future data structure
changes in the Object struct.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Tested-by: Pavel Fedin <p.fedin@samsung.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-11-18 21:13:48 +01:00
Daniel P. Berrange
1b30c094dc qmp: Convert QMP code to use object property iterators
Stop directly accessing the Object::properties field data
structure and instead use the formal object property iterator
APIs. This insulates the code from future data structure
changes in the Object struct.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Tested-by: Pavel Fedin <p.fedin@samsung.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-11-18 21:13:48 +01:00
Daniel P. Berrange
a00c948241 qom: Introduce ObjectPropertyIterator struct for iteration
Some users of QOM need to be able to iterate over properties
defined against an object instance. Currently they are just
directly using the QTAIL macros against the object properties
data structure.

This is bad because it exposes them to changes in the data
structure used to store properties, as well as changes in
functionality such as ability to register properties against
the class.

This provides an ObjectPropertyIterator struct which will
insulate the callers from the particular data structure
used to store properties. It can be used thus

  ObjectProperty *prop;
  ObjectPropertyIterator *iter;

  iter = object_property_iter_init(obj);
  while ((prop = object_property_iter_next(iter))) {
      ... do something with prop ...
  }
  object_property_iter_free(iter);

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Tested-by: Pavel Fedin <p.fedin@samsung.com>
[AF: Fixed examples, style cleanups]
Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-11-18 21:13:38 +01:00
Ildar Isaev
3b6ca4022d qdev: Change Property::offset field to ptrdiff_t type
Property::offset field is calculated as a diff between two pointers:

  arrayprop->prop.offset = eltptr - (void *)dev;

If offset is declared as int, this subtraction can cause type overflow,
thus leading to failure of the subsequent assertion:

  assert(qdev_get_prop_ptr(dev, &arrayprop->prop) == eltptr);

So ptrdiff_t should be used instead.

Signed-off-by: Ildar Isaev <ild@inbox.ru>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-11-18 21:11:55 +01:00
Peter Maydell
8f28030903 Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches

# gpg: Signature made Wed 18 Nov 2015 15:28:32 GMT using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"

* remotes/kevin/tags/for-upstream:
  block: Call external_snapshot_clean after blockdev-snapshot
  blockdev: Add missing bdrv_unref() in drive-backup
  iotests: fix race in 030
  nand: fix address overflow

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-18 17:07:24 +00:00
Michael S. Tsirkin
48854f57ce vhost-user: fix log size
commit 2b8819c6ee
("vhost-user: modify SET_LOG_BASE to pass mmap size and offset")
passes log size in units of 4 byte chunks instead of the
expected size in bytes.

Fix this up.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-11-18 18:49:27 +02:00
Michael S. Tsirkin
72018d1e19 vhost-user: ignore qemu-only features
Some features (such as ctrl vq) are supported
by qemu without need to communicate with the
backend.

Drop them from the feature mask so we set them
unconditionally.

Reported-by: Victor Kaplansky <vkaplans@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-11-18 18:49:12 +02:00
Peter Maydell
7199c89d8c Merge remote-tracking branch 'remotes/berrange/tags/qcrypto-fixes-20151118-1' into staging
Pull qcrypto fixes 2015/11/18 v1

# gpg: Signature made Wed 18 Nov 2015 15:44:07 GMT using RSA key ID 15104FDF
# gpg: Good signature from "Daniel P. Berrange <dan@berrange.com>"
# gpg:                 aka "Daniel P. Berrange <berrange@redhat.com>"

* remotes/berrange/tags/qcrypto-fixes-20151118-1:
  crypto: avoid passing NULL to access() syscall
  crypto: fix leaks in TLS x509 helper functions
  crypto: fix mistaken setting of Error in success code path
  crypto: fix leak of gnutls_dh_params_t data on credential unload

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-18 16:27:15 +00:00
Daniel P. Berrange
08cb175a24 crypto: avoid passing NULL to access() syscall
The qcrypto_tls_creds_x509_sanity_check() checks whether
certs exist by calling access(). It is valid for this
method to be invoked with certfile==NULL though, since
for client credentials the cert is optional. This caused
it to call access(NULL), which happens to be harmless on
current Linux, but should none the less be avoided.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2015-11-18 15:42:26 +00:00
Kevin Wolf
ca4fa82fe6 Merge remote-tracking branch 'mreitz/tags/pull-block-for-kevin-2015-11-18' into queue-block
One block patch for qemu 2.5-rc1.

# gpg: Signature made Wed Nov 18 16:26:59 2015 CET using RSA key ID E838ACAD
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>"

* mreitz/tags/pull-block-for-kevin-2015-11-18:
  block: Call external_snapshot_clean after blockdev-snapshot

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-18 16:27:44 +01:00
Alberto Garcia
4ad6f3db7d block: Call external_snapshot_clean after blockdev-snapshot
Otherwise the AioContext will never be released.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 1447419624-21918-1-git-send-email-berto@igalia.com
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2015-11-18 16:26:15 +01:00
Max Reitz
0702d3d88c blockdev: Add missing bdrv_unref() in drive-backup
All error paths after a successful bdrv_open() of target_bs should
contain a bdrv_unref(target_bs). This one did not yet, so add it.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-18 16:05:56 +01:00
Daniel P. Berrange
7b35030eed crypto: fix leaks in TLS x509 helper functions
The test_tls_get_ipaddr() method forgot to free the returned data
from getaddrinfo().

The test_tls_write_cert_chain() method forgot to free the allocated
buffer holding the certificate data after writing it out to a file.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2015-11-18 14:56:58 +00:00
Daniel P. Berrange
6ef8cd7a41 crypto: fix mistaken setting of Error in success code path
The qcrypto_tls_session_check_certificate() method was setting
an Error even when the ACL check suceeded. This didn't affect
the callers detection of errors because they relied on the
function return status, but this did cause a memory leak since
the caller would not free an Error they did not expect to be
set.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2015-11-18 14:56:58 +00:00
Daniel P. Berrange
61b9251a3a crypto: fix leak of gnutls_dh_params_t data on credential unload
The QCryptoTLSCredsX509 object was not free'ing the allocated
gnutls_dh_params_t data when unloading the credentials

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2015-11-18 14:56:58 +00:00
John Snow
01809194a0 iotests: fix race in 030
the stop_test case tests that we can resume a block-stream
command after it has stopped/paused due to error. We cannot
always reliably query it before it finishes after resume, though,
so make this a conditional.

The important thing is that we are still testing that it has stopped,
and that it finishes successfully after we send a resume command.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-18 15:54:15 +01:00
Rabin Vincent
a184e74f24 nand: fix address overflow
The shifts of the address mask and value shift beyond 32 bits when there
are 5 address cycles.

Cc: qemu-stable@nongnu.org
Signed-off-by: Rabin Vincent <rabin.vincent@axis.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-18 15:54:15 +01:00
Peter Maydell
ab9b872ab3 Merge remote-tracking branch 'remotes/mdroth/tags/qga-pull-2015-11-13-v2-tag' into staging
qemu-ga patch queue for 2.5

* fixes for guest-exec gspawn() usage:
  - inherit default lookup path by default instead of
    explicitly defining it as being empty.
  - don't inherit default PATH when PATH/ENV are explicit

v2:

* added fix for w32 'make install' target
* added version check for new g_spawn() flag

# gpg: Signature made Tue 17 Nov 2015 22:33:03 GMT using RSA key ID F108B584
# gpg: Good signature from "Michael Roth <flukshun@gmail.com>"
# gpg:                 aka "Michael Roth <mdroth@utexas.edu>"
# gpg:                 aka "Michael Roth <mdroth@linux.vnet.ibm.com>"

* remotes/mdroth/tags/qga-pull-2015-11-13-v2-tag:
  makefile: fix w32 install target for qemu-ga
  qga: allow to lookup in PATH from the passed envp for guest-exec
  qga: fix for default env processing for guest-exec

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-18 12:47:29 +00:00
Peter Maydell
6b79f253a3 Merge remote-tracking branch 'remotes/jnsnow/tags/ide-pull-request' into staging
# gpg: Signature made Tue 17 Nov 2015 20:06:58 GMT using RSA key ID AAFC390E
# gpg: Good signature from "John Snow (John Huston) <jsnow@redhat.com>"

* remotes/jnsnow/tags/ide-pull-request:
  ide: enable buffered requests for PIO read requests
  ide: enable buffered requests for ATAPI devices
  ide: orphan all buffered requests on DMA cancel
  ide: add support for IDEBufferedRequest
  block: add blk_abort_aio_request
  ide/atapi: make PIO read requests async

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-18 12:16:14 +00:00
Michael Roth
ab59e3ecb2 makefile: fix w32 install target for qemu-ga
fafcaf1 added a 'qemu-ga' install target on w32, which can be used
in place of the existing qemu-ga.exe target to also handle dealing
with other components such as DLLs for VSS/fsfreeze and generating
an MSI package if appropriate configure options are present.

As part of that, qemu-ga$(EXESUF) was removed from $TOOLS in favor
of this new qemu-ga target.

The install rule however relies on a direct mapping of the $TOOLS
entry to the actual resulting binary. In the case of w32, qemu-ga
is not identical to qemu-ga$(EXESUF), and the install recipe fails
to find the 'qemu-ga' binary.

Fix this by essentially remapping 'qemu-ga' back to 'qemu-ga.exe'
in the install recipe.

This raises the question of whether or not qemu-ga should continue
to live in TOOLS as opposed to its own special target, but as a
late fix for a regression in 2.5 this commit should be safer, since
we rely on qemu-ga's presence in $TOOLS in several places throughout
Makefile.

Reported-by: Stefan Weil <sw@weilnetz.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-11-17 16:32:27 -06:00
Yuri Pudgorodskiy
0be4083951 qga: allow to lookup in PATH from the passed envp for guest-exec
This was original behaviour before GLIB gspawn() rework and we rely on
this behaviour.

Signed-off-by: Yuri Pudgorodskiy <yur@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Michael Roth <mdroth@linux.vnet.ibm.com>
* add version check (2.33.2) for G_SPAWN_SEARCH_PATH_FROM_ENVP
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-11-17 16:24:18 -06:00
Yuri Pudgorodskiy
02a4d82e8c qga: fix for default env processing for guest-exec
envp == NULL must be passed inside gspawn() if it was not passed with
the command line. Original code inherits environment from the QGA,
which is wrong.

Signed-off-by: Yuri Pudgorodskiy <yur@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-11-17 16:24:18 -06:00
Peter Maydell
55db5eeeb7 Merge remote-tracking branch 'remotes/ehabkost/tags/x86-pull-request' into staging
X86 fixes, 2015-11-17

Two X86 fixes, hopefully in time for -rc1.

# gpg: Signature made Tue 17 Nov 2015 19:06:53 GMT using RSA key ID 984DC5A6
# gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>"

* remotes/ehabkost/tags/x86-pull-request:
  target-i386: Disable rdtscp on Opteron_G* CPU models
  target-i386: Fix mulx for identical target regs

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-17 22:00:46 +00:00
Peter Lieven
d66a8fa83b ide: enable buffered requests for PIO read requests
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1447345846-15624-7-git-send-email-pl@kamp.de
Signed-off-by: John Snow <jsnow@redhat.com>
2015-11-17 15:06:39 -05:00
Peter Lieven
02506b20b6 ide: enable buffered requests for ATAPI devices
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1447345846-15624-6-git-send-email-pl@kamp.de
Signed-off-by: John Snow <jsnow@redhat.com>
2015-11-17 15:06:33 -05:00
Peter Lieven
7cda62087c ide: orphan all buffered requests on DMA cancel
If the guests canceles a DMA request we can prematurely
invoke all callbacks of buffered requests and flag all them
as orphaned. Ideally this avoids the need for draining all
requests. For CDROM devices this works in 100% of all cases.

Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1447345846-15624-5-git-send-email-pl@kamp.de
Signed-off-by: John Snow <jsnow@redhat.com>
2015-11-17 15:06:29 -05:00
Peter Lieven
1d8c11d631 ide: add support for IDEBufferedRequest
this patch adds a new aio readv compatible function which copies
all data through a bounce buffer. These buffered requests can be
flagged as orphaned which means that their original callback has
already been invoked and the request has just not been completed
by the backend storage. The bounce buffer guarantees that guest
memory corruption is avoided when such a orphaned request is
completed by the backend at a later stage.

This trick only works for read requests as a write request completed
at a later stage might corrupt data as there is no way to control
if and what data has already been written to the storage.

Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1447345846-15624-4-git-send-email-pl@kamp.de
Signed-off-by: John Snow <jsnow@redhat.com>
2015-11-17 15:06:25 -05:00
Peter Lieven
ca78ecfa72 block: add blk_abort_aio_request
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1447345846-15624-3-git-send-email-pl@kamp.de
Signed-off-by: John Snow <jsnow@redhat.com>
2015-11-17 15:06:21 -05:00
Peter Lieven
5f81724d80 ide/atapi: make PIO read requests async
PIO read requests on the ATAPI interface used to be sync blk requests.
This has two significant drawbacks. First the main loop hangs util an
I/O request is completed and secondly if the I/O request does not
complete (e.g. due to an unresponsive storage) Qemu hangs completely.

Note: Due to possible race conditions requests during an ongoing
elementary transfer are still sync.

Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 1447345846-15624-2-git-send-email-pl@kamp.de
Signed-off-by: John Snow <jsnow@redhat.com>
2015-11-17 15:06:15 -05:00
Eduardo Habkost
33b5e8c03a target-i386: Disable rdtscp on Opteron_G* CPU models
KVM can't virtualize rdtscp on AMD CPUs yet, so there's no point
in enabling it by default on AMD CPU models, as all we are
getting are confused users because of the "host doesn't support
requested feature" warnings.

Disable rdtscp on Opteron_G* models, but keep compatibility on
pc-*-2.4 and older (just in case there are people are doing funny
stuff using AMD CPU models on Intel hosts).

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-11-17 17:05:59 -02:00
Richard Henderson
9ecac5dad1 target-i386: Fix mulx for identical target regs
The Intel specification clearly indicates that the low part
of the result is written first and the high part of the result
is written second; thus if ModRM:reg and VEX.vvvv are identical,
the final result should be the high part of the result.

At present, TCG may either produce incorrect results or crash
with --enable-checking.

Reported-by: Toni Nedialkov <farmdve@gmail.com>
Reported-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-11-17 17:05:59 -02:00
Michael S. Tsirkin
7ebcfe5692 specs/vhost-user: fix spec to match reality
We wanted to start/stop rings on VRING_ENABLE, but that is not what QEMU
does. Rather than tweaking code some more, with risk to stability, let's
just document it as it is.

We'll be  able to fix this in the future with a new protocol feature bit.

Reported-by: Victor Kaplansky <victork@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-11-17 15:41:13 +02:00
Victor Kaplansky
5c93c47338 tests/vhost-user-bridge: implement logging of dirty pages
During migration devices continue writing to the guest's memory.
The writes has to be reported to QEMU. This change implements
minimal support in vhost-user-bridge required for successful
migration of a guest with virtio-net device.

Signed-off-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-11-17 15:41:13 +02:00
Bandan Das
8d211f622b i440fx: print an error message if user tries to enable iommu
There's no indication of any sort that i440fx doesn't support
"iommu=on"

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Bandan Das <bsd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Bandan Das <bsd@redhat.com>
2015-11-17 15:41:13 +02:00
Bandan Das
1f8431f42d q35: Check propery to determine if iommu is set
The helper function machine_iommu() isn't necesary. We can
directly check for the property.

Signed-off-by: Bandan Das <bsd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Bandan Das <bsd@redhat.com>
2015-11-17 15:41:13 +02:00
Peter Maydell
c27e9014d5 Merge remote-tracking branch 'remotes/kraxel/tags/pull-vnc-20151116-1' into staging
vnc: buffer code improvements, bugfixes.

# gpg: Signature made Mon 16 Nov 2015 17:20:02 GMT using RSA key ID D3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"

* remotes/kraxel/tags/pull-vnc-20151116-1:
  vnc: fix mismerge
  buffer: allow a buffer to shrink gracefully
  buffer: factor out buffer_adj_size
  buffer: factor out buffer_req_size
  vnc: recycle empty vs->output buffer
  vnc: fix local state init
  vnc: only alloc server surface with clients connected
  vnc: use vnc_{width,height} in vnc_set_area_dirty
  vnc: factor out vnc_update_server_surface
  vnc: add vnc_width+vnc_height helpers
  vnc: zap dead code
  vnc-jobs: move buffer reset, use new buffer move
  vnc: kill jobs queue buffer
  vnc: attach names to buffers
  buffer: add tracing
  buffer: add buffer_shrink
  buffer: add buffer_move
  buffer: add buffer_move_empty
  buffer: add buffer_init
  buffer: make the Buffer capacity increase in powers of two

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-17 12:34:07 +00:00
Peter Maydell
9be060f527 Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging
# gpg: Signature made Tue 17 Nov 2015 11:13:05 GMT using RSA key ID 81AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"

* remotes/stefanha/tags/block-pull-request:
  virtio-blk: Fix double completion for werror=stop
  block: make 'stats-interval' an array of ints instead of a string
  aio-epoll: Fix use-after-free of node
  disas/arm: avoid clang shifting negative signed warning
  tpm: avoid clang shifting negative signed warning
  tests: Ignore recent test binaries
  docs: update bitmaps.md

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-17 11:33:38 +00:00
Fam Zheng
10f5a72f70 virtio-blk: Fix double completion for werror=stop
When a request R is absorbed by request M, it is appended to the
"mr_next" queue led by M, and is completed together with the completion
of M, in virtio_blk_rw_complete.

During DMA restart in virtio_blk_dma_restart_bh, requests in s->rq are
parsed and submitted again, possibly with a stale req->mr_next. It could
be a problem if the request merging in virtio_blk_handle_request hasn't
refreshed every mr_next pointer, in which case, virtio_blk_rw_complete
could walk through unexpected requests following the stale pointers.

Fix this by unsetting the pointer in virtio_blk_rw_complete. It is safe
because this req is either completed and freed right away, or it will be
restarted and parsed from scratch out of the vq later.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-11-17 18:35:57 +08:00
Alberto Garcia
40119effc5 block: make 'stats-interval' an array of ints instead of a string
This is the natural JSON representation and prevents us from having to
decode the list manually.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 0e3da8fa206f4ab534ae3ce6086e75fe84f1557e.1447665472.git.berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-11-17 18:35:57 +08:00
Fam Zheng
0ed39f3df2 aio-epoll: Fix use-after-free of node
aio_epoll_update needs the fields in node, so delay the free.

Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1447655534-13974-1-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-11-17 18:35:57 +08:00
Stefan Hajnoczi
02460c3b42 disas/arm: avoid clang shifting negative signed warning
clang 3.7.0 on x86_64 warns about the following:

  disas/arm.c:1782:17: warning: shifting a negative signed value is undefined [-Wshift-negative-value]
    imm |= (-1 << 7);
            ~~ ^

Note that this patch preserves the tab indent in this source file
because the surrounding code still uses tabs.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-11-17 18:35:56 +08:00
Stefan Hajnoczi
886ce6f8b6 tpm: avoid clang shifting negative signed warning
clang 3.7.0 on x86_64 warns about the following:

  hw/tpm/tpm_tis.c:1000:36: warning: shifting a negative signed value is undefined [-Wshift-negative-value]
            tis->loc[c].iface_id = TPM_TIS_IFACE_ID_SUPPORTED_FLAGS1_3;
                                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  hw/tpm/tpm_tis.c:144:10: note: expanded from macro 'TPM_TIS_IFACE_ID_SUPPORTED_FLAGS1_3'
     (~0 << 4)/* all of it is don't care */)
      ~~ ^

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-11-17 18:35:56 +08:00
Eric Blake
a12e52a151 tests: Ignore recent test binaries
Commits 6c6f312d and bd797fc1 added new tests (test-blockjob-txn
and test-timed-average, respectively), but did not mark them for
exclusion in .gitignore.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 1447386423-13160-1-git-send-email-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-11-17 18:35:56 +08:00
John Snow
c4c2daa1ae docs: update bitmaps.md
Include new error handling scenarios for 2.5.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1447196417-26081-1-git-send-email-jsnow@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-11-17 18:35:56 +08:00
Peter Maydell
361cb26827 Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2015-11-17' into staging
QAPI patches

# gpg: Signature made Tue 17 Nov 2015 08:28:24 GMT using RSA key ID EB918653
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>"
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>"

* remotes/armbru/tags/pull-qapi-2015-11-17:
  input: Document why x-input-send-event is still experimental
  qapi: Document introspection stability considerations

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-17 10:20:25 +00:00
Eric Blake
513e7cdbae input: Document why x-input-send-event is still experimental
The x-input-send-event command was introduced in 2.2 with mention
that it is experimental, but now that several releases have elapsed
without any changes, it would be nice to document why that was done
and should still remain experimental in 2.5.

Meanwhile, our documentation states that we prefer 'lower-case',
rather than 'CamelCase', for qapi enum values.  The InputButton and
InputAxis enums violate this convention.  However, because they are
currently used primarily for generating code that is used internally;
and their only exposure through QMP is via the experimental
'x-input-send-event' command, we are free to change their spelling.
Of course, it would be nicer to delay such a change until the same
time we promote the command to non-experimental.  Adding
documentation will help us remember to do that rename.

We have plans to tighten the qapi generator to flag instances of
inconsistent use of naming conventions; if that lands first, it
will just need to whitelist these exceptions until the time we
settle on the final interface.

Fix a typo in the docs for InputAxis while at it.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1447354243-31825-1-git-send-email-eblake@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-17 08:42:07 +01:00
Eric Blake
39a65e2c24 qapi: Document introspection stability considerations
We are not ready (and might never be ready) to declare
introspection stable between releases. Clients written to
control multiple versions of qemu, and desiring to know
whether a particular member is supported for a given
command, must be prepared to locate that member in spite
of qapi changes that may affect the member's location or
type within the overall object, even though such changes
did not break QMP wire back-compatibility.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1447264202-19554-1-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-17 08:42:07 +01:00
Michael S. Tsirkin
dc3db6adde vhost-user: start/stop all rings
We are currently only sending VRING_ENABLE message for the first ring,
that's wrong: we must start/stop them all.

Reported-by: Victor Kaplansky <victork@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-11-16 18:48:31 +02:00
Michael S. Tsirkin
5421f318ec vhost-user: print original request on error
When we get an unexpected response, print out
the original request.
Helps debug protocol errors tremendously.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-11-16 14:35:16 +02:00
Michael S. Tsirkin
87656d5018 vhost-user-test: support VHOST_USER_SET_VRING_ENABLE
vhost-user-test is broken now: it assumes
QEMU sends RESET_OWNER, and we stopped doing that.
Wait for ENABLE_RING with 0 instead.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-11-16 14:35:05 +02:00
Peter Maydell
c257779e2a Merge remote-tracking branch 'remotes/otubo/tags/pull-seccomp-20151116' into staging
seccomp branch queue

# gpg: Signature made Mon 16 Nov 2015 08:50:28 GMT using RSA key ID 12F8BD2F
# gpg: Good signature from "Eduardo Otubo (Software Engineer @ ProfitBricks) <eduardo.otubo@profitbricks.com>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 1C96 46B6 E1D1 C38A F2EC  3FDE FD0C FF5B 12F8 BD2F

* remotes/otubo/tags/pull-seccomp-20151116:
  seccomp: loosen library version dependency
  configure: arm/aarch64: allow enable-seccomp
  seccomp: add cacheflush to whitelist

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-16 12:09:47 +00:00
Michael S. Tsirkin
a586e65bbd vhost-user: update spec description
Clarify logging setup to make sure all clients comply in a way that is
future-proof.  Document how rings are started/stopped.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Victor Kaplansky <victork@redhat.com>
2015-11-16 13:21:30 +02:00
Peter Maydell
bc7c6c1fec Merge remote-tracking branch 'remotes/jnsnow/tags/ide-pull-request' into staging
# gpg: Signature made Fri 13 Nov 2015 20:16:21 GMT using RSA key ID AAFC390E
# gpg: Good signature from "John Snow (John Huston) <jsnow@redhat.com>"

* remotes/jnsnow/tags/ide-pull-request:
  qtest/ahci: use raw format when qemu-img is absent
  libqos: add qemu-img presence check
  qtest/ahci: always specify image format
  ahci/qtest: don't use tcp sockets for migration tests
  atapi: Prioritize unknown cmd error over BCL error
  atapi: add byte_count_limit helper

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-16 10:14:33 +00:00
Yuanhan Liu
12b8cbac3c vhost: don't send RESET_OWNER at stop
First of all, RESET_OWNER message is sent incorrectly, as it's sent
before GET_VRING_BASE. And the reset message would let the later call
get nothing correct.

And, sending SET_VRING_ENABLE at stop, which has already been done,
makes more sense than RESET_OWNER.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-11-16 12:02:54 +02:00
Yuanhan Liu
923e2d98ed vhost: let SET_VRING_ENABLE message depends on protocol feature
But not depend on PROTOCOL_F_MQ feature bit. So that we could use
SET_VRING_ENABLE to sign the backend on stop, even if MQ is disabled.

That's reasonable, since we will have one queue pair at least.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-11-16 12:02:54 +02:00
dann frazier
ba060c53d5 seccomp: loosen library version dependency
Drop the libseccomp required version back to 2.1.0, restoring the ability
to build w/ --enable-seccomp on Ubuntu 14.04.

Commit 4cc47f8b3c tightened the dependency
on libseccomp from version 2.1.0 to 2.1.1. This broke building on Ubuntu
14.04, the current Ubuntu LTS release. The commit message didn't mention
any specific functional need for 2.1.1, just that it was the most recent
stable version at the time. I reviewed the changes between 2.1.0 and 2.1.1,
but it looks like that update just contained minor fixes and cleanups - no
obvious (to me) new interfaces or critical bug fixes.

Signed-off-by: dann frazier <dann.frazier@canonical.com>
Acked-by: Eduardo Otubo <eduardo.otubo@profitbricks.com>
2015-11-16 09:49:47 +01:00
Andrew Jones
693e59105d configure: arm/aarch64: allow enable-seccomp
This is a revert of ae6e8ef11e, but with a bit of refactoring,
and also specifically adding arm/aarch64, rather than all
architectures. Currently, libseccomp code appears to also support
mips, ppc, and s390. We could therefore allow qemu to enable
seccomp for those platforms as well, with additional configure
patches, given they're tested and proven to work.

Signed-off-by: Andrew Jones <drjones@redhat.com>
Acked-by: Eduardo Otubo <eduardo.otubo@profitbricks.com>
2015-11-16 09:49:14 +01:00
Andrew Jones
47d2067af3 seccomp: add cacheflush to whitelist
cacheflush is an arm-specific syscall that qemu built for arm
uses. Add it to the whitelist, but only if we're linking with
a recent enough libseccomp.

Signed-off-by: Andrew Jones <drjones@redhat.com>
2015-11-16 09:48:53 +01:00
John Snow
917158dc3b qtest/ahci: use raw format when qemu-img is absent
If we don't have the qemu-img tool, use the raw format
for tests and skip the high-sector LBA48 tests.

Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1447439479-16775-4-git-send-email-jsnow@redhat.com
2015-11-13 14:31:43 -05:00
John Snow
cb11e7b2f3 libqos: add qemu-img presence check
To allow tests to optionally exercise additional tests
that require the qemu-img tool that may not be present
in all builds.

Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1447439479-16775-3-git-send-email-jsnow@redhat.com
2015-11-13 14:31:42 -05:00
John Snow
b236b61056 qtest/ahci: always specify image format
Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1447439479-16775-2-git-send-email-jsnow@redhat.com
2015-11-13 14:31:42 -05:00
John Snow
6d9e7295c5 ahci/qtest: don't use tcp sockets for migration tests
Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1447108074-20609-1-git-send-email-jsnow@redhat.com
2015-11-13 14:31:42 -05:00
John Snow
f36aa12d2f atapi: Prioritize unknown cmd error over BCL error
If we don't know about the command at all, we need to prioritize
that failure above the zero byte-count-limit failure.

This fixes a failure in the sparc64 NetBSD 7.0 installer bootup.

Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: John Snow <jsnow@redhat.com>
Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-id: 1447095959-10046-3-git-send-email-jsnow@redhat.com
2015-11-13 14:31:42 -05:00
John Snow
af0e00db0e atapi: add byte_count_limit helper
Signed-off-by: John Snow <jsnow@redhat.com>
Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-id: 1447095959-10046-2-git-send-email-jsnow@redhat.com
2015-11-13 14:31:42 -05:00
Roger Pau Monne
cdadde39a8 xen: fix usage of xc_domain_create in domain builder
Due to the addition of HVMlite and the requirement to always provide a
valid xc_domain_configuration_t, xc_domain_create now always takes an arch
domain config, which can be NULL in order to mimic previous behaviour.

Add a small stub called xen_domain_create that encapsulates the correct
call to xc_domain_create depending on the libxc version detected.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2015-11-13 17:38:06 +00:00
Peter Maydell
8337c6cbc3 Update version for v2.5.0-rc0 release
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-13 17:10:36 +00:00
Guenter Roeck
74fcbd22d2 hw/misc: Add support for ADC controller in Xilinx Zynq 7000
Add support for the Xilinx XADC core used in Zynq 7000.

References:
- Zynq-7000 All Programmable SoC Technical Reference Manual
- 7 Series FPGAs and Zynq-7000 All Programmable SoC XADC
  Dual 12-Bit 1 MSPS Analog-to-Digital Converter

Tested with Linux using QEMU machine xilinx-zynq-a9 with devicetree
files zynq-zc702.dtb and zynq-zc706.dtb, and kernel configuration
multi_v7_defconfig.

Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
[ PC changes:
  * Changed macro names to match TRM where possible
  * Made programmers model macro scheme consistent
  * Dropped XADC_ZYNQ_ prefix on local macros
  * Fix ALM field width
  * Update threshold-comparison interrupts in _update_ints()
  * factored out DFIFO pushes into helper. Renamed to "push/pop"
  * Changed xadc_reg to 10 bits and added OOB check.
  * Reduced scope of MCTL reset to just stop channel coms.
  * Added dummy read data to write commands
  * Changed _ to - seperators in string names and filenames
  * Dropped ------------ in header comment
  * Catchall'ed _update_ints() in _write handler.
  * Minor whitespace changes.
  * Use ZYNQ_XADC_FIFO_DEPTH instead of ARRAY_SIZE()
]
Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-12 21:30:42 +00:00
Peter Maydell
f3bcfc5663 Merge remote-tracking branch 'remotes/juanquintela/tags/migration/20151112' into staging
migration/next for 20151112

# gpg: Signature made Thu 12 Nov 2015 16:56:44 GMT using RSA key ID 5872D723
# gpg: Good signature from "Juan Quintela <quintela@redhat.com>"
# gpg:                 aka "Juan Quintela <quintela@trasno.org>"

* remotes/juanquintela/tags/migration/20151112:
  migration_init: Fix lock initialisation/make it explicit
  migrate-start-postcopy: Improve text
  Postcopy: Fix TP!=HP zero case
  Finish non-postcopiable iterative devices before package
  migration: Make 32bit linux compile with RDMA
  migration: print ram_addr_t as RAM_ADDR_FMT not %zx

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-12 18:08:19 +00:00
Peter Maydell
b2df6a79df Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches (rebased Stefan's pull request)

# gpg: Signature made Thu 12 Nov 2015 15:34:16 GMT using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"

* remotes/kevin/tags/for-upstream: (43 commits)
  block: Update copyright of the accounting code
  scsi-disk: Account for failed operations
  macio: Account for failed operations
  ide: Account for failed and invalid operations
  atapi: Account for failed and invalid operations
  xen_disk: Account for failed and invalid operations
  virtio-blk: Account for failed and invalid operations
  nvme: Account for failed and invalid operations
  iotests: Add test for the block device statistics
  block: Use QEMU_CLOCK_VIRTUAL for the accounting code in qtest mode
  qemu-io: Account for failed, invalid and flush operations
  block: New option to define the intervals for collecting I/O statistics
  block: Add average I/O queue depth to BlockDeviceTimedStats
  block: Compute minimum, maximum and average I/O latencies
  block: Allow configuring whether to account failed and invalid ops
  block: Add statistics for failed and invalid I/O operations
  block: Add idle_time_ns to BlockDeviceStats
  util: Infrastructure for computing recent averages
  block: define 'clock_type' for the accounting code
  ide: Account for write operations correctly
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-12 17:22:06 +00:00
Dr. David Alan Gilbert
389775d1f6 migration_init: Fix lock initialisation/make it explicit
Peter reported a lock error on MacOS after my a82d593b
patch.

migrate_get_current does one-time initialisation of
a bunch of variables.
migrate_init does reinitialisation even on a 2nd
migrate after a cancel.

The problem here was that I'd initialised the mutex
in migrate_get_current, and the memset in migrate_init
corrupted it.

Remove the memset and replace it by explicit initialisation
of fields that need initialising; this also turns out to be simpler
than the old code that had to preserve some fields.

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Fixes: a82d593b
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-12 17:55:27 +01:00
Dr. David Alan Gilbert
a54d340b9d migrate-start-postcopy: Improve text
Improve the text in both the qapi-schema and hmp help to point out
you need to set the postcopy-ram capability prior to issuing
migrate-start-postcopy.

Also fix the text of the migrate_start_postcopy error that
deals with capabilities.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Acked-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-12 17:54:39 +01:00
John Snow
cfcc7c1448 configure: check for $cxx before use
I broke this when adding checks for clang++.

Reported-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1447345789-840-1-git-send-email-jsnow@redhat.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-12 16:53:44 +00:00
Dr. David Alan Gilbert
a3b6ff6d0a Postcopy: Fix TP!=HP zero case
Where the target page size is different from the host page
we special case it, but I messed up on the zero case check.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-12 17:52:29 +01:00
Dr. David Alan Gilbert
1c0d249ddf Finish non-postcopiable iterative devices before package
Where we have iterable, but non-postcopiable devices (e.g. htab
or block migration), complete them before forming the 'package'
but with the CPUs stopped.  This stops them filling up the package.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-12 17:52:29 +01:00
Juan Quintela
80e60c6e1c migration: Make 32bit linux compile with RDMA
Rest of the file already use that trick. 64bit offsets make no sense in
32bit archs, but that is ram_addr_t for you.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2015-11-12 17:52:29 +01:00
Juan Quintela
9458ad6b44 migration: print ram_addr_t as RAM_ADDR_FMT not %zx
Not all the wold is 64bits (yet).

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2015-11-12 17:52:29 +01:00
Sergey Fedorov
ed6c64489e target-arm: Update PC before calling gen_helper_check_breakpoints()
PC should be updated in the CPU state before calling check_breakpoints()
helper. Otherwise, the helper would not see the correct PC in the CPU
state if it is not at the start of a TB.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Message-id: 1447176222-16401-1-git-send-email-serge.fdrv@gmail.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-12 16:16:02 +00:00
Peter Maydell
8f0da01d18 Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
virtio, vhost: fixes for 2.5

This fixes a performance regression with virtio 1,
and makes device stop/start more robust for vhost-user.
virtio devices on pcie bus now have pcie and pm
capability, as required by the PCI Express spec.
migration now works better with virtio 9p.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# gpg: Signature made Thu 12 Nov 2015 14:40:42 GMT using RSA key ID D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>"

* remotes/mst/tags/for_upstream:
  virtio-9p: add savem handlers
  hw/virtio: Add PCIe capability to virtio devices
  vhost: send SET_VRING_ENABLE at start/stop
  vhost: rename RESET_DEVICE backto RESET_OWNER
  vhost-user: modify SET_LOG_BASE to pass mmap size and offset
  virtio-pci: unbreak queue_enable read
  virtio-pci: introduce pio notification capability for modern device
  virtio-pci: use zero length mmio eventfd for 1.0 notification cap when possible
  KVM: add support for any length io eventfd
  memory: don't try to adjust endianness for zero length eventfd
  virtio-pci: fix 1.0 virtqueue migration

Conflicts:
	include/hw/compat.h
[Fixed a trivial merge conflict in compat.h]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-12 15:25:40 +00:00
Alberto Garcia
aece5edc96 block: Update copyright of the accounting code
Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 80a2278e3ec2dafd5daab20a7cb2d6a9b83371e4.1446044838.git.berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:47 +01:00
Alberto Garcia
d7628080f3 scsi-disk: Account for failed operations
Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 0ead7b0e59c22926e033ca12725e3a31985ec46b.1446044838.git.berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:47 +01:00
Alberto Garcia
b88b3c8b83 macio: Account for failed operations
Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: ee6f4fde6a7c1071ca96d4ddd53e4934ff812fcd.1446044838.git.berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:47 +01:00
Alberto Garcia
ecca3b397d ide: Account for failed and invalid operations
Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: bf4d6c9c563877e699b0bf42e7eaf8b096c4a35e.1446044838.git.berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:47 +01:00
Alberto Garcia
ece2d05ed4 atapi: Account for failed and invalid operations
Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 59dee4e2921b0c79d41c49b67dfb93d32db9f7f9.1446044838.git.berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:47 +01:00
Alberto Garcia
57ee366ce9 xen_disk: Account for failed and invalid operations
Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: e0cbb96cb0e1f86c37c7ce332efdf02b57b9d365.1446044838.git.berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:46 +01:00
Alberto Garcia
01762e0322 virtio-blk: Account for failed and invalid operations
Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 4f623ce52c9d673d35a043fc2959526b41b685c6.1446044838.git.berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:46 +01:00
Alberto Garcia
1753f3dc17 nvme: Account for failed and invalid operations
Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 678dc67da229759d404b44f7cc2bf5ed8bf8ad14.1446044838.git.berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:46 +01:00
Alberto Garcia
4214face09 iotests: Add test for the block device statistics
Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 0fb8501bbf3666b3d5d3f67fa899729c88f21baf.1446044838.git.berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:46 +01:00
Alberto Garcia
918a17a464 block: Use QEMU_CLOCK_VIRTUAL for the accounting code in qtest mode
This patch switches to QEMU_CLOCK_VIRTUAL for the accounting code in
qtest mode, and makes the latency of the operation constant. This way we
can perform tests on the accounting code with reproducible results.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 35ed0501450fa572684e9b5e92c361ab6cce565b.1446044838.git.berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:46 +01:00
Alberto Garcia
556c2b6071 qemu-io: Account for failed, invalid and flush operations
Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 78a7662a8636e55991737ece50003a2dc5a5f3e0.1446044838.git.berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:46 +01:00
Alberto Garcia
2be5506fc8 block: New option to define the intervals for collecting I/O statistics
The BlockAcctStats structure contains a list of BlockAcctTimedStats.
Each one of these collects statistics about the minimum, maximum and
average latencies of all I/O operations in a certain interval of time.

This patch adds a new "stats-intervals" option that allows defining
these intervals.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 41cbcd334a61c6157f0f495cdfd21eff6c156f2a.1446044837.git.berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:46 +01:00
Alberto Garcia
96e4dedaff block: Add average I/O queue depth to BlockDeviceTimedStats
This patch adds two new fields to BlockDeviceTimedStats that track the
average number of pending read and write requests for a block device.

The values are calculated for the period of time defined for that
interval.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: fd31fef53e2714f2f30d59ed58ca2f67ec9ab926.1446044837.git.berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:46 +01:00
Alberto Garcia
979e9b03fc block: Compute minimum, maximum and average I/O latencies
This patch keeps track of the minimum, maximum and average latencies
of I/O operations during a certain interval of time.

The values are exposed in the BlockDeviceTimedStats structure.

An option to define the intervals to collect these statistics will be
added in a separate patch.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: c7382dc89622c64f918d09f32815827772628f8e.1446044837.git.berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:45 +01:00
Alberto Garcia
362e9299b3 block: Allow configuring whether to account failed and invalid ops
This patch adds two options, "stats-account-invalid" and
"stats-account-failed", that can be used to decide whether invalid and
failed I/O operations must be used when collecting statistics for
latency and last access time.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: ebc7e5966511a342cad428a392c5f5ad56b15213.1446044837.git.berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:45 +01:00
Alberto Garcia
7ee12dafe9 block: Add statistics for failed and invalid I/O operations
This patch adds the block_acct_failed() and block_acct_invalid()
functions to allow keeping track of failed and invalid I/O operations.

The number of failed and invalid operations is exposed in
BlockDeviceStats.

We don't keep track of the time spent on invalid operations because
they are cancelled immediately when they are started.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: a7256ccb883a86356b1c6c46b5a29ed5448546a5.1446044837.git.berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:45 +01:00
Alberto Garcia
cb38fffbc9 block: Add idle_time_ns to BlockDeviceStats
This patch adds the new field 'idle_time_ns' to the BlockDeviceStats
structure, indicating the time that has passed since the previous I/O
operation.

It also adds the block_acct_idle_time_ns() call, to ensure that all
references to the clock type used for accounting are in the same
place. This will later allow us to use a different clock for iotests.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 7d8cfcf931453e1a2443e6626e8c1edc347c7c8a.1446044837.git.berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:45 +01:00
Alberto Garcia
bd797fc15b util: Infrastructure for computing recent averages
This module computes the average of a set of values within a time
window, keeping also track of the minimum and maximum values.

In order to produce more accurate results it works internally by
creating two time windows of the same period, offsetted by half of
that period. Values are accounted on both windows and the data is
always returned from the oldest one.

[Add missing util/replay.o to test-timed-average dependencies to fix the
build.
--Stefan]

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 201b09c21bbc9c329779d2b2365ee2b9c80dceeb.1446044837.git.berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:45 +01:00
Alberto Garcia
5519593c07 block: define 'clock_type' for the accounting code
Its value is still QEMU_CLOCK_REALTIME, but having it in a variable will
allow us to change its value easily in the future when running in qtest
mode.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 547485eb841cf9e3b2770c96539ae9ae5996e214.1446044837.git.berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:45 +01:00
Alberto Garcia
c618f331d3 ide: Account for write operations correctly
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 2e71323c0875c2b66a8ae22229545e0c013af8d4.1446044837.git.berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:45 +01:00
Alberto Garcia
693044ebd2 xen_disk: Account for flush operations
Currently both BLKIF_OP_WRITE and BLKIF_OP_FLUSH_DISKCACHE are being
accounted as write operations.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 7a2a14e3ac62027aa6267a6c02abc70717be9c0a.1446044837.git.berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:45 +01:00
Stefan Hajnoczi
6c6f312dd7 tests: add BlockJobTxn unit test
The BlockJobTxn unit test verifies that both single jobs and pairs of
jobs behave as a transaction group.  Either all jobs complete
successfully or the group is cancelled.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1446765200-3054-15-git-send-email-jsnow@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:44 +01:00
John Snow
fc6c796ff2 iotests: 124 - transactional failure test
Use a transaction to request an incremental backup across two drives.
Coerce one of the jobs to fail, and then re-run the transaction.

Verify that no bitmap data was lost due to the partial transaction
failure.

To support the 'err-cancel' QMP argument name it's necessary for
transaction_action() to convert underscores in Python argument names
to hyphens for QMP argument names.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1446765200-3054-14-git-send-email-jsnow@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:44 +01:00
John Snow
94d16a640a block: add transactional properties
Add both transactional properties to the QMP transactional interface,
and add the BlockJobTxn that we create as a result of the err-cancel
property to the BlkActionState structure.

[split up from a patch originally by Stefan and Fam. --js]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>

Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1446765200-3054-13-git-send-email-jsnow@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:44 +01:00
John Snow
78f51fde88 block: Add BlockJobTxn support to backup_run
Allow a BlockJobTxn to be passed into backup_run, which
will allow the job to join a transactional group if present.

Propagate this new parameter outward into new QMP helper
functions in blockdev.c to allow transaction commands to
pass forward their BlockJobTxn object in a forthcoming patch.

[split up from a patch originally by Stefan and Fam. --js]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>

Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1446765200-3054-12-git-send-email-jsnow@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:44 +01:00
John Snow
c347b2c62a block/backup: Rely on commit/abort for cleanup
Switch over to the new .commit/.abort handlers for
cleaning up incremental bitmaps.

[split up from a patch originally by Stefan and Fam. --js]
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>

Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1446765200-3054-11-git-send-email-jsnow@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:44 +01:00
Fam Zheng
c55a832fdd block: Add block job transactions
Sometimes block jobs must execute as a transaction group.  Finishing
jobs wait until all other jobs are ready to complete successfully.
Failure or cancellation of one job cancels the other jobs in the group.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1446765200-3054-10-git-send-email-jsnow@redhat.com
[Rewrite the implementation which is now contained in block_job_completed.
--Fam]
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>

Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:44 +01:00
Fam Zheng
94db6d2d30 blockjob: Simplify block_job_finish_sync
With job->completed and job->ret to replace BlockFinishData.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1446765200-3054-9-git-send-email-jsnow@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:44 +01:00
Fam Zheng
a689dbf2df blockjob: Add "completed" and "ret" in BlockJob
They are set when block_job_completed is called.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1446765200-3054-8-git-send-email-jsnow@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:44 +01:00
Fam Zheng
57901ecb8e blockjob: Add .commit and .abort block job actions
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1446765200-3054-7-git-send-email-jsnow@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:44 +01:00
Fam Zheng
18930ba3d1 blockjob: Introduce reference count and fix reference to job->bs
Add reference count to block job, meanwhile move the ownership of the
reference to job->bs from the caller (which is released in two
completion callbacks) to the block job itself. It is necessary for
block_job_complete_sync to work, because block job shouldn't live longer
than its bs, as asserted in bdrv_delete.

Now block_job_complete_sync can be simplified.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1446765200-3054-6-git-send-email-jsnow@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:43 +01:00
Fam Zheng
b976ea3cf5 backup: Extract dirty bitmap handling as a separate function
This will be reused by the coming new transactional completion code.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1446765200-3054-5-git-send-email-jsnow@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:43 +01:00
John Snow
50f43f0ff9 block: rename BlkTransactionState and BdrvActionOps
These structures are misnomers, somewhat.

(1) BlockTransactionState is not state for a transaction,
    but is rather state for a single transaction action.
    Rename it "BlkActionState" to be more accurate.

(2) The BdrvActionOps describes operations for the BlkActionState,
    above. This name might imply a 'BdrvAction' or a 'BdrvActionState',
    which there isn't.
    Rename this to 'BlkActionOps' to match 'BlkActionState'.

Lastly, update the surrounding in-line documentation and comments
to reflect the current nature of how Transactions operate.

This patch changes only comments and names, and should not affect
behavior in any way.

Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1446765200-3054-4-git-send-email-jsnow@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:43 +01:00
John Snow
749ad5e887 iotests: add transactional incremental backup test
Test simple usage cases for using transactions to create
and synchronize incremental backups.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1446765200-3054-3-git-send-email-jsnow@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:43 +01:00
Fam Zheng
df9a681dc9 qed: Implement .bdrv_drain
The "need_check_timer" is used to clear the "NEED_CHECK" flag in the
image header after a grace period once metadata update has finished. In
compliance to the bdrv_drain semantics we should make sure it remains
deleted once .bdrv_drain is called.

We cannot reuse qed_need_check_timer_cb because here it doesn't satisfy
the assertion.  Do the "plug" and "flush" calls manually.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1447064214-29930-10-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:43 +01:00
Fam Zheng
67da1dc5ce block: Introduce BlockDriver.bdrv_drain callback
Drivers can have internal request sources that generate IO, like the
need_check_timer in QED. Since we want quiesced periods that contain
nested event loops in block layer, we need to have a way to disable such
event sources.

Block drivers must implement the "bdrv_drain" callback if it has any
internal sources that can generate I/O activity, like a timer or a
worker thread (even in a library) that can schedule QEMUBH in an
asynchronous callback.

Update the comments of bdrv_drain and bdrv_drained_begin accordingly.

Like bdrv_requests_pending(), we should consider all the children of bs.
Before, the while loop just works, as bdrv_requests_pending() already
tracks its children; now we mustn't miss the callback, so recurse down
explicitly.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1447064214-29930-9-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:43 +01:00
Fam Zheng
83c98d7b92 block: Drop BlockDriver.bdrv_ioctl
Now the callback is not used any more, drop the field along with all
implementations in block drivers, which are iscsi and raw.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1447064214-29930-8-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:43 +01:00
Fam Zheng
5c5ae76acb block: Emulate bdrv_ioctl with bdrv_aio_ioctl and track both
Currently all drivers that support .bdrv_aio_ioctl also implement
.bdrv_ioctl redundantly.  To track ioctl requests in block layer it is
easier if we unify the two paths, because we'll need to run it in a
coroutine, as required by tracked_request_begin. While we're at it, use
.bdrv_aio_ioctl plus aio_poll() to emulate bdrv_ioctl().

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1447064214-29930-7-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:43 +01:00
Fam Zheng
8b45f6878d block: Add ioctl parameter fields to BlockRequest
The two fields that will be used by ioctl handling code later are added
as union, because it's used exclusively by ioctl code which dosn't need
the four fields in the other struct of the union.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1447064214-29930-6-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:42 +01:00
Fam Zheng
4bb17ab51a iscsi: Emulate commands in iscsi_aio_ioctl as iscsi_ioctl
iscsi_ioctl emulates SG_GET_VERSION_NUM and SG_GET_SCSI_ID. Now that
bdrv_ioctl() will be emulated with .bdrv_aio_ioctl, replicate the logic
into iscsi_aio_ioctl to make them consistent.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1447064214-29930-5-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:42 +01:00
Fam Zheng
b1066c8755 block: Track discard requests
Both bdrv_discard and bdrv_aio_discard will call into bdrv_co_discard,
so add tracked_request_begin/end calls around the loop.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1447064214-29930-4-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:42 +01:00
Fam Zheng
cdb5e3155e block: Track flush requests
Both bdrv_flush and bdrv_aio_flush eventually call bdrv_co_flush, add
tracked_request_begin and tracked_request_end pair in that function so
that all flush requests are now tracked.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1447064214-29930-3-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:42 +01:00
Fam Zheng
ebde595ce6 block: Add more types for tracked request
We'll track more request types besides read and write, change the
boolean field to an enum.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1447064214-29930-2-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-12 16:22:08 +01:00
Greg Kurz
4652f1640e virtio-9p: add savem handlers
We don't support migration of mounted 9p shares. This is handled by a
migration blocker.

One would expect, however, to be able to migrate if the share is unmounted.
Unfortunately virtio-9p-device does not register savevm handlers at all !
Migration succeeds and leaves the guest with a dangling device...

This patch simply registers migration handlers for virtio-9p-device. Whether
migration is possible or not still depends on the migration blocker.

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-11-12 16:40:14 +02:00
Marcel Apfelbaum
1811e64c35 hw/virtio: Add PCIe capability to virtio devices
The virtio devices are converted to PCI-Express
if they are plugged into a PCI-Express bus and
the 'modern' protocol is enabled.

Devices plugged directly into the Root Complex as
Integrated Endpoints remain PCI.

Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-11-12 16:23:16 +02:00
Peter Maydell
17e50a72a3 Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging
# gpg: Signature made Thu 12 Nov 2015 08:01:55 GMT using RSA key ID 398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F  3562 EF04 965B 398D 6211

* remotes/jasowang/tags/net-pull-request:
  net: netmap: use error_setg() helpers in place of error_report()
  net: netmap: Fix compilation issue
  e1000: Introducing backward compatibility command line parameter
  e1000: Implementing various counters
  e1000: Fixing the packet address filtering procedure
  e1000: Fixing the received/transmitted octets' counters
  e1000: Fixing the received/transmitted packets' counters
  e1000: Trivial implementation of various MAC registers
  e1000: Introduced an array to control the access to the MAC registers
  e1000: Add support for migrating the entire MAC registers' array
  e1000: Cosmetic and alignment fixes
  slirp: Fix type casts and format strings in debug code

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-12 14:15:32 +00:00
Yuanhan Liu
3a12f32229 vhost: send SET_VRING_ENABLE at start/stop
Send SET_VRING_ENABLE at start/stop, to give the backend
an explicit sign of our state.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-11-12 15:49:33 +02:00
Yuanhan Liu
60915dc469 vhost: rename RESET_DEVICE backto RESET_OWNER
This patch basically reverts commit d1f8b30e.

It turned out that it breaks stuff, so revert it:
    http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg00949.html

CC: "Michael S. Tsirkin" <mst@redhat.com>
Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-11-12 15:49:33 +02:00
Victor Kaplansky
2b8819c6ee vhost-user: modify SET_LOG_BASE to pass mmap size and offset
Unlike the kernel, vhost-user application accesses log table by
mmaping it to its user space. This change adds two new fields to
VhostUserMsg payload: mmap_size, and mmap_offset and make QEMU to
pass the to vhost-user application in VHOST_USER_SET_LOG_BASE
request.

Signed-off-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-11-12 15:49:33 +02:00
Jason Wang
393f04d3ab virtio-pci: unbreak queue_enable read
Guest always get zero when reading queue_enable. This violates
spec. Fixing this by setting the queue_enable to true during any guest
writing and setting it to zero during reset.

Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-11-12 15:49:33 +02:00
Jason Wang
9824d2a39d virtio-pci: introduce pio notification capability for modern device
We used to use mmio for notification. This could be slow on some arch
(e.g on x86 without EPT). So this patch introduces pio bar and a pio
notification cap for modern device. This ability is enabled through
property "modern-pio-notify" for virtio pci devices and was disabled
by default. Management can enable when it thinks it was needed.

Benchmarks shows almost no obvious difference compared to legacy
device on machines without ept. Thanks Wenli Quan <wquan@redhat.com>
for the benchmarking.

Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-11-12 15:49:32 +02:00
Jason Wang
bc85ccfdf5 virtio-pci: use zero length mmio eventfd for 1.0 notification cap when possible
We use data match eventfd for 1.0 notification currently. This could
be slow since software decoding is needed for mmio exit. To speed this
up, we can switch to use zero length mmio eventfd for 1.0 notification
since we can examine the queue index directly from the writing
address. KVM kernel module can utilize this by registering it to fast
mmio bus which could be as fast as pio on ept capable machine when
fast mmio is supported by host kernel.

Lots of improvements were seen on a ept capable machine:

Guest RX:(TCP)
size/session/+throughput%/+cpu%/-+per cpu%/
64/1/+1.6807%/[-16.2421%]/[+21.3984%]/
64/2/+0.6091%/[-11.0187%]/[+13.0678%]/
64/4/+0.0553%/[-5.9768%]/[+6.4155%]/
64/8/+0.1206%/[-4.0057%]/[+4.2984%]/
256/1/-0.0031%/[-10.1166%]/[+11.2517%]/
256/2/-0.5058%/[-6.1656%]/+6.0317%]/
...

Guest TX:(TCP)
size/session/+throughput%/+cpu%/-+per cpu%/
64/1/[+18.9183%]/-0.2823%/[+19.2550%]/
64/2/[+13.5714%]/[+2.2675%]/[+11.0533%]/
64/4/[+13.1070%]/[+2.1817%]/[+10.6920%]/
64/8/[+13.0426%]/[+2.0887%]/[+10.7299%]/
256/1/[+36.2761%]/+6.3434%/[+28.1471%]/
...
1024/1/[+44.8873%]/+2.0811%/[+41.9335%]/
...
1024/4/+0.0228%/[-2.2044%]/[+2.2774%]/
...
16384/2/+0.0127%/[-5.0346%]/[+5.3148%]/
...
65535/1/[+0.0062%]/[-4.1183%]/[+4.3017%]/
65535/2/+0.0004%/[-4.2311%]/[+4.4185%]/
65535/4/+0.0107%/[-4.6106%]/[+4.8446%]/
65535/8/-0.0090%/[-5.5178%]/[+5.8306%]/

Latency:(TCP_RR)
size/session/+transaction rate%/+cpu%/-+per cpu%/
64/1/[+6.5248%]/[-9.2882%]/[+17.4322%]/
64/25/[+11.0854%]/[+0.8000%]/[+10.2038%]/
64/50/[+12.1076%]/[+2.4627%]/[+9.4131%]/
256/1/[+5.3677%]/[+10.5669%]/-4.7024%/
256/25/[+5.6402%]/-0.8962%/[+6.5955%]/
256/50/[+5.9685%]/[+1.7766%]/[+4.1188%]/
4096/1/+0.2508%/[-10.4941%]/[+12.0047%]/
4096/25/[+1.8533%]/-0.0273%/+1.8812%/
4096/50/[+1.2156%]/-1.4134%/+2.6667%/

Notes: data with '[]' is the one whose significance is greater than 95%.

Thanks Wenli Quan <wquan@redhat.com> for the benchmarking.

Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-11-12 15:49:32 +02:00
Jason Wang
351082238d KVM: add support for any length io eventfd
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-12 15:49:32 +02:00
Jason Wang
b8aecea23a memory: don't try to adjust endianness for zero length eventfd
There's no need to adjust endianness for zero length eventfd since the
data wrote was actually ignored by kernel. So skip the adjust in this
case to fix a possible crash when trying to use wildcard mmio eventfd
in ppc.

Cc: Greg Kurz <gkurz@linux.vnet.ibm.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-11-12 15:49:32 +02:00
Jason Wang
a6df8adf3e virtio-pci: fix 1.0 virtqueue migration
We don't migrate the followings fields for virtio-pci:

uint32_t dfselect;
uint32_t gfselect;
uint32_t guest_features[2];
struct {
    uint16_t num;
    bool enabled;
    uint32_t desc[2];
    uint32_t avail[2];
    uint32_t used[2];
} vqs[VIRTIO_QUEUE_MAX];

This will confuse driver if migrating during initialization. Solves
this issue by:

- introduce transport specific callbacks to load and store extra
  virtqueue states.
- add a new subsection for virtio to migrate transport specific modern
  device state.
- implement pci specific callbacks.
- add a new property for virtio-pci for whether or not to migrate
  extra state.
- compat the migration for 2.4 and elder machine types

Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2015-11-12 15:49:32 +02:00
Peter Maydell
df1ac44e9f Merge remote-tracking branch 'remotes/dgibson/tags/ppc-next-20151112' into staging
ppc patch queue -2015-11-12

Highlights:
   - A number of fixes for MacOS 9 compatibility based on the old MOL
     (Mac-On-Linux) code and a GSoC project.
   - Cleaner and more general way of handling register access from the
     monitor

# gpg: Signature made Thu 12 Nov 2015 04:33:26 GMT using RSA key ID 20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-next-20151112:
  monitor/target-ppc: Define target_get_monitor_def
  cuda.c: add delay to setting of SR_INT bit
  cuda.c: fix T2 timer and enable its interrupt
  cuda.c: rename get_counter() state variable from s to ti for consistency
  cuda.c: refactor get_tb() so that the time can be passed in
  cuda.c: add defines for CUDA registers
  cuda.c: fix CUDA SR interrupt clearing
  cuda.c: implement dummy IIC access commands
  cuda.c: implement simple CUDA_GET_6805_ADDR command
  cuda.c: fix CUDA_PACKET response packet format
  cuda.c: fix CUDA ADB error packet format
  PPC: mac99: Always add USB controller
  PPC: Fix lswx bounds checks
  PPC: Allow Rc bit to be set on mtspr

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-12 13:41:44 +00:00
Peter Maydell
fd717e7890 Merge remote-tracking branch 'remotes/mdroth/tags/qga-pull-2015-11-11-tag' into staging
qemu-ga patch queue

* fix for unintended overwriting of data on w32 using
  guest-file-open with append mode

# gpg: Signature made Wed 11 Nov 2015 22:14:52 GMT using RSA key ID F108B584
# gpg: Good signature from "Michael Roth <flukshun@gmail.com>"
# gpg:                 aka "Michael Roth <mdroth@utexas.edu>"
# gpg:                 aka "Michael Roth <mdroth@linux.vnet.ibm.com>"

* remotes/mdroth/tags/qga-pull-2015-11-11-tag:
  qga: fix append file open modes for win32

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-12 13:11:06 +00:00
Marc-André Lureau
2048a2a491 tests: classify some ivshmem tests as slow
Some tests may take long to run, move them under g_test_slow()
condition.

The 5s timeout for the "server" test will have to be adjusted to the worst
known time (for the records, it takes ~0.2s on my host). The "pair"
test takes ~1.7, a quickest version could be implemented.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 1447326618-11686-1-git-send-email-marcandre.lureau@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-12 11:57:58 +00:00
Peter Maydell
c459343b85 Merge remote-tracking branch 'remotes/armbru/tags/pull-error-2015-11-11' into staging
error: More error_setg() usage

# gpg: Signature made Wed 11 Nov 2015 17:57:15 GMT using RSA key ID EB918653
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>"
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>"

* remotes/armbru/tags/pull-error-2015-11-11:
  error: More error_setg() usage

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-12 10:09:14 +00:00
Vincenzo Maffione
39bec4f38b net: netmap: use error_setg() helpers in place of error_report()
This update was required to align error reporting of netmap backend
initialization to the modifications introduced by commit a30ecde.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Vincenzo Maffione <v.maffione@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2015-11-12 15:31:52 +08:00
Vincenzo Maffione
54c59b4de5 net: netmap: Fix compilation issue
Reorganization of struct NetClientOptions (commit e4ba22b) caused a
compilation failure of the netmap backend. This patch fixes the issue
by properly accessing the union field.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Vincenzo Maffione <v.maffione@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2015-11-12 15:30:47 +08:00
Leonid Bloch
ba63ec8594 e1000: Introducing backward compatibility command line parameter
This follows the previous patches, where support for migrating the
entire MAC registers' array, and some new MAC registers were introduced.

This patch introduces the e1000-specific boolean parameter
"extra_mac_registers", which is on by default. Setting it to off will
enable migration to older versions of QEMU, but will disable the read
and write access to the new registers, that were introduced since adding
the ability to migrate the entire MAC array.

Example for usage to enable backward compatibility and to disable the
new MAC registers:

    qemu-system-x86_64 -device e1000,extra_mac_registers=off,... ...

As mentioned above, the default value is "on".

Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2015-11-12 15:26:54 +08:00
Leonid Bloch
3b27430177 e1000: Implementing various counters
This implements the following Statistic registers (various counters)
according to Intel's specs:

TSCTC  GOTCL  GOTCH  GORCL  GORCH  MPRC   BPRC   RUC    ROC
BPTC   MPTC   PTC... PRC...

PLEASE NOTE: these registers will not be active, nor will migrate, until
a compatibility flag will be set (in the next patch in this series).

Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2015-11-12 15:26:54 +08:00
Leonid Bloch
4aeea330f0 e1000: Fixing the packet address filtering procedure
Previously, if promiscuous unicast was enabled, a packet was received
straight away, even if it was a multicast or a broadcast packet. This
patch fixes that behavior, while making the filtering procedure a bit
more human-readable.

Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2015-11-12 15:26:54 +08:00
Leonid Bloch
45e9376471 e1000: Fixing the received/transmitted octets' counters
Previously, these 64-bit registers did not stick at their maximal
values when (and if) they reached them, as they should do, according to
the specs.

This patch introduces a function that takes care of such registers,
avoiding code duplication, making the relevant parts more compatible
with the QEMU coding style, while ensuring that in the unlikely case
of reaching the maximal value, the counter will stick there, as it
supposed to.

Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2015-11-12 15:26:53 +08:00
Leonid Bloch
1f67f92c4f e1000: Fixing the received/transmitted packets' counters
According to Intel's specs, these counters (as the other Statistic
registers) stick at 0xffffffff when this maximal value is reached.
Previously, they would reset after the max. value.

Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2015-11-12 15:26:53 +08:00
Leonid Bloch
72ea771c97 e1000: Trivial implementation of various MAC registers
These registers appear in Intel's specs, but were not implemented.
These registers are now implemented trivially, i.e. they are initiated
with zero values, and if they are RW, they can be written or read by the
driver, or read only if they are R (essentially retaining their zero
values). For these registers no other procedures are performed.

For the trivially implemented Diagnostic registers, a debug warning is
produced on read/write attempts.

PLEASE NOTE: these registers will not be active, nor will migrate, until
a compatibility flag will be set (in a later patch in this series).

The registers implemented here are:

Transmit:
RW: AIT

Management:
RW: WUC     WUS     IPAV    IP6AT*  IP4AT*  FFLT*   WUPM*   FFMT*   FFVT*

Diagnostic:
RW: RDFH    RDFT    RDFHS   RDFTS   RDFPC   PBM*    TDFH    TDFT    TDFHS
    TDFTS   TDFPC

Statistic:
RW: FCRUC
R:  RNBC    TSCTFC  MGTPRC  MGTPDC  MGTPTC  RFC     RJC     SCC     ECOL
    LATECOL MCC     COLC    DC      TNCRS   SEC     CEXTERR RLEC    XONRXC
    XONTXC  XOFFRXC XOFFTXC

Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2015-11-12 15:26:53 +08:00
Leonid Bloch
bc0f0674f0 e1000: Introduced an array to control the access to the MAC registers
The array of uint8_t's which is introduced here, contains access metadata
about the MAC registers: if a register is accessible, but partly implemented,
or if a register requires a certain compatibility flag in order to be
accessed. Currently, 6 hypothetical flags are supported (3 exist for e1000
so far) but in the future, if more than 6 flags will be needed, the datatype
of this array can simply be swapped for a larger one.

This patch is intended to solve the following current problems:

1) In a scenario of migration between different versions of QEMU, which
differ by the MAC registers implemented in them, some registers need not to
be active if a compatibility flag is set, in order to preserve the machine's
state perfectly for the older version. Checking this for each register
individually, would create a lot of clutter in the code.

2) Some registers are (or may be) only partly implemented (e.g.
placeholders that allow reading and writing, but lack other functions).
In such cases it is better to print a debug warning on read/write attempts.
As above, dealing with this functionality on a per-register level, would
require longer and more messy code.

Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2015-11-12 15:26:52 +08:00
Leonid Bloch
9e11773417 e1000: Add support for migrating the entire MAC registers' array
This patch makes the migration of the entire array of MAC registers
possible during live migration. The entire array is just 128 KB long, so
practically no penalty should be felt when transmitting it, additionally
to the previously transmitted individual registers. The advantage here is
eliminating the need to introduce new vmstate subsections in the future,
when additional MAC registers will be implemented.

Backward compatibility is preserved by introducing a e1000-specific
boolean parameter (in a later patch), which will be on by default.
Setting it to off would enable migration to older versions of QEMU.

Additionally, this parameter will be used to control the access to the
extra MAC registers in the future.

Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2015-11-12 15:26:52 +08:00
Leonid Bloch
20f3e86362 e1000: Cosmetic and alignment fixes
This fixes some alignment and cosmetic issues. The changes are made
in order that the following patches in this series will look like
integral parts of the code surrounding them, while conforming to the
coding style. Although some changes in unrelated areas are also made.

Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2015-11-12 13:48:53 +08:00
Stefan Weil
ecc804cac3 slirp: Fix type casts and format strings in debug code
Casting pointers to long won't work on 64 bit Windows.
It is not needed with the right format strings.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2015-11-12 13:48:36 +08:00
Alexey Kardashevskiy
0a9516c2d6 monitor/target-ppc: Define target_get_monitor_def
At the moment get_monitor_def() returns only registers from statically
defined monitor_defs array. However there is a lot of BOOK3S SPRs
which are not in the list and cannot be printed from the monitor.

This adds a new target platform hook - target_get_monitor_def().
The hook is called if a register was not found in the static
array returned by the target_monitor_defs() hook.

The hook is only defined for POWERPC, it returns registered
SPRs and fails on unregistered ones providing the user with information
on what is actually supported on the running CPU. The register value is
saved as uint64_t as it is the biggest supported register size;
target_ulong cannot be used because of the stub - it is in a "common"
code and cannot include "cpu.h", etc; this is also why the hook prototype
is redefined in the stub instead of being included from some header.

This replaces static descriptors for GPRs, FPRs, SRs with a helper which
looks for a value in a corresponding array in the CPUPPCState.
The immediate effect is that all 32 SRs can be printed now (instead of 16);
later this can be reused for VSX or TM registers.

This replaces callbacks for MSR and XER with static descriptors in
monitor_defs as they are stored in CPUPPCState.

While we are here, this adds "cr" as a synonym of "ccr".

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-11-12 14:53:36 +11:00
Mark Cave-Ayland
cffc331a31 cuda.c: add delay to setting of SR_INT bit
MacOS 9 is racy when it comes to accessing the shift register. Fix this by
introducing a small delay between data accesses and raising the SR_INT
interrupt bit.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-11-12 13:15:55 +11:00
Mark Cave-Ayland
a53cfdcca2 cuda.c: fix T2 timer and enable its interrupt
Fix the counter loading logic and enable the T2 interrupt when the timer
expires. Otherwise MacOS 9 hangs on boot.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-11-12 13:15:55 +11:00
Mark Cave-Ayland
0174adb611 cuda.c: rename get_counter() state variable from s to ti for consistency
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-11-12 13:15:55 +11:00
Mark Cave-Ayland
eda14abbb8 cuda.c: refactor get_tb() so that the time can be passed in
This is in preparation for sharing the code between timers.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-11-12 13:15:55 +11:00
Mark Cave-Ayland
b5ac04103b cuda.c: add defines for CUDA registers
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-11-12 13:15:55 +11:00
Mark Cave-Ayland
d271ae36dc cuda.c: fix CUDA SR interrupt clearing
Make sure that we also clear the data and clock interrupts at the same time.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-11-12 13:15:55 +11:00
Mark Cave-Ayland
ce8d3b647b cuda.c: implement dummy IIC access commands
These are used by MacOS 9 on boot. Here we return an error except for 4-byte
commands which write to the IIC bus in a similar manner to MOL.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-11-12 13:15:54 +11:00
Mark Cave-Ayland
f1f46f74a9 cuda.c: implement simple CUDA_GET_6805_ADDR command
This simply returns an empty response with no error status as implemented by
MOL to allow MacOS 9 boot to proceed further.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-11-12 13:15:54 +11:00
Mark Cave-Ayland
4202e63c04 cuda.c: fix CUDA_PACKET response packet format
According to comments in MOL, the response to a CUDA_PACKET should be one of
the following:

Reply: (CUDA_PACKET, status, cmd)
Error: (ERROR_PACKET, status, CUDA_PACKET, cmd)

Update cuda_receive_packet() accordingly to reflect this in order to make
MacOS 9 happy.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-11-12 13:15:54 +11:00
Mark Cave-Ayland
6729aa4013 cuda.c: fix CUDA ADB error packet format
According to MOL, ADB error packets should be of the form (type, status, cmd)
rather than just (type, status). This fixes ADB device detection under MacOS 9.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-11-12 13:15:54 +11:00
Alexander Graf
72f1f97d49 PPC: mac99: Always add USB controller
The mac99 machines always have a USB controller. Usually not having one around
doesn't hurt quite as much, but Mac OS 9 really really wants one or it crashes
on bootup.

So always add OHCI to make it happy.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-11-12 13:15:54 +11:00
Alexander Graf
488661ee9d PPC: Fix lswx bounds checks
The lswx instruction checks whether the desired string actually fits
into all defined registers. Unfortunately it does the calculation wrong,
resulting in illegal instruction traps for loads that really should fit.

Fix it up, making Mac OS happier.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-11-12 13:15:54 +11:00
Alexander Graf
4248b336d3 PPC: Allow Rc bit to be set on mtspr
According to the ISA setting the Rc bit on mtspr is undefined behavior.
Real 750 hardware simply ignores the bit and doesn't touch cr0 though.

Unfortunately, Mac OS 9 relies on this fact and executes a few mtspr
instructions (to set XER for example) with Rc set.

So let's handle the bit the same way hardware does and ignore it.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-11-12 13:15:54 +11:00
Peter Maydell
7497b8dddc Merge remote-tracking branch 'remotes/cody/tags/block-pull-request' into staging
# gpg: Signature made Wed 11 Nov 2015 17:59:33 GMT using RSA key ID C0DE3057
# gpg: Good signature from "Jeffrey Cody <jcody@redhat.com>"
# gpg:                 aka "Jeffrey Cody <jeff@codyprime.org>"
# gpg:                 aka "Jeffrey Cody <codyprime@gmail.com>"

* remotes/cody/tags/block-pull-request:
  gluster: allocate GlusterAIOCBs on the stack

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-11 23:20:07 +00:00
Peter Maydell
31e49ac192 Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20151111' into staging
Hopefully last big batch of s390x patches, including:
- bugfixes for LE host and for pci translation
- MAINTAINERS update
- hugetlbfs enablement (kernel patches pending)
- boot from El Torito iso images on virtio-blk
  (boot from scsi pending)
- cleanup in the ipl device code

There's also a helper function for resetting busless devices in the
qdev core in there.

# gpg: Signature made Wed 11 Nov 2015 17:49:58 GMT using RSA key ID C6F02FAF
# gpg: Good signature from "Cornelia Huck <huckc@linux.vnet.ibm.com>"
# gpg:                 aka "Cornelia Huck <cornelia.huck@de.ibm.com>"

* remotes/cohuck/tags/s390x-20151111:
  s390: deprecate the non-ccw machine in 2.5
  s390x/ipl: switch error reporting to error_setg
  s390x/ipl: clean up qom definitions and turn into TYPE_DEVICE
  qdev: provide qdev_reset_all_fn()
  pc-bios/s390-ccw: rebuild image
  pc-bios/s390-ccw: El Torito 16-bit boot image size field workaround
  pc-bios/s390-ccw: El Torito s390x boot entry check
  pc-bios/s390-ccw: ISO-9660 El Torito boot implementation
  pc-bios/s390-ccw: Always adjust virtio sector count
  s390x/kvm: don't enable CMMA when hugetlbfs will be used
  s390x: switch to memory_region_allocate_system_memory
  MAINTAINERS: update virtio-ccw/s390 git tree
  MAINTAINERS: update s390 file patterns
  s390x/pci : fix up s390 pci iommu translation function
  s390x/css: sense data endianness

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-11 18:23:08 +00:00
Eric Blake
455b0fde8c error: More error_setg() usage
A few uses of error_set(ERROR_CLASS_GENERIC_ERROR) were missed in
c6bd8c706, or have snuck in since.  Nuke them.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1447224690-9743-19-git-send-email-eblake@redhat.com>
Acked-by: Andreas Färber <afaerber@suse.de>
[Indentation tidied up, commit message tweaked]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-11 18:56:26 +01:00
Peter Maydell
2869653f23 Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches

# gpg: Signature made Wed 11 Nov 2015 16:03:19 GMT using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"

* remotes/kevin/tags/for-upstream: (41 commits)
  iotests: Check for quorum support in test 139
  qcow2: Fix qcow2_get_cluster_offset() for zero clusters
  iotests: Add tests for the x-blockdev-del command
  block: Add 'x-blockdev-del' QMP command
  block: Add blk_get_refcnt()
  mirror: block all operations on the target image during the job
  qemu-iotests: fix -valgrind option for check
  qemu-iotests: fix cleanup of background processes
  qemu-io: Correct error messages
  qemu-io: Check for trailing chars
  qemu-io: fix cvtnum lval types
  block: test 'blockdev-snapshot' using a file BDS as the overlay
  block: Remove inner quotation marks in iotest 085
  block: Disallow snapshots if the overlay doesn't support backing files
  throttle: Use bs->throttle_state instead of bs->io_limits_enabled
  throttle: Check for pending requests in throttle_group_unregister_bs()
  qemu-img: add check for zero-length job len
  qcow2: avoid misaligned 64bit bswap
  qemu-iotests: Test the reopening of overlay_bs in 'block-commit'
  commit: reopen overlay_bs before base
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-11 16:43:19 +00:00
Christian Borntraeger
3c4c694c7c s390: deprecate the non-ccw machine in 2.5
The non-ccw machine for s390 (s390-virtio) is not very well maintained
and caused several issues in the past:
- aliases like virtio-blk did not work for s390
- virtio refactoring failed due to long standing bugs (e.g.see
commit cb927b8a "s390-virtio: Accommodate guests using virtqueues too early")
- some features like memory hotplug will cause trouble due to virtio storage
  being above guest memory
- the boot loader bios no longer seems to work. the source code of that
  loader is also no longer maintained

2.4 changed the default to the ccw machine, let's deprecate the old
machine for 2.5.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Message-Id: <1446811645-25565-1-git-send-email-borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2015-11-11 17:21:39 +01:00
David Hildenbrand
8f04e88e2c s390x/ipl: switch error reporting to error_setg
Now that we can report errors in the realize function, let's replace
the fprintf's and hw_error's with error_setg.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2015-11-11 17:21:39 +01:00
David Hildenbrand
04fccf106e s390x/ipl: clean up qom definitions and turn into TYPE_DEVICE
Let's move the qom definitions of the ipl device into ipl.h, replace
"s390-ipl" by a proper type define, turn it into a TYPE_DEVICE
and remove the unneeded class definition.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2015-11-11 17:21:39 +01:00
David Hildenbrand
ff8de0757f qdev: provide qdev_reset_all_fn()
For TYPE_DEVICE, the dc->reset() function is not called on system resets
yet. Until that is changed, we have to manually register a reset handler.
Let's provide qdev_reset_all_fn(), that can directly be used - just like
the reset handler that is already available for qbus.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2015-11-11 17:21:39 +01:00
Cornelia Huck
81a93ee64d pc-bios/s390-ccw: rebuild image
Contains:
  pc-bios/s390-ccw: Always adjust virtio sector count
  pc-bios/s390-ccw: ISO-9660 El Torito boot implementation
  pc-bios/s390-ccw: El Torito s390x boot entry check
  pc-bios/s390-ccw: El Torito 16-bit boot image size field workaround

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2015-11-11 17:21:39 +01:00
Maxim Samoylov
869648e87e pc-bios/s390-ccw: El Torito 16-bit boot image size field workaround
Because of El Torito spec flaw boot image size needs to be verified.

Boot catalog entry size field has 16-bit width, and specifies size
in 512-byte units.

Thus, boot image size cannot exceed 32M.

We actually search for the file to get the file size.

This is done by scanning the ISO directory tree for the ISO block number
and reading the file size from the directory entry.

Signed-off-by: Maxim Samoylov <max7255@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2015-11-11 17:21:39 +01:00
Maxim Samoylov
ba21f0cca8 pc-bios/s390-ccw: El Torito s390x boot entry check
Boot entry is considered compatible if boot image is Linux kernel
with matching S390 Linux magic string.

Empty boot images with sector_count == 0 are considered broken.

Signed-off-by: Maxim Samoylov <max7255@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2015-11-11 17:21:39 +01:00
Maxim Samoylov
866cac91e0 pc-bios/s390-ccw: ISO-9660 El Torito boot implementation
This patch enables boot from media formatted according to
ISO-9660 and El Torito bootable CD specification.

We try to boot from device as ISO-9660 media when SCSI IPL failed.

The first boot catalog entry with bootable flag is used.

ISO-9660 media with default 2048-bytes sector size only is supported.

Signed-off-by: Maxim Samoylov <max7255@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2015-11-11 17:21:39 +01:00
Maxim Samoylov
38150be860 pc-bios/s390-ccw: Always adjust virtio sector count
Let's always adjust the sector number to be read using the current
virtio block size value.

This prepares for the implementation of IPL from ISO-9660 media.

Signed-off-by: Maxim Samoylov <max7255@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2015-11-11 17:21:39 +01:00
Dominik Dingel
4c292a0097 s390x/kvm: don't enable CMMA when hugetlbfs will be used
On hugetlbfs CMMA will not be useful as every ESSA instruction will trap.
So don't offer CMMA to guests with a hugepages backing.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2015-11-11 17:21:39 +01:00
Dominik Dingel
ae23a33591 s390x: switch to memory_region_allocate_system_memory
By replacing memory_region_init_ram with memory_region_allocate_system_memory
we gain goodies like mem-path backends. This will allow us to use hugetlbfs
once the kernel supports it.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2015-11-11 17:21:39 +01:00
Cornelia Huck
3e9ed24b84 MAINTAINERS: update virtio-ccw/s390 git tree
Let's reference the git branch I actually use, and add Christian's
git tree.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2015-11-11 17:21:39 +01:00
Cornelia Huck
c5bfb202bb MAINTAINERS: update s390 file patterns
We were missing some files, and some files should get an additional
entry to add the people actually looking after the code.

Reported-by: Thomas Huth <thuth@redhat.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2015-11-11 17:21:38 +01:00
Yi Min Zhao
dce1b08924 s390x/pci : fix up s390 pci iommu translation function
On s390x, each pci device has its own iommu, which is only properly
setup in qemu once the mpcifc instruction used to register the
translation table has been intercepted. Therefore, for a pci device that
is not configured or has not been initialized, proper translation is
neither required nor possible. Moreover, we may not have a host bridge
device ready yet.

This was exposed by a recent vfio change that triggers iommu translation
during the initialization of the vfio pci device. Let's do an early exit
in that case.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Yi Min Zhao <zyimin@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2015-11-11 17:21:38 +01:00
Cornelia Huck
b498484ed4 s390x/css: sense data endianness
We keep the device's sense data in a byte array (following the
architecture), but the ecws are an array of 32 bit values. If we
just blindly copy the values, the sense data will change from
de-facto BE data to de-facto cpu-endian data, which means we end
up doing an incorrect conversion on LE hosts.

Let's just explicitly convert to cpu-endianness while assembling
the irb.

Reported-by: Andy Lutomirski <luto@kernel.org>
Tested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-11-11 17:21:38 +01:00
52074d0f66 qga: fix append file open modes for win32
For append file open modes, use FILE_APPEND_DATA for the desired access
for writing at the end of the file.

Version 2:
For "a+", "ab+", and "a+b" modes use FILE_APPEND_DATA|GENERIC_READ.
ORing in GENERIC_READ starts a read at the begining of the file.  All
writes will append to the end fo the file.

Added white space to maintain the alignment of the guest_file_open_modes[].

Signed-off-by: Kirk Allan <kallan@suse.com>
Cc: qemu-stable@nongnu.org
* use FILE_GENERIC_APPEND macro, which provides same semantics as
  FILE_APPEND_DATA, but retains other flags from GENERIC_WRITE
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-11-11 10:21:02 -06:00
Kevin Wolf
4d07c720f4 Merge remote-tracking branch 'mreitz/tags/pull-block-for-kevin-2015-11-11' into queue-block
Block patches from 2015-10-26 until 2015-11-11.

# gpg: Signature made Wed Nov 11 17:00:50 2015 CET using RSA key ID E838ACAD
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>"

* mreitz/tags/pull-block-for-kevin-2015-11-11:
  iotests: Check for quorum support in test 139
  qcow2: Fix qcow2_get_cluster_offset() for zero clusters
  iotests: Add tests for the x-blockdev-del command
  block: Add 'x-blockdev-del' QMP command
  block: Add blk_get_refcnt()
  mirror: block all operations on the target image during the job
  qemu-iotests: fix -valgrind option for check
  qemu-iotests: fix cleanup of background processes

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11 17:02:02 +01:00
Alberto Garcia
92e6898774 iotests: Check for quorum support in test 139
The quorum driver is always built in, but it is disabled during
run-time if there's no SHA256 support available (see commit e94867e).

This patch skips the quorum test in iotest 139 in that case.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 1447172891-20410-1-git-send-email-berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2015-11-11 16:59:44 +01:00
Kevin Wolf
a99dfb45f2 qcow2: Fix qcow2_get_cluster_offset() for zero clusters
When searching for contiguous zero clusters, we only need to check the
cluster type. Before this patch, an increasing offset (L2E_OFFSET_MASK)
was expected, so that the function never returned more than a single
zero cluster in practice. This patch fixes it to actually return as many
contiguous zero clusters as it can.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1446657384-5907-1-git-send-email-kwolf@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2015-11-11 16:55:29 +01:00
Alberto Garcia
342075fd0e iotests: Add tests for the x-blockdev-del command
Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 57c3b0d4d0c73ddadd19e5bded9492c359cc4568.1446475331.git.berto@igalia.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2015-11-11 16:55:29 +01:00
Alberto Garcia
81b936ae70 block: Add 'x-blockdev-del' QMP command
This command is still experimental, hence the name.

This is the companion to 'blockdev-add'. It allows deleting a
BlockBackend with its associated BlockDriverState tree, or a
BlockDriverState that is not attached to any backend.

In either case, the command fails if the reference count is greater
than 1 or the BlockDriverState has any parents.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 6cfc148c77aca1da942b094d811bfa3fcf7ac7bb.1446475331.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2015-11-11 16:55:28 +01:00
Alberto Garcia
f636ae85f3 block: Add blk_get_refcnt()
This function returns the reference count of a given BlockBackend.
For convenience, it returns 0 if the BlockBackend pointer is NULL.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: dfdd8a17dbe3288842840636d2cfe5bb895abcb0.1446475331.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2015-11-11 16:55:28 +01:00
Alberto Garcia
10f3cd15dd mirror: block all operations on the target image during the job
There's nothing preventing the target image from being used by other
operations during the 'drive-mirror' job, so we should block them all
until the job is done.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 82b88fd04cde918a08a6f9a4ab85626d7fd7b502.1446475331.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2015-11-11 16:55:28 +01:00
Jeff Cody
e6c17669eb qemu-iotests: fix -valgrind option for check
Commit 934659c switched the iotests to run qemu-io from a bash subshell,
in order to catch segfaults.  This method is incompatible with the
current valgrind_qemu_io() bash function.

Move the valgrind usage into the exec subshell in _qemu_io_wrapper(),
while making sure the original return value is passed back to the
caller.

Update test output for tests 039, 061, and 137 as it looks for the
specific subshell command when the process is terminated.

Reported-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Message-id: 0066fd85d26ca641a1c25135ff2479b7985701cf.1446232490.git.jcody@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2015-11-11 16:55:28 +01:00
Jeff Cody
f6c8c2e055 qemu-iotests: fix cleanup of background processes
Commit 934659c switched the iotests to run qemu and qemu-nbd from a bash
subshell, in order to catch segfaults.  Unfortunately, this means the
process PID cannot be captured via '$!'. We stopped killing qemu and
qemu-nbd processes, leaving a lot of orphaned, running qemu processes
after executing iotests.

Since the process is using exec in the subshell, the PID is the
same as the subshell PID.

Track these PIDs for cleanup using pidfiles in the $TEST_DIR. Only
track the qemu PID, however, if requested - not all usage requires
killing the process.

Reported-by: John Snow <jsnow@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Message-id: 9e4f958b3895b7259b98d845bb46f000ba362869.1446232490.git.jcody@redhat.com
[mreitz@redhat.com: Replaced '! -z "..."' by '-n "..."']
Signed-off-by: Max Reitz <mreitz@redhat.com>
2015-11-11 16:55:28 +01:00
Paolo Bonzini
c833d1e8f5 gluster: allocate GlusterAIOCBs on the stack
This is simpler now that the driver has been converted to coroutines.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11 10:45:39 -05:00
John Snow
a9ecfa004f qemu-io: Correct error messages
Reported-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11 16:40:10 +01:00
John Snow
ef5a788527 qemu-io: Check for trailing chars
Make sure there's not trailing garbage, e.g.
"64k-whatever-i-want-here"

Reported-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11 16:40:10 +01:00
John Snow
9b0beaf3de qemu-io: fix cvtnum lval types
cvtnum() returns int64_t: we should not be storing this
result inside of an int.

In a few cases, we need an extra sprinkling of error handling
where we expect to pass this number on towards a function that
expects something smaller than int64_t.

Reported-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11 16:39:59 +01:00
Alberto Garcia
3fa123d059 block: test 'blockdev-snapshot' using a file BDS as the overlay
This test checks that it is not possible to create a snapshot if the
requested overlay node is a BDS which does not support backing images.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11 16:25:48 +01:00
Alberto Garcia
f2d7f16f94 block: Remove inner quotation marks in iotest 085
This patch removes the inner quotation marks in all cases like this:

   cmd=" ... "${variable}" ... "

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11 16:25:48 +01:00
Alberto Garcia
08b24cfe37 block: Disallow snapshots if the overlay doesn't support backing files
This addresses scenarios like this one:

  { 'execute': 'blockdev-add', 'arguments':
    { 'options': { 'driver': 'qcow2',
                   'node-name': 'new0',
                   'file': { 'driver': 'file',
                             'filename': 'new.qcow2',
                             'node-name': 'file0' } } } }

  { 'execute': 'blockdev-snapshot', 'arguments':
    { 'node': 'virtio0',
      'overlay': 'file0' } }

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11 16:25:48 +01:00
Alberto Garcia
a0d64a61db throttle: Use bs->throttle_state instead of bs->io_limits_enabled
There are two ways to check for I/O limits in a BlockDriverState:

- bs->throttle_state: if this pointer is not NULL, it means that this
  BDS is member of a throttling group, its ThrottleTimers structure
  has been initialized and its I/O limits are ready to be applied.

- bs->io_limits_enabled: if true it means that the throttle_state
  pointer is valid _and_ the limits are currently enabled.

The latter is used in several places to check whether a BDS has I/O
limits configured, but what it really checks is whether requests
are being throttled or not. For example, io_limits_enabled can be
temporarily set to false in cases like bdrv_read_unthrottled() without
otherwise touching the throtting configuration of that BDS.

This patch replaces bs->io_limits_enabled with bs->throttle_state in
all cases where what we really want to check is the existence of I/O
limits, not whether they are currently enabled or not.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11 16:25:47 +01:00
Alberto Garcia
5ac724184c throttle: Check for pending requests in throttle_group_unregister_bs()
throttle_group_unregister_bs() removes a BlockDriverState from its
throttling group and destroys the timers. This means that there must
be no pending throttled requests at that point (because it would be
impossible to complete them), so the caller has to drain them first.

At the moment throttle_group_unregister_bs() is only called from
bdrv_io_limits_disable(), which already takes care of draining the
requests, so there's nothing to worry about, but this patch makes
this invariant explicit in the documentation and adds the relevant
assertions.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11 16:25:47 +01:00
John Snow
62547b8a1c qemu-img: add check for zero-length job len
The mirror job doesn't update its total length until
it has already started running, so we should translate
a zero-length job-len as meaning 0%.

Otherwise, we may get divide-by-zero faults.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11 16:25:47 +01:00
John Snow
9533423063 qcow2: avoid misaligned 64bit bswap
If we create a buffer directly on the stack by using 12 bytes, there's
no guarantee the 64bit value we want to swap will be aligned, which
could cause errors with undefined behavior.

Spotted with clang -fsanitize=undefined and observed in iotests 15, 26,
44, 115 and 121.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11 16:25:47 +01:00
Alberto Garcia
bcdce5a73c qemu-iotests: Test the reopening of overlay_bs in 'block-commit'
The 'block-commit' command needs the overlay image of 'top' to
be opened in read-write mode in order to update the backing file
string. If 'top' is not the active layer or its backing file then its
overlay needs to be reopened during the block job.

This is a test case for that scenario.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11 16:25:47 +01:00
Alberto Garcia
3db2bd5508 commit: reopen overlay_bs before base
'block-commit' needs write access to two different nodes of the chain:

- 'base', because that's where the data is written to.
- the overlay of 'top', because it needs to update the backing file
  string to point to 'base' after the operation.

Both images have to be opened in read-write mode, and commit_start()
takes care of reopening them if necessary.

With the current implementation, however, when overlay_bs is reopened
in read-write mode it has the side effect of making 'base' read-only
again, eventually making 'block-commit' fail.

This needs to be fixed in bdrv_reopen(), but until we get to that it
can be worked around simply by swapping the order of base and
overlay_bs in the reopen queue.

In order to reproduce this bug, overlay_bs needs to be initially in
read-only mode. That is: the 'top' parameter of 'block-commit' cannot
be the active layer nor its immediate backing chain.

Cc: qemu-stable@nongnu.org
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11 16:25:47 +01:00
Alberto Garcia
89e3a2d86d block: add tests for the 'blockdev-snapshot' command
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11 16:25:47 +01:00
Alberto Garcia
43de7e2de0 block: add a 'blockdev-snapshot' QMP command
One of the limitations of the 'blockdev-snapshot-sync' command is that
it does not allow passing BlockdevOptions to the newly created
snapshots, so they are always opened using the default values.

Extending the command to allow passing options is not a practical
solution because there is overlap between those options and some of
the existing parameters of the command.

This patch introduces a new 'blockdev-snapshot' command with a simpler
interface: it just takes two references to existing block devices that
will be used as the source and target for the snapshot.

Since the main difference between the two commands is that one of them
creates and opens the target image, while the other uses an already
opened one, the bulk of the implementation is shared.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11 16:25:47 +01:00
Alberto Garcia
3e8c2e5705 block: support passing 'backing': '' to 'blockdev-add'
Passing an empty string allows opening an image but not its backing
file. This was already described in the API documentation, only the
implementation was missing.

This is useful for creating snapshots using images opened with
blockdev-add, since they are not supposed to have a backing image
before the operation.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11 16:25:47 +01:00
Alberto Garcia
a911e6ae7c block: rename BlockdevSnapshot to BlockdevSnapshotSync
We will introduce the 'blockdev-snapshot' command that will require
its own struct for the parameters, so we need to rename this one in
order to avoid name clashes.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11 16:25:47 +01:00
Alberto Garcia
a39a24fbb0 block: check for existing device IDs in external_snapshot_prepare()
The 'snapshot-node-name' parameter of blockdev-snapshot-sync allows
setting the node name of the image that is going to be created.

Before creating the image, external_snapshot_prepare() checks that the
name is not already being used. The check is however incomplete since
it only considers existing node names, but node names must not clash
with device IDs either because they share the same namespace.

If the user attempts to create a snapshot using the name of an
existing device for the 'snapshot-node-name' parameter the operation
will eventually fail, but only after the new image has been created.

This patch replaces bdrv_find_node() with bdrv_lookup_bs() to extend
the check to existing device IDs, and thus detect possible name
clashes before the new image is created.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11 16:25:47 +01:00
Max Reitz
adfe20303f iotests: Add test for change-related QMP commands
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11 16:25:47 +01:00
Max Reitz
baead0abef hmp: Add read-only-mode option to change command
Expose the new read-only-mode option of 'blockdev-change-medium' for the
'change' HMP command.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11 16:25:47 +01:00
Max Reitz
39ff43d9e1 blockdev: read-only-mode for blockdev-change-medium
Add an option to qmp_blockdev_change_medium() which allows changing the
read-only status of the block device whose medium is changed.

Some drives do not have a inherently fixed read-only status; for
instance, floppy disks can be set read-only or writable independently of
the drive. Some users may find it useful to be able to therefore change
the read-only status of a block device when changing the medium.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11 16:23:34 +01:00
Max Reitz
1068674927 hmp: Use blockdev-change-medium for change command
Use separate code paths for the two overloaded functions of the 'change'
HMP command, and invoke the 'blockdev-change-medium' QMP command if used
on a block device (by calling qmp_blockdev_change_medium()).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11 16:22:47 +01:00
Max Reitz
24fb413300 qmp: Introduce blockdev-change-medium
Introduce a new QMP command 'blockdev-change-medium' which is intended
to replace the 'change' command for block devices. The existing function
qmp_change_blockdev() is accordingly renamed to
qmp_blockdev_change_medium().

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11 16:22:47 +01:00
Max Reitz
f1f5706657 block: Inquire tray state before tray-moved events
blk_dev_change_media_cb() is called for all potential tray movements;
however, it is possible to request closing the tray but nothing actually
happening (on a floppy disk drive without a medium).

Thus, the actual tray status should be inquired before sending a
tray-moved event (and an event should be sent whenever the status
changed).

Checking @load is now superfluous; it was necessary because it was
possible to change a medium without having explicitly opened the tray
and closed it again (or it might have been possible, at least). This is
no longer possible, though.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11 16:22:47 +01:00
Max Reitz
de2c6c0536 blockdev: Implement change with basic operations
Implement 'change' on block devices by calling blockdev-open-tray,
blockdev-remove-medium, blockdev-insert-medium (a variation of that
which does not need a node-name) and blockdev-close-tray.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11 16:22:47 +01:00
Max Reitz
38f54bd1ee blockdev: Implement eject with basic operations
Implement 'eject' by calling blockdev-open-tray and
blockdev-remove-medium.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11 16:22:47 +01:00
Max Reitz
d129988289 blockdev: Add blockdev-insert-medium
And a helper function for that, which directly takes a pointer to the
BDS to be inserted instead of its node-name (which will be used for
implementing 'change' using blockdev-insert-medium).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11 16:22:47 +01:00
Max Reitz
2814f67271 blockdev: Add blockdev-remove-medium
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11 16:22:47 +01:00
Max Reitz
abaaf59d24 blockdev: Add blockdev-close-tray
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11 16:22:47 +01:00
Max Reitz
7d8a9f71b9 blockdev: Add blockdev-open-tray
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11 16:22:47 +01:00
Max Reitz
38cb18f5b7 block: Add functions for inheriting a BBRS
In order to open a BDS which inherits a BB's root state,
blk_get_open_flags_from_root_state() is used to inquire the flags to be
passed to bdrv_open(), and blk_apply_root_state() is used to apply the
remaining state after the BDS has been opened.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11 16:22:46 +01:00
Max Reitz
c69a4dd899 block: Make bdrv_states public
When inserting a BDS tree into a BB, we will need to add the root BDS to
this list. Since we will want to do that in the blockdev-insert-medium
implementation in blockdev.c, we will need access to it there.

This patch is not exactly elegant, but bdrv_states will be removed in
the future anyway because we no longer need it since we have BBs.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11 16:22:46 +01:00
Max Reitz
1c95f7e1af block: Add blk_remove_bs()
This function removes the BlockDriverState associated with the given
BlockBackend from that BB and sets the BDS pointer in the BB to NULL.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11 16:22:46 +01:00
Alberto Garcia
9f4ed6fbd2 block: Don't call blk_bs() twice in bdrv_lookup_bs()
Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-11-11 16:22:46 +01:00
Peter Maydell
3c07587d49 Merge remote-tracking branch 'remotes/dgibson/tags/ppc-next-20151111' into staging
ppc patch queue - 2015-11-11

Highlights:
  - Updated SLOF version for "pseries machine
  - Bugfix / cleanup for KVM hash page table allocation

# gpg: Signature made Wed 11 Nov 2015 02:30:51 GMT using RSA key ID 20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-next-20151111:
  spapr: Handle failure of KVM_PPC_ALLOCATE_HTAB ioctl
  ppc: Let kvmppc_reset_htab() return 0 for !CONFIG_KVM
  pseries: Update SLOF firmware image to qemu-slof-20151103
  ppc: Add/Re-introduce MMU model definitions needed by PR KVM

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-11 09:34:18 +00:00
Bharata B Rao
b41d320fef spapr: Handle failure of KVM_PPC_ALLOCATE_HTAB ioctl
KVM_PPC_ALLOCATE_HTAB ioctl can return -ENOMEM for KVM guests and QEMU
never handled this correctly. But this didn't cause any problems till
now as KVM_PPC_ALLOCATE_HTAB ioctl returned with smaller than requested
HTAB when enough contiguous memory wasn't available in the host.
After the proposed kernel change: https://patchwork.ozlabs.org/patch/530501/,
KVM_PPC_ALLOCATE_HTAB ioctl will not fallback to lower sized HTAB
allocation and will fail if requested HTAB size can't be met.

Check for such failures in QEMU and abort appropriately. This will
prevent guest kernel from hanging/freezing during early boot by doing
graceful exit when host is unable to allocate requested HTAB.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-11-11 13:29:04 +11:00
Bharata B Rao
a3166f8f6e ppc: Let kvmppc_reset_htab() return 0 for !CONFIG_KVM
The !CONFIG_KVM implementation of kvmppc_reset_htab() returns -1
by default. Change this to return 0 so that we fall back to user space
HTAB allocation for emulated guests.

This fixes the make check failures for ppc64 emulated target.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-11-11 13:29:04 +11:00
Alexey Kardashevskiy
1210481958 pseries: Update SLOF firmware image to qemu-slof-20151103
The changes are:
1. supports recent binutils;
2. 64bit BARs behind PCI bridges supported;
3. Many fixes for USB keyboard support - keys, XHCI;
4. virtio-vga support.

This image was built with:
gcc version 4.8.3 20140911 (Red Hat 4.8.3-7) (GCC)
GNU ld version 2.23.2

The full changelog is:
  > version: update to 20151103
  > documentation: Add a clause about signing off
  > qemu/js2x/client: Support binutils >= 2.25.1
  > Fix special keys on USB
  > Fix function keys on USB
  > pci-scan: program 64-bit mem bar range in pci-bridge bar
  > Allow to build SLOF on Little Endian host
  > usb-xhci: add keyboard support
  > usb-xhci: ready the link trb early
  > usb-xhci: scan usb high speed ports
  > usb-xhci: bulk improve event handling loop
  > usb-xhci: return on allocation failure
  > usb-xhci: add delay in shutdown path
  > usb-xhci: event trbs does not need link trb
  > usb-hid: refactor usb key reading
  > takeover: Fix header includes
  > board-js2x: Add missing file dma-function.fs
  > vga: Add support for virtio-vga
  > qemu-vga: Use MMIO BAR instead of legacy IO ports
  > slof: Change call_c() function to a proper assembler function

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-11-11 13:28:45 +11:00
Bharata B Rao
ba3ecda05e ppc: Add/Re-introduce MMU model definitions needed by PR KVM
Commit aa4bb58752 (ppc: Add mmu_model defines for arch 2.03 and 2.07)
removed the mmu_model definition POWERPC_MMU_2_06a which is needed by
PR KVM. Reintroduce it and also add POWERPC_MMU_2_07a.

This fixes QEMU crash (qemu: fatal: Unknown MMU model) during booting
of PR KVM guest.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-11-11 11:05:30 +11:00
Peter Maydell
d93ae5b696 Merge remote-tracking branch 'remotes/awilliam/tags/vfio-update-20151110.0' into staging
VFIO updates 2015-11-10

 - Make Windows happy with vfio-pci devices exposed on conventional
   PCI buses on q35 by hiding PCIe capability (Alex Williamson)
 - Convert to g_new() where appropriate (Markus Armbruster)

# gpg: Signature made Tue 10 Nov 2015 19:46:41 GMT using RSA key ID 3BB08B22
# gpg: Good signature from "Alex Williamson <alex.williamson@redhat.com>"
# gpg:                 aka "Alex Williamson <alex@shazbot.org>"
# gpg:                 aka "Alex Williamson <alwillia@redhat.com>"
# gpg:                 aka "Alex Williamson <alex.l.williamson@gmail.com>"

* remotes/awilliam/tags/vfio-update-20151110.0:
  vfio: Use g_new() & friends where that makes obvious sense
  vfio/pci: Hide device PCIe capability on non-express buses for PCIe VMs

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-10 22:21:42 +00:00
Markus Armbruster
bdd81addf4 vfio: Use g_new() & friends where that makes obvious sense
g_new(T, n) is neater than g_malloc(sizeof(T) * n).  It's also safer,
for two reasons.  One, it catches multiplication overflowing size_t.
Two, it returns T * rather than void *, which lets the compiler catch
more type errors.

This commit only touches allocations with size arguments of the form
sizeof(T).  Same Coccinelle semantic patch as in commit b45c03f.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-11-10 12:11:08 -07:00
Alex Williamson
0282abf078 vfio/pci: Hide device PCIe capability on non-express buses for PCIe VMs
When we have a PCIe VM, such as Q35, guests start to care more about
valid configurations of devices relative to the VM view of the PCI
topology.  Windows will error with a Code 10 for an assigned device if
a PCIe capability is found for a device on a conventional bus.  We
also have the possibility of IOMMUs, like VT-d, where the where the
guest may be acutely aware of valid express capabilities on physical
hardware.

Some devices, like tg3 are adversely affected by this due to driver
dependencies on the PCIe capability.  The only solution for such
devices is to attach them to an express capable bus in the VM.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-11-10 12:11:08 -07:00
Peter Maydell
a77067f6ac Merge remote-tracking branch 'remotes/juanquintela/tags/migration/20151110' into staging
migration/next for 20151110

# gpg: Signature made Tue 10 Nov 2015 14:23:26 GMT using RSA key ID 5872D723
# gpg: Good signature from "Juan Quintela <quintela@redhat.com>"
# gpg:                 aka "Juan Quintela <quintela@trasno.org>"

* remotes/juanquintela/tags/migration/20151110: (57 commits)
  migration: qemu_savevm_state_cleanup becomes mandatory operation
  Inhibit ballooning during postcopy
  Disable mlock around incoming postcopy
  End of migration for postcopy
  Postcopy: Mark nohugepage before discard
  postcopy: Wire up loadvm_postcopy_handle_ commands
  Start up a postcopy/listener thread ready for incoming page data
  Postcopy; Handle userfault requests
  Round up RAMBlock sizes to host page sizes
  Host page!=target page: Cleanup bitmaps
  Don't iterate on precopy-only devices during postcopy
  Don't sync dirty bitmaps in postcopy
  postcopy: Check order of received target pages
  Postcopy: Use helpers to map pages during migration
  postcopy_ram.c: place_page and helpers
  Page request: Consume pages off the post-copy queue
  Page request: Process incoming page request
  Page request: Add MIG_RP_MSG_REQ_PAGES reverse command
  Postcopy: End of iteration
  Postcopy: Postcopy startup in migration thread
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-10 17:49:39 +00:00
Denis V. Lunev
15b3b8eaae migration: qemu_savevm_state_cleanup becomes mandatory operation
since commit
    commit 94f5a43704
    Author: Liang Li <liang.z.li@intel.com>
    Date:   Mon Nov 2 15:37:00 2015 +0800

    migration: defer migration_end & blk_mig_cleanup

when actual .cleanup callbacks calling was removed from complete operations.

The patch fixes regression introduced by the commit above results in
100% reliable assert for virtio-scsi VM with iothreads enabled during
'virsh create-snapshot' operation:
    assert(i != mr->ioeventfd_nb);
    memory_region_del_eventfd
    virtio_pci_set_host_notifier_internal
    virtio_pci_set_host_notifier
    virtio_scsi_dataplane_start
    virtio_scsi_handle_cmd
    virtio_queue_notify_vq
    virtio_queue_host_notifier_read
    aio_dispatch

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Liang Li <liang.z.li@intel.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
CC: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 15:00:28 +01:00
Dr. David Alan Gilbert
371ff5a3f0 Inhibit ballooning during postcopy
Postcopy detects accesses to pages that haven't been transferred yet
using userfaultfd, and it causes exceptions on pages that are 'not
present'.
Ballooning also causes pages to be marked as 'not present' when the
guest inflates the balloon.
Potentially a balloon could be inflated to discard pages that are
currently inflight during postcopy and that may be arriving at about
the same time.

To avoid this confusion, disable ballooning during postcopy.

When disabled we drop balloon requests from the guest.  Since ballooning
is generally initiated by the host, the management system should avoid
initiating any balloon instructions to the guest during migration,
although it's not possible to know how long it would take a guest to
process a request made prior to the start of migration.
Guest initiated ballooning will not know if it's really freed a page
of host memory or not.

Queueing the requests until after migration would be nice, but is
non-trivial, since the set of inflate/deflate requests have to
be compared with the state of the page to know what the final
outcome is allowed to be.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 15:00:28 +01:00
Dr. David Alan Gilbert
58b7c17e22 Disable mlock around incoming postcopy
Userfault doesn't work with mlock; mlock is designed to nail down pages
so they don't move, userfault is designed to tell you when they're not
there.

munlock the pages we userfault protect before postcopy.
mlock everything again at the end if mlock is enabled.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 15:00:28 +01:00
Dr. David Alan Gilbert
e9bef235d9 End of migration for postcopy
Tweak the end of migration cleanup; we don't want to close stuff down
at the end of the main stream, since the postcopy is still sending pages
on the other thread.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 15:00:28 +01:00
Dr. David Alan Gilbert
f952710757 Postcopy: Mark nohugepage before discard
Prior to servicing userfault requests we must ensure we've not got
huge pages in the area that might include non-transferred memory,
since a hugepage could incorrectly mark the whole huge page as present.

We mark the area as non-huge page (nhp) just before we perform
discards; the discard code now tells us to discard any areas
that haven't been sent (as well as any that are redirtied);
any already formed transparent-huge-pages get fragmented
by this discard process if they cotnain any discards.

Transparent huge pages that have been entirely transferred
and don't contain any discards are not broken by this mechanism;
they stay as huge pages.

By starting postcopy after a full precopy pass, many of the pages
then stay as huge pages; this is important for maintaining performance
after the end of the migration.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 15:00:28 +01:00
Dr. David Alan Gilbert
27c6825bd3 postcopy: Wire up loadvm_postcopy_handle_ commands
Wire up more of the handlers for the commands on the destination side,
in particular loadvm_postcopy_handle_run now has enough to start the
guest running.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 15:00:28 +01:00
Dr. David Alan Gilbert
c76201ab52 Start up a postcopy/listener thread ready for incoming page data
The loading of a device state (during postcopy) may access guest
memory that's still on the source machine and thus might need
a page fill; split off a separate thread that handles the incoming
page data so that the original incoming migration code can finish
off the device data.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 15:00:28 +01:00
Dr. David Alan Gilbert
c4faeed231 Postcopy; Handle userfault requests
userfaultfd is a Linux syscall that gives an fd that receives a stream
of notifications of accesses to pages registered with it and allows
the program to acknowledge those stalls and tell the accessing
thread to carry on.

We convert the requests from the kernel into messages back to the
source asking for the pages.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 15:00:28 +01:00
Dr. David Alan Gilbert
4ed023ce2a Round up RAMBlock sizes to host page sizes
RAMBlocks that are not a multiple of host pages in length
cause problems for postcopy (I've seen an ACPI table on aarch64
be 5k in length - i.e. 5x target-page), so round RAMBlock sizes
up to a host-page.

This potentially breaks migration compatibility due to changes
in RAMBlock sizes; however:
   1) x86 and s390 I think always have host=target page size
   2) When I've tried on Power the block sizes already seem aligned.
   3) I don't think there's anything else that maintains per-version
      machine-types for compatibility.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 15:00:28 +01:00
Dr. David Alan Gilbert
99e314ebca Host page!=target page: Cleanup bitmaps
Prior to the start of postcopy, ensure that everything that will
be transferred later is a whole host-page in size.

This is accomplished by discarding partially transferred host pages
and marking any that are partially dirty as fully dirty.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 15:00:28 +01:00
Dr. David Alan Gilbert
35ecd943e7 Don't iterate on precopy-only devices during postcopy
During the postcopy phase we must not call the iterate method on
precopy-only devices, since they may have done some cleanup during
the _complete call at the end of the precopy phase.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 15:00:28 +01:00
Dr. David Alan Gilbert
663e6c1df8 Don't sync dirty bitmaps in postcopy
Once we're in postcopy the source processors are stopped and memory
shouldn't change any more, so there's no need to look at the dirty
map.

There are two notes to this:
  1) If we do resync and a page had changed then the page would get
     sent again, which the destination wouldn't allow (since it might
     have also modified the page)
  2) Before disabling this I'd seen very rare cases where a page had been
     marked dirtied although the memory contents are apparently identical

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 15:00:28 +01:00
Dr. David Alan Gilbert
c53b7ddc61 postcopy: Check order of received target pages
Ensure that target pages received within a host page are in order.
This shouldn't trigger, but in the cases where the sender goes
wrong and sends stuff out of order it produces a corruption that's
really nasty to debug.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 15:00:27 +01:00
Dr. David Alan Gilbert
a71808772a Postcopy: Use helpers to map pages during migration
In postcopy, the destination guest is running at the same time
as it's receiving pages; as we receive new pages we must put
them into the guests address space atomically to avoid a running
CPU accessing a partially written page.

Use the helpers in postcopy-ram.c to map these pages.

qemu_get_buffer_in_place is used to avoid a copy out of qemu_file
in the case that postcopy is going to do a copy anyway.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 15:00:27 +01:00
Dr. David Alan Gilbert
696ed9a9b3 postcopy_ram.c: place_page and helpers
postcopy_place_page (etc) provide a way for postcopy to place a page
into guests memory atomically (using the copy ioctl on the ufd).

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 15:00:27 +01:00
Dr. David Alan Gilbert
a82d593b61 Page request: Consume pages off the post-copy queue
When transmitting RAM pages, consume pages that have been queued by
MIG_RPCOMM_REQPAGE commands and send them ahead of normal page scanning.

Note:
  a) After a queued page the linear walk carries on from after the
unqueued page; there is a reasonable chance that the destination
was about to ask for other closeby pages anyway.

  b) We have to be careful of any assumptions that the page walking
code makes, in particular it does some short cuts on its first linear
walk that break as soon as we do a queued page.

  c) We have to be careful to not break up host-page size chunks, since
this makes it harder to place the pages on the destination.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 15:00:27 +01:00
Dr. David Alan Gilbert
6c595cdee1 Page request: Process incoming page request
On receiving MIG_RPCOMM_REQ_PAGES look up the address and
queue the page.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 15:00:27 +01:00
Dr. David Alan Gilbert
1e2d90ebc5 Page request: Add MIG_RP_MSG_REQ_PAGES reverse command
Add MIG_RP_MSG_REQ_PAGES command on Return path for the postcopy
destination to request a page from the source.

Two versions exist:
   MIG_RP_MSG_REQ_PAGES_ID that includes a RAMBlock name and start/len
   MIG_RP_MSG_REQ_PAGES that just has start/len for use with the same
                        RAMBlock as a previous MIG_RP_MSG_REQ_PAGES_ID

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 15:00:27 +01:00
Dr. David Alan Gilbert
b10ac0c42c Postcopy: End of iteration
The end of migration in postcopy is a bit different since some of
the things normally done at the end of migration have already been
done on the transition to postcopy.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 15:00:27 +01:00
Dr. David Alan Gilbert
1d34e4bf6a Postcopy: Postcopy startup in migration thread
Rework the migration thread to setup and start postcopy.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 15:00:27 +01:00
Dr. David Alan Gilbert
f0a227ade4 postcopy: ram_enable_notify to switch on userfault
Mark the area of RAM as 'userfault'
Start up a fault-thread to handle any userfaults we might receive
from it (to be filled in later)

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 15:00:27 +01:00
Dr. David Alan Gilbert
1caddf8a81 postcopy: Incoming initialisation
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 15:00:27 +01:00
Dr. David Alan Gilbert
e0b266f01d migration_completion: Take current state
Soon we'll be in either ACTIVE or POSTCOPY_ACTIVE when we
complete migration, and we need to know which we expect to be
in to change state safely.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 15:00:27 +01:00
Dr. David Alan Gilbert
f3f491fcd6 Postcopy: Maintain unsentmap
Maintain an 'unsentmap' of pages that have yet to be sent.
This is used in the following patches to discard some set of
the pages already sent as we enter postcopy mode.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 15:00:27 +01:00
Dr. David Alan Gilbert
763c906b0e Add qemu_savevm_state_complete_postcopy
Add qemu_savevm_state_complete_postcopy to complement
qemu_savevm_state_complete_precopy together with a new
save_live_complete_postcopy method on devices.

The save_live_complete_precopy method is called on
all devices during a precopy migration, and all non-postcopy
devices during a postcopy migration at the transition.

The save_live_complete_postcopy method is called at
the end of postcopy for all postcopiable devices.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 15:00:27 +01:00
Dr. David Alan Gilbert
8421b205dd Avoid sending vmdescription during postcopy
VMDescription is normally sent at the end, after all
of the devices; however that's not the end for postcopy,
so just don't send it when in postcopy.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 15:00:26 +01:00
Dr. David Alan Gilbert
9ec055ae29 MIGRATION_STATUS_POSTCOPY_ACTIVE: Add new migration state
'MIGRATION_STATUS_POSTCOPY_ACTIVE' is entered after migrate_start_postcopy

'migration_in_postcopy' is provided for other sections to know if
they're in postcopy.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 15:00:26 +01:00
Dr. David Alan Gilbert
36f48567b8 migration_completion: Take current state
Soon we'll be in either ACTIVE or POSTCOPY_ACTIVE when we
complete migration, and we need to know which we expect to be
in to change state safely.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 15:00:26 +01:00
Dr. David Alan Gilbert
4886a1bcb7 migrate_start_postcopy: Command to trigger transition to postcopy
Once postcopy is enabled (with migrate_set_capability), the migration
will still start on precopy mode.  To cause a transition into postcopy
the:

  migrate_start_postcopy

command must be issued.  Postcopy will start sometime after this
(when it's next checked in the migration loop).

Issuing the command before migration has started will error,
and issuing after it has finished is ignored.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 15:00:26 +01:00
Dr. David Alan Gilbert
eb59db53a4 postcopy: OS support test
Provide a check to see if the OS we're running on has all the bits
needed for postcopy.

Creates postcopy-ram.c which will get most of the other helpers we need.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 15:00:26 +01:00
Dr. David Alan Gilbert
c31b098f64 Modify save_live_pending for postcopy
Modify save_live_pending to return separate postcopiable and
non-postcopiable counts.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 15:00:26 +01:00
Dr. David Alan Gilbert
11cf1d984b MIG_CMD_PACKAGED: Send a packaged chunk of migration stream
MIG_CMD_PACKAGED is a migration command that wraps a chunk of migration
stream inside a package whose length can be determined purely by reading
its header.  The destination guarantees that the whole MIG_CMD_PACKAGED
is read off the stream prior to parsing the contents.

This is used by postcopy to load device state (from the package)
while leaving the main stream free to receive memory pages.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 15:00:26 +01:00
Dr. David Alan Gilbert
093e3c4296 Add wrappers and handlers for sending/receiving the postcopy-ram migration messages.
The state of the postcopy process is managed via a series of messages;
   * Add wrappers and handlers for sending/receiving these messages
   * Add state variable that track the current state of postcopy

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 15:00:26 +01:00
Dr. David Alan Gilbert
53dd370ced Add migration-capability boolean for postcopy-ram.
The 'postcopy ram' capability allows postcopy migration of RAM;
note that the migration starts off in precopy mode until
postcopy mode is triggered (see the migrate_start_postcopy
patch later in the series).

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 15:00:26 +01:00
Dr. David Alan Gilbert
7b89bf279f Rework loadvm path for subloops
Postcopy needs to have two migration streams loading concurrently;
one from memory (with the device state) and the other from the fd
with the memory transactions.

Split the core of qemu_loadvm_state out so we can use it for both.

Allow the inner loadvm loop to quit and cause the parent loops to
exit as well.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 15:00:26 +01:00
Dr. David Alan Gilbert
70b2047774 Return path: Source handling of return path
Open a return path, and handle messages that are received upon it.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 15:00:26 +01:00
Dr. David Alan Gilbert
f6844b99ce migration_is_setup_or_active
Add 'migration_is_setup_or_active' utility function to check state.
(It gets postcopy added to it's list later on in the series)

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 15:00:26 +01:00
Dr. David Alan Gilbert
6decec9311 Return path: Send responses from destination to source
Add migrate_send_rp_message to send a message from destination to source along the return path.
  (It uses a mutex to let it be called from multiple threads)
Add migrate_send_rp_shut to send a 'shut' message to indicate
  the destination is finished with the RP.
Add migrate_send_rp_ack to send a 'PONG' message in response to a PING
  Use it in the MSG_RP_PING handler

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 15:00:26 +01:00
Dr. David Alan Gilbert
2e37701efd Return path: Control commands
Add two src->dest commands:
   * OPEN_RETURN_PATH - To request that the destination open the return path
   * PING - Request an acknowledge from the destination

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 15:00:25 +01:00
Dr. David Alan Gilbert
c76ca1888f Migration commands
Create QEMU_VM_COMMAND section type for sending commands from
source to destination.  These commands are not intended to convey
guest state but to control the migration process.

For use in postcopy.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 15:00:25 +01:00
Dr. David Alan Gilbert
3e4097b564 Return path: socket_writev_buffer: Block even on non-blocking fd's
The destination sets the fd to non-blocking on incoming migrations;
this also affects the return path from the destination, and thus we
need to make sure we can safely write to the return path.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 15:00:25 +01:00
Peter Maydell
a1a88589dc Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20151110' into staging
target-arm queue:
 * fix bugs in gdb singlestep handling and breakpoints
 * minor code cleanup in arm_gic
 * clean up error messages in hw/arm/virt
 * fix highbank kernel booting by adding a board-setup blob

# gpg: Signature made Tue 10 Nov 2015 13:43:52 GMT using RSA key ID 14360CDE
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>"
# gpg:                 aka "Peter Maydell <pmaydell@gmail.com>"
# gpg:                 aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>"

* remotes/pmaydell/tags/pull-target-arm-20151110:
  target-arm: Clean up DISAS_UPDATE usage in AArch32 translation code
  hw/arm/virt: error_report cleanups
  arm: highbank: Implement PSCI and dummy monitor
  arm: highbank: Defeature CPU override
  arm: boot: Add secure_board_setup flag
  hw/intc/arm_gic: Remove the definition of NUM_CPU
  target-arm: Fix gdb singlestep handling in arm_debug_excp_handler()

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-10 13:55:07 +00:00
Dr. David Alan Gilbert
adc468e9b9 Return path: Open a return path on QEMUFile for sockets
Postcopy needs a method to send messages from the destination back to
the source, this is the 'return path'.

Wire it up for 'socket' QEMUFile's.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 14:51:49 +01:00
Dr. David Alan Gilbert
c1fcf220c9 Add Linux userfaultfd.h header
Postcopy uses the userfaultfd.h feature in the Linux kernel; include
the header.

(In early versions of the patch series we had this, and then we dropped
this by only including it if the kernel headers defined the syscall
number; however 1842bdfd added the syscall definition to our
headers, which means we can't tell if the kernel has it or not)

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 14:51:49 +01:00
Dr. David Alan Gilbert
a3e06c3d13 Rename save_live_complete to save_live_complete_precopy
In postcopy we're going to need to perform the complete phase
for postcopiable devices at a different point, start out by
renaming all of the 'complete's to make the difference obvious.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 14:51:49 +01:00
Dr. David Alan Gilbert
aefeb18bde migrate_init: Call from savevm
Suspend to file is very much like a migrate, and it makes life
easier if we have the Migration state available, so initialise it
in the savevm.c code for suspending.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewd-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 14:51:49 +01:00
Dr. David Alan Gilbert
a776aa15a7 ram_load: Factor out host_from_stream_offset call and check
The main RAM load loop has a call to host_from_stream_offset for
each page type that actually loads data with the same test;
factor it out before the switch.

The host = NULL is to silence a bogus gcc warning of
an unitialised in the RAM_SAVE_COMPRESS_PAGE case, it
doesn't seem to realise that host is always initialised by the if at
the top in the cases the switch takes.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 14:51:49 +01:00
Dr. David Alan Gilbert
4f2e425267 ram_debug_dump_bitmap: Dump a migration bitmap as text
Useful for debugging the migration bitmap and other bitmaps
of the same format (including the sentmap in postcopy).

The bitmap is printed to stderr.
Lines that are all the expected value are excluded so the output
can be quite compact for many bitmaps.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 14:51:48 +01:00
Dr. David Alan Gilbert
ebf811500b Add QEMU_MADV_NOHUGEPAGE
Add QEMU_MADV_NOHUGEPAGE as an OS-independent version of
MADV_NOHUGEPAGE.

We include sys/mman.h before making the test to ensure
that we pick up the system defines.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 14:51:48 +01:00
Dr. David Alan Gilbert
a800cd5c38 Add wrapper for setting blocking status on a QEMUFile
Add a wrapper to change the blocking status on a QEMUFile
rather than having to use qemu_set_block(qemu_get_fd(f));
it seems best to avoid exposing the fd since not all QEMUFile's
really have one.  With this wrapper we could move the implementation
down to be different on different transports.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 14:51:48 +01:00
Dr. David Alan Gilbert
9504fb510c Add qemu_get_buffer_in_place to avoid copies some of the time
qemu_get_buffer always copies the data it reads to a users buffer,
however in many cases the file buffer inside qemu_file could be given
back to the caller, avoiding the copy.  This isn't always possible
depending on the size and alignment of the data.

Thus 'qemu_get_buffer_in_place' either copies the data to a supplied
buffer or updates a pointer to the internal buffer if convenient.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 14:51:48 +01:00
Dr. David Alan Gilbert
42e2aa5637 Rename mis->file to from_src_file
'file' becomes confusing when you have flows in each direction;
rename to make it clear.
This leaves just the main forward direction ms->file, which is used
in a lot of places and is probably not worth renaming given the churn.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 14:51:48 +01:00
Dr. David Alan Gilbert
e3dd74934f qemu_ram_block_by_name
Add a function to find a RAMBlock by name; use it in two
of the places that already open code that loop; we've
got another use later in postcopy.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 14:51:48 +01:00
Dr. David Alan Gilbert
422148d3e5 qemu_ram_block_from_host
Postcopy sends RAMBlock names and offsets over the wire (since it can't
rely on the order of ramaddr being the same), and it starts out with
HVA fault addresses from the kernel.

qemu_ram_block_from_host translates a HVA into a RAMBlock, an offset
in the RAMBlock and the global ram_addr_t value.

Rewrite qemu_ram_addr_from_host to use qemu_ram_block_from_host.

Provide qemu_ram_get_idstr since its the actual name text sent on the
wire.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 14:51:48 +01:00
Dr. David Alan Gilbert
87f50caa30 Move page_size_init earlier
The HOST_PAGE_ALIGN macros don't work until the page size variables
have been set up; later in postcopy I use those macros in the RAM
code, and it can be triggered using -object.

Fix this by initialising page_size_init() earlier - it's currently
initialised inside the accelerators, move it up into vl.c.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 14:51:48 +01:00
Dr. David Alan Gilbert
172dfd4faf Move configuration section writing
The vmstate_configuration is currently written
in 'qemu_savevm_state_begin', move it to
'qemu_savevm_state_header' since it's got a hard
requirement that it must be the 1st thing after
the header.
(In postcopy some 'command' sections get sent
early before the saving of the main sections
and hence before qemu_savevm_state_begin).

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 14:51:48 +01:00
Dr. David Alan Gilbert
038629a699 Provide runtime Target page information
The migration code generally is built target-independent, however
there are a few places where knowing the target page size would
avoid artificially moving stuff into migration/ram.c.

Provide 'qemu_target_page_bits()' that returns TARGET_PAGE_BITS
to other bits of code so that they can stay target-independent.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 14:51:48 +01:00
Dr. David Alan Gilbert
2bfdd1c8a6 Add postcopy documentation
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2015-11-10 14:51:48 +01:00
Sergey Fedorov
577bf80895 target-arm: Clean up DISAS_UPDATE usage in AArch32 translation code
AArch32 translation code does not distinguish between DISAS_UPDATE and
DISAS_JUMP. Thus, we cannot use any of them without first updating PC in
CPU state. Furthermore, it is too complicated to update PC in CPU state
before PC gets updated in disas context. So it is hardly possible to
correctly end TB early if is is not likely to be executed before calling
disas_*_insn(), e.g. just after calling breakpoint check helper.

Modify DISAS_UPDATE and DISAS_JUMP usage in AArch32 translation and
apply to them the same semantic as AArch64 translation does:
 - DISAS_UPDATE: update PC in CPU state when finishing translation
 - DISAS_JUMP:   preserve current PC value in CPU state when finishing
                 translation

This patch fixes a bug in AArch32 breakpoint handling: when
check_breakpoints helper does not generate an exception, ending the TB
early with DISAS_UPDATE couldn't update PC in CPU state and execution
hangs.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Message-id: 1447097859-586-1-git-send-email-serge.fdrv@gmail.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-10 13:37:33 +00:00
Andrew Jones
faa811f6de hw/arm/virt: error_report cleanups
Signed-off-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-id: 1446909925-12201-1-git-send-email-drjones@redhat.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-10 13:37:33 +00:00
Peter Crosthwaite
40340e5f22 arm: highbank: Implement PSCI and dummy monitor
Firstly, enable monitor mode and PSCI, both of which are features of
this board.

In addition to PSCI, this board also uses SMC for cache maintenance
ops. This means we need a secure monitor to catch these and nop them.
Use the ARM boot board-setup feature to implement this. The SMC trap
implements the needed nop while all other traps will pen the CPU.

As a KVM CPU cannot run in secure mode, do not do the board-setup if
not running TCG. Report a warning explaining the limitation in this
case.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Message-id: 0fd0d12f0fa666c86616c89447861a70dbe27312.1447007690.git.crosthwaite.peter@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-10 13:37:33 +00:00
Peter Crosthwaite
dca6eeed8c arm: highbank: Defeature CPU override
This board should not support CPU model override. This allows for
easier patching of the board with being able to rely on the CPU
type being correct.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Message-id: 471a61e049c7ca6e82f5ef6668889a1d518c7e00.1447007690.git.crosthwaite.peter@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-10 13:37:33 +00:00
Peter Crosthwaite
baf6b6815b arm: boot: Add secure_board_setup flag
Add a flag that when set, will cause the primary CPU to start in secure
mode, even if the overall boot is non-secure. This is useful for when
there is a board-setup blob that needs to run from secure mode, but
device and secondary CPU init should still be done as-normal for a non-
secure boot.

Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Message-id: d1170774d5446d715fced7739edfc61a5be931f9.1447007690.git.crosthwaite.peter@gmail.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-10 13:37:33 +00:00
Wei Huang
b95690c9be hw/intc/arm_gic: Remove the definition of NUM_CPU
arm_gic.c retrieves CPU number using either NUM_CPU(s) or s->num_cpu.
Such mixed-uses make source code inconsistent. This patch removes
NUM_CPU(s), which was defined for MPCore tweak long ago, and instead
favors s->num_cpu. The source is more consistent after this small tweak.

Reviewed-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Wei Huang <wei@redhat.com>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Message-id: 1446744293-32365-1-git-send-email-wei@redhat.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-10 13:37:33 +00:00
Sergey Fedorov
5c629f4ff4 target-arm: Fix gdb singlestep handling in arm_debug_excp_handler()
Do not raise a CPU exception if no CPU breakpoint has fired, since
singlestep is also done by generating a debug internal exception. This
fixes a bug with singlestepping in gdbstub.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Message-id: 1446726361-18328-1-git-send-email-serge.fdrv@gmail.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-10 13:37:32 +00:00
Peter Maydell
a8b4f9585a Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2015-11-10' into staging
QAPI patches

# gpg: Signature made Tue 10 Nov 2015 07:12:25 GMT using RSA key ID EB918653
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>"
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>"

* remotes/armbru/tags/pull-qapi-2015-11-10:
  qapi-introspect: Document lack of sorting
  qapi: Provide nicer array names in introspection
  qapi: More tests of input arrays
  qapi: Test failure in middle of array parse
  qapi: More tests of alternate output
  qapi: Simplify error cleanup in test-qmp-*
  qapi: Simplify non-error testing in test-qmp-*
  qapi: Plug leaks in test-qmp-*
  qapi: Share test_init code in test-qmp-input*
  qobject: Protect against use-after-free in qobject_decref()
  qapi: Strengthen test of TestStructList
  qapi: Use generated TestStruct machinery in tests

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-10 09:39:24 +00:00
Eric Blake
f545504420 qapi-introspect: Document lack of sorting
qapi-code-gen.txt already claims that types, commands, and
events share a common namespace; set this in stone by further
documenting that our introspection output will never have
collisions with the same name tied to more than one meta-type.

Our largest QMP enum currently has 125 values, our largest
object type has 27 members, and the mean for each is less than
10.  These sizes are small enough that the per-element overhead
of O(log n) binary searching probably outweighs the speed
possible with direct O(n) linear searching (a better algorithm
with more overhead will only beat a leaner naive algorithm only
as you scale to larger input sizes).

Arguably, the overall SchemaInfo array could be sorted by name;
there, we currently have 531 entities, large enough for a binary
search to be faster than linear.  However, remember that we have
mutually-recursive types, which means there is no topological
ordering that will allow clients to learn all information about
that type in a single linear pass; thus clients will want to do
random access over the data, and they will probably read the
introspection output into a hashtable for O(1) lookup rather
than O(log n) binary searching, at which point, pre-sorting our
introspection output doesn't help the client.

It doesn't help that sorting can be subjective if you introduce
locales into the mix (I'm not experienced enough with Python
to know for sure, but at least it looks like it defaults to
sorting in the C locale even when run under a different locale).
And while our current introspection output is deterministic
(because we visit entities in a sorted order), we may want
to change that order in the future (such as using OrderedDict
to stick to .json declaration order).

For these reasons, we simply document that clients should not
rely on any particular order of items in introspection output.
And since it is now a documented part of the contract, we have
the freedom to later rearrange output if needed, without
worrying about breaking well-written clients.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1446791754-23823-13-git-send-email-eblake@redhat.com>
[Commit message tweaked]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-10 08:10:28 +01:00
Eric Blake
ce5fcb472d qapi: Provide nicer array names in introspection
For the sake of humans reading introspection output, it is nice
to have the name of implicit array types be recognizable as
arrays of the underlying type.  However, while this patch allows
humans to skip from a command with return type "[123]" straight
to the definition of type "123" without having to first inspect
type "[123]", document that this shortcut should not be taken by
client apps.

This makes the resulting introspection string slightly larger by
default (just over 200 bytes), but it's in the noise (less than
0.3% of the overall 70k size of 'query-qmp-capabilities').

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1446791754-23823-12-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-10 08:09:15 +01:00
Eric Blake
2533377c7b qapi: More tests of input arrays
Our testsuite had no coverage of empty arrays, nor of what
happens when the input does not match the expected type.
Useful to have, especially if we start changing the visitor
contracts.

I did not think it worth duplicating these additions to
test-qmp-input-strict; since all strict mode does is add
the ability to reject JSON input that has more keys than
what the visitor expects, yet the additions in this patch
error out earlier than that point regardless of whether
strict mode was requested.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1446791754-23823-11-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-10 08:09:14 +01:00
Eric Blake
dd5ee2c2d3 qapi: Test failure in middle of array parse
Our generated list visitors have the same problem as has been
mentioned elsewhere (see commit 2f52e20): they allocate data
even on failure. An upcoming patch will correct things to
provide saner guarantees, but first we need to expose the
behavior in the testsuite to ensure we aren't introducing any
memory usage bugs.

There are more test cases throughout the test-qmp-input-* tests
that already deal with partial allocation; a later commit will
clean up all visit_type_FOO(), without marking all of the tests
with FIXME at this time.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1446791754-23823-10-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-10 08:09:14 +01:00
Eric Blake
12fafd7ced qapi: More tests of alternate output
The testsuite was only covering that we could output the 'int'
branch of an alternate (no additional allocation/cleanup required).
Add a test of the 'str' branch, to make sure that things still
work even when a branch involves allocation.

Update to modern style of g_new0() over g_malloc0() while
touching it.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1446791754-23823-9-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-10 08:09:14 +01:00
Eric Blake
a12a5a1a01 qapi: Simplify error cleanup in test-qmp-*
We have several tests that perform multiple sub-actions that are
expected to fail.  Asserting that an error occurred, then clearing
it up to prepare for the next action, turned into enough
boilerplate that it was sometimes forgotten (for example, a number
of tests added to test-qmp-input-visitor.c in d88f5fd leaked err).
Worse, if an error is not reset to NULL, we risk invalidating
later use of that error (passing a non-NULL err into a function
is generally a bad idea).  Encapsulate the boilerplate into a
single helper function error_free_or_abort(), and consistently
use it.

The new function is added into error.c for use everywhere,
although it is anticipated that testsuites will be the main
client.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-10 08:08:21 +01:00
Peter Maydell
ce278618b0 configure: Don't disable optimization for non-fortify builds
Commit b553a04280 inadvertently disabled optimization
for all non-fortify builds. Fix this bug so we only do an
unoptimized build if we want debug.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1447082049-25099-1-git-send-email-peter.maydell@linaro.org
2015-11-09 16:28:09 +00:00
Peter Maydell
d17008bc29 hw/timer/hpet.c: Avoid signed integer overflow which results in bugs on OSX
Signed integer overflow in C is undefined behaviour, and the compiler
is at liberty to assume it can never happen and optimize accordingly.
In particular, the subtractions in hpet_time_after() and hpet_time_after64()
were causing OSX clang to optimize the code such that it was prone to
hangs and complaints about the main loop stalling (presumably because
we were spending all our time trying to service very high frequency
HPET timer callbacks). The clang sanitizer confirms the UB:

hw/timer/hpet.c:119:26: runtime error: signed integer overflow: -2146967296 - 2147003978 cannot be represented in type 'int'

Fix this by doing the subtraction as an unsigned operation and then
converting to signed for the comparison.

Reported-by: Aaron Elkins <threcius@yahoo.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1447080991-24995-1-git-send-email-peter.maydell@linaro.org
2015-11-09 15:48:21 +00:00
Eric Blake
3f66f764ee qapi: Simplify non-error testing in test-qmp-*
By using &error_abort, we can avoid a local err variable in
situations where we expect success.  It also has the nice
effect that if the test breaks, the error message from
error_abort tends to be nicer than that of g_assert().

This patch has an additional bonus of fixing several call sites that
were passing &err to two different functions without checking it in
between.  In general that is unsafe practice; because if the first
function sets an error, the second function could abort() if it tries to
set a different error. We got away with it because we were asserting
that err was NULL through the entire chain, but switching to
&error_abort avoids the questionable practice up front.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1446791754-23823-7-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-09 16:45:05 +01:00
Eric Blake
b18f1141d0 qapi: Plug leaks in test-qmp-*
Make valgrind happy with the current state of the tests, so that
it is easier to see if future patches introduce new memory problems
without being drowned in noise.  Many of the leaks were due to
calling a second init without tearing down the data from an earlier
visit.  But since teardown is already idempotent, and we already
register teardown as part of input_visitor_test_add(), it is nicer
to just make init() safe to call multiple times than it is to have
to make all tests call teardown.

Another common leak was forgetting to clean up an error object,
after testing that an error was raised.

Another leak was in test_visitor_in_struct_nested(), failing to
clean the base member of UserDefTwo.  Cleaning that up left
check_and_free_str() as dead code (since using the qapi_free_*
takes care of recursion, and we don't want double frees).

A final leak was in test_visitor_out_any(), which was reassigning
the qobj local variable to a subset of the overall structure
needing freeing; it did not result in a use-after-free, but
was not cleaning up all the qdict.

test-qmp-event and test-qmp-commands were already clean.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1446791754-23823-6-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-09 16:45:05 +01:00
Eric Blake
0920a17199 qapi: Share test_init code in test-qmp-input*
Rather than duplicate the body of two functions just to
decide between qobject_from_jsonv() and qobject_from_json(),
exploit the fact that qobject_from_jsonv() intentionally
takes 'va_list *' instead of the more common 'va_list', and
that qobject_from_json() just calls qobject_from_jsonv(,NULL).
For each file, our two existing init functions then become
thin wrappers around a new internal function, and future
updates to initialization don't have to be duplicated.

Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1446791754-23823-5-git-send-email-eblake@redhat.com>
[Two old comment typos fixed]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-09 16:45:05 +01:00
Eric Blake
cc9f60d4a2 qobject: Protect against use-after-free in qobject_decref()
Adding an assertion to qobject_decref() will ensure that a
programming error causing use-after-free will result in
immediate failure (provided no other thread has started
using the memory) instead of silently attempting to wrap
refcnt around and leaving the problem to potentially bite
later at a harder point to diagnose.

Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1446791754-23823-4-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-09 16:45:05 +01:00
Eric Blake
bd20588d19 qapi: Strengthen test of TestStructList
Make each list element different, to ensure that order is
preserved, and use the generated free function instead of
hand-rolling our own to ensure (under valgrind) that the
list is properly cleaned.

Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1446791754-23823-3-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-09 16:45:05 +01:00
Eric Blake
748053c97b qapi: Use generated TestStruct machinery in tests
Commit d88f5fd and friends first introduced the various test-qmp-*
tests in 2011, with duplicated hand-rolled TestStruct machinery,
to make sure the qapi visitor interface was tested.  Later, commit
4f193e3 in 2013 added a .json file for further testing use by the
files, but without consolidating any of the existing hand-rolled
visitors.  And with four copies, subtle differences have crept in,
between the tests themselves (mainly whitespace differences, but
also a question of whether to use NULL or "TestStruct" when
calling visit_start_struct()) and from what the generator produces
(the hand-rolled versions did not cater to partially-allocated
objects, because they did not have a deallocation usage).

Of course, just because the visitor interface is tested does not
mean it is a sane interface; and future patches will be changing
some of the visitor contracts.  Rather than having to duplicate
the cleanup work in each copy of the TestStruct visitor, and keep
each hand-rolled copy in sync with what the generator supplies, we
might as well just test what the generator should give us in the
first place.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1446791754-23823-2-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-09 16:45:05 +01:00
Peter Maydell
9d5c1dc117 Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging
# gpg: Signature made Mon 09 Nov 2015 10:08:17 GMT using RSA key ID 81AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"

* remotes/stefanha/tags/block-pull-request:
  blockdev: acquire AioContext in hmp_commit()
  monitor: add missed aio_context_acquire into vm_completion call
  aio: Introduce aio-epoll.c
  aio: Introduce aio_context_setup
  aio: Introduce aio_external_disabled
  dataplane: support non-contigious s/g
  dataplane: simplify indirect descriptor read

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-09 11:20:51 +00:00
Stefan Hajnoczi
84aa0140dd blockdev: acquire AioContext in hmp_commit()
This one slipped through.  Although we acquire AioContext when
committing all devices we don't for just a single device.

AioContext must be acquired before calling bdrv_*() functions to
synchronize access with other threads that may be using the AioContext.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-11-09 10:07:10 +00:00
Denis V. Lunev
6bf1faa848 monitor: add missed aio_context_acquire into vm_completion call
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Luiz Capitulino <lcapitulino@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-11-09 10:07:10 +00:00
Fam Zheng
fbe3fc5cb3 aio: Introduce aio-epoll.c
To minimize code duplication, epoll is hooked into aio-posix's
aio_poll() instead of rolling its own. This approach also has both
compile-time and run-time switchability.

1) When QEMU starts with a small number of fds in the event loop, ppoll
is used.

2) When QEMU starts with a big number of fds, or when more devices are
hot plugged, epoll kicks in when the number of fds hits the threshold.

3) Some fds may not support epoll, such as tty based stdio. In this
case, it falls back to ppoll.

A rough benchmark with scsi-disk on virtio-scsi dataplane (epoll gets
enabled from 64 onward). Numbers are in MB/s.

===============================================
             |     master     |     epoll
             |                |
scsi disks # | read    randrw | read    randrw
-------------|----------------|----------------
1            | 86      36     | 92      45
8            | 87      43     | 86      41
64           | 71      32     | 70      38
128          | 48      24     | 58      31
256          | 37      19     | 57      28
===============================================

To comply with aio_{disable,enable}_external, we always use ppoll when
aio_external_disabled() is true.

[Removed #ifdef CONFIG_EPOLL around AioContext epollfd field declaration
since the field is also referenced outside CONFIG_EPOLL code.
--Stefan]

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1446177989-6702-4-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-11-09 09:59:47 +00:00
Fam Zheng
37fcee5d11 aio: Introduce aio_context_setup
This is the place to initialize platform specific bits of AioContext.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1446177989-6702-3-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-11-09 09:59:32 +00:00
Fam Zheng
5ceb9e3928 aio: Introduce aio_external_disabled
This allows AioContext users to check the enable/disable state of
external clients.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1446177989-6702-2-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-11-09 09:59:32 +00:00
Michael S. Tsirkin
8347c53243 dataplane: support non-contigious s/g
bring_map currently fails if one of the entries it's mapping is
contigious in GPA but not HVA address space.  Introduce a mapped_len
parameter so it can handle this, returning the actual mapped length.

This will still fail if there's no space left in the sg, but luckily max
queue size in use is currently 256, while max sg size is 1024, so we
should be OK even is all entries happen to cross a single DIMM boundary.

Won't work well with very small DIMM sizes, unfortunately:
e.g. this will fail with 4K DIMMs where a single
request might span a large number of DIMMs.

Let's hope these are uncommon - at least we are not breaking things.

Reported-by: Stefan Hajnoczi <stefanha@redhat.com>
Reported-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Igor Mammedov <imammedo@redhat.com>
Message-id: 1446047243-3221-2-git-send-email-mst@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-11-09 09:59:32 +00:00
Michael S. Tsirkin
572ec519ed dataplane: simplify indirect descriptor read
Use address_space_read to make sure we handle the case of an indirect
descriptor crossing DIMM boundary correctly.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Igor Mammedov <imammedo@redhat.com>
Message-id: 1446047243-3221-1-git-send-email-mst@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-11-09 09:59:32 +00:00
Peter Maydell
b3a9e57d92 Merge remote-tracking branch 'remotes/ehabkost/tags/x86-pull-request' into staging
target-i386: tcg: Handle clflushopt/clwb/pcommit instructions

A small update to TCG code so it can handle the new
clflushopt/clwb/pcommit instructions.

# gpg: Signature made Sat 07 Nov 2015 14:50:54 GMT using RSA key ID 984DC5A6
# gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>"

* remotes/ehabkost/tags/x86-pull-request:
  target-i386: Add clflushopt/clwb/pcommit to TCG_7_0_EBX_FEATURES
  target-i386: tcg: Check right CPUID bits for clflushopt/pcommit
  target-i386: tcg: Accept clwb instruction

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-07 21:41:33 +00:00
Peter Maydell
c4a7bf54e5 Merge remote-tracking branch 'remotes/jnsnow/tags/ide-pull-request' into staging
# gpg: Signature made Fri 06 Nov 2015 20:01:44 GMT using RSA key ID AAFC390E
# gpg: Good signature from "John Snow (John Huston) <jsnow@redhat.com>"

* remotes/jnsnow/tags/ide-pull-request:
  arm: allwinner-a10: Add SATA
  ahci: Add allwinner AHCI
  ahci: split realize and init
  ahci: Add some MMIO debug printfs
  ide: remove hardcoded 2GiB transactional limit

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-07 19:55:15 +00:00
Peter Crosthwaite
dca625768a arm: allwinner-a10: Add SATA
Add the Allwinner A10 AHCI controller module to the SoC.

Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 69d6962f2d14a218bd07e9ac4ccd1947737cc30f.1445917756.git.crosthwaite.peter@gmail.com
Signed-off-by: John Snow <jsnow@redhat.com>
2015-11-06 14:09:01 -05:00
Peter Crosthwaite
377e214539 ahci: Add allwinner AHCI
Add a Sysbus AHCI subclass for the Allwinner AHCI. It has a few extra
vendor specific registers which are used for phy and power init.

Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 833b5b05ed5ade38bf69656679b0a7575e79492b.1445917756.git.crosthwaite.peter@gmail.com
[resolved patch context on pull --js]
Signed-off-by: John Snow <jsnow@redhat.com>
2015-11-06 14:09:01 -05:00
Peter Crosthwaite
0487eea48e ahci: split realize and init
Do the init level tasks asap and the realize later (mainly when
num_ports is available). This allows sub-class realize routines
to work with the device post-init.

Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 1a7c7b2b32e5ccf49373a5065da5ece89730d3ac.1445917756.git.crosthwaite.peter@gmail.com
Signed-off-by: John Snow <jsnow@redhat.com>
2015-11-06 14:09:00 -05:00
Peter Crosthwaite
802742670d ahci: Add some MMIO debug printfs
These are useful for bringup of AHCI.

Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 517ba413dce7deb4ab17c0cc1e8bbdaaace2a0db.1445917756.git.crosthwaite.peter@gmail.com
Signed-off-by: John Snow <jsnow@redhat.com>
2015-11-06 14:09:00 -05:00
John Snow
9fbf0fa81f ide: remove hardcoded 2GiB transactional limit
Not that you can request a >2GiB transaction, but that's why checking
for it makes no sense anymore.

With the newer 'limit' parameter to prepare_buf, we no longer need a
static limit. The maximum limit is still 2GiB, but the limit parameter
is set to the current transaction size, which cannot surpass 32MiB
(512 * 65536). If the PRDT surpasses the transactional size, then,
we'll just carry out the normative underflow handling pathways instead
of needing an extra, strange pathway that worries about hitting some
logistical cap for the largest sglist we can support -- we'll never
even attempt to build one that big anymore.

Reported-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1445902682-20051-1-git-send-email-jsnow@redhat.com
2015-11-06 14:09:00 -05:00
Xiao Guangrong
0c47242b51 target-i386: Add clflushopt/clwb/pcommit to TCG_7_0_EBX_FEATURES
Now these instructions are handled by TCG and can be added to the
TCG_7_0_EBX_FEATURES macro.

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-11-06 12:19:33 -02:00
Eduardo Habkost
891bc821a3 target-i386: tcg: Check right CPUID bits for clflushopt/pcommit
Detect the clflushopt and pcommit instructions and check their
corresponding feature flags, instead of checking CPUID_SSE and
CPUID_CLFLUSH.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-11-06 12:03:12 -02:00
Eduardo Habkost
5e1fac2dba target-i386: tcg: Accept clwb instruction
Accept the clwb instruction (66 0F AE /6) if its corresponding feature
flag is enabled on CPUID[7].

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-11-06 12:03:12 -02:00
Peter Maydell
4b59f39bc9 Merge remote-tracking branch 'remotes/mjt/tags/pull-trivial-patches-2015-11-06' into staging
trivial patches for 2015-11-06

# gpg: Signature made Fri 06 Nov 2015 12:42:43 GMT using RSA key ID A4C3D7DB
# gpg: Good signature from "Michael Tokarev <mjt@tls.msk.ru>"
# gpg:                 aka "Michael Tokarev <mjt@corpit.ru>"
# gpg:                 aka "Michael Tokarev <mjt@debian.org>"

* remotes/mjt/tags/pull-trivial-patches-2015-11-06: (24 commits)
  tap-bsd: use user-specified tap device if it already exists
  qemu-sockets: do not test path with access() before unlinking
  taget-ppc: Fix read access to IBAT registers higher than IBAT3
  exec: avoid unnecessary cacheline bounce on ram_list.mru_block
  target-alpha: fix uninitialized variable
  ivshmem-server: fix possible OVERRUN
  pci-assign: do not test path with access() before opening
  qom/object: fix 2 comment typos
  configure: remove help string for 'vnc-tls' option
  usb: Use g_new() & friends where that makes obvious sense
  qxl: Use g_new() & friends where that makes obvious sense
  ui: Use g_new() & friends where that makes obvious sense
  bt: fix use of uninitialized variable seqlen
  hw/dma/pxa2xx: Remove superfluous memset
  linux-user/syscall: Replace g_malloc0 + memcpy with g_memdup
  tests/i44fx-test: No need for zeroing memory before memset
  hw/input/tsc210x: Remove superfluous memset
  xen: fix invalid assertion
  tests: ignore test-qga
  fix bad indentation in pcie_cap_slot_write_config()
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-06 12:50:24 +00:00
Ed Maste
bd54a9f943 tap-bsd: use user-specified tap device if it already exists
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Ed Maste <emaste@freebsd.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-11-06 15:42:38 +03:00
Paolo Bonzini
a2f31f1804 qemu-sockets: do not test path with access() before unlinking
Using access() is a time-of-check/time-of-use race condition.  It is
okay to use them to provide better error messages, but that is pretty
much it.

This is not one such case; on the other hand, access() *will* skip
unlink() for a non-existent path, so ignore ENOENT return values from
the unlink() system call.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-11-06 15:42:38 +03:00
Julio Guerra
3ede8f6996 taget-ppc: Fix read access to IBAT registers higher than IBAT3
Fix the index used to read the IBAT's vector which results in IBAT0..3 instead
of IBAT4..N.

The bug appeared by saving/restoring contexts including IBATs values.

Signed-off-by: Julio Guerra <julio@farjump.io>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-11-06 15:42:38 +03:00
Paolo Bonzini
68851b98e5 exec: avoid unnecessary cacheline bounce on ram_list.mru_block
Whenever the MRU cache hits for the list of RAM blocks, qemu_get_ram_block
does an unnecessary write that causes a processor cache line to bounce
from one core to another.  This causes a performance hit.

Reported-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-11-06 15:42:38 +03:00
Paolo Bonzini
74de807f79 target-alpha: fix uninitialized variable
I am not sure why the compiler does not catch it.  There is no
semantic change since gen_excp returns EXIT_NORETURN, but the
old code is wrong.

Reported by Coverity.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-11-06 15:42:38 +03:00
Gonglei
258133bda9 ivshmem-server: fix possible OVERRUN
>>>     CID 1337991:  Memory - illegal accesses  (OVERRUN)
>>>     Decrementing "i". The value of "i" is now 65534.
218         while (i--) {
219             event_notifier_cleanup(&peer->vectors[i]);
220         }

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-11-06 15:42:38 +03:00
Paolo Bonzini
6268520d7d pci-assign: do not test path with access() before opening
Using access() is a time-of-check/time-of-use race condition.  It is
okay to use them to provide better error messages, but that is pretty
much it.

In this case we can get the same error from fopen(), so just use
strerror and errno there---which actually improves the error
message most of the time.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-11-06 15:42:38 +03:00
Cao jin
b30d805464 qom/object: fix 2 comment typos
Also change the misleading definition of macro OBJECT_CLASS_CHECK

Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-11-06 15:42:38 +03:00
Daniel P. Berrange
9f503153c7 configure: remove help string for 'vnc-tls' option
The '--enable-vnc-tls' option to configure was removed in

  commit 3e305e4a47
  Author: Daniel P. Berrange <berrange@redhat.com>
  Date:   Thu Aug 6 14:39:32 2015 +0100

    ui: convert VNC server to use QCryptoTLSSession

This removes the corresponding help string.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-11-06 15:42:38 +03:00
Markus Armbruster
98f343395e usb: Use g_new() & friends where that makes obvious sense
g_new(T, n) is neater than g_malloc(sizeof(T) * n).  It's also safer,
for two reasons.  One, it catches multiplication overflowing size_t.
Two, it returns T * rather than void *, which lets the compiler catch
more type errors.

This commit only touches allocations with size arguments of the form
sizeof(T).  Same Coccinelle semantic patch as in commit b45c03f.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-11-06 15:42:38 +03:00
Markus Armbruster
9de68637df qxl: Use g_new() & friends where that makes obvious sense
g_new(T, n) is neater than g_malloc(sizeof(T) * n).  It's also safer,
for two reasons.  One, it catches multiplication overflowing size_t.
Two, it returns T * rather than void *, which lets the compiler catch
more type errors.

This commit only touches allocations with size arguments of the form
sizeof(T).  Same Coccinelle semantic patch as in commit b45c03f.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-11-06 15:42:38 +03:00
Markus Armbruster
fedf0d35aa ui: Use g_new() & friends where that makes obvious sense
g_new(T, n) is neater than g_malloc(sizeof(T) * n).  It's also safer,
for two reasons.  One, it catches multiplication overflowing size_t.
Two, it returns T * rather than void *, which lets the compiler catch
more type errors.

This commit only touches allocations with size arguments of the form
sizeof(T).  Same Coccinelle semantic patch as in commit b45c03f.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-11-06 15:42:38 +03:00
Paolo Bonzini
374ec0669a bt: fix use of uninitialized variable seqlen
sdp_svc_match, sdp_attr_match and sdp_svc_attr_match read the last
argument.  The only sensible way to change the code is to make that last
argument "len" instead of "seqlen" which is the length of a subsequence
in the previous "if" branch.

To make the structure of the code clearer, use "else" instead of
"else if".

Reported by Coverity.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-11-06 15:42:38 +03:00
Thomas Huth
1a13b27273 hw/dma/pxa2xx: Remove superfluous memset
g_malloc0 already clears the memory, so no need for
the additional memset here. And while we're at it,
also convert the g_malloc0 to the preferred g_new0.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-11-06 15:42:38 +03:00
Thomas Huth
e9d49d518d linux-user/syscall: Replace g_malloc0 + memcpy with g_memdup
No need to use g_malloc0 to zero the memory if we memcpy to
the whole buffer afterwards anyway. Actually, there is even
a function which combines both steps, g_memdup, so let's use
this function here instead.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-11-06 15:42:38 +03:00
Thomas Huth
112317867d tests/i44fx-test: No need for zeroing memory before memset
Change a g_malloc0 into g_malloc since the following
memset fills the whole buffer anyway.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-11-06 15:42:38 +03:00
Thomas Huth
a6c6d82760 hw/input/tsc210x: Remove superfluous memset
g_malloc0 already clears the memory, so no need for additional
memsets here. And while we're at it, let's also remove the
superfluous typecasts for the return values of g_malloc0
and use the type-safe g_new0 instead.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-11-06 15:42:38 +03:00
Paolo Bonzini
2c21ec3d18 xen: fix invalid assertion
Asserting "true" is not that useful.

Reported by Coverity.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-11-06 15:42:38 +03:00
Eric Blake
3cd01b6ed8 tests: ignore test-qga
Commit 62c39b30 added a new test, but did not mark it for
exclusion in .gitignore.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-11-06 15:42:38 +03:00
Cao jin
6ba9fe8695 fix bad indentation in pcie_cap_slot_write_config()
bad indentation conflicts with CODING_STYLE doc

Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-11-06 15:42:38 +03:00
Eric Blake
53d47be25a maint: Ignore ivshmem binaries
Commit a75eb03b added ivshmem-client and ivshmem-server binaries,
but did not mark them for exclusion in .gitignore.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-11-06 15:42:38 +03:00
Thomas Huth
b21de19992 hw/display/tcx: Remove superfluous OBJECT() typecasts
The tcx_initfn() function is already supplied with an
Object *obj pointer, so there is no need to cast the
state pointer back to an Object pointer all over the
place. And while we're at it, also remove the superfluous
"return;" statement in this function.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-11-06 15:42:38 +03:00
Kevin Wolf
5accecb3a6 gdbstub: Fix buffer overflows in gdb_handle_packet()
Some places in gdb_handle_packet() can get an arbitrary length (most
times directly from the client) and either didn't check it at all or
checked against the wrong value, potentially causing buffer overflows.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-11-06 15:42:37 +03:00
Marc-André Lureau
3c15d3a450 hw/acpi/aml-build: remove useless glib version check
2.22 is the minimum version required

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-11-06 15:42:37 +03:00
Peter Maydell
9319738080 Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream-replay' into staging
So here it is, let's see what happens.

# gpg: Signature made Fri 06 Nov 2015 09:30:34 GMT using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"

* remotes/bonzini/tags/for-upstream-replay:
  replay: recording of the user input
  replay: command line options
  replay: replay blockers for devices
  replay: initialization and deinitialization
  replay: ptimer
  bottom halves: introduce bh call function
  replay: checkpoints
  icount: improve counting for record/replay
  replay: shutdown event
  replay: recording and replaying clock ticks
  replay: asynchronous events infrastructure
  replay: interrupts and exceptions
  cpu: replay instructions sequence
  cpu-exec: allow temporary disabling icount
  replay: introduce icount event
  replay: introduce mutex to protect the replay log
  replay: internal functions for replay log
  replay: global variables and function stubs

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-06 11:31:40 +00:00
Stefan Hajnoczi
3aa88b3129 configure: add missing --disable-modules option
According to ./configure all options should have both --enable-foo and
--disable-foo:

  # Always add --enable-foo and --disable-foo command line args.
  # Distributions want to ensure that several features are compiled in, and it
  # is impossible without a --enable-foo that exits if a feature is not found.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1446473183-24250-1-git-send-email-stefanha@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-06 11:07:14 +00:00
Peter Maydell
5744181323 Merge remote-tracking branch 'remotes/ehabkost/tags/x86-pull-request' into staging
X86 queue, 2015-11-05

# gpg: Signature made Thu 05 Nov 2015 19:35:31 GMT using RSA key ID 984DC5A6
# gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>"

* remotes/ehabkost/tags/x86-pull-request:
  target-i386: Enable clflushopt/clwb/pcommit instructions
  target-i386: Remove POPCNT from qemu64 and qemu32 CPU models
  target-i386: Remove ABM from qemu64 CPU model
  target-i386: Remove SSE4a from qemu64 CPU model
  target-i386: Set "check=off" by default on pc-*-2.4 and older

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-06 10:10:15 +00:00
Pavel Dovgalyuk
ee312992a3 replay: recording of the user input
This records user input (keyboard and mouse events) in record mode and replays
these input events in replay mode.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Message-Id: <20150917162524.8676.11696.stgit@PASHA-ISP.def.inno>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
2015-11-06 10:16:03 +01:00
Pavel Dovgalyuk
4c27b85972 replay: command line options
This patch introduces command line options for enabling recording or replaying
virtual machine behavior. These options are added to icount command line
parameter. They include 'rr' which switches between record and replay
and 'rrfile' for specifying the filename for replay log.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Message-Id: <20150917162518.8676.70792.stgit@PASHA-ISP.def.inno>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
2015-11-06 10:16:03 +01:00
Pavel Dovgalyuk
0194749ac4 replay: replay blockers for devices
Some devices are not supported by record/replay subsystem.
This patch introduces replay blocker which denies starting record/replay
if such devices are included into the configuration.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Message-Id: <20150917162512.8676.11367.stgit@PASHA-ISP.def.inno>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
2015-11-06 10:16:03 +01:00
Pavel Dovgalyuk
7615936ebf replay: initialization and deinitialization
This patch introduces the functions for enabling the record/replay and for
freeing the resources when simulator closes.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Message-Id: <20150917162507.8676.90232.stgit@PASHA-ISP.def.inno>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
2015-11-06 10:16:03 +01:00
Pavel Dovgalyuk
8a354bd935 replay: ptimer
This patch adds deterministic replay for hardware periodic countdown timers.
ptimer uses bottom halves layer to execute such an asynchronous callback.
We put this callback into the replay queue instead of bottom halves one.
When checkpoint is met by main loop thread, the replay queue is processed
and callback is executed. Binding callback moment to one of the checkpoints
makes it deterministic.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Message-Id: <20150917162456.8676.83366.stgit@PASHA-ISP.def.inno>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
2015-11-06 10:16:03 +01:00
Pavel Dovgalyuk
df281b80b9 bottom halves: introduce bh call function
This patch introduces aio_bh_call function. It is used to execute
bottom halves as callbacks without adding them to the queue.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Message-Id: <20150917162450.8676.56980.stgit@PASHA-ISP.def.inno>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
2015-11-06 10:16:03 +01:00
Pavel Dovgalyuk
8bd7f71d79 replay: checkpoints
This patch introduces checkpoints that synchronize cpu thread and iothread.
When checkpoint is met in the code all asynchronous events from the queue
are executed.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Message-Id: <20150917162444.8676.52916.stgit@PASHA-ISP.def.inno>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
2015-11-06 10:16:03 +01:00
Pavel Dovgalyuk
efab87cf79 icount: improve counting for record/replay
icount_warp_rt function is called by qemu_clock_warp and as
callback of icount_warp timer. This patch adds call to qemu_clock_warp
into main_loop_wait function, because icount warp may be missed
in record/replay mode, when CPU is sleeping.
This patch also disables of calling this function by timer, because
it is not needed after making modifications of main_loop_wait.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Message-Id: <20150917162439.8676.38290.stgit@PASHA-ISP.def.inno>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
2015-11-06 10:16:02 +01:00
Pavel Dovgalyuk
b60c48a701 replay: shutdown event
This patch records and replays simulator shutdown event.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Message-Id: <20150917162433.8676.32262.stgit@PASHA-ISP.def.inno>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
2015-11-06 10:16:02 +01:00
Pavel Dovgalyuk
8eda206e09 replay: recording and replaying clock ticks
Clock ticks are considered as the sources of non-deterministic data for
virtual machine. This patch implements saving the clock values when they
are acquired (virtual, host clock).
When replaying the execution corresponding values are read from log and
transfered to the module, which wants to read the values.
Such a design required the clock polling to be synchronized. Sometimes
it is not true - e.g. when timeouts for timer lists are checked. In this case
we use a cached value of the clock, passing it to the client code.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Message-Id: <20150917162427.8676.36558.stgit@PASHA-ISP.def.inno>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
2015-11-06 10:16:02 +01:00
Pavel Dovgalyuk
c0c071d052 replay: asynchronous events infrastructure
This patch adds module for saving and replaying asynchronous events.
These events include network packets, keyboard and mouse input,
USB packets, thread pool and bottom halves callbacks.
All events are stored in the queue to be processed at synchronization points
such as beginning of TB execution, or checkpoint in the iothread.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Message-Id: <20150917162422.8676.88696.stgit@PASHA-ISP.def.inno>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
2015-11-06 10:16:02 +01:00
Pavel Dovgalyuk
6f0609697f replay: interrupts and exceptions
This patch includes modifications of common cpu files. All interrupts and
exceptions occured during recording are written into the replay log.
These events allow correct replaying the execution by kicking cpu thread
when one of these events is found in the log.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Message-Id: <20150917162416.8676.57647.stgit@PASHA-ISP.def.inno>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-06 10:16:00 +01:00
Xiao Guangrong
f7fda28094 target-i386: Enable clflushopt/clwb/pcommit instructions
These instructions are used by NVDIMM drivers and the specification is
located at:
https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf

There instructions are available on Skylake Server.

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-11-05 17:35:04 -02:00
Eduardo Habkost
6aa91e4a02 target-i386: Remove POPCNT from qemu64 and qemu32 CPU models
POPCNT is not available on Penryn and older and on Opteron_G2 and older,
and we want to make the default CPU runnable in most hosts, so it won't
be enabled by default in KVM mode.

We should eventually have all features supported by TCG enabled by
default in TCG mode, but as we don't have a good mechanism today to
ensure we have different defaults in KVM and TCG mode, disable POPCNT in
the qemu64 and qemu32 CPU models entirely.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-11-05 16:27:59 -02:00
Eduardo Habkost
711956722c target-i386: Remove ABM from qemu64 CPU model
ABM is not available on Sandy Bridge and older, and we want to make the
default CPU runnable in most hosts, so it won't be enabled by default in
KVM mode.

We should eventually have all features supported by TCG enabled by
default in TCG mode, but as we don't have a good mechanism today to
ensure we have different defaults in KVM and TCG mode, disable ABM in
the qemu64 CPU model entirely.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-11-05 16:27:59 -02:00
Eduardo Habkost
0909ad24b2 target-i386: Remove SSE4a from qemu64 CPU model
SSE4a is not available in any Intel CPU, and we want to make the default
CPU runnable in most hosts, so it doesn't make sense to enable it by
default in KVM mode.

We should eventually have all features supported by TCG enabled by
default in TCG mode, but as we don't have a good mechanism today to
ensure we have different defaults in KVM and TCG mode, disable SSE4a in
the qemu64 CPU model entirely.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-11-05 16:27:59 -02:00
Eduardo Habkost
3e68482224 target-i386: Set "check=off" by default on pc-*-2.4 and older
The default CPU model (qemu64) have some issues today: it enables some
features (ABM and SSE4a) that are not present in many host CPUs. That
means many hosts (but not all of them) had those features silently
disabled in the default configuration in QEMU 2.4 and older.

With the new "check=on" default, this causes warnings to be printed in
the default configuration, because of the lack of SSE4A on all Intel
hosts, and the lack of ABM on Sandy Bridge and older hosts:

  $ qemu-system-x86_64 -machine pc,accel=kvm
  warning: host doesn't support requested feature: CPUID.80000001H:ECX.abm [bit 5]
  warning: host doesn't support requested feature: CPUID.80000001H:ECX.sse4a [bit 6]

Those issues will be fixed in pc-*-2.5 and newer. But as we can't change
the guest ABI in pc-*-2.4, disable "check" mode by default in pc-*-2.4
and older so we don't print spurious warnings.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-11-05 16:27:59 -02:00
Gerd Hoffmann
382e1737d3 vnc: fix mismerge
Commit "4d77b1f vnc: fix bug: vnc server can't start when 'to' is
specified" was rebased incorrectly, fix it.

Reported-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Yang Hongyang <hongyang.yang@easystack.cn>
Message-id: 1446714738-22400-1-git-send-email-kraxel@redhat.com
2015-11-05 16:01:37 +01:00
Peter Maydell
496c1b19fa Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
* Guest ABI fixes for PC machines (hw_version)
* Fixes for recent Perl
* John Snow's configure fixes
* file-backed RAM improvements (Igor, Pavel)
* -Werror=clobbered fixes (Stefan)
* Kill -d ioport
* Fix qemu-system-s390x
* Performance improvement for kvmclock migration

# gpg: Signature made Thu 05 Nov 2015 13:42:27 GMT using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"

* remotes/bonzini/tags/for-upstream:
  iscsi: Translate scsi sense into error code
  Revert "Introduce cpu_clean_all_dirty"
  kvmclock: add a new function to update env->tsc.
  configure: disable FORTIFY_SOURCE under clang
  backends/hostmem-file: Allow to specify full pathname for backing file
  configure: disallow ccache during compile tests
  cpu-exec: Fix compiler warning (-Werror=clobbered)
  memory: call begin, log_start and commit when registering a new listener
  megasas: Use qemu_hw_version() instead of QEMU_VERSION
  osdep: Rename qemu_{get, set}_version() to qemu_{, set_}hw_version()
  pc: Set hw_version on all machine classes
  qemu-log: remove -d ioport
  ioport: do not use CPU_LOG_IOPORT
  target-i386: fix pcmpxstrx equal-ordered (strstr) mode
  scripts/text2pod.pl: Escape left brace
  file_ram_alloc: propagate error to caller instead of terminating QEMU

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-05 14:31:24 +00:00
Fam Zheng
e01dd3da5c iscsi: Translate scsi sense into error code
Previously we return -EIO blindly when anything goes wrong. Add a helper
function to parse sense fields and try to make the return code more
meaningful.

This also fixes the default werror configuration (enospc) when we're
using qcow2 on an iscsi lun. The old -EIO not being treated as out of
space error failed to trigger vm stop.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <1446699609-11376-1-git-send-email-famz@redhat.com>
[libiscsi 1.9 compatibility - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-05 14:42:19 +01:00
Pavel Dovgalyuk
8b42704441 cpu: replay instructions sequence
This patch adds calls to replay functions into the icount setup block.
In record mode number of executed instructions is written to the log.
In replay mode number of istructions to execute is taken from the replay log.
When replayed instructions counter is expired qemu_notify_event()
function is called to wake up the iothread.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Message-Id: <20150917162405.8676.31890.stgit@PASHA-ISP.def.inno>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-05 12:19:09 +01:00
Pavel Dovgalyuk
56c0269a9e cpu-exec: allow temporary disabling icount
This patch is required for deterministic replay to generate an exception
by trying executing an instruction without changing icount.
It adds new flag to TB for disabling icount while translating it.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Message-Id: <20150917162359.8676.77011.stgit@PASHA-ISP.def.inno>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-05 12:19:09 +01:00
Pavel Dovgalyuk
26bc60ac82 replay: introduce icount event
This patch adds icount event to the replay subsystem. This event corresponds
to execution of several instructions and used to synchronize input events
in the replay phase.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Message-Id: <20150917162354.8676.31351.stgit@PASHA-ISP.def.inno>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-05 12:19:09 +01:00
Pavel Dovgalyuk
c16861ef1b replay: introduce mutex to protect the replay log
This mutex will protect read/write operations for replay log.
Using mutex is necessary because most of the events consist of
several fields stored in the log. The mutex will help to avoid races.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Message-Id: <20150917162348.8676.8628.stgit@PASHA-ISP.def.inno>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-05 12:19:09 +01:00
Pavel Dovgalyuk
c92079f45f replay: internal functions for replay log
This patch adds functions to perform read and write operations
with replay log.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Message-Id: <20150917162342.8676.29445.stgit@PASHA-ISP.def.inno>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-05 12:19:09 +01:00
Pavel Dovgalyuk
d73abd6dcc replay: global variables and function stubs
This patch adds global variables, defines, function declarations,
and function stubs for deterministic VM replay used by external modules.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Message-Id: <20150917162337.8676.41538.stgit@PASHA-ISP.def.inno>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-05 12:19:08 +01:00
Peter Maydell
8835b9df3b Merge remote-tracking branch 'remotes/mdroth/tags/qga-pull-2015-11-04-tag' into staging
qemu-ga patch queue

* fix file handle cleanup on w32
* use non-blocking mode for file handles on w32 to avoid
  hangs on guest-file-read/guest-file-write to pipes

# gpg: Signature made Wed 04 Nov 2015 19:36:16 GMT using RSA key ID F108B584
# gpg: Good signature from "Michael Roth <flukshun@gmail.com>"
# gpg:                 aka "Michael Roth <mdroth@utexas.edu>"
# gpg:                 aka "Michael Roth <mdroth@linux.vnet.ibm.com>"

* remotes/mdroth/tags/qga-pull-2015-11-04-tag:
  qga: set file descriptor in qmp_guest_file_open non-blocking on Win32
  qga: fixed CloseHandle in qmp_guest_file_open
  qga: drop hand-made guest_file_toggle_flags helper

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-05 10:52:35 +00:00
Liang Li
6388acc853 Revert "Introduce cpu_clean_all_dirty"
This reverts commit de9d61e83d.

Now 'cpu_clean_all_dirty' is useless, we can revert the related code.

Conflicts:
	include/sysemu/kvm.h

Signed-off-by: Liang Li <liang.z.li@intel.com>
Message-Id: <1446695464-27116-3-git-send-email-liang.z.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-05 11:28:23 +01:00
Liang Li
0fd7e098db kvmclock: add a new function to update env->tsc.
The commit 317b0a6d8 fixed an issue which caused by the outdated
env->tsc value, but the fix lead to 'cpu_synchronize_all_states()'
called twice during live migration. The 'cpu_synchronize_all_states()'
takes about 130us for a VM which has 4 vcpus, it's a bit expensive.

Synchronize the whole CPU context just for updating env->tsc is too
wasting, this patch use a new function to update the env->tsc.
Comparing to 'cpu_synchronize_all_states()', it only takes about 20us.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Message-Id: <1446695464-27116-2-git-send-email-liang.z.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-05 11:28:10 +01:00
John Snow
b553a04280 configure: disable FORTIFY_SOURCE under clang
Some versions of clang may have difficulty compiling glibc headers when
-D_FORTIFY_SOURCE is used. For example, Clang++ 3.5.0-9.fc22 cannot
compile glibc's stdio headers when -D_FORTIFY_SOURCE=2 is used. This
manifests currently as build failures with clang and any arm target.

According to LLVM dev Richard Smith, clang does not target or support
FORTIFY_SOURCE + glibc, and it should not be relied on.
"It's still an unsupported combination, and while it might compile, some
of the checks are unlikely to work because they require a frontend
inliner to be useful"

See: http://lists.llvm.org/pipermail/cfe-dev/2015-November/045846.html

Conclusion: disable fortify-source if we appear to be using clang instead
of testing for compile success or failure, which may be incidental or not
indicative of proper support of the feature.

Signed-off-by: John Snow <jsnow@redhat.com>
Message-Id: <1446583422-10153-1-git-send-email-jsnow@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-05 11:28:02 +01:00
Peter Maydell
6c5f30cad2 Merge remote-tracking branch 'remotes/juanquintela/tags/migration/20151104' into staging
migration/next for 20151104

# gpg: Signature made Wed 04 Nov 2015 12:45:19 GMT using RSA key ID 5872D723
# gpg: Good signature from "Juan Quintela <quintela@redhat.com>"
# gpg:                 aka "Juan Quintela <quintela@trasno.org>"

* remotes/juanquintela/tags/migration/20151104:
  migration: fix analyze-migration.py script
  migration: code clean up
  migration: rename cancel to cleanup in SaveVMHandles
  migration: rename qemu_savevm_state_cancel
  migration: defer migration_end & blk_mig_cleanup

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-05 10:10:57 +00:00
Peter Lieven
f14c3d85b0 buffer: allow a buffer to shrink gracefully
the idea behind this patch is to allow the buffer to shrink, but
make this a seldom operation. The buffers average size is measured
exponentionally smoothed with am alpha of 1/128.

Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1446203414-4013-20-git-send-email-kraxel@redhat.com
2015-11-05 09:09:58 +01:00
Peter Lieven
4ec5ba151f buffer: factor out buffer_adj_size
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1446203414-4013-19-git-send-email-kraxel@redhat.com
2015-11-05 09:09:55 +01:00
Peter Lieven
fd95243372 buffer: factor out buffer_req_size
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1446203414-4013-18-git-send-email-kraxel@redhat.com
2015-11-05 09:09:48 +01:00
Peter Lieven
c3d6899c5e vnc: recycle empty vs->output buffer
If the vs->output buffer is empty it will be dropped
by the next qio_buffer_move_empty in vnc_jobs_consume_buffer
anyway. So reuse the allocated buffer from this buffer
in the worker thread where we otherwise would start with
an empty (unallocated buffer).

Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1446203414-4013-17-git-send-email-kraxel@redhat.com

[ added a comment describing the non-obvious optimization ]

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2015-11-05 09:09:31 +01:00
Gerd Hoffmann
2e0c90af0a vnc: fix local state init
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1446203414-4013-16-git-send-email-kraxel@redhat.com
2015-11-05 09:09:27 +01:00
Gerd Hoffmann
c7628bff41 vnc: only alloc server surface with clients connected
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1446203414-4013-15-git-send-email-kraxel@redhat.com
2015-11-05 09:09:23 +01:00
Gerd Hoffmann
f7b3d68c95 vnc: use vnc_{width,height} in vnc_set_area_dirty
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1446203414-4013-14-git-send-email-kraxel@redhat.com
2015-11-05 09:09:18 +01:00
Gerd Hoffmann
453f842bc4 vnc: factor out vnc_update_server_surface
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1446203414-4013-13-git-send-email-kraxel@redhat.com
2015-11-05 09:09:14 +01:00
Gerd Hoffmann
d05959c2e1 vnc: add vnc_width+vnc_height helpers
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1446203414-4013-12-git-send-email-kraxel@redhat.com
2015-11-05 09:09:10 +01:00
Gerd Hoffmann
e081aae5ae vnc: zap dead code
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1446203414-4013-11-git-send-email-kraxel@redhat.com
2015-11-05 09:09:05 +01:00
Gerd Hoffmann
d90340115a vnc-jobs: move buffer reset, use new buffer move
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1446203414-4013-10-git-send-email-kraxel@redhat.com
2015-11-05 09:08:59 +01:00
Gerd Hoffmann
8305f917c1 vnc: kill jobs queue buffer
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1446203414-4013-9-git-send-email-kraxel@redhat.com
2015-11-05 09:08:56 +01:00
Gerd Hoffmann
543b95801f vnc: attach names to buffers
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1446203414-4013-8-git-send-email-kraxel@redhat.com
2015-11-05 09:08:52 +01:00
Gerd Hoffmann
d2b90718d2 buffer: add tracing
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1446203414-4013-7-git-send-email-kraxel@redhat.com
2015-11-05 09:08:48 +01:00
Gerd Hoffmann
1ff36b5d4d buffer: add buffer_shrink
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1446203414-4013-6-git-send-email-kraxel@redhat.com
2015-11-05 09:08:41 +01:00
Gerd Hoffmann
830a958320 buffer: add buffer_move
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1446203414-4013-5-git-send-email-kraxel@redhat.com
2015-11-05 09:08:39 +01:00
Gerd Hoffmann
4d1eb5fdb1 buffer: add buffer_move_empty
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Daniel Berrange <berrange@redhat.com>
Message-id: 1446203414-4013-4-git-send-email-kraxel@redhat.com
2015-11-05 09:08:36 +01:00
Gerd Hoffmann
810082d15c buffer: add buffer_init
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1446203414-4013-3-git-send-email-kraxel@redhat.com
2015-11-05 09:08:33 +01:00
Peter Lieven
5c10dbb7b5 buffer: make the Buffer capacity increase in powers of two
This makes sure the number of reallocs is in O(log N).

Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 1446203414-4013-2-git-send-email-kraxel@redhat.com

[ rebased to util/buffer.c ]

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2015-11-05 09:08:29 +01:00
Peter Maydell
2b5a79f1d9 Merge remote-tracking branch 'remotes/armbru/tags/pull-error-2015-11-03' into staging
vl.c: Error message rework

# gpg: Signature made Tue 03 Nov 2015 08:40:50 GMT using RSA key ID EB918653
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>"
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>"

* remotes/armbru/tags/pull-error-2015-11-03:
  vl.c: Use "%s support is disabled" error messages consistently
  vl.c: Improve warnings on use of deprecated options
  vl.c: Touch up error messages
  vl.c: Remove unnecessary uppercase in error messages
  vl.c: Use "warning:" prefix consistently on warnings
  vl.c: Remove periods and exclamation points from error messages
  vl.c: Replace fprintf(stderr) with error_report()

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-04 18:20:31 +00:00
Pavel Fedin
8d31d6b65a backends/hostmem-file: Allow to specify full pathname for backing file
This allows to explicitly specify file name to use with the backend. This
is important when using it together with ivshmem in order to make it backed
by hugetlbfs. By default filename is autogenerated using mkstemp(), and the
file is unlink()ed after creation, effectively making it anonymous. This is
not very useful with ivshmem because it ends up in a memory which cannot be
accessed by something else.

Distinction between directory and file name is done by stat() check. If an
existing directory is given, the code keeps old behavior. Otherwise it
creates or opens a file with the given pathname.

Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Tested-by: Igor Skalkin <i.skalkin@samsung.com>
Message-Id: <004301d11166$9672fe30$c358fa90$@samsung.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-04 15:56:05 +01:00
John Snow
5e4dfd3d4e configure: disallow ccache during compile tests
If the user is using ccache during the configuration step,
it may interfere with some of the configuration tests,
particularly the "Is ccache interfering with macro analysis" step,
which is a bit of a poetic problem.

1) Disallow ccache from reading from the cache during configure,
   but don't disable it entirely to allow us to see if it causes other
   problems.

2) Force off CCACHE_CPP2 during the ccache test to get a deterministic
   answer over whether or not we need to enable that feature later.

Signed-off-by: John Snow <jsnow@redhat.com>
Message-Id: <1446055000-29150-1-git-send-email-jsnow@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-04 15:56:04 +01:00
Stefan Weil
0448f5f8b8 cpu-exec: Fix compiler warning (-Werror=clobbered)
Reloading of local variables after sigsetjmp is only needed for some
buggy compilers.

The code which should reload these variables causes compiler warnings
with gcc 4.7 when compiler optimizations are enabled:

cpu-exec.c:204:15: error:
 variable ‘cpu’ might be clobbered by ‘longjmp’ or ‘vfork’ [-Werror=clobbered]
cpu-exec.c:207:15: error:
 variable ‘cc’ might be clobbered by ‘longjmp’ or ‘vfork’ [-Werror=clobbered]
cpu-exec.c:202:28: error:
 argument ‘env’ might be clobbered by ‘longjmp’ or ‘vfork’ [-Werror=clobbered]

Now this code is only used for compilers which need it
(and gcc 4.5.x, x > 0 which does not need it but won't give warnings).

There were bug reports for clang and gcc 4.5.0, while gcc 4.5.1
was reported to work fine without the reload code. For clang it
is not clear which versions are affected, so simply keep the status quo
for all clang compilations. This can be improved later.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Message-Id: <1443266606-21400-1-git-send-email-sw@weilnetz.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-04 15:56:04 +01:00
Paolo Bonzini
680a4783dc memory: call begin, log_start and commit when registering a new listener
This ensures that cpu_reload_memory_map() is called as soon as
tcg_cpu_address_space_init() is called, and before cpu->memory_dispatch
is used.  qemu-system-s390x never changes the address spaces after
tcg_cpu_address_space_init() is called, and thus tcg_commit() is never
called.  This causes a SIGSEGV.

Because memory_map_init() will now call mem_commit(), we have to
initialize io_mem_* before address_space_memory and friends.

Reported-by: Philipp Kern <pkern@debian.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Fixes: 0a1c71cec6
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-04 15:56:01 +01:00
Eduardo Habkost
69fbd0ea25 megasas: Use qemu_hw_version() instead of QEMU_VERSION
Guest visible data shouldn't change with a simple QEMU upgrade, so use
qemu_hw_version() to ensure it won't change (as long as the machine
class being used has hw_version set).

Cc: Hannes Reinecke <hare@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-block@nongnu.org
Reviewed-by: Hannes Reinecke <hare@suse.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <1446233769-7892-4-git-send-email-ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-04 15:02:31 +01:00
Eduardo Habkost
35c2c8dc8c osdep: Rename qemu_{get, set}_version() to qemu_{, set_}hw_version()
This makes the purpose of the function clearer: it is not about the
version of QEMU that's running, but the version string exposed in the
emulated hardware.

Cc: Andrzej Zaborowski <balrogg@gmail.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: John Snow <jsnow@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <1446233769-7892-3-git-send-email-ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-04 15:02:31 +01:00
Eduardo Habkost
de796d93f5 pc: Set hw_version on all machine classes
In 2012, QEMU had a bug where it exposed QEMU version information to the
guest, meaning a QEMU upgrade would expose different hardware to the
guest OS even if the same machine-type is being used.

The bug was fixed by commit 93bfef4c6e, on
all machines up to pc-1.0. But we kept introducing the same bug on all
newer machines since then. That means we are breaking guest ABI every
time QEMU was upgraded.

Fix this by setting the hw_version on all PC machines, making sure the
hardware won't change when upgrading QEMU.

Note that QEMU_VERSION was "1.0" in QEMU 1.0, but starting on QEMU
1.1.0, it started following the "x.y.0" pattern. We have to follow it,
to make sure we use the right QEMU_VERSION string from each QEMU
release.

The 2.5 machine classes could have hw_version unset, because the default
value for qemu_get_version() is QEMU_VERSION. But I decided to set it
explicitly to QEMU_VERSION so we don't forget to update it to "2.5.0"
after we release 2.5.0 and create a 2.6 machine class.

Reported-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <1446233769-7892-2-git-send-email-ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-04 15:02:30 +01:00
Paolo Bonzini
ddcc8e9d51 qemu-log: remove -d ioport
It was disabled at compile-time, and is now replaced by tracepoints.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-04 15:02:30 +01:00
Paolo Bonzini
6f94b7d97f ioport: do not use CPU_LOG_IOPORT
These messages are disabled by default; a perfect usecase for tracepoints,
which in fact already exist.  Add the missing information to them and
stop using qemu_log_mask.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-04 15:02:30 +01:00
Paolo Bonzini
54c54f8b56 target-i386: fix pcmpxstrx equal-ordered (strstr) mode
In this mode, referring an invalid element of the source forces the
result to false (table 4-7, last column) but referring an invalid
element of the destination forces the result to true, so the outer
loop should still be run even if some elements of the destination
will be invalid.  They will be avoided in the inner loop, which
correctly bounds "i" to validd, but they will still contribute to a
positive outcome of the search.

This fixes tst_strstr in glibc 2.17.

Reported-by: Florian Weimer <fweimer@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-04 15:02:30 +01:00
Olga Krishtal
fb68777312 qga: set file descriptor in qmp_guest_file_open non-blocking on Win32
Set fd non-blocking to avoid common use cases (like reading from a
named pipe) from hanging the agent. This was missed in the original
code.

The patch introduces qemu_set_handle_nonoblocking, the local analog
of qemu_set_nonblock for HANDLES.
The usage of handles in qemu_set_non/block is impossible, because for
win32 there is a difference between file discriptors and file handles,
and all file ops are made via Win32 api.

Signed-off-by: Olga Krishtal <okrishtal@parallels.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Michael Roth <mdroth@linux.vnet.ibm.com>
CC: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-11-04 07:37:56 -06:00
Olga Krishtal
c87d0964ef qga: fixed CloseHandle in qmp_guest_file_open
CloseHandle use HANDLE as an argument, but not *HANDLE

Signed-off-by: Olga Krishtal <okrishtal@parallels.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
CC: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-11-04 07:37:56 -06:00
Denis V. Lunev
125053965b qga: drop hand-made guest_file_toggle_flags helper
We'd better use generic qemu_set_nonblock directly.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Yuri Pudgorodskiy <yur@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
CC: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-11-04 07:37:56 -06:00
Mark Cave-Ayland
96e5c9bc77 migration: fix analyze-migration.py script
Commit 61964 "Add configuration section" broke the analyze-migration.py script
which terminates due to the unrecognised section. Fix the script by parsing
the contents of the configuration section directly into a new
ConfigurationSection object (although nothing is done with it yet).

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Juan Quintela <quintela@redhat.com>al3
Signed-off-by: Juan Quintela <quintela@redhat.com>al3
2015-11-04 13:40:13 +01:00
Liang Li
6ad2a215e7 migration: code clean up
Just clean up code, no behavior change.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>al3
Reviewed-by: Amit Shah <amit.shah@redhat.com>al3
Signed-off-by: Juan Quintela <quintela@redhat.com>al3
2015-11-04 13:40:13 +01:00
Liang Li
d1a8548c10 migration: rename cancel to cleanup in SaveVMHandles
'cleanup' seems more appropriate than 'cancel'.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>al3
Reviewed-by: Amit Shah <amit.shah@redhat.com>al3
Signed-off-by: Juan Quintela <quintela@redhat.com>al3
2015-11-04 13:40:13 +01:00
Liang Li
ea7415fac6 migration: rename qemu_savevm_state_cancel
The function qemu_savevm_state_cancel is called after the migration
in migration_thread, it seems strange to 'cancel' it after completion,
rename it to qemu_savevm_state_cleanup looks better.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>al3
Reviewed-by: Amit Shah <amit.shah@redhat.com>al3
Signed-off-by: Juan Quintela <quintela@redhat.com>al3
2015-11-04 13:40:13 +01:00
Liang Li
94f5a43704 migration: defer migration_end & blk_mig_cleanup
Because of the patch 3ea3b7fa9af067982f34b of kvm, which introduces a
lazy collapsing of small sptes into large sptes mechanism, now
migration_end() is a time consuming operation because it calls
memroy_global_dirty_log_stop(), which will trigger the dropping of small
sptes operation and takes about dozens of milliseconds, so call
migration_end() before all the vmsate data has already been transferred
to the destination will prolong VM downtime. This operation should be
deferred after all the data has been transferred to the destination.

blk_mig_cleanup() can be deferred too.

For a VM with 8G RAM, this patch can reduce the VM downtime about 30 ms.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>al3
Reviewed-by: Amit Shah <amit.shah@redhat.com>al3
Signed-off-by: Juan Quintela <quintela@redhat.com>al3
2015-11-04 13:40:13 +01:00
Peter Maydell
79cf9fad34 Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20151103' into staging
target-arm queue:
 * code cleanup to use symbolic constants for register bank numbers
 * fix direct booting of modern Linux kernels on xilinx_zynq by setting
   SCLR values to what the kernel expects firmware to have done
 * implement SYSRESETREQ for ARMv7M CPU (stellaris boards)
 * update MAINTAINERS to mention new qemu-arm mailing list
 * clean up display of PSTATE in AArch64 debug logs
 * report Secure/Nonsecure status in CPU debug logs
 * fix a missing _CCA attribute in ACPI tables
 * add support for GICv3 to ACPI tables

# gpg: Signature made Tue 03 Nov 2015 13:58:46 GMT using RSA key ID 14360CDE
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>"
# gpg:                 aka "Peter Maydell <pmaydell@gmail.com>"
# gpg:                 aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>"

* remotes/pmaydell/tags/pull-target-arm-20151103:
  ARM: ACPI: Fix MPIDR value in ACPI table
  hw/arm/virt-acpi-build: Add GICC ACPI subtable for GICv3
  hw/arm/virt-acpi-build: _CCA attribute is compulsory
  target-arm: Report S/NS status in the CPU debug logs
  target-arm: Bring AArch64 debug CPU display of PSTATE into line with AArch32
  MAINTAINERS: Add new qemu-arm mailing list to ARM related entries
  arm: stellaris: exit on external reset request
  armv7-m: Implement SYSRESETREQ
  armv7-m: Return DeviceState* from armv7m_init()
  arm: xilinx_zynq: Add linux pre-boot
  arm: boot: Add board specific setup code API
  arm: boot: Adjust indentation of FIXUP comments
  target-arm: Add and use symbolic names for register banks

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-03 14:54:40 +00:00
Peter Maydell
19bb546713 Merge remote-tracking branch 'remotes/kraxel/tags/pull-usb-20151103-1' into staging
usb: two bugfixes for ehci & usb-host.

# gpg: Signature made Tue 03 Nov 2015 10:57:28 GMT using RSA key ID D3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"

* remotes/kraxel/tags/pull-usb-20151103-1:
  usb-host: fix usb3ep0quirk test
  ehci: clear suspend bit on detach

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-03 14:09:59 +00:00
Shannon Zhao
5d9c175614 ARM: ACPI: Fix MPIDR value in ACPI table
Use mp_affinity of ARMCPU as the CPU MPIDR instead of the CPU index.

Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Message-id: 1446285001-7316-1-git-send-email-zhaoshenglong@huawei.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-03 13:49:42 +00:00
Shannon Zhao
f2fbfacec0 hw/arm/virt-acpi-build: Add GICC ACPI subtable for GICv3
When booting VM with GICv3, the kernel needs GICC ACPI subtable to
initialize the CPUs, e.g. MPIDR information. This adds GICC ACPI
subtable for GICv3, but set GICC base address only when gic_version == 2
since it donesn't need GICC base address for GICv3.

Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Message-id: 1446131773-5018-1-git-send-email-shannon.zhao@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-03 13:49:42 +00:00
Graeme Gregory
bc64b96c98 hw/arm/virt-acpi-build: _CCA attribute is compulsory
According to ACPI specification 6.2.17 _CCA (Cache Coherency Attribute)
this attribute is compulsory on ARM systems. Add this attribute to
the PCI host bridges as required.

Without this the kernel will produce the error
[Firmware Bug]: PCI device 0000:00:00.0 fail to setup DMA.

Signed-off-by: Graeme Gregory <graeme.gregory@linaro.org>
Message-id: 1446460786-13663-1-git-send-email-graeme.gregory@linaro.org
Reviewed-by: Shannon Zhao <shannon.zhao@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-03 13:49:42 +00:00
Peter Maydell
06e5cf7acd target-arm: Report S/NS status in the CPU debug logs
If this CPU supports EL3, enhance the printing of the current
CPU mode in debug logging to distinguish S from NS modes as
appropriate.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 1445883178-576-3-git-send-email-peter.maydell@linaro.org
2015-11-03 13:49:42 +00:00
Peter Maydell
08b8e0f527 target-arm: Bring AArch64 debug CPU display of PSTATE into line with AArch32
The AArch64 debug CPU display of PSTATE as "PSTATE=200003c5 (flags --C-)"
on the end of the same line as the last of the general purpose registers
is unnecessarily different from the AArch32 display of PSR as
"PSR=200001d3 --C- A svc32" on its own line. Update the AArch64
code to put PSTATE in its own line and in the same format, including
printing the exception level (mode).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 1445883178-576-2-git-send-email-peter.maydell@linaro.org
2015-11-03 13:49:42 +00:00
Peter Maydell
b4f2bd1ce8 MAINTAINERS: Add new qemu-arm mailing list to ARM related entries
We now have a qemu-arm mailing list for ARM patches and discussion,
so add an L: entry for it to the various ARM related entries in
MAINTAINERS.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Reviewed-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Message-id: 1446129661-5239-1-git-send-email-peter.maydell@linaro.org
2015-11-03 13:49:42 +00:00
Michael Davidsaver
d69ffb5b48 arm: stellaris: exit on external reset request
Add GPIO in for the stellaris board which calls
qemu_system_reset_request() on reset request.

Signed-off-by: Michael Davidsaver <mdavidsaver@gmail.com>
Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-03 13:49:41 +00:00
Michael Davidsaver
e192becdc7 armv7-m: Implement SYSRESETREQ
Implement the SYSRESETREQ bit of the AIRCR register
for armv7-m (ie. cortex-m3) to trigger a GPIO out.

Signed-off-by: Michael Davidsaver <mdavidsaver@gmail.com>
Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-03 13:49:41 +00:00
Michael Davidsaver
20c59c3892 armv7-m: Return DeviceState* from armv7m_init()
Change armv7m_init to return the DeviceState* for the NVIC.
This allows access to all GPIO blocks, not just the IRQ inputs.
Move qdev_get_gpio_in() calls out of armv7m_init() into
board code for stellaris and stm32f205 boards.

Signed-off-by: Michael Davidsaver <mdavidsaver@gmail.com>
Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-03 13:49:41 +00:00
Peter Crosthwaite
c3a9a689c6 arm: xilinx_zynq: Add linux pre-boot
Add a Linux-specific pre-boot routine that matches the device-
specific bootloaders behaviour. This is needed for modern Linux that
expects the ARM PLL in SLCR to be a more even value (not 26).

Cc: Alistair Francis <alistair.francis@xilinx.com>
Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Message-id: 9a9025ea65572586b50dca4e5819032e3c436d64.1446182614.git.crosthwaite.peter@gmail.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-03 13:49:41 +00:00
Peter Crosthwaite
10b8ec73e6 arm: boot: Add board specific setup code API
Add an API for boards to inject their own preboot software (or
firmware) sequence.

The software then returns to the bootloader via the link register. This
allows boards to do their own little bits of firmware setup without
needed to replace the bootloader completely (which is the requirement
for existing firmware support).

The blob is loaded by a callback if and only if doing a linux boot
(similar to the existing write_secondary support).

Rewrite the comment for the primary boot blob.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Message-id: 070295644c6ac84696d743913296e8cfefb48c15.1446182614.git.crosthwaite.peter@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-03 13:49:41 +00:00
Peter Crosthwaite
84e5939779 arm: boot: Adjust indentation of FIXUP comments
These comments start immediately after the current longest name in the
list. Tab them out to the next tab stop to give a little breathing room
and prepare for FIXUP_BOARD_SETUP which will require more indent.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Message-id: b9b9bb8f1c307c1ef8a3f26ff1f34fabb34b332e.1446182614.git.crosthwaite.peter@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-03 13:49:41 +00:00
Soren Brinkmann
99a99c1fc8 target-arm: Add and use symbolic names for register banks
Add BANK_<cpumode> #defines to index banked registers.

Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-03 13:49:41 +00:00
Gerd Hoffmann
a9be4e7c48 usb-host: fix usb3ep0quirk test
usb->speed is the usb speed the device is actually running on in the
qemu emulation (i.e. from the guests point of view).  So when plugging
usb3 devices into ehci hostadapter this is HIGH not SUPER.

To figure whenever the host talks to the device with superspeed we
have to check speedmask instead and see whenever the superspeed bit
is set there.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 1445603230-11840-1-git-send-email-kraxel@redhat.com
2015-11-03 11:56:23 +01:00
Gerd Hoffmann
cbf82fa01e ehci: clear suspend bit on detach
When a device is detached, clear the suspend bit (PORTSC_SUSPEND)
in the port status register.

The specs are not *that* clear what is supposed to happen in case
a suspended device is unplugged.  But the enable bit (PORTSC_PED)
is cleared, and the specs mention setting suspend with enable being
unset is undefined behavior.  So clearing them both looks reasonable,
and it actually fixes the reported bug.

https://bugzilla.redhat.com/show_bug.cgi?id=1268879

Cc: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Message-id: 1445413462-18004-1-git-send-email-kraxel@redhat.com
2015-11-03 11:55:51 +01:00
Peter Maydell
130d0bc659 Merge remote-tracking branch 'remotes/kraxel/tags/pull-ui-20151103-1' into staging
ui: fixes for vnc, opengl and curses.

# gpg: Signature made Tue 03 Nov 2015 09:53:24 GMT using RSA key ID D3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"

* remotes/kraxel/tags/pull-ui-20151103-1:
  vnc: fix bug: vnc server can't start when 'to' is specified
  vnc: allow fall back to RAW encoding
  ui/opengl: Reduce build required libraries for opengl
  ui/curses: Fix pageup/pagedown on -curses
  ui/curses: Support line graphics chars on -curses mode
  ui/curses: Fix monitor color with -curses when 256 colors

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-03 10:20:04 +00:00
Yang Hongyang
4d77b1f238 vnc: fix bug: vnc server can't start when 'to' is specified
commit e0d03b8ceb converted VNC startup to use SocketAddress,
the interface socket_listen don't have a port_offset param, so
we need to add the port offset (5900) to both 'port' and 'to' opts.
currently only 'port' is added by offset.
This patch add the port offset to 'to' opts.

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1445926252-14830-1-git-send-email-hongyang.yang@easystack.cn
Cc: Daniel P. Berrange <berrange@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2015-11-03 10:21:49 +01:00
Peter Lieven
de3f7de7f4 vnc: allow fall back to RAW encoding
I have observed that depending on the contents and the encoding it happens
that sending data as RAW sometimes would take less space than the encoded data.
This is especially the case for small updates or areas with high color images.
If sending RAW encoded data is beneficial allow a fall back to RAW encoding
for the framebuffer update.

Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2015-11-03 10:21:49 +01:00
OGAWA Hirofumi
fb71956367 ui/opengl: Reduce build required libraries for opengl
We now use epoxy to load opengl libraries. This means we don't need to
link opengl libraries directly if interfaces handled by epoxy. With
this, we just need epoxy headers and epoxy's *.so to build.

Tested with epoxy-1.3.1.

- sdl2/gtk/console egl stuff doesn't require other than epoxy
- milkymist-tmu2 glx stuff doesn't require other than epoxy

(lm32 test is limited, because can't find mmone-bios.bin, so just test
to load libGL with "./lm32-softmmu/qemu-system-lm32 -M milkymist,accel=qtest")

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

[ lm32 tested by kraxel ]

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2015-11-03 10:13:42 +01:00
OGAWA Hirofumi
e72df72a55 ui/curses: Fix pageup/pagedown on -curses
Current KEY_NPAGE/KEY_PPAGE handling is broken on -curses. Those uses
"GREY", but "KEY_MASK" masked out "GREY".

To fix, we have to use correct mask value - SCANCODE_KEYMASK.

Then, this adds support of "shift + pageup/pagedown". With this,
-curses mode can use scroll-up/down as usual like other display modes.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2015-11-03 10:12:46 +01:00
OGAWA Hirofumi
e2368dc968 ui/curses: Support line graphics chars on -curses mode
This converts vga code to curses code in console_write_bh().

With this changes, we can see line graphics (for example, dialog uses)
correctly.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2015-11-03 10:12:46 +01:00
OGAWA Hirofumi
615220ddaf ui/curses: Fix monitor color with -curses when 256 colors
If TERM=xterm-256color, COLOR_PAIRS==256 and monitor passes chtype
like 0x74xx. Then, the code uses uninitialized color pair. As result,
monitor uses black for both of fg and bg color, i.e. terminal is
filled by black.

To fix, this initialize above than 64 with default color (fg=white,bg=black).

FIXME: on 256 color, curses may be possible better vga color emulation.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2015-11-03 10:12:45 +01:00
Eduardo Habkost
5dfdae8143 vl.c: Use "%s support is disabled" error messages consistently
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <1446217682-24421-12-git-send-email-ehabkost@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-03 09:38:32 +01:00
Eduardo Habkost
515cb8ef27 vl.c: Improve warnings on use of deprecated options
Simplify warnings about deprecated options by rewriting them as
"warning: ignoring deprecated option".

Reword -no-kvm-pit-reinjection deprecation warning.

Suggested-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <1446217682-24421-10-git-send-email-ehabkost@redhat.com>
[Squashed in
Message-Id: <1446217682-24421-11-git-send-email-ehabkost@redhat.com>
and updated commit message]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-03 09:38:26 +01:00
Eduardo Habkost
4cd70f34af vl.c: Touch up error messages
Several small improvements:

* Use "cannot" instead of "can not"
* Use 'quotes' instead of `quotes'
* Change "fail to parse" error message to "failed to parse"

Suggested-by: Eric Blake <eblake@redhat.com>
Suggested-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <1446217682-24421-6-git-send-email-ehabkost@redhat.com>
[Squashed in
Message-Id: <1446217682-24421-7-git-send-email-ehabkost@redhat.com>
Message-Id: <1446217682-24421-9-git-send-email-ehabkost@redhat.com>
and updated commit message]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-03 09:38:20 +01:00
Eduardo Habkost
3e5153732e vl.c: Remove unnecessary uppercase in error messages
Suggested-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <1446217682-24421-8-git-send-email-ehabkost@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-03 09:38:16 +01:00
Eduardo Habkost
eb177ec184 vl.c: Use "warning:" prefix consistently on warnings
Suggested-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <1446217682-24421-5-git-send-email-ehabkost@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-03 09:38:13 +01:00
Eduardo Habkost
8afb900030 vl.c: Remove periods and exclamation points from error messages
Except for removing periods and exclamation points, no other changes
were made to the error messages (yet).

Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <1446217682-24421-4-git-send-email-ehabkost@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-03 09:38:02 +01:00
Eduardo Habkost
f61eddcb2b vl.c: Replace fprintf(stderr) with error_report()
Straightforward replacement, except for qemu_kill_report(), which
printed a common part of its error message first, then the applicable
special part.  Print each complete message with a single
error_report() instead.

Multi-line messages were replaced by error_report() followed by
error_printf().

The following changes were made to the error messages:

* The "invalid date format" message was reworded to better fit
  the new error_report()+error_printf() pattern.
* On the remaining messages, only the trailing newlines, "qemu:" and
  "error:" message prefixes were removed.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <1446217682-24421-2-git-send-email-ehabkost@redhat.com>
[Squashed in
Message-Id: <1446217682-24421-3-git-send-email-ehabkost@redhat.com>
and updated commit message]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-03 09:37:50 +01:00
Fam Zheng
aa5ccadcca scripts/text2pod.pl: Escape left brace
Latest perl now deprecates "{" literal in regex and print warnings like
"unescaped left brace in regex is deprecated".  Add escapes to keep it
happy.

Signed-off-by: Fam Zheng <famz@redhat.com>

Message-Id: <1445326726-16031-1-git-send-email-famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-02 14:50:27 +01:00
Igor Mammedov
cc57501dee file_ram_alloc: propagate error to caller instead of terminating QEMU
QEMU shouldn't exits from file_ram_alloc() if -mem-prealloc option is specified
and "object_add memory-backend-file,..." fails allocation during memory hotplug.

Propagate error to a caller and let it decide what to do with allocation failure.
That leaves QEMU alive if it can't create backend during hotplug time and
kills QEMU at startup time if backends or initial memory were misconfigured/
too large.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1445274671-17704-1-git-send-email-imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-02 14:50:27 +01:00
Peter Maydell
3d861a0109 Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2015-11-02' into staging
QAPI patches

# gpg: Signature made Mon 02 Nov 2015 09:07:23 GMT using RSA key ID EB918653
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>"
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>"

* remotes/armbru/tags/pull-qapi-2015-11-02: (25 commits)
  qapi: Simplify gen_struct_field()
  qapi: Reserve 'u' member name
  qapi: Finish converting to new qapi union layout
  tpm: Convert to new qapi union layout
  memory: Convert to new qapi union layout
  input: Convert to new qapi union layout
  char: Convert to new qapi union layout
  net: Convert to new qapi union layout
  sockets: Convert to new qapi union layout
  block: Convert to new qapi union layout
  tests: Convert to new qapi union layout
  qapi-visit: Convert to new qapi union layout
  qapi: Start converting to new qapi union layout
  qapi-visit: Remove redundant functions for flat union base
  qapi: Unbox base members
  qapi: Prefer typesafe upcasts to qapi base classes
  qapi-types: Refactor base fields output
  qapi-visit: Split off visit_type_FOO_fields forward decl
  vnc: Hoist allocation of VncBasicInfo to callers
  qapi: Reserve 'q_*' and 'has_*' member names
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-11-02 11:11:39 +00:00
Eric Blake
32bc6879be qapi: Simplify gen_struct_field()
Rather than having all callers pass a name, type, and optional
flag, have them instead pass a QAPISchemaObjectTypeMember which
already has all that information.

No change to generated code.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1445898903-12082-25-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-02 08:30:28 +01:00
Eric Blake
5e59baf90a qapi: Reserve 'u' member name
Now that we have separated union tag values from colliding with
non-variant C names, by naming the union 'u', we should reserve
this name for our use.  Note that we want to forbid 'u' even in
a struct with no variants, because it is possible for a future
qemu release to extend QMP in a backwards-compatible manner while
converting from a struct to a flat union.  Fortunately, no
existing clients were using this member name.  If we ever find
the need for QMP to have a member 'u', we could at that time
relax things, perhaps by having c_name() munge the QMP member to
'q_u'.

Note that we cannot forbid 'u' everywhere (by adding the
rejection code to check_name()), because the existing QKeyCode
enum already uses it; therefore we only reserve it as a struct
type member name.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1445898903-12082-24-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-02 08:30:28 +01:00
Eric Blake
e4ba22b319 qapi: Finish converting to new qapi union layout
We have two issues with our qapi union layout:
1) Even though the QMP wire format spells the tag 'type', the
C code spells it 'kind', requiring some hacks in the generator.
2) The C struct uses an anonymous union, which places all tag
values in the same namespace as all non-variant members. This
leads to spurious collisions if a tag value matches a non-variant
member's name.

This patch is the back end for a series that converts to a
saner qapi union layout.  Now that all clients have been
converted to use 'type' and 'obj->u.value', we can drop the
temporary parallel support for 'kind' and 'obj->value'.

Given a simple union qapi type:

{ 'union':'Foo', 'data': { 'a':'int', 'b':'bool' } }

this is the overall effect, when compared to the state before
this series of patches:

| struct Foo {
|-    FooKind kind;
|-    union { /* union tag is @kind */
|+    FooKind type;
|+    union { /* union tag is @type */
|         void *data;
|         int64_t a;
|         bool b;
|-    };
|+    } u;
| };

The testsuite still contains some examples of artificial restrictions
(see flat-union-clash-type.json, for example) that are no longer
technically necessary, now that there is no longer a collision between
enum tag values and non-variant member names; but fixing this will be
done in later patches, in part because some further changes are required
to keep QAPISchema*.check() from asserting.  Also, a later patch will
add a reservation for the member name 'u' to avoid a collision between a
user's non-variant names and our internal choice of C union name.

Note, however, that we do not rename the generated enum, which
is still 'FooKind'.  A further patch could generate implicit
enums as 'FooType', but while the generator already reserved
the '*Kind' namespace (commit 4dc2e69), there are already QMP
constructs with '*Type' naming, which means changing our
reservation namespace would have lots of churn to C code to
deal with a forced name change.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1445898903-12082-23-git-send-email-eblake@redhat.com>
[Commit message tweaked]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-02 08:30:28 +01:00
Eric Blake
ce21131a0b tpm: Convert to new qapi union layout
We have two issues with our qapi union layout:
1) Even though the QMP wire format spells the tag 'type', the
C code spells it 'kind', requiring some hacks in the generator.
2) The C struct uses an anonymous union, which places all tag
values in the same namespace as all non-variant members. This
leads to spurious collisions if a tag value matches a non-variant
member's name.

Make the conversion to the new layout for TPM-related code.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1445898903-12082-22-git-send-email-eblake@redhat.com>
[Commit message tweaked slightly]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-02 08:30:28 +01:00
Eric Blake
1fd5d4fea4 memory: Convert to new qapi union layout
We have two issues with our qapi union layout:
1) Even though the QMP wire format spells the tag 'type', the
C code spells it 'kind', requiring some hacks in the generator.
2) The C struct uses an anonymous union, which places all tag
values in the same namespace as all non-variant members. This
leads to spurious collisions if a tag value matches a non-variant
member's name.

Make the conversion to the new layout for memory-related code.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1445898903-12082-21-git-send-email-eblake@redhat.com>
[Commit message tweaked slightly]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-02 08:30:28 +01:00
Eric Blake
568c73a478 input: Convert to new qapi union layout
We have two issues with our qapi union layout:
1) Even though the QMP wire format spells the tag 'type', the
C code spells it 'kind', requiring some hacks in the generator.
2) The C struct uses an anonymous union, which places all tag
values in the same namespace as all non-variant members. This
leads to spurious collisions if a tag value matches a non-variant
member's name.

Make the conversion to the new layout for input-related code.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1445898903-12082-20-git-send-email-eblake@redhat.com>
[Commit message tweaked slightly]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-02 08:30:28 +01:00
Eric Blake
130257dc44 char: Convert to new qapi union layout
We have two issues with our qapi union layout:
1) Even though the QMP wire format spells the tag 'type', the
C code spells it 'kind', requiring some hacks in the generator.
2) The C struct uses an anonymous union, which places all tag
values in the same namespace as all non-variant members. This
leads to spurious collisions if a tag value matches a non-variant
member's name.

Make the conversion to the new layout for character-related
code.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1445898903-12082-19-git-send-email-eblake@redhat.com>
[Commit message tweaked slightly]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-02 08:30:27 +01:00
Eric Blake
8d0bcba837 net: Convert to new qapi union layout
We have two issues with our qapi union layout:
1) Even though the QMP wire format spells the tag 'type', the
C code spells it 'kind', requiring some hacks in the generator.
2) The C struct uses an anonymous union, which places all tag
values in the same namespace as all non-variant members. This
leads to spurious collisions if a tag value matches a non-variant
member's name.

Make the conversion to the new layout for net-related code.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1445898903-12082-18-git-send-email-eblake@redhat.com>
[Commit message tweaked slightly]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-02 08:30:27 +01:00
Eric Blake
2d32addae7 sockets: Convert to new qapi union layout
We have two issues with our qapi union layout:
1) Even though the QMP wire format spells the tag 'type', the
C code spells it 'kind', requiring some hacks in the generator.
2) The C struct uses an anonymous union, which places all tag
values in the same namespace as all non-variant members. This
leads to spurious collisions if a tag value matches a non-variant
member's name.

Make the conversion to the new layout for socket-related code.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1445898903-12082-17-git-send-email-eblake@redhat.com>
[Commit message tweaked slightly]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-02 08:30:27 +01:00
Eric Blake
6a8f9661dc block: Convert to new qapi union layout
We have two issues with our qapi union layout:
1) Even though the QMP wire format spells the tag 'type', the
C code spells it 'kind', requiring some hacks in the generator.
2) The C struct uses an anonymous union, which places all tag
values in the same namespace as all non-variant members. This
leads to spurious collisions if a tag value matches a non-variant
member's name.

Make the conversion to the new layout for block-related code.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1445898903-12082-16-git-send-email-eblake@redhat.com>
[Commit message tweaked slightly]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-02 08:30:27 +01:00
Eric Blake
c363acef77 tests: Convert to new qapi union layout
We have two issues with our qapi union layout:
1) Even though the QMP wire format spells the tag 'type', the
C code spells it 'kind', requiring some hacks in the generator.
2) The C struct uses an anonymous union, which places all tag
values in the same namespace as all non-variant members. This
leads to spurious collisions if a tag value matches a non-variant
member's name.

Make the conversion to the new layout for testsuite code.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1445898903-12082-15-git-send-email-eblake@redhat.com>
[Commit message tweaked slightly]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-02 08:30:27 +01:00
Eric Blake
150d0564a4 qapi-visit: Convert to new qapi union layout
We have two issues with our qapi union layout:
1) Even though the QMP wire format spells the tag 'type', the
C code spells it 'kind', requiring some hacks in the generator.
2) The C struct uses an anonymous union, which places all tag
values in the same namespace as all non-variant members. This
leads to spurious collisions if a tag value matches a non-variant
member's name.

Make the conversion to the new layout for qapi-visit.py.

Generated code changes look like:

|@@ -4912,16 +4912,16 @@ void visit_type_MemoryDeviceInfo(Visitor
|     if (!*obj) {
|         goto out_obj;
|     }
|-    visit_type_MemoryDeviceInfoKind(v, &(*obj)->kind, "type", &err);
|+    visit_type_MemoryDeviceInfoKind(v, &(*obj)->type, "type", &err);
|     if (err) {
|         goto out_obj;
|     }
|-    if (!visit_start_union(v, !!(*obj)->data, &err) || err) {
|+    if (!visit_start_union(v, !!(*obj)->u.data, &err) || err) {
|         goto out_obj;
|     }
|-    switch ((*obj)->kind) {
|+    switch ((*obj)->type) {
|     case MEMORY_DEVICE_INFO_KIND_DIMM:
|-        visit_type_PCDIMMDeviceInfo(v, &(*obj)->dimm, "data", &err);
|+        visit_type_PCDIMMDeviceInfo(v, &(*obj)->u.dimm, "data", &err);
|         break;
|     default:
|         abort();
|@@ -4930,7 +4930,7 @@ out_obj:
|     error_propagate(errp, err);
|     err = NULL;
|     if (*obj) {
|-        visit_end_union(v, !!(*obj)->data, &err);
|+        visit_end_union(v, !!(*obj)->u.data, &err);
|     }
|     error_propagate(errp, err);
|     err = NULL;

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1445898903-12082-14-git-send-email-eblake@redhat.com>
[Commit message tweaked slightly]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-02 08:30:27 +01:00
Eric Blake
f51d8fab44 qapi: Start converting to new qapi union layout
We have two issues with our qapi union layout:
1) Even though the QMP wire format spells the tag 'type', the
C code spells it 'kind', requiring some hacks in the generator.
2) The C struct uses an anonymous union, which places all tag
values in the same namespace as all non-variant members. This
leads to spurious collisions if a tag value matches a non-variant
member's name.

This patch is the front end for a series that converts to a
saner qapi union layout.  By the end of the series, we will no
longer have the type/kind mismatch, and all tag values will be
under a named union, which requires clients to access
'obj->u.value' instead of 'obj->value'.  But since the
conversion touches a number of files, it is easiest if we
temporarily support BOTH layouts simultaneously.

Given a simple union qapi type:

{ 'union':'Foo', 'data': { 'a':'int', 'b':'bool' } }

make the following changes in generated qapi-types.h:

| struct Foo {
|-    FooKind kind;
|-    union { /* union tag is @kind */
|+    union {
|+        FooKind kind;
|+        FooKind type;
|+    };
|+    union { /* union tag is @type */
|         void *data;
|         int64_t a;
|         bool b;
|+        union { /* union tag is @type */
|+            void *data;
|+            int64_t a;
|+            bool b;
|+        } u;
|     };
| };

Flat unions do not need the anonymous union for the tag member,
as we already fixed that to use the member name instead of 'kind'
back in commit 0f61af3e.

One additional change is needed in qapi.py: check_union() now
needs to check for collisions with 'type' in addition to those
with 'kind'.

Later, when the conversions are complete, we will remove the
duplication hacks, and also drop the check_union() restrictions.

Note, however, that we do not rename the generated enum, which
is still 'FooKind'.  A further patch could generate implicit
enums as 'FooType', but while the generator already reserved
the '*Kind' namespace (commit 4dc2e69), there are already QMP
constructs with '*Type' naming, which means changing our
reservation namespace would have lots of churn to C code to
deal with a forced name change.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1445898903-12082-13-git-send-email-eblake@redhat.com>
[Commit message tweaked slightly]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-02 08:30:27 +01:00
Eric Blake
5c5e51a05b qapi-visit: Remove redundant functions for flat union base
The code for visiting the base class of a child struct created
visit_type_Base_fields() which covers all fields of Base; while
the code for visiting the base class of a flat union created
visit_type_Union_fields() covering all fields of the base
except the discriminator.  But since the base class includes
the discriminator of a flat union, we can just visit the entire
base, without needing a separate visit of the discriminator.
Not only is consistently visiting all fields easier to
understand, it lets us share code.

The generated code in qapi-visit.c loses several now-unused
visit_type_UNION_fields(), along with changes like:

|@@ -1654,11 +1557,7 @@ void visit_type_BlockdevOptions(Visitor
|     if (!*obj) {
|         goto out_obj;
|     }
|-    visit_type_BlockdevOptions_fields(v, obj, &err);
|-    if (err) {
|-        goto out_obj;
|-    }
|-    visit_type_BlockdevDriver(v, &(*obj)->driver, "driver", &err);
|+    visit_type_BlockdevOptionsBase_fields(v, (BlockdevOptionsBase **)obj, &err);
|     if (err) {
|         goto out_obj;
|     }

and forward declarations where needed.  Note that the cast of obj
to BASE ** is necessary to call visit_type_BASE_fields() (and we
can't use our upcast wrappers, because those work on pointers while
we have a pointer-to-pointer).

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1445898903-12082-12-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-02 08:30:27 +01:00
Eric Blake
ddf2190896 qapi: Unbox base members
Rather than storing a base class as a pointer to a box, just
store the fields of that base class in the same order, so that
a child struct can be directly cast to its parent.  This gives
less malloc overhead, less pointer dereferencing, and even less
generated code.  Compare to the earlier commit 1e6c1616a "qapi:
Generate a nicer struct for flat unions" (although that patch
had fewer places to change, as less of qemu was directly using
qapi structs for flat unions).  It also allows us to turn on
automatic type-safe wrappers for upcasting to the base class
of a struct.

Changes to the generated code look like this in qapi-types.h:

| struct SpiceChannel {
|-    SpiceBasicInfo *base;
|+    /* Members inherited from SpiceBasicInfo: */
|+    char *host;
|+    char *port;
|+    NetworkAddressFamily family;
|+    /* Own members: */
|     int64_t connection_id;

as well as additional upcast functions like qapi_SpiceChannel_base().
Meanwhile, changes to qapi-visit.c look like:

| static void visit_type_SpiceChannel_fields(Visitor *v, SpiceChannel **obj, Error **errp)
| {
|     Error *err = NULL;
|
|-    visit_type_implicit_SpiceBasicInfo(v, &(*obj)->base, &err);
|+    visit_type_SpiceBasicInfo_fields(v, (SpiceBasicInfo **)obj, &err);
|     if (err) {

(the cast is necessary, since our upcast wrappers only deal with a
single pointer, not pointer-to-pointer); plus the wholesale
elimination of some now-unused visit_type_implicit_FOO() functions.

Without boxing, the corner case of one empty struct having
another empty struct as its base type now requires inserting a
dummy member (previously, the 'Base *base' member sufficed).

And now that we no longer consume a 'base' member in the generated
C struct, we can delete the former negative struct-base-clash-base
test.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1445898903-12082-11-git-send-email-eblake@redhat.com>
[Commit message tweaked slightly]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-02 08:30:26 +01:00
Eric Blake
30594fe1cd qapi: Prefer typesafe upcasts to qapi base classes
A previous patch (commit 1e6c1616) made it possible to
directly cast from a qapi flat union type to its base type.
However, it requires the use of a C cast, which turns off
compiler type-safety checks.  Fortunately, no such casts
exist, just yet.

Regardless, add inline type-safe wrappers named
qapi_FOO_base() for any union type FOO that has a base,
which can be used for a safer upcast, and enhance the
testsuite to cover the new functionality.

A future patch will extend the upcast support to structs,
where such conversions do exist already.

Note that C makes const-correct upcasts annoying because
it lacks overloads; these functions cast away const so that
they can accept user pointers whether const or not, and the
result in turn can be assigned to normal or const pointers.
Alternatively, this could have been done with macros, but
type-safe macros are hairy, and not worthwhile here.

This patch just adds upcasts.  None of our code needed to
downcast from a base qapi class to a child.  Also, in the
case of grandchildren (such as BlockdevOptionsQcow2), the
caller will need to call two functions to get to the inner
base (although it wouldn't be too hard to generate a
qapi_FOO_base_base() if desired).  If a user changes qapi
to alter the base class hierarchy, such as going from
'A -> C' to 'A -> B -> C', it will change the type of
'qapi_C_base()', and the compiler will point out the places
that are affected by the new base.

One alternative was proposed, but was deemed too ugly to use
in practice: the generators could output redundant
information using anonymous types:
| struct Child {
|     union {
|         struct {
|             Type1 parent_member1;
|             Type2 parent_member2;
|         };
|         Parent base;
|     };
| };
With that ugly proposal, for a given qapi type, obj->member
and obj->base.member would refer to the same storage; allowing
convenience in working with members without needing 'base.'
allowing typesafe upcast without needing a C cast by accessing
'&obj->base', and allowing downcasts from the parent back to
the child possible through container_of(obj, Child, base).

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1445898903-12082-10-git-send-email-eblake@redhat.com>
[Commit message tweaked]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-02 08:30:26 +01:00
Eric Blake
f87ab7f9bd qapi-types: Refactor base fields output
Move code from gen_union() into gen_struct_fields() in order for
a later patch to share code when enumerating inherited fields
for struct types.

No change to generated code.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1445898903-12082-9-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-02 08:30:26 +01:00
Eric Blake
d02cf37766 qapi-visit: Split off visit_type_FOO_fields forward decl
We generate a static visit_type_FOO_fields() for every type
FOO.  However, sometimes we need a forward declaration. Split
the code to generate the forward declaration out of
gen_visit_implicit_struct() into a new gen_visit_fields_decl(),
and also prepare for a forward declaration to be emitted
during gen_visit_struct(), so that a future patch can switch
from using visit_type_FOO_implicit() to the simpler
visit_type_FOO_fields() as part of unboxing the base class
of a struct.

No change to generated code.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1445898903-12082-8-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-02 08:30:26 +01:00
Eric Blake
98481bfcd6 vnc: Hoist allocation of VncBasicInfo to callers
A future qapi patch will rework generated structs with a base
class to be unboxed.  In preparation for that, change the code
that allocates then populates an info struct to instead merely
populate the fields of an info field passed in as a parameter
(renaming vnc_basic_info_get* to vnc_init_basic_info*). Add
rudimentary Error handling at the lowest levels for cases
where the old code returned NULL; but rather than plumb Error
all the way through the stack, the callers drop the error and
return NULL as before.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1445898903-12082-7-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-02 08:30:26 +01:00
Eric Blake
9fb081e0b9 qapi: Reserve 'q_*' and 'has_*' member names
c_name() produces names starting with 'q_' when protecting a
dictionary member name that would fail to directly compile, but
in doing so can cause clashes with any member name already
beginning with 'q-' or 'q_'.  Likewise, we create a C name 'has_'
for any optional member that can clash with any member name
beginning with 'has-' or 'has_'.

Technically, rather than blindly reserving the namespace,
we could try to complain about user names only when an actual
collision occurs, or even teach c_name() how to munge names
to avoid collisions.  But it is not trivial, especially when
collisions can occur across multiple types (such as via
inheritance or flat unions).  Besides, no existing .json
files are trying to use these names.  So it's easier to just
outright forbid the potential for collision.  We can always
relax things in the future if a real need arises for QMP to
express member names that have been forbidden here.

'has_' only has to be reserved for struct/union member names,
while 'q_' is reserved everywhere (matching the fact that
only members can be optional, while we use c_name() for munging
both members and entities).  Note that we could relax 'q_'
restrictions on entities independently from member names; for
example, c_name('qmp_' + 'unix') would result in a different
function name than our current 'qmp_' + c_name('unix').

Update and add tests to cover the new error messages.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1445898903-12082-6-git-send-email-eblake@redhat.com>
[Consistently pass protect=False to c_name(); commit message tweaked
slightly]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-02 08:30:26 +01:00
Eric Blake
255960dd37 qapi: Reserve '*List' type names for list types
Type names ending in 'List' can clash with qapi list types in
generated C.  We don't currently use such names. It is easier to
outlaw them now than to worry about how to resolve such a clash
in the future. For precedence, see commit 4dc2e69, which did the
same for names ending in 'Kind' versus implicit enum types for
qapi unions.

Update the testsuite to match.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1445898903-12082-5-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-02 08:30:26 +01:00
Eric Blake
f9e6102b48 qapi: More robust conditions for when labels are needed
We were using regular expressions to see if ret included
any earlier text that emitted a 'goto out;' line, to decide
whether we needed to output an 'out:' label.  But this is
fragile, if the ret text can possibly combine more than one
generated function body, where the first function used a
goto but the second does not.  Change the code to just check
for the known conditions which cause an error check to be
needed.  Besides, it's slightly more efficient to use plain
checks than regular expression searching.

No change to generated code.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1445898903-12082-4-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-02 08:30:26 +01:00
Eric Blake
8712fa5333 qapi: More idiomatic string operations
Rather than slicing the end of a string, we can use python's
endswith().  And rather than creating a set of characters,
we can search for a character within a string.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1445898903-12082-3-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-02 08:30:26 +01:00
Eric Blake
1976708321 tests/qapi-schema: Test for reserved names, empty struct
Add some testsuite coverage to ensure future patches are on
the right track:

Our current C representation of qapi arrays is done by appending
'List' to the element name; but we are not preventing the
creation of an object type with the same name.  Add
reserved-type-list.json to test this.  Then rename
enum-union-clash.json to reserved-type-kind.json to cover the
reservation that we DO detect, and shorten it to match the fact
that the name is reserved even if there is no clash.

We are failing to detect a collision between a dictionary member
and the implicit 'has_*' flag for another optional member. The
easiest fix would be for a future patch to reserve the entire
"has[-_]" namespace for member names (the collision is also
possible for branch names within flat unions, but only as long as
branch names can collide with (non-variant) members; however,
since future patches are about to remove that, it is not worth
testing here). Add reserved-member-has.json to test this.

A similar collision exists between a dictionary member where
c_name() munges what might otherwise be a reserved name to start
with 'q_', and another member explicitly starts with "q[-_]".
Again, the easiest solution for a future patch will be reserving
the entire namespace, but here for commands as well as members.
Add reserved-member-q.json and reserved-command-q.json to test
this; separate tests since arguably our munging of command 'unix'
to 'qmp_q_unix()' could be done without a q_, which is different
than the munging of a member 'unix' to 'foo.q_unix'.

Finally, our testsuite does not have any compilation coverage
of struct inheritance with empty qapi structs.  Update
qapi-schema-test.json to test this.

Note that there is currently no technical reason to forbid type
name patterns from member names, or member name patterns from
types, since the two are not in the same namespace in C and
won't collide; but it's not worth adding positive tests of these
corner cases at this time, especially while there is other churn
pending in patches that rearrange which collisions actually
happen.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1445898903-12082-2-git-send-email-eblake@redhat.com>
[Commit message tweaked slightly]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-02 08:30:25 +01:00
Daniel P. Berrange
2ea1793bd9 qapi-schema: mark InetSocketAddress as mandatory again
Revert the qapi-schema.json change done in:

  commit 0983f5e6af
  Author: Daniel P. Berrange <berrange@redhat.com>
  Date:   Tue Sep 1 14:46:50 2015 +0100

    sockets: allow port to be NULL when listening on IP address

Switching "port" from mandatory to optional causes the QAPI
code generator to add a 'has_port' field to the InetSocketAddress
struct. No code that created InetSocketAddress objects was updated
to set 'has_port = true', which caused the non-NULL port strings
to be silently dropped when copying InetSocketAddress objects.

Reported-by: Knut Omang <knuto@ifi.uio.no>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1445509543-30679-1-git-send-email-berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-11-02 08:30:25 +01:00
Peter Maydell
24f4a0f5c9 Merge remote-tracking branch 'remotes/rth/tags/pull-tile-20151030' into staging
Prefetch in y2 pipe

# gpg: Signature made Fri 30 Oct 2015 20:40:02 GMT using RSA key ID 4DD0279B
# gpg: Good signature from "Richard Henderson <rth7680@gmail.com>"
# gpg:                 aka "Richard Henderson <rth@redhat.com>"
# gpg:                 aka "Richard Henderson <rth@twiddle.net>"

* remotes/rth/tags/pull-tile-20151030:
  target-tilegx: Implement prefetch instructions in pipe y2

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-30 21:59:48 +00:00
Peter Maydell
3a958f559e Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging
# gpg: Signature made Thu 29 Oct 2015 18:09:16 GMT using RSA key ID 81AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"

* remotes/stefanha/tags/block-pull-request:
  block: Consider all child nodes in bdrv_requests_pending()
  target-arm: xlnx-zynqmp: Add sdhci support.
  sdhci: Split sdhci.h for public and internal device usage
  sd.h: Move sd.h to include/hw/sd/
  virtio: sync the dataplane vring state to the virtqueue before virtio_save
  gdb command: qemu handlers
  virtio-blk: switch off scsi-passthrough by default
  ppc/spapr: add 2.4 compat props
  s390x: include HW_COMPAT_* props
  qemu-gdb: add $qemu_coroutine_sp and $qemu_coroutine_pc
  qemu-gdb: extract parts of "qemu coroutine" implementation
  qemu-gdb: allow using glibc_pointer_guard() on core dumps

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-30 19:47:47 +00:00
Peter Maydell
e79ea9e424 Merge remote-tracking branch 'remotes/lalrae/tags/mips-20151030' into staging
MIPS patches 2015-10-30

Changes:
* R6 CPU can be woken up by non-enabled interrupts
* PC fix in KVM
* Coprocessor 0 XContext calculation fix
* various MIPS R6 updates

# gpg: Signature made Fri 30 Oct 2015 14:51:56 GMT using RSA key ID 0B29DA6B
# gpg: Good signature from "Leon Alrae <leon.alrae@imgtec.com>"

* remotes/lalrae/tags/mips-20151030:
  target-mips: fix updating XContext on mmu exception
  target-mips: add SIGRIE instruction
  target-mips: Set Config5.XNP for R6 cores
  target-mips: add PC, XNP reg numbers to RDHWR
  hw/mips_malta: Fix KVM PC initialisation
  target-mips: Add enum for BREAK32
  target-mips: update writing to CP0.Status.KX/SX/UX in MIPS Release R6
  target-mips: implement the CPU wake-up on non-enabled interrupts in R6
  target-mips: move the test for enabled interrupts to a separate function

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-30 16:30:25 +00:00
Yongbok Kim
60270f85cc target-mips: fix updating XContext on mmu exception
Correct updating XContext.Region field on mmu exceptions.
If Config3.CTXTC = 0 then the R field of XContext has to be updated
with the value of bits 63..62 of the virtual address upon a TLB
exception.
Also fixed the below line which overs 80 characters.

Signed-off-by: Yongbok Kim <yongbok.kim@imgtec.com>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Reviewed-by: Leon Alrae <leon.alrae@imgtec.com>
Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
2015-10-30 14:36:19 +00:00
Yongbok Kim
bb238210bb target-mips: add SIGRIE instruction
Add SIGRIE (Signal Reserved Instruction Exception) for both MIPS and
microMIPS.
The instruction allows to use the 16-bit code field for software use.
This instruction is introduced by and required as of Release 6.

Signed-off-by: Yongbok Kim <yongbok.kim@imgtec.com>
Reviewed-by: Leon Alrae <leon.alrae@imgtec.com>
Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
2015-10-30 14:36:19 +00:00
Yongbok Kim
35ac9e342e target-mips: Set Config5.XNP for R6 cores
Set Config5.XNP for R6 cores to indicate the extended LL/SC family
of instructions NOT present.

Signed-off-by: Yongbok Kim <yongbok.kim@imgtec.com>
Reviewed-by: Leon Alrae <leon.alrae@imgtec.com>
Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
2015-10-30 14:36:19 +00:00
Yongbok Kim
b00c72180c target-mips: add PC, XNP reg numbers to RDHWR
Add Performance Counter (4) and XNP (5) register numbers to RDHWR.
Add check_hwrena() to simplify access control checkings.
Add RDHWR support to microMIPS R6.

Signed-off-by: Yongbok Kim <yongbok.kim@imgtec.com>
Reviewed-by: Leon Alrae <leon.alrae@imgtec.com>
Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
2015-10-30 14:35:52 +00:00
James Hogan
ca2f6bbbce hw/mips_malta: Fix KVM PC initialisation
Commit 71c199c81d ("mips_malta: provide ememsize env variable to
kernels") changed the meaning of loaderparams.ram_size to be the whole
of RAM rather than just the low part below where the boot code is placed
for KVM, but it didn't update the PC initialisation for KVM to use
ram_low_size. Fix that now.

Fixes: 71c199c81d ("mips_malta: provide ememsize env variable to kernels")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Leon Alrae <leon.alrae@imgtec.com>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Leon Alrae <leon.alrae@imgtec.com>
Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
2015-10-30 13:30:14 +00:00
Peter Maydell
fdf927621a Merge remote-tracking branch 'remotes/armbru/tags/pull-monitor-2015-10-30' into staging
QMP and QObject patches

# gpg: Signature made Fri 30 Oct 2015 08:06:26 GMT using RSA key ID EB918653
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>"
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>"

* remotes/armbru/tags/pull-monitor-2015-10-30:
  docs: Document QMP event rate limiting
  monitor: Throttle event VSERPORT_CHANGE separately by "id"
  monitor: Turn monitor_qapi_event_state[] into a hash table
  glib: add compatibility interface for g_hash_table_add()
  monitor: Split MonitorQAPIEventConf off MonitorQAPIEventState
  monitor: Switch from timer_new() to timer_new_ns()
  monitor: Simplify event throttling
  monitor: Reduce casting of QAPI event QDict
  qstring: Make conversion from QObject * accept null
  qlist: Make conversion from QObject * accept null
  qfloat qint: Make conversion from QObject * accept null
  qdict: Make conversion from QObject * accept null
  qbool: Make conversion from QObject * accept null
  qobject: Drop QObject_HEAD

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-30 09:41:15 +00:00
Markus Armbruster
7f1e7b23d5 docs: Document QMP event rate limiting
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1444921716-9511-8-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2015-10-30 09:05:38 +01:00
Markus Armbruster
7de0be6573 monitor: Throttle event VSERPORT_CHANGE separately by "id"
VSERPORT_CHANGE is emitted when the guest opens or closes a
virtio-serial port.  The event's member "id" identifies the port.

When several events arrive quickly, throttling drops all but the last
of them.  Because of that, a QMP client must assume that *any* port
may have changed state when it receives a VSERPORT_CHANGE event and
throttling may have happened.

Make the event more useful by throttling it for each port separately.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1444921716-9511-7-git-send-email-armbru@redhat.com>
2015-10-30 09:05:38 +01:00
Markus Armbruster
a24712af54 monitor: Turn monitor_qapi_event_state[] into a hash table
In preparation of finer grained throttling.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1444921716-9511-6-git-send-email-armbru@redhat.com>
2015-10-30 09:05:38 +01:00
Markus Armbruster
8681dffa91 glib: add compatibility interface for g_hash_table_add()
The next commit will use it.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2015-10-30 09:01:03 +01:00
Kevin Wolf
37a639a7fb block: Consider all child nodes in bdrv_requests_pending()
The function manually recursed into bs->file and bs->backing to check
whether there were any requests pending, but it ignored other children.

There's no need to special case file and backing here, so just replace
these two explicit recursions by a loop recursing for all child nodes.

Reported-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1446029211-27148-1-git-send-email-kwolf@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-10-29 17:59:27 +00:00
Sai Pavan Boddu
33108e9f33 target-arm: xlnx-zynqmp: Add sdhci support.
Add two SYSBUS_SDHCI devices for xlnx-zynqmp

Signed-off-by: Sai Pavan Boddu <saipava@xilinx.com>
Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-10-29 17:59:27 +00:00
Sai Pavan Boddu
637d23beb6 sdhci: Split sdhci.h for public and internal device usage
Split sdhci.h into pubilc version (i.e include/hw/sd/sdhci.h) and
internal version (i.e hw/sd/sdhci-interna.h) based on register
declarations and object declaration.

Signed-off-by: Sai Pavan Boddu <saipava@xilinx.com>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-10-29 17:59:27 +00:00
Sai Pavan Boddu
e3382ef0ea sd.h: Move sd.h to include/hw/sd/
Create a sd directory under include/hw/ and move sd.h to
include/hw/sd/

Signed-off-by: Sai Pavan Boddu <saipava@xilinx.com>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-10-29 17:59:27 +00:00
Pavel Butsykin
10a06fd65f virtio: sync the dataplane vring state to the virtqueue before virtio_save
When creating snapshot with the dataplane enabled, the snapshot file gets
not the actual state of virtqueue, because the current state is stored in
VirtIOBlockDataPlane. Therefore, before saving snapshot need to sync
the dataplane vring state to the virtqueue. The dataplane will resume its
work at the next notify virtqueue.

When snapshot loads with loadvm we get a message:
VQ 0 size 0x80 Guest index 0x15f5 inconsistent with Host index 0x0:
    delta 0x15f5
error while loading state for instance 0x0 of device
    '0000:00:08.0/virtio-blk'
Error -1 while loading VM state

to reproduce the error I used the following hmp commands:
savevm snap1
loadvm snap1

qemu parameters:
--enable-kvm -smp 4 -m 1024 -drive file=/var/lib/libvirt/images/centos6.4.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk0,id=virtio-disk0 -set device.virtio-disk0.x-data-plane=on

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Message-id: 1445859777-2982-1-git-send-email-den@openvz.org
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: "Michael S. Tsirkin" <mst@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-10-29 17:59:27 +00:00
Dr. David Alan Gilbert
c900ef86c5 gdb command: qemu handlers
A new gdb commands are added:

  qemu handlers

     That dumps an AioContext list (by default qemu_aio_context)
     possibly including a backtrace for cases it knows about
     (with the verbose option).  Intended to help find why something
     is hanging waiting for IO.

  Use 'qemu handlers --verbose iohandler_ctx'  to find out why
your incoming migration is stuck.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-id: 1445951385-11924-1-git-send-email-dgilbert@redhat.com

V2:
  Merge into one command with optional handlers arg, and only do
    backtrace in verbose mode

 (gdb) qemu handlers
 ----
 {pfd = {fd = 6, events = 25, revents = 0}, io_read = 0x55869656ffd0
 <event_notifier_dummy_cb>, io_write = 0x0, deleted = 0, opaque =
 0x558698c4ce08, node = {le_next = 0x0, le_prev = 0x558698c4cdc0}}

 (gdb) qemu handlers iohandler_ctx
 ----
 {pfd = {fd = 9, events = 25, revents = 0}, io_read = 0x558696581380
 <fd_coroutine_enter>, io_write = 0x0, deleted = 0, opaque =
 0x558698dc99d0, node = {le_next = 0x558698c4cca0, le_prev =
 0x558698c4c1d0}}
 ----
 {pfd = {fd = 4, events = 25, revents = 0}, io_read = 0x55869657b330
 <sigfd_handler>, io_write = 0x0, deleted = 0, opaque = 0x4, node =
 {le_next = 0x558698c4c260, le_prev = 0x558699f72508}}
 ----
 {pfd = {fd = 5, events = 25, revents = 0}, io_read = 0x55869656ffd0
 <event_notifier_dummy_cb>, io_write = 0x0, deleted = 0, opaque =
 0x558698c4c218, node = {le_next = 0x0, le_prev = 0x558698c4ccc8}}
 ----
 (gdb) qemu handlers --verbose iohandler_ctx
 ----
 {pfd = {fd = 9, events = 25, revents = 0}, io_read = 0x558696581380
 <fd_coroutine_enter>, io_write = 0x0, deleted = 0, opaque =
 0x558698dc99d0, node = {le_next = 0x558698c4cca0, le_prev =
 0x558698c4c1d0}}
 #0  0x0000558696581820 in qemu_coroutine_switch
 (from_=from_@entry=0x558698cb3cf0, to_=to_@entry=0x7f421c37eac8,
 action=action@entry=COROUTINE_YIELD) at
 /home/dgilbert/git/qemu/coroutine-ucontext.c:177
 #1  0x0000558696580c00 in qemu_coroutine_yield () at
 /home/dgilbert/git/qemu/qemu-coroutine.c:145
 #2  0x00005586965814f5 in yield_until_fd_readable (fd=9) at
 /home/dgilbert/git/qemu/qemu-coroutine-io.c:90
 #3  0x0000558696523937 in socket_get_buffer (opaque=0x55869a3dc620,
 buf=0x558698c505a0 "", pos=<optimized out>, size=32768) at
 /home/dgilbert/git/qemu/migration/qemu-file-unix.c:101
 #4  0x0000558696521fac in qemu_fill_buffer (f=0x558698c50570) at
 /home/dgilbert/git/qemu/migration/qemu-file.c:227
 #5  0x0000558696522989 in qemu_peek_byte (f=0x558698c50570, offset=0)
     at /home/dgilbert/git/qemu/migration/qemu-file.c:507
 #6  0x0000558696522bf4 in qemu_get_be32 (f=0x558698c50570) at
 /home/dgilbert/git/qemu/migration/qemu-file.c:520
 #7  0x0000558696522bf4 in qemu_get_be32 (f=f@entry=0x558698c50570)
     at /home/dgilbert/git/qemu/migration/qemu-file.c:604
 #8  0x0000558696347e5c in qemu_loadvm_state (f=f@entry=0x558698c50570)
     at /home/dgilbert/git/qemu/migration/savevm.c:1821
 #9  0x000055869651de8c in process_incoming_migration_co
 (opaque=0x558698c50570)
     at /home/dgilbert/git/qemu/migration/migration.c:336
 #10 0x000055869658188a in coroutine_trampoline (i0=<optimized out>,
 i1=<optimized out>)
     at /home/dgilbert/git/qemu/coroutine-ucontext.c:80
 #11 0x00007f420f05df10 in __start_context () at /lib64/libc.so.6
 #12 0x00007ffc40815f50 in  ()
 #13 0x0000000000000000 in  ()

  ----
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-10-29 17:59:27 +00:00
Cornelia Huck
ed65fd1a27 virtio-blk: switch off scsi-passthrough by default
Devices that are compliant with virtio-1 do not support scsi
passthrough any more (and it has not been a recommended setup
anyway for quite some time). To avoid having to switch it off
explicitly in newer qemus that turn on virtio-1 by default, let's
switch the default to scsi=false for 2.5.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Message-id: 1444991154-79217-4-git-send-email-cornelia.huck@de.ibm.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-10-29 17:59:27 +00:00
Cornelia Huck
80fd50f96b ppc/spapr: add 2.4 compat props
HW_COMPAT_2_4 will become non-empty: prepare for it.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Message-id: 1444991154-79217-3-git-send-email-cornelia.huck@de.ibm.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-10-29 17:59:26 +00:00
Cornelia Huck
54d8ec84fa s390x: include HW_COMPAT_* props
We want to inherit generic hw compat as well.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Message-id: 1444991154-79217-2-git-send-email-cornelia.huck@de.ibm.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-10-29 17:59:26 +00:00
Paolo Bonzini
a201b0ff28 qemu-gdb: add $qemu_coroutine_sp and $qemu_coroutine_pc
These can be useful to manually get a stack trace of a coroutine inside
a core dump.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1444636974-19950-4-git-send-email-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-10-29 17:59:26 +00:00
Paolo Bonzini
80ab31b257 qemu-gdb: extract parts of "qemu coroutine" implementation
Provide useful Python functions to reach and decipher a jmpbuf.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1444636974-19950-3-git-send-email-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-10-29 17:59:26 +00:00
Paolo Bonzini
1138f24645 qemu-gdb: allow using glibc_pointer_guard() on core dumps
get_fs_base() cannot be run on a core dump, because it uses the arch_prctl
system call.  The fs base is the value that is returned by pthread_self(),
and it would be nice to just glean it from the "info threads" output:

* 1    Thread 0x7f16a3fff700 (LWP 33642) pthread_cond_wait@@GLIBC_2.3.2 ()
              ^^^^^^^^^^^^^^

but unfortunately the gdb API does not provide that.  Instead, we can
look for the "arg" argument of the start_thread function if glibc debug
information are available.  If not, fall back to the old mechanism.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1444636974-19950-2-git-send-email-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-10-29 17:59:26 +00:00
Yongbok Kim
dbd8af9824 target-mips: Add enum for BREAK32
Add enum for BREAK32

Signed-off-by: Yongbok Kim <yongbok.kim@imgtec.com>
Reviewed-by: Leon Alrae <leon.alrae@imgtec.com>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
2015-10-29 16:16:44 +00:00
Leon Alrae
2dcf7908d9 target-mips: update writing to CP0.Status.KX/SX/UX in MIPS Release R6
Implement the relationship between CP0.Status.KX, SX and UX. It should not
be possible to set UX bit if SX is 0, the same applies for setting SX if
KX is 0.

Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
2015-10-29 16:16:44 +00:00
Leon Alrae
7540a43a1d target-mips: implement the CPU wake-up on non-enabled interrupts in R6
In Release 6, the behaviour of WAIT has been modified to make it a
requirement that a processor that has disabled operation as a result of
executing a WAIT will resume operation on arrival of an interrupt even if
interrupts are not enabled.

Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
2015-10-29 16:16:44 +00:00
Leon Alrae
71ca034a0d target-mips: move the test for enabled interrupts to a separate function
Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
2015-10-29 16:16:44 +00:00
Markus Armbruster
b9b03ab0d4 monitor: Split MonitorQAPIEventConf off MonitorQAPIEventState
In preparation of turning monitor_qapi_event_state[] into a hash table
for finer grained throttling.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1444921716-9511-5-git-send-email-armbru@redhat.com>
2015-10-29 14:34:45 +01:00
Markus Armbruster
1824c41a62 monitor: Switch from timer_new() to timer_new_ns()
We don't actually care for the scale, so we can just as well use the
simpler interface.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1444921716-9511-4-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2015-10-29 14:34:45 +01:00
Markus Armbruster
93f8f982fe monitor: Simplify event throttling
The event throttling state machine is hard to understand.  I'm not
sure it's entirely correct.  Rewrite it in a more straightforward
manner:

State 1: No event sent recently (less than evconf->rate ns ago)

    Invariant: evstate->timer is not pending, evstate->qdict is null

    On event: send event, arm timer, goto state 2

State 2: Event sent recently, no additional event being delayed

    Invariant: evstate->timer is pending, evstate->qdict is null

    On event: store it in evstate->qdict, goto state 3

    On timer: goto state 1

State 3: Event sent recently, additional event being delayed

    Invariant: evstate->timer is pending, evstate->qdict is non-null

    On event: store it in evstate->qdict, goto state 3

    On timer: send evstate->qdict, clear evstate->qdict,
              arm timer, goto state 2

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1444921716-9511-3-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2015-10-29 14:34:45 +01:00
Markus Armbruster
688b4b7de7 monitor: Reduce casting of QAPI event QDict
Make the variables holding the event QDict instead of QObject.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1444921716-9511-2-git-send-email-armbru@redhat.com>
2015-10-29 14:34:45 +01:00
Markus Armbruster
7f0278435d qstring: Make conversion from QObject * accept null
qobject_to_qstring() crashes on null, which is a trap for the unwary.
Return null instead, and simplify a few callers.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1444918537-18107-7-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
2015-10-29 14:34:45 +01:00
Markus Armbruster
2d6421a900 qlist: Make conversion from QObject * accept null
qobject_to_qlist() crashes on null, which is a trap for the unwary.
Return null instead.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1444918537-18107-6-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
2015-10-29 14:34:45 +01:00
Markus Armbruster
fcf73f66a6 qfloat qint: Make conversion from QObject * accept null
qobject_to_qfloat() and qobject_to_qint() crash on null, which is a
trap for the unwary.  Return null instead, and simplify a few callers.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1444918537-18107-5-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
2015-10-29 14:34:45 +01:00
Markus Armbruster
89cad9f3ec qdict: Make conversion from QObject * accept null
qobject_to_qdict() crashes on null, which is a trap for the unwary.
Return null instead, and simplify a few callers.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1444918537-18107-4-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
2015-10-29 14:34:45 +01:00
Markus Armbruster
14b6160099 qbool: Make conversion from QObject * accept null
qobject_to_qbool() crashes on null, which is a trap for the unwary.
Return null instead, and simplify a few callers.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1444918537-18107-3-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
2015-10-29 14:34:44 +01:00
Markus Armbruster
c7c462123c qobject: Drop QObject_HEAD
QObject_HEAD is a macro expanding into the common part of structs that
are sub-types of QObject.  It's always been just QObject base, and
unlikely to change.  Drop the macro, because the code is clearer with
out it.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1444918537-18107-2-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
2015-10-29 14:34:44 +01:00
Peter Maydell
7bc8e0c967 Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
virtio, pc, memory: fixes+features for 2.5

New features:
    This enables hotplug for multifunction devices.
    Patches are very small, so I think it's OK to merge
    at this stage.

    There's also some new infrastructure for vhost-user testing
    not enabled yet so it's harmless to merge.

I've reverted the "gap between DIMMs" workaround, as it seems too risky, and
applied my own patch in virtio, but not in dataplane code.  This means that
dataplane is broken for some complex DIMM configurations for now.  Waiting for
Stefan to review the dataplane fix.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# gpg: Signature made Thu 29 Oct 2015 09:36:16 GMT using RSA key ID D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>"

* remotes/mst/tags/for_upstream:
  enable multi-function hot-add
  remove function during multi-function hot-add
  tests/vhost-user-bridge: add vhost-user bridge application
  Revert "memhp: extend address auto assignment to support gaps"
  Revert "pc: memhp: force gaps between DIMM's GPA"
  virtio: drop virtqueue_map_sg
  virtio-scsi: convert to virtqueue_map
  virtio-serial: convert to virtio_map
  virtio-blk: convert to virtqueue_map
  virtio: switch to virtio_map
  virtio: introduce virtio_map
  mmap-alloc: fix error handling
  pc: memhp: do not emit inserting event for coldplugged DIMMs
  vhost-user-test: fix up rhel6 build
  vhost-user: cleanup msg size math
  vhost-user: cleanup struct size math

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-29 09:49:52 +00:00
Cao jin
3f1e1478db enable multi-function hot-add
Enable PCIe device multi-function hot-add, just ensure function 0 is added
last, then driver will get the notification to scan the slot.

Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-10-29 11:17:53 +02:00
Cao jin
0d1c7d88ad remove function during multi-function hot-add
In case user want to cancel the hot-add operation, should roll back,
device_del the added function that still don`t work.

Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-10-29 11:17:52 +02:00
Victor Kaplansky
3595e2eb0a tests/vhost-user-bridge: add vhost-user bridge application
The test existing in QEMU for vhost-user feature is good for
testing the management protocol, but does not allow actual
traffic. This patch proposes Vhost-User Bridge application, which
can serve the QEMU community as a comprehensive test by running
real internet traffic by means of vhost-user interface.

Essentially the Vhost-User Bridge is a very basic vhost-user
backend for QEMU. It runs as a standalone user-level process.
For packet processing Vhost-User Bridge uses an additional QEMU
instance with a backend configured by "-net socket" as a shared
VLAN.  This way another QEMU virtual machine can effectively
serve as a shared bus by means of UDP communication.

For a more simple setup, the another QEMU instance running the
SLiRP backend can be the same QEMU instance running vhost-user
client.

This Vhost-User Bridge implementation is very preliminary.  It is
missing many features. I has been studying vhost-user protocol
internals, so I've written vhost-user-bridge bit by bit as I
progressed through the protocol.  Most probably its internal
architecture will change significantly.

To run Vhost-User Bridge application:

1. Build vhost-user-bridge with a regular procedure. This will
create a vhost-user-bridge executable under tests directory:

    $ configure; make tests/vhost-user-bridge

2. Ensure the machine has hugepages enabled in kernel with
command line like:

    default_hugepagesz=2M hugepagesz=2M hugepages=2048

3. Run Vhost-User Bridge with:

    $ tests/vhost-user-bridge

The above will run vhost-user server listening for connections
on UNIX domain socket /tmp/vubr.sock, and will try to connect
by UDP to VLAN bridge to localhost:5555, while listening on
localhost:4444

Run qemu with a virtio-net backed by vhost-user:

    $ qemu \
        -enable-kvm -m 512 -smp 2 \
        -object memory-backend-file,id=mem,size=512M,mem-path=/dev/hugepages,share=on \
        -numa node,memdev=mem -mem-prealloc \
        -chardev socket,id=char0,path=/tmp/vubr.sock \
        -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \
        -device virtio-net-pci,netdev=mynet1 \
        -net none \
        -net socket,vlan=0,udp=localhost:4444,localaddr=localhost:5555 \
        -net user,vlan=0 \
        disk.img

vhost-user-bridge was tested very lightly: it's able to bringup a
linux on client VM with the virtio-net driver, and execute transmits
and receives to the internet. I tested with "wget redhat.com",
"dig redhat.com".

PS. I've consulted DPDK's code for vhost-user during Vhost-User
Bridge implementation.

Signed-off-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-10-29 11:11:07 +02:00
Michael S. Tsirkin
d6a9b0b89d Revert "memhp: extend address auto assignment to support gaps"
This reverts commit df0acded19.

There's no point to it now that the only user has been reverted.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-10-29 11:11:07 +02:00
Michael S. Tsirkin
340065e5a1 Revert "pc: memhp: force gaps between DIMM's GPA"
This reverts commit aa8580cddf.

As described in
http://article.gmane.org/gmane.comp.emulators.qemu/371432
that commit causes linux guests to crash on memory hot-unplug.

The original problem it's trying to solve has now
been addressed within virtio.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-10-29 11:11:07 +02:00
Michael S. Tsirkin
3945ecf1ec virtio: drop virtqueue_map_sg
Deprecated in favor of virtqueue_map.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
2015-10-29 11:05:24 +02:00
Michael S. Tsirkin
4ada533189 virtio-scsi: convert to virtqueue_map
Note: virtqueue_map already validates input
so virtio-scsi does not have to.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
2015-10-29 11:05:24 +02:00
Michael S. Tsirkin
bff712dc22 virtio-serial: convert to virtio_map
This also fixes a minor bug:
-                virtqueue_map_sg(port->elem.out_sg, port->elem.out_addr,
-                                 port->elem.out_num, 1);
is wrong: out_sg is not written so should not be marked dirty.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
2015-10-29 11:05:24 +02:00
Michael S. Tsirkin
3d8db153b4 virtio-blk: convert to virtqueue_map
Drop deprecated use of virtqueue_map_sg.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
2015-10-29 11:05:24 +02:00
Michael S. Tsirkin
13972ac5e2 virtio: switch to virtio_map
Drop use of the deprecated virtio_map_sg in virtio core.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
2015-10-29 11:05:24 +02:00
Michael S. Tsirkin
8059feee00 virtio: introduce virtio_map
virtio_map_sg currently fails if one of the entries it's mapping is
contigious in GPA but not HVA address space.  Introduce virtio_map which
handles this by splitting sg entries.

This new API generally turns out to be a good idea since it's harder to
misuse: at least in one case the existing one was used incorrectly.

This will still fail if there's no space left in the sg, but luckily max
queue size in use is currently 256, while max sg size is 1024, so we
should be OK even is all entries happen to cross a single DIMM boundary.

Won't work well with very small DIMM sizes, unfortunately:
e.g. this will fail with 4K DIMMs where a single
request might span a large number of DIMMs.

Let's hope these are uncommon - at least we are not breaking things.

Note: virtio-scsi calls virtio_map_sg on data loaded from network, and
validates input, asserting on failure.  Copy the validating code here -
it will be dropped from virtio-scsi in a follow-up patch.

Reported-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
2015-10-29 11:05:24 +02:00
Michael S. Tsirkin
9d4ec9370a mmap-alloc: fix error handling
Existing callers are checking for MAP_FAILED,
so we should return that on error.

Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-10-29 11:05:24 +02:00
Igor Mammedov
4828b10bda pc: memhp: do not emit inserting event for coldplugged DIMMs
currently acpi_memory_plug_cb() sets is_inserting for
cold- and hot-plugged DIMMs as result ASL MHPD.MSCN()
method issues device check even for every coldplugged
DIMM. There isn't much harm in it but if we try to
unplug such DIMM, OSPM will issue device check
intstead of device eject event. So OSPM won't eject
memory module as expected and it will try to eject it
only when another memory device is hot-(un)plugged.

As a fix do not set 'is_inserting' event and do not
issue SCI for cold-plugged DIMMs as they are
enumerated and activated by OSPM during guest's boot.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-10-29 11:05:24 +02:00
Michael S. Tsirkin
12ebf69083 vhost-user-test: fix up rhel6 build
Build on RHEL6 fails:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42875

Apparently unnamed unions couldn't use C99  named field initializers.
Let's just name the payload union field.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-10-29 11:05:24 +02:00
Michael S. Tsirkin
7fc0246c07 vhost-user: cleanup msg size math
We are sending msg fields, use sizeof on these
and not on local variables which happen to
have a matching type.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-10-29 11:05:24 +02:00
Michael S. Tsirkin
86abad0fed vhost-user: cleanup struct size math
We are using local msg structures everywhere, use them
for sizeof as well.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-10-29 11:05:24 +02:00
Peter Maydell
331c5e2091 Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20151028' into staging
Breakpoint fixes

# gpg: Signature made Wed 28 Oct 2015 17:58:52 GMT using RSA key ID 4DD0279B
# gpg: Good signature from "Richard Henderson <rth7680@gmail.com>"
# gpg:                 aka "Richard Henderson <rth@redhat.com>"
# gpg:                 aka "Richard Henderson <rth@twiddle.net>"

* remotes/rth/tags/pull-tcg-20151028:
  target-*: Advance pc after recognizing a breakpoint

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-28 20:10:22 +00:00
Richard Henderson
522a0d4e3c target-*: Advance pc after recognizing a breakpoint
Some targets already had this within their logic, but make sure
it's present for all targets.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-10-28 10:57:16 -07:00
Peter Maydell
496fedddce Merge remote-tracking branch 'remotes/ehabkost/tags/x86-pull-request' into staging
target-i386: finally enable "check" mode by default

# gpg: Signature made Wed 28 Oct 2015 14:13:10 GMT using RSA key ID 984DC5A6
# gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>"

* remotes/ehabkost/tags/x86-pull-request:
  target-i386: Enable "check" mode by default
  target-i386: Don't left shift negative constant

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-28 15:08:36 +00:00
Peter Maydell
739680da59 Merge remote-tracking branch 'remotes/mcayland/tags/qemu-openbios-signed' into staging
Update OpenBIOS images

# gpg: Signature made Wed 28 Oct 2015 00:02:46 GMT using RSA key ID AE0F321F
# gpg: Good signature from "Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>"

* remotes/mcayland/tags/qemu-openbios-signed:
  Update OpenBIOS images

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-28 14:02:27 +00:00
Mark Cave-Ayland
637016c260 Update OpenBIOS images
Update OpenBIOS images to SVN r1353 built from submodule.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
2015-10-28 00:01:28 +00:00
Eduardo Habkost
15e4134590 target-i386: Enable "check" mode by default
Current default behavior of QEMU is to silently disable features that
are not supported by the host when a CPU model is requested in the
command-line. This means that in addition to risking breaking guest ABI
by default, we are silent about it.

I would like to enable "enforce" by default, but this can easily break
existing production systems because of the way libvirt makes assumptions
about CPU models today (this will change in the future, once QEMU
provide a proper interface for checking if a CPU model is runnable).

But there's no reason we should be silent about it. So, change
target-i386 to enable "check" mode by default so at least we have some
warning printed to stderr (and hopefully logged somewhere) when QEMU
disables a feature that is not supported by the host system.

Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-10-27 16:12:15 -02:00
Eduardo Habkost
712b4243c7 target-i386: Don't left shift negative constant
Left shift of negative values is undefined behavior. Detected by clang:
  qemu/target-i386/translate.c:2423:26: runtime error:
    left shift of negative value -8

This changes the code to reverse the sign after the left shift.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-10-27 15:52:11 -02:00
Peter Maydell
c012e1b7ad Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20151027-1' into staging
target-arm queue:
 * more EL2 preparation: handling for stage 2 translations
 * standardize debug macros in i.MX devices
 * improve error message in a corner case for virt board
 * disable live migration of KVM GIC if the kernel can't handle it
 * add SPSR_(ABT|UND|IRQ|FIQ) registers
 * handle non-executable page-straddling Thumb instructions
 * fix a "no 64-bit EL2" assumption in arm_excp_unmasked()

# gpg: Signature made Tue 27 Oct 2015 16:03:31 GMT using RSA key ID 14360CDE
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>"
# gpg:                 aka "Peter Maydell <pmaydell@gmail.com>"
# gpg:                 aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>"

* remotes/pmaydell/tags/pull-target-arm-20151027-1: (27 commits)
  target-arm: Add support for S1 + S2 MMU translations
  target-arm: Route S2 MMU faults to EL2
  target-arm: Add S2 translation to 32bit S1 PTWs
  target-arm: Add S2 translation to 64bit S1 PTWs
  target-arm: Add ARMMMUFaultInfo
  target-arm: Avoid inline for get_phys_addr
  target-arm: Add support for S2 page-table protection bits
  target-arm: Add computation of starting level for S2 PTW
  target-arm: lpae: Rename granule_sz to stride
  target-arm: lpae: Replace tsz with computed inputsize
  target-arm: Add support for AArch32 S2 negative t0sz
  target-arm: lpae: Move declaration of t0sz and t1sz
  target-arm: lpae: Make t0sz and t1sz signed integers
  target-arm: Add HPFAR_EL2
  i.MX: Standardize i.MX GPT debug
  i.MX: Standardize i.MX EPIT debug
  i.MX: Standardize i.MX FEC debug
  i.MX: Standardize i.MX CCM debug
  i.MX: Standardize i.MX AVIC debug
  i.MX: Standardize i.MX I2C debug
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-27 16:17:55 +00:00
Edgar E. Iglesias
9b539263fa target-arm: Add support for S1 + S2 MMU translations
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Message-id: 1445864527-14520-15-git-send-email-edgar.iglesias@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-27 15:59:47 +00:00
Edgar E. Iglesias
d759a457a1 target-arm: Route S2 MMU faults to EL2
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Message-id: 1445864527-14520-14-git-send-email-edgar.iglesias@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-27 15:59:47 +00:00
Edgar E. Iglesias
a614e69854 target-arm: Add S2 translation to 32bit S1 PTWs
Add support for applying S2 translation to 32bit S1
page-table walks.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Message-id: 1445864527-14520-13-git-send-email-edgar.iglesias@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-27 15:59:47 +00:00
Edgar E. Iglesias
3778597762 target-arm: Add S2 translation to 64bit S1 PTWs
Add support for applying S2 translation to 64bit S1
page-table walks.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Message-id: 1445864527-14520-12-git-send-email-edgar.iglesias@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-27 15:59:47 +00:00
Edgar E. Iglesias
e14b5a23d8 target-arm: Add ARMMMUFaultInfo
Introduce ARMMMUFaultInfo to propagate MMU Fault information
across the MMU translation code path. This is in preparation for
adding Stage-2 translation.

No functional changes.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Message-id: 1445864527-14520-11-git-send-email-edgar.iglesias@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-27 15:59:47 +00:00
Edgar E. Iglesias
af51f566ec target-arm: Avoid inline for get_phys_addr
Avoid inline for get_phys_addr() to prepare for future recursive use.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Message-id: 1445864527-14520-10-git-send-email-edgar.iglesias@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-27 15:59:47 +00:00
Edgar E. Iglesias
6ab1a5ee1c target-arm: Add support for S2 page-table protection bits
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Message-id: 1445864527-14520-9-git-send-email-edgar.iglesias@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-27 15:59:47 +00:00
Edgar E. Iglesias
1853d5a9dc target-arm: Add computation of starting level for S2 PTW
The starting level for S2 pagetable walks is computed
differently from the S1 starting level. Implement the S2
variant.

Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Message-id: 1445864527-14520-8-git-send-email-edgar.iglesias@gmail.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-27 15:59:47 +00:00
Edgar E. Iglesias
973a543482 target-arm: lpae: Rename granule_sz to stride
Rename granule_sz to stride to better match the reference manuals.

No functional change.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Message-id: 1445864527-14520-7-git-send-email-edgar.iglesias@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-27 15:59:46 +00:00
Edgar E. Iglesias
4ca6a05175 target-arm: lpae: Replace tsz with computed inputsize
Remove the tsz variable and introduce inputsize.
This simplifies the code a little and makes it easier to
compare with the reference manuals.

No functional change.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Message-id: 1445864527-14520-6-git-send-email-edgar.iglesias@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-27 15:59:46 +00:00
Edgar E. Iglesias
4ee3809801 target-arm: Add support for AArch32 S2 negative t0sz
Add support for AArch32 S2 negative t0sz. In preparation for
using 40bit IPAs on AArch32.

Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Message-id: 1445864527-14520-5-git-send-email-edgar.iglesias@gmail.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-27 15:59:46 +00:00
Edgar E. Iglesias
1f4c8c18a5 target-arm: lpae: Move declaration of t0sz and t1sz
Move declaration of t0sz and t1sz to the top of the function
avoiding a mix of code and variable declarations.

No functional change.

Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Message-id: 1445864527-14520-4-git-send-email-edgar.iglesias@gmail.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-27 15:59:46 +00:00
Edgar E. Iglesias
5c31a10d16 target-arm: lpae: Make t0sz and t1sz signed integers
Make t0sz and t1sz signed integers to match tsz and to make
it easier to implement support for AArch32 negative t0sz.
t1sz is changed for consistensy.

No functional change.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Message-id: 1445864527-14520-3-git-send-email-edgar.iglesias@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-27 15:59:46 +00:00
Edgar E. Iglesias
59e0553073 target-arm: Add HPFAR_EL2
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Message-id: 1445864527-14520-2-git-send-email-edgar.iglesias@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-27 15:59:46 +00:00
Jean-Christophe Dubois
054535262f i.MX: Standardize i.MX GPT debug
The goal is to have debug code always compiled during build.

We standardize all debug output on the following format:

[QOM_TYPE_NAME]reporting_function: debug message

We also replace IPRINTF with qemu_log_mask(). The qemu_log_mask() output
is following the same format as the above debug.

Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Message-id: b7ce7e98a051479453744aded122789531d80a44.1445781957.git.jcd@tribudubois.net
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-27 15:59:46 +00:00
Jean-Christophe Dubois
4929f6563c i.MX: Standardize i.MX EPIT debug
The goal is to have debug code always compiled during build.

We standardize all debug output on the following format:

[QOM_TYPE_NAME]reporting_function: debug message

We also replace IPRINTF with qemu_log_mask(). The qemu_log_mask() output
is following the same format as the above debug.

Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Message-id: 5bbad71517ca728d8865f7b9f998baa0df022794.1445781957.git.jcd@tribudubois.net
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-27 15:59:46 +00:00
Jean-Christophe Dubois
b72d8d257c i.MX: Standardize i.MX FEC debug
The goal is to have debug code always compiled during build.

We standardize all debug output on the following format:

[QOM_TYPE_NAME]reporting_function: debug message

The qemu_log_mask() output is following the same format as the
above debug.

Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Message-id: 57e565982db94fb433c32dfa17608888464d21de.1445781957.git.jcd@tribudubois.net
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-27 15:59:46 +00:00
Jean-Christophe Dubois
4a6aa0af85 i.MX: Standardize i.MX CCM debug
The goal is to have debug code always compiled during build.

We standardize all debug output on the following format:

[QOM_TYPE_NAME]reporting_function: debug message

The qemu_log_mask() output is following the same format as the
above debug.

Adding some missing qemu_log_mask call for bad registers.

Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Message-id: 293e08f31cbb4df84d58f693243e61e770c73b3a.1445781957.git.jcd@tribudubois.net
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-27 15:59:46 +00:00
Jean-Christophe Dubois
f50ed7853a i.MX: Standardize i.MX AVIC debug
The goal is to have debug code always compiled during build.

We standardize all debug output on the following format:

[QOM_TYPE_NAME]reporting_function: debug message

We also replace IPRINTF with qemu_log_mask(). The qemu_log_mask() output
is following the same format as the above debug.

Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Message-id: 29885ffea2577eaf2288c1d17fd87ee951748b49.1445781957.git.jcd@tribudubois.net
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-27 15:59:46 +00:00
Jean-Christophe Dubois
3afcbb01bc i.MX: Standardize i.MX I2C debug
The goal is to have debug code always compiled during build.

We standardize all debug output on the following format:

[QOM_TYPE_NAME]reporting_function: debug message

The qemu_log_mask() output is following the same format as
the above debug.

Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Message-id: 328acfe6fc09a5afdbfbfd5220e0869fd5082660.1445781957.git.jcd@tribudubois.net
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-27 15:59:46 +00:00
Jean-Christophe Dubois
5641112574 i.MX: Standardize i.MX GPIO debug
The goal is to have debug code always compiled during build.

We standardize all debug output on the following format:

[QOM_TYPE_NAME]reporting_function: debug message

The qemu_log_mask() outputis following the same format as
the above debug.

Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Message-id: 4f2007adcf0f579864bb4dd8a825824e0e9098b8.1445781957.git.jcd@tribudubois.net
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-27 15:59:46 +00:00
Jean-Christophe Dubois
8ccce77c04 i.MX: Standardize i.MX serial debug.
The goal is to have debug code always compiled during build.

We standardize all debug output on the following format:

[QOM_TYPE_NAME]reporting_function: debug message

We also replace IPRINTF with qemu_log_mask(). The qemu_log_mask() output
is following the same format as the above debug.

Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Message-id: 47b8759b251d356c633faf7ea34f897f340aea4e.1445781957.git.jcd@tribudubois.net
[PMM: Drop attempt to print the ram_addr of a memory region in
 one DPRINTF, which (a) was using the wrong format string so
 didn't build on 32-bit and (b) was incorrectly looking at a
 private field of a MemoryRegion struct]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-27 13:16:21 +00:00
Andrew Jones
4b280b726a hw/arm/virt: don't use a15memmap directly
We should always go through VirtBoardInfo when we need the memmap.
To avoid using a15memmap directly, in this case, we need to defer
the max-cpus check from class init time to instance init time. In
class init we now use MAX_CPUMASK_BITS for max_cpus initialization,
which is the maximum QEMU supports, and also, incidentally, the
maximum KVM/gicv3 currently supports. Also, a nice side-effect of
delaying the max-cpus check is that we now get more appropriate
error messages for gicv2 machines that try to configure more than
123 cpus. Before this patch it would complain that the requested
number of cpus was greater than 123, but for gicv2 configs, it
should complain that the number is greater than 8.

Signed-off-by: Andrew Jones <drjones@redhat.com>
Message-id: 1445189728-860-3-git-send-email-drjones@redhat.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-27 12:00:50 +00:00
Pavel Fedin
24182fbc19 arm_gic_kvm: Disable live migration if not supported
Currently, if the kernel does not have live migration API, the migration
will still be attempted, but vGIC save/restore functions will just not do
anything. This will result in a broken machine state.

This patch fixes the problem by adding migration blocker if kernel API is
not supported.

Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-27 12:00:50 +00:00
Soren Brinkmann
b876452507 target-arm: Add support for SPSR_(ABT|UND|IRQ|FIQ)
Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-27 12:00:50 +00:00
Peter Maydell
541ebcd401 target-arm/translate.c: Handle non-executable page-straddling Thumb insns
When the memory we're trying to translate code from is not executable we have
to turn this into a guest fault. In order to report the correct PC for this
fault, and to make sure it is not reported until after any other possible
faults for instructions earlier in execution, we must terminate TBs at
the end of a page, in case the next instruction is in a non-executable page.
This is simple for T16, A32 and A64 instructions, which are always aligned
to their size. However T32 instructions may be 32-bits but only 16-aligned,
so they can straddle a page boundary.

Correct the condition that checks whether the next instruction will touch
the following page, to ensure that if we're 2 bytes before the boundary
and this insn is T32 then we end the TB.

Reported-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Reviewed-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-27 12:00:50 +00:00
Peter Maydell
7cd6de3bb1 target-arm: Fix "no 64-bit EL2" assumption in arm_excp_unmasked()
The code in arm_excp_unmasked() suppresses the ability of PSTATE.AIF
to mask exceptions from a lower EL targeting EL2 or EL3 if the
CPU is 64-bit. This is correct for a target of EL3, but not correct
for targeting EL2. Further, we go to some effort to calculate
scr and hcr values which are not used at all for the 64-bit CPU
case.

Rearrange the code to correctly implement the 64-bit CPU logic
and keep the hcr/scr calculations in the 32-bit CPU codepath.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1444327729-4120-1-git-send-email-peter.maydell@linaro.org
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Tested-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
2015-10-27 12:00:50 +00:00
Peter Maydell
7e038b94e7 Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging
# gpg: Signature made Tue 27 Oct 2015 05:47:28 GMT using RSA key ID 398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F  3562 EF04 965B 398D 6211

* remotes/jasowang/tags/net-pull-request:
  net: free the string returned by object_get_canonical_path_component
  net: make iov_to_buf take right size argument in nc_sendv_compat()
  net: Remove duplicate data from query-rx-filter on multiqueue net devices
  vmxnet3: Do not fill stats if device is inactive
  options: Add documentation for filter-dump
  net/dump: Provide the dumping facility as a net-filter
  net/dump: Separate the NetClientState from the DumpState
  net/dump: Rework net-dump init functions
  net/dump: Add support for receive_iov function
  net: cadence_gem: Set initial MAC address

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-27 10:10:46 +00:00
Yang Hongyang
a3e8a3f382 net: free the string returned by object_get_canonical_path_component
The value returned from object_get_canonical_path_component
must be freed.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2015-10-27 10:30:42 +08:00
Yang Hongyang
edc981443d net: make iov_to_buf take right size argument in nc_sendv_compat()
We want "buf, sizeof(buf)" here.  sizeof(buffer) is the size of a
pointer, which is wrong.
Thanks to Paolo for pointing it out.

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2015-10-27 10:30:40 +08:00
Vladislav Yasevich
5320c2caf4 net: Remove duplicate data from query-rx-filter on multiqueue net devices
When responding to a query-rx-filter command on a multiqueue
netdev, qemu reports the data for each queue.  The data, however,
is not per-queue, but per device and the same data is reported
multiple times.  This causes confusion and may also cause extra
unnecessary processing when looking at the data.

Commit 638fb14169 (net: Make qmp_query_rx_filter() with name argument
more obvious) partially addresses this issue, by limiting the output
when the name is specified.  However, when the name is not specified,
the issue still persists.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2015-10-27 10:30:39 +08:00
Shmulik Ladkani
eedeeeffd4 vmxnet3: Do not fill stats if device is inactive
Guest OS may issue VMXNET3_CMD_GET_STATS even before device was
activated (for example in linux, after insmod but prior net-dev open).

Accessing shared descriptors prior device activation is illegal as the
VMXNET3State structures have not been fully initialized.

As a result, guest memory gets corrupted and may lead to guest OS
crashes.

Fix, by not filling the stats descriptors if device is inactive.

Reported-by: Leonid Shatz <leonid.shatz@ravellosystems.com>
Acked-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Dana Rubin <dana.rubin@ravellosystems.com>
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@ravellosystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2015-10-27 10:30:38 +08:00
Thomas Huth
d3e0c032f5 options: Add documentation for filter-dump
Add a short description for the filter-dump command line options.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2015-10-27 10:30:36 +08:00
Thomas Huth
9d3e12e881 net/dump: Provide the dumping facility as a net-filter
Use the net-filter infrastructure to provide the dumping
functions for netdev devices, too.

Reviewed-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2015-10-27 10:30:35 +08:00
Thomas Huth
75310e3486 net/dump: Separate the NetClientState from the DumpState
With the upcoming dumping-via-netfilter patch, the DumpState
should not be related to NetClientState anymore, so move the
related information to a new struct called DumpNetClient.

Reviewed-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2015-10-27 10:30:34 +08:00
Thomas Huth
7bc3074c27 net/dump: Rework net-dump init functions
Move the creation of the dump client from net_dump_init() into
net_init_dump(), so we can later use the former function for
dump via netfilter, too. Also rename net_dump_init() to
net_dump_state_init() to make it easier distinguishable from
net_init_dump().

Reviewed-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2015-10-27 10:30:32 +08:00
Thomas Huth
43192fcc1a net/dump: Add support for receive_iov function
Adding a proper receive_iov function to the net dump module.
This will make it easier to support the dump filter feature for
the -netdev option in later patches.

Reviewed-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2015-10-27 10:30:31 +08:00
Sebastian Huber
afb4c51fad net: cadence_gem: Set initial MAC address
Set initial MAC address to the one specified by the command line.

Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2015-10-27 10:30:30 +08:00
Peter Maydell
9666248a85 Merge remote-tracking branch 'remotes/sstabellini/tags/xen-2015-10-26' into staging
Xen 2015-10-26

# gpg: Signature made Mon 26 Oct 2015 11:32:50 GMT using RSA key ID 70E1AE90
# gpg: Good signature from "Stefano Stabellini <stefano.stabellini@eu.citrix.com>"

* remotes/sstabellini/tags/xen-2015-10-26:
  xen-platform: Replace assert() with appropriate error reporting
  xen_platform: switch to realize
  Qemu/Xen: Fix early freeing MSIX MMIO memory region

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-26 13:13:38 +00:00
Eduardo Habkost
b1ecd51bdb xen-platform: Replace assert() with appropriate error reporting
Commit dbb7405d8c made it possible to
trigger an assert using "-device xen-platform". Replace it with
appropriate error reporting.

Before:

  $ qemu-system-x86_64 -device xen-platform
  qemu-system-x86_64: hw/i386/xen/xen_platform.c:391: xen_platform_initfn: Assertion `xen_enabled()' failed.
  Aborted (core dumped)
  $

After:

  $ qemu-system-x86_64 -device xen-platform
  qemu-system-x86_64: -device xen-platform: xen-platform device requires the Xen accelerator
  $

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2015-10-26 11:32:24 +00:00
Stefano Stabellini
4098d49db5 xen_platform: switch to realize
Use realize to initialize the xen_platform device

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-10-26 11:32:24 +00:00
Peter Maydell
251d7e6014 Merge remote-tracking branch 'remotes/elmarco/tags/ivshmem-pull-request' into staging
ivshmem series

# gpg: Signature made Mon 26 Oct 2015 09:27:46 GMT using RSA key ID 75969CE5
# gpg: Good signature from "Marc-André Lureau <marcandre.lureau@redhat.com>"
# gpg:                 aka "Marc-André Lureau <marcandre.lureau@gmail.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 87A9 BD93 3F87 C606 D276  F62D DAE8 E109 7596 9CE5

* remotes/elmarco/tags/ivshmem-pull-request: (51 commits)
  doc: document ivshmem & hugepages
  ivshmem: use little-endian int64_t for the protocol
  ivshmem: use kvm irqfd for msi notifications
  ivshmem: rename MSI eventfd_table
  ivshmem: remove EventfdEntry.vector
  ivshmem: add hostmem backend
  ivshmem: use qemu_strtosz()
  ivshmem: do not keep shm_fd open
  tests: add ivshmem qtest
  qtest: add qtest_add_abrt_handler()
  msix: implement pba write (but read-only)
  contrib: remove unnecessary strdup()
  ivshmem: add check on protocol version in QEMU
  docs: update ivshmem device spec
  ivshmem-server: fix hugetlbfs support
  ivshmem-server: use a uint16 for client ID
  ivshmem-client: check the number of vectors
  contrib: add ivshmem client and server
  util: const event_notifier_get_fd() argument
  ivshmem: reset mask on device reset
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-26 11:32:20 +00:00
Lan Tianyu
4e494de668 Qemu/Xen: Fix early freeing MSIX MMIO memory region
msix->mmio is added to XenPCIPassthroughState's object as property.
object_finalize_child_property is called for XenPCIPassthroughState's
object, which calls object_property_del_all, which is going to try to
delete msix->mmio. object_finalize_child_property() will access
msix->mmio's obj. But the whole msix struct has already been freed
by xen_pt_msix_delete. This will cause segment fault when msix->mmio
has been overwritten.

This patch is to fix the issue.

Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2015-10-26 11:32:18 +00:00
Marc-André Lureau
7d4f4bdaf7 doc: document ivshmem & hugepages
Document and give some examples of hugepages support with ivshmem device
and server.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2015-10-26 10:19:53 +01:00
Marc-André Lureau
f7a199b2b4 ivshmem: use little-endian int64_t for the protocol
The current ivshmem protocol uses 'long' for integers. But the
sizeof(long) depends on the host and the endianess is not defined, which
may cause portability troubles.

Instead, switch to using little-endian int64_t. This breaks the
protocol, except on x64 little-endian host where this change
should be compatible.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
2015-10-26 10:19:53 +01:00
Marc-André Lureau
660c97eef6 ivshmem: use kvm irqfd for msi notifications
Use irqfd for improving context switch when notifying the guest.
If the host doesn't support kvm irqfd, regular msi notifications are
still supported.

Note: the ivshmem implementation doesn't allow switching between MSI and
IO interrupts, this patch doesn't either.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-26 10:19:53 +01:00
Marc-André Lureau
0f57350e5c ivshmem: rename MSI eventfd_table
The array is used to have vector specific data, so use a more
descriptive name.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
2015-10-26 10:19:53 +01:00
Marc-André Lureau
d160f3f791 ivshmem: remove EventfdEntry.vector
No need to store an extra int for the vector number when it can be
computed easily by looking at the position in the array.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
2015-10-26 10:19:53 +01:00
Marc-André Lureau
d9453c93fe ivshmem: add hostmem backend
Instead of handling allocation, teach ivshmem to use a memory backend.
This allows to use hugetlbfs backed memory now.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
2015-10-26 10:19:53 +01:00
Marc-André Lureau
2c04752cc8 ivshmem: use qemu_strtosz()
Use the common qemu utility function to parse the memory size.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
2015-10-26 10:19:53 +01:00
Marc-André Lureau
f689d2811a ivshmem: do not keep shm_fd open
Remove shm_fd from device state, closing it as early as possible to avoid leaks.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
2015-10-26 10:19:53 +01:00
Marc-André Lureau
ddef6a0d68 tests: add ivshmem qtest
Adds 4 ivshmemtests:
- single qemu instance and basic IO
- pair of instances, check memory sharing
- pair of instances with server, and MSIX
- hot plug/unplug

A temporary shm is created as well as a directory to place server
socket, both should be clear on exit and abort.

Cc: Cam Macdonell <cam@cs.ualberta.ca>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-10-26 10:19:48 +01:00
Marc-André Lureau
063c23d909 qtest: add qtest_add_abrt_handler()
Allow a test to add abort handlers, use GHook for all handlers.

There is currently no way to remove a handler, but it could be
later added if needed.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
2015-10-24 18:03:18 +02:00
Marc-André Lureau
43b11a91dd msix: implement pba write (but read-only)
qpci_msix_pending() writes on pba region, causing qemu to SEGV:

  Program received signal SIGSEGV, Segmentation fault.
  [Switching to Thread 0x7ffff7fba8c0 (LWP 25882)]
  0x0000000000000000 in ?? ()
  (gdb) bt
  #0  0x0000000000000000 in  ()
  #1  0x00005555556556c5 in memory_region_oldmmio_write_accessor (mr=0x5555579f3f80, addr=0, value=0x7fffffffbf68, size=4, shift=0, mask=4294967295, attrs=...) at /home/elmarco/src/qemu/memory.c:434
  #2  0x00005555556558e1 in access_with_adjusted_size (addr=0, value=0x7fffffffbf68, size=4, access_size_min=1, access_size_max=4, access=0x55555565563e <memory_region_oldmmio_write_accessor>, mr=0x5555579f3f80, attrs=...) at /home/elmarco/src/qemu/memory.c:506
  #3  0x00005555556581eb in memory_region_dispatch_write (mr=0x5555579f3f80, addr=0, data=0, size=4, attrs=...) at /home/elmarco/src/qemu/memory.c:1176
  #4  0x000055555560b6f9 in address_space_rw (as=0x555555eff4e0 <address_space_memory>, addr=3759147008, attrs=..., buf=0x7fffffffc1b0 "", len=4, is_write=true) at /home/elmarco/src/qemu/exec.c:2439
  #5  0x000055555560baa2 in cpu_physical_memory_rw (addr=3759147008, buf=0x7fffffffc1b0 "", len=4, is_write=1) at /home/elmarco/src/qemu/exec.c:2534
  #6  0x000055555564c005 in cpu_physical_memory_write (addr=3759147008, buf=0x7fffffffc1b0, len=4) at /home/elmarco/src/qemu/include/exec/cpu-common.h:80
  #7  0x000055555564cd9c in qtest_process_command (chr=0x55555642b890, words=0x5555578de4b0) at /home/elmarco/src/qemu/qtest.c:378
  #8  0x000055555564db77 in qtest_process_inbuf (chr=0x55555642b890, inbuf=0x55555641b340) at /home/elmarco/src/qemu/qtest.c:569
  #9  0x000055555564dc07 in qtest_read (opaque=0x55555642b890, buf=0x7fffffffc2e0 "writel 0xe0100800 0x0\n", size=22) at /home/elmarco/src/qemu/qtest.c:581
  #10 0x000055555574ce3e in qemu_chr_be_write (s=0x55555642b890, buf=0x7fffffffc2e0 "writel 0xe0100800 0x0\n", len=22) at qemu-char.c:306
  #11 0x0000555555751263 in tcp_chr_read (chan=0x55555642bcf0, cond=G_IO_IN, opaque=0x55555642b890) at qemu-char.c:2876
  #12 0x00007ffff64c9a8a in g_main_context_dispatch (context=0x55555641c400) at gmain.c:3122

(without this patch, this can be reproduced with the ivshmem qtest)

Implement an empty mmio write to avoid the crash.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-24 18:03:18 +02:00
Marc-André Lureau
45b00c44ce contrib: remove unnecessary strdup()
getopt() optarg points to argv memory, no need to dup those values,
fixes small leaks detected by clang-analyzer.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2015-10-24 18:03:18 +02:00
David Marchand
5105b1d8c2 ivshmem: add check on protocol version in QEMU
Send a protocol version as the first message from server, clients must
close communication if they don't support this protocol version.  Older
QEMUs should be fine with this change in the protocol since they
overrides their own vm_id on reception of an id associated to no
eventfd.

Signed-off-by: David Marchand <david.marchand@6wind.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
[use fifo_update_and_get()]
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
2015-10-24 18:03:18 +02:00
David Marchand
8c4ef202b9 docs: update ivshmem device spec
Add some notes on the parts needed to use ivshmem devices: more specifically,
explain the purpose of an ivshmem server and the basic concept to use the
ivshmem devices in guests.
Move some parts of the documentation and re-organise it.

Signed-off-by: David Marchand <david.marchand@6wind.com>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2015-10-24 18:03:17 +02:00
Marc-André Lureau
1e21feb628 ivshmem-server: fix hugetlbfs support
As pointed out on the ML by Andrew Jones, glibc no longer permits
creating POSIX shm on hugetlbfs directly. When given a hugetlbfs path,
create a shareable file there.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2015-10-24 18:03:16 +02:00
Marc-André Lureau
022cffe313 ivshmem-server: use a uint16 for client ID
In practice, the number of VM is limited to MAXUINT16 in ivshmem, so use
the same limit on the server (removes a theorical infinite loop)

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
2015-10-24 18:03:16 +02:00
Marc-André Lureau
95204aa951 ivshmem-client: check the number of vectors
Check the number of vectors received from the server, to avoid
out of bound array access.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
2015-10-24 18:03:16 +02:00
David Marchand
a75eb03b9f contrib: add ivshmem client and server
When using ivshmem devices, notifications between guests can be sent as
interrupts using a ivshmem-server (typical use described in documentation).
The client is provided as a debug tool.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: David Marchand <david.marchand@6wind.com>
[fix a valgrind warning, option and server_close() segvs, extra server
headers includes, getopt() return type, out-of-tree build, use qemu
event_notifier instead of eventfd, fix x86/osx warnings - Marc-André]
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2015-10-24 18:03:16 +02:00
Marc-André Lureau
12f0b68c82 util: const event_notifier_get_fd() argument
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2015-10-24 18:02:49 +02:00
Marc-André Lureau
972ad21553 ivshmem: reset mask on device reset
The interrupt mask is a state value, it should be reset, like the
interrupt status.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
2015-10-24 18:02:49 +02:00
Marc-André Lureau
1ee57de444 ivshmem: error on too many eventfd received
The number of eventfd that can be handled per peer is limited by the
number of vectors. Return an error when receiving too many of them.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
2015-10-24 18:02:49 +02:00
Marc-André Lureau
f456179fae ivshmem: replace 'guest' for 'peer' appropriately
The terms 'guest' and 'peer' are used sometime interchangeably which may
be confusing. Instead, use 'peer' for the remote instances of ivshmem
clients, and 'guest' for the local VM.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
2015-10-24 18:02:49 +02:00
Marc-André Lureau
f64a078d45 ivshmem: fix pci_ivshmem_exit()
Free all objects owned by the device, making sure the device is free,
fixing hot-unplug.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
2015-10-24 18:02:49 +02:00
Marc-André Lureau
d383537d01 ivshmem: add device description
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
2015-10-24 18:02:49 +02:00
Marc-André Lureau
945001a1af ivshmem: check shm isn't already initialized
The server should not change the shm, and this isn't handled by qemu and
we should should verify this in qemu.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
2015-10-24 18:02:49 +02:00
Marc-André Lureau
86d471bfa4 ivshmem: shmfd can be 0
0 is a valid fd value, so change conditions and set -1 value early

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
2015-10-24 18:02:49 +02:00
Marc-André Lureau
1f8552df2c ivshmem: migrate with VMStateDescription
load_state_old() is used to keep compatibility with version 0.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
2015-10-24 18:02:49 +02:00
Marc-André Lureau
e309366337 ivshmem: use common is_power_of_2()
The common version correctly checks for 0 value case.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
2015-10-24 18:02:49 +02:00
Marc-André Lureau
6f8a16d55d ivshmem: use common return
Both if branches return, move this out to common end.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
2015-10-24 18:02:49 +02:00
Marc-André Lureau
9a2f0e64ae ivshmem: simplify a bit the code
Use some more explicit variables to simplify the code.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
2015-10-24 18:02:49 +02:00
Marc-André Lureau
ffa99afd6e ivshmem: print error on invalid peer id
The server shouldn't send invalid peer id, so print an error if it's the
case.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
2015-10-24 18:02:49 +02:00
Marc-André Lureau
36617792b4 ivshmem: improve error handling
The test whether the chardev is an AF_UNIX socket rejects
"-chardev socket,id=chr0,path=/tmp/foo,server,nowait -device
ivshmem,chardev=chr0", but fails to explain why.

Use an explicit error on why a chardev may be rejected.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
2015-10-24 18:02:49 +02:00
Marc-André Lureau
f59bb37898 ivshmem: improve debug messages
Some misc improvements to ivshmem debug.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
2015-10-24 18:02:49 +02:00
Marc-André Lureau
95c8425cc3 ivshmem: remove max_peer field
max_peer isn't really useful, it tracks the maximum received VM id, but
that quickly matches nb_peers, the size of the peers array. Since VM
come and go, there might be sparse peers so it doesn't help much in
general to have this value around.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
2015-10-24 18:02:48 +02:00
Marc-André Lureau
95e7c8a0f6 ivshmem: initialize max_peer to -1
There is no peer when device is initialized, do not let doorbell for
inexisting peer 0.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
2015-10-24 18:02:48 +02:00
Marc-André Lureau
d8a5da075a ivshmem: remove useless ivshmem_update_irq() val argument
val isn't used in ivshmem_update_irq() function.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
2015-10-24 18:02:48 +02:00
Marc-André Lureau
81e507f0bc ivshmem: allocate eventfds in resize_peers()
It simplifies a bit the code to allocate the array when setting the
number of peers instead of lazily when receiving the first vector.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
2015-10-24 18:02:48 +02:00
Marc-André Lureau
1300b2733a ivshmem: simplify around increase_dynamic_storage()
Set the number of peers and array allocation in a single place. Rename
to better reflect the function content.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
2015-10-24 18:02:48 +02:00
Marc-André Lureau
61ea2d8648 ivshmem: limit maximum number of peers to G_MAXUINT16
Limit the maximum number of peers to MAXUINT16. This is more realistic
and better matches the limit of the doorbell register.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
2015-10-24 18:02:48 +02:00
Marc-André Lureau
03977ad552 ivshmem: remove last exit(1)
Failing to create a chardev shouldn't be fatal.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
2015-10-24 18:02:48 +02:00
Marc-André Lureau
d58d7e848e ivshmem: more qdev conversion
Use the latest qemu device modeling API, in particular, convert to
realize to fix the error handling; right now a botched device_add
ivhsmem command kills the VM.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
2015-10-24 18:02:48 +02:00
Marc-André Lureau
49b2951f84 ivshmem: remove useless doorbell field
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
2015-10-24 18:02:48 +02:00
Marc-André Lureau
9113e3f394 ivshmem: remove superflous ivshmem_attr field
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
2015-10-24 18:02:48 +02:00
Marc-André Lureau
dee2151e72 ivshmem: remove unnecessary dup()
qemu_chr_fe_get_msgfd() transfers ownership, there is no need to dup the
fd.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
2015-10-24 18:02:48 +02:00
Marc-André Lureau
0f14fd71c1 ivshmem: factor out the incoming fifo handling
Make a new function fifo_update_and_get() that can be reused by other
functions (in next commits).

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
2015-10-24 18:02:48 +02:00
Marc-André Lureau
951dada665 ivshmem: fix number of bytes to push to fifo
If the fifo has 0 bytes, and the read is of size 1, the call to
fifo8_push_all() will copy off boundary data.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
2015-10-24 18:02:48 +02:00
Marc-André Lureau
b8ab854b27 ivhsmem: read do not accept more than sizeof(long)
ivshmem_read() only reads sizeof(long) from the input buffer.  Accepting
more could lead to fifo8 abort() on 32bit systems if fifo is not empty.

A following patch will change the protocol to 64-bit little-endian
instead.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
2015-10-24 18:02:48 +02:00
Marc-André Lureau
c246a62f26 msix: add VMSTATE_MSIX_TEST
ivshmem is going to use MSIX state conditionally.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
2015-10-24 18:02:48 +02:00
Marc-André Lureau
1ad78ea51a char: add qemu_chr_free()
If a chardev is allowed to be created outside of QMP, then it must be
also possible to free it. This is useful for ivshmem that creates
chardev anonymously and must be able to free them.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
2015-10-24 18:02:48 +02:00
Andreas Färber
bbfc2efefe tests: Add ivshmem qtest
Note that it launches two instances, as sharing memory is the purpose of
ivshmem.

Cc: Cam Macdonell <cam@cs.ualberta.ca>
Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
[ Remove Nahanni codename, add test to pci set - Marc-André ]
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2015-10-24 18:02:48 +02:00
Marc-André Lureau
dd054be35f config: enable ivshmem on POSIX
ivshmem doesn't actually require kvm, so enable it when POSIX is
enabled. (it is required however when ioeventfd is enabled)

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2015-10-24 18:02:47 +02:00
Peter Maydell
af25e7277d Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches

# gpg: Signature made Fri 23 Oct 2015 17:59:56 BST using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"

* remotes/kevin/tags/for-upstream: (37 commits)
  tests: Add test case for aio_disable_external
  block: Add "drained begin/end" for internal snapshot
  block: Add "drained begin/end" for transactional blockdev-backup
  block: Add "drained begin/end" for transactional backup
  block: Add "drained begin/end" for transactional external snapshot
  block: Introduce "drained begin/end" API
  aio: introduce aio_{disable,enable}_external
  dataplane: Mark host notifiers' client type as "external"
  nbd: Mark fd handlers client type as "external"
  aio: Add "is_external" flag for event handlers
  throttle: Remove throttle_group_lock/unlock()
  blockdev: Allow more options for BB-less BDS tree
  blockdev: Pull out blockdev option extraction
  blockdev: Do not create BDS for empty drive
  block: Prepare for NULL BDS
  block: Add blk_insert_bs()
  block: Prepare remaining BB functions for NULL BDS
  block: Fail requests to empty BlockBackend
  block: Make some BB functions fall back to BBRS
  block: Add BlockBackendRootState
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-23 18:14:42 +01:00
Fam Zheng
c07bc2c165 tests: Add test case for aio_disable_external
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:24 +02:00
Fam Zheng
507306cc8e block: Add "drained begin/end" for internal snapshot
This ensures the atomicity of the transaction by avoiding processing of
external requests such as those from ioeventfd.

state->bs is assigned right after bdrv_drained_begin. Because it was
used as the flag for deletion or not in abort, now we need a separate
flag - InternalSnapshotState.created.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:24 +02:00
Fam Zheng
ff52bf36a3 block: Add "drained begin/end" for transactional blockdev-backup
Similar to the previous patch, make sure that external events are not
dispatched during transaction operations.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:24 +02:00
Fam Zheng
1fdd4b7be3 block: Add "drained begin/end" for transactional backup
This ensures the atomicity of the transaction by avoiding processing of
external requests such as those from ioeventfd.

Move the assignment to state->bs up right after bdrv_drained_begin, so
that we can use it in the clean callback. The abort callback will still
check bs->job and state->job, so it's OK.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:24 +02:00
Fam Zheng
da763e8301 block: Add "drained begin/end" for transactional external snapshot
This ensures the atomicity of the transaction by avoiding processing of
external requests such as those from ioeventfd.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:24 +02:00
Fam Zheng
51288d7917 block: Introduce "drained begin/end" API
The semantics is that after bdrv_drained_begin(bs), bs will not get new external
requests until the matching bdrv_drained_end(bs).

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:24 +02:00
Fam Zheng
c1e1e5fa8f aio: introduce aio_{disable,enable}_external
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:24 +02:00
Fam Zheng
3a1e8074d7 dataplane: Mark host notifiers' client type as "external"
They will be excluded by type in the nested event loops in block layer,
so that unwanted events won't be processed there.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:24 +02:00
Fam Zheng
172cc129a5 nbd: Mark fd handlers client type as "external"
So we could distinguish it from internal used fds, thus avoid handling
unwanted events in nested aio polls.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:23 +02:00
Fam Zheng
dca21ef23b aio: Add "is_external" flag for event handlers
All callers pass in false, and the real external ones will switch to
true in coming patches.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:23 +02:00
Alberto Garcia
d87d01e16a throttle: Remove throttle_group_lock/unlock()
The group throttling code was always meant to handle its locking
internally. However, bdrv_swap() was touching the ThrottleGroup
structure directly and therefore needed an API for that.

Now that bdrv_swap() no longer exists there's no need for the
throttle_group_lock() API anymore.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:23 +02:00
Max Reitz
bd745e238b blockdev: Allow more options for BB-less BDS tree
Most of the options which blockdev_init() parses for both the
BlockBackend and the root BDS are valid for just the root BDS as well
(e.g. read-only). This patch allows specifying these options even if not
creating a BlockBackend.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:23 +02:00
Max Reitz
fbf8175eac blockdev: Pull out blockdev option extraction
Extract some of the blockdev option extraction code from blockdev_init()
into its own function. This simplifies blockdev_init() and will allow
reusing the code in a different function added in a follow-up patch.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:23 +02:00
Max Reitz
5ec18f8c83 blockdev: Do not create BDS for empty drive
Do not use "rudimentary" BDSs for empty drives any longer (for
freshly created drives).

After a follow-up patch, empty drives will generally use a NULL BDS, not
only the freshly created drives.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:23 +02:00
Max Reitz
5433c24f0f block: Prepare for NULL BDS
blk_bs() will not necessarily return a non-NULL value any more (unless
blk_is_available() is true or it can be assumed to otherwise, e.g.
because it is called immediately after a successful blk_new_with_bs() or
blk_new_open()).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:23 +02:00
Max Reitz
0c3c36d651 block: Add blk_insert_bs()
This function associates the given BlockDriverState with the given
BlockBackend.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:23 +02:00
Max Reitz
a46fc9c950 block: Prepare remaining BB functions for NULL BDS
There are several BlockBackend functions which, in theory, cannot fail.
This patch makes them cope with the BlockDriverState pointer being NULL
by making them fall back to some default action like ignoring the value
in setters and returning the default in getters.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:23 +02:00
Max Reitz
c09ba36c9a block: Fail requests to empty BlockBackend
If there is no BlockDriverState in a BlockBackend or if the tray of the
guest device is open, fail all requests (where that is possible) with
-ENOMEDIUM.

The reason the status of the guest device is taken into account is
because once the guest device's tray is opened, any request on the same
BlockBackend as the guest uses should fail. If the BDS tree is supposed
to be usable even after ejecting it from the guest, a different
BlockBackend must be used.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:23 +02:00
Max Reitz
061959e8da block: Make some BB functions fall back to BBRS
If there is no BDS tree attached to a BlockBackend, functions that can
do so should fall back to the BlockBackendRootState structure.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:23 +02:00
Max Reitz
281d22d86c block: Add BlockBackendRootState
This structure will store some of the state of the root BDS if the BDS
tree is removed, so that state can be restored once a new BDS tree is
inserted.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:23 +02:00
Max Reitz
973f2ddf7b block/throttle-groups: Make incref/decref public
Throttle groups are not necessarily referenced by BDSs alone; a later
patch will essentially allow BBs to reference them, too. Make the
ref/unref functions public so that reference can be properly accounted
for.

Their interface is slightly adjusted in that they return and take a
ThrottleState pointer, respectively, instead of a ThrottleGroup pointer.
Functionally, they are equivalent, but since ThrottleGroup is not meant
to be used outside of block/throttle-groups.c, ThrottleState is easier
to handle.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:23 +02:00
Max Reitz
373340b26c block: Move I/O status and error actions into BB
These options are only relevant for the user of a whole BDS tree (like a
guest device or a block job) and should thus be moved into the
BlockBackend.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:23 +02:00
Max Reitz
7f0e9da6f1 block: Move BlockAcctStats into BlockBackend
As the comment above bdrv_get_stats() says, BlockAcctStats is something
which belongs to the device instead of each BlockDriverState. This patch
therefore moves it into the BlockBackend.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:23 +02:00
Max Reitz
53d8f9d8fb block: Remove wr_highest_sector from BlockAcctStats
BlockAcctStats contains statistics about the data transferred from and
to the device; wr_highest_sector does not fit in with the rest.

Furthermore, those statistics are supposed to be specific for a certain
device and not necessarily for a BDS (see the comment above
bdrv_get_stats()); on the other hand, wr_highest_sector may be a rather
important information to know for each BDS. When BlockAcctStats is
finally removed from the BDS, we will want to keep wr_highest_sector in
the BDS.

Finally, wr_highest_sector is renamed to wr_highest_offset and given the
appropriate meaning. Externally, it is represented as an offset so there
is no point in doing something different internally. Its definition is
changed to match that in qapi/block-core.json which is "the offset after
the greatest byte written to". Doing so should not cause any harm since
if external programs tried to calculate the volume usage by
(wr_highest_offset + 512) / volume_size, after this patch they will just
assume the volume to be full slightly earlier than before.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:23 +02:00
Max Reitz
68e9ec017b block: Move guest_block_size into BlockBackend
guest_block_size is a guest device property so it should be moved into
the interface between block layer and guest devices, which is the
BlockBackend.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:23 +02:00
Max Reitz
4981bdec0d block: Fix BB AIOCB AioContext without BDS
Fix the BlockBackend's AIOCB AioContext for aborting AIO in case there
is no BDS. If there is no implementation of AIOCBInfo::get_aio_context()
the AioContext is derived from the BDS the AIOCB belongs to. If that BDS
is NULL (because it has been removed from the BB) this will not work.

This patch makes blk_get_aio_context() fall back to the main loop
context if the BDS pointer is NULL and implements
AIOCBInfo::get_aio_context() (blk_aiocb_get_aio_context()) which invokes
blk_get_aio_context().

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:23 +02:00
Max Reitz
7d3467d903 hw/usb-storage: Check whether BB is inserted
Only call bdrv_add_key() on the BlockDriverState if it is not NULL.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:23 +02:00
Max Reitz
2e1280e8ff hw/block/fdc: Implement tray status
The tray of an FDD is open iff there is no medium inserted (there are
only two states for an FDD: "medium inserted" or "no medium inserted").

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:23 +02:00
Max Reitz
b4d02820d9 block: Invoke change media CB before NULLing drv
In order to handle host device passthrough, some guest device models
may call blk_is_inserted() to check whether the medium is inserted on
the host, when checking the guest tray status.

This tray status is inquired by blk_dev_change_media_cb(); because
bdrv_is_inserted() (invoked by blk_is_inserted()) always returns false
for BDS with drv set to NULL, blk_dev_change_media_cb() should therefore
be called before drv is set to NULL.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:23 +02:00
Max Reitz
1354c47378 block/raw_bsd: Drop raw_is_inserted()
With the new automatically-recursive implementation of
bdrv_is_inserted() checking by default whether all the children of a BDS
are inserted, we can drop raw's own implementation.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:23 +02:00
Max Reitz
28d7a78996 block: Make bdrv_is_inserted() recursive
If bdrv_is_inserted() is called on the top level BDS, it should make
sure all nodes in the BDS tree are actually inserted.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:22 +02:00
Max Reitz
db0284f86a block: Add blk_is_available()
blk_is_available() returns true iff the BDS is inserted (which means
blk_bs() is not NULL and bdrv_is_inserted() returns true) and if the
tray of the guest device is closed.

blk_is_inserted() is changed to return true only if blk_bs() is not
NULL.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:22 +02:00
Max Reitz
e031f75048 block: Make bdrv_is_inserted() return a bool
Make bdrv_is_inserted(), blk_is_inserted(), and the callback
BlockDriver.bdrv_is_inserted() return a bool.

Suggested-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:22 +02:00
Max Reitz
8e9e653038 iotests: Only create BB if necessary
Tests 071 and 081 test giving references in blockdev-add. It is not
necessary to create a BlockBackend here, so omit it.

While at it, fix up some blockdev-add invocations in the vicinity
(s/raw/$IMGFMT/ in 081, drop the format BDS for blkverify's raw child in
071).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:22 +02:00
Max Reitz
be4b67bc7d blockdev: Allow creation of BDS trees without BB
If the "id" field is missing from the options given to blockdev-add,
just omit the BlockBackend and create the BlockDriverState tree alone.

However, if "id" is missing, "node-name" must be specified; otherwise,
the BDS tree would no longer be accessible.

Many BDS options which are not parsed by bdrv_open() (like caching)
cannot be specified for these BB-less BDS trees yet. A future patch will
remove this limitation.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:22 +02:00
Max Reitz
d44f928a54 block: Set BDRV_O_INCOMING in bdrv_fill_options()
This flag should not be set for the root BDS only, but for any BDS that
is being created while incoming migration is pending, so setting it is
moved from blockdev_init() to bdrv_fill_options().

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:22 +02:00
Max Reitz
f709623b3d block: Remove host floppy support
It has been deprecated as of 2.3, so we can now remove it.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-10-23 18:18:22 +02:00
Peter Maydell
bc79082e4c Merge remote-tracking branch 'remotes/ehabkost/tags/x86-pull-request' into staging
X86 queue, 2015-10-23

# gpg: Signature made Fri 23 Oct 2015 16:30:58 BST using RSA key ID 984DC5A6
# gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>"

* remotes/ehabkost/tags/x86-pull-request:
  vl: trivial: minor tweaks to a max-cpu error msg
  target-i386: Use 1UL for bit shift
  target-i386: Add DE to TCG_FEATURES
  target-i386: Ensure always-1 bits on DR6 can't be cleared
  target-i386: Check CR4[DE] for processing DR4/DR5
  target-i386: Handle I/O breakpoints
  target-i386: Optimize setting dr[0-3]
  target-i386: Move hw_*breakpoint_* functions
  target-i386: Ensure bit 10 on DR7 is never cleared
  target-i386: Re-introduce optimal breakpoint removal
  target-i386: Introduce cpu_x86_update_dr7
  target-i386: Disable cache info passthrough by default
  target-i386: allow any alignment for SMBASE

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-23 16:35:43 +01:00
Andrew Jones
31bfa2a400 vl: trivial: minor tweaks to a max-cpu error msg
Signed-off-by: Andrew Jones <drjones@redhat.com>
2015-10-23 13:11:52 -02:00
Eduardo Habkost
72370dc114 target-i386: Use 1UL for bit shift
Fix undefined behavior detected by clang runtime check:

  qemu/target-i386/cpu.c:1494:15: runtime error:
    left shift of 1 by 31 places cannot be represented in type 'int'

While doing that, add extra parenthesis for clarity.

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-10-23 13:07:27 -02:00
Eduardo Habkost
b6c5a6f021 target-i386: Add DE to TCG_FEATURES
Now DE is supported by TCG so it can be enabled in CPUID bits.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-10-23 12:59:27 -02:00
Eduardo Habkost
462f8ed1f1 target-i386: Ensure always-1 bits on DR6 can't be cleared
Bits 4-11 and 16-31 on DR6 are documented as always 1, so ensure they
can't be cleared by software.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-10-23 12:59:27 -02:00
Richard Henderson
d005233923 target-i386: Check CR4[DE] for processing DR4/DR5
Introduce helper_get_dr so that we don't have to put CR4[DE]
into the scarce HFLAGS resource.  At the same time, rename
helper_movl_drN_T0 to helper_set_dr and set the helper flags.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-10-23 12:59:27 -02:00
Eduardo Habkost
5223a9423c target-i386: Handle I/O breakpoints
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-10-23 12:59:27 -02:00
Richard Henderson
7525b55051 target-i386: Optimize setting dr[0-3]
If the debug register is not enabled, we need
do nothing besides update the register.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-10-23 12:59:27 -02:00
Richard Henderson
696ad9e4b2 target-i386: Move hw_*breakpoint_* functions
They're only used from bpt_helper.c now.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-10-23 12:59:27 -02:00
Eduardo Habkost
9055330ffb target-i386: Ensure bit 10 on DR7 is never cleared
Bit 10 of DR7 is documented as always set to 1, so ensure that's
always the case.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-10-23 12:59:27 -02:00
Richard Henderson
36eb6e0967 target-i386: Re-introduce optimal breakpoint removal
Before the last patch, we had an efficient loop that disabled
local breakpoints on task switch.  Re-add that, but in a more
general way that handles changes to the global enable bits too.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-10-23 12:59:27 -02:00
Richard Henderson
93d00d0fbe target-i386: Introduce cpu_x86_update_dr7
This moves the last of the iteration over breakpoints into
the bpt_helper.c file.  This also allows us to make several
breakpoint functions static.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-10-23 12:59:27 -02:00
Eduardo Habkost
e265e3e480 target-i386: Disable cache info passthrough by default
The host cache information may not make sense for the guest if the VM
CPU topology doesn't match the host CPU topology. To make sure we won't
expose broken cache information to the guest, disable cache info
passthrough by default, and add a new "host-cache-info" property that
can be used to enable the old behavior for users that really need it.

Cc: Benoît Canet <benoit@irqsave.net>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-10-23 12:59:27 -02:00
Paolo Bonzini
dd75d4fcb4 target-i386: allow any alignment for SMBASE
Processors up to the Pentium (says Bochs---I do not have old enough
manuals) require a 32KiB alignment for the SMBASE, but newer processors
do not need that, and Tiano Core will use non-aligned SMBASE values.

Reported-by: Michael D Kinney <michael.d.kinney@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-10-23 12:59:26 -02:00
Peter Maydell
1e700f4c6c Merge remote-tracking branch 'remotes/mdroth/tags/qga-pull-2015-10-23-tag' into staging
qemu-ga patch queue

* unbreak qga-test unit test on travis-ci systems by not assuming a
  disk-based filesystem must be present

# gpg: Signature made Fri 23 Oct 2015 15:01:47 BST using RSA key ID F108B584
# gpg: Good signature from "Michael Roth <flukshun@gmail.com>"
# gpg:                 aka "Michael Roth <mdroth@utexas.edu>"
# gpg:                 aka "Michael Roth <mdroth@linux.vnet.ibm.com>"

* remotes/mdroth/tags/qga-pull-2015-10-23-tag:
  tests: test-qga, loosen assumptions about host filesystems

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-23 15:55:50 +01:00
Michael Roth
b3e9e584fc tests: test-qga, loosen assumptions about host filesystems
QGA skips pseudo-filesystems when querying filesystems via
guest-get-fsinfo. On some hosts, such as travis-ci which uses
containers with simfs filesystems, QGA might not report *any*
filesystems. Our test case assumes there would be at least one,
leading to false error messages in these situations.

Instead, sanity-check values iff we get at least one filesystem.

Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-10-23 08:57:45 -05:00
Peter Maydell
147482ae35 Merge remote-tracking branch 'remotes/dgibson/tags/ppc-next-20151023' into staging
ppc patch queue - 2015-10-23

sPAPR highlights:
  * Allow VFIO devices on the spapr-pci-host-bridge
  * Allow virtio VGA
  * Safer handling of HTAB allocation
  * ibm,pa-features device tree property

non-sPAPR highlights:
  * Categorization of many ppc specific devices in help output
  * Tweaks to MMU type constants

# gpg: Signature made Fri 23 Oct 2015 07:27:56 BST using RSA key ID 20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-next-20151023: (21 commits)
  prep: do not use CPU_LOG_IOPORT, convert to tracepoints
  openpic: add to misc category
  macio-nvram: add to misc category
  macio: add to bridge category
  uninorth: add to bridge category
  macio-ide: add to storage category
  cuda: add to bridge category
  grackle: add to bridge category
  escc: add to input category
  cmd646: add to storage category
  adb: add to input category
  ppc/spapr: Add "ibm,pa-features" property to the device-tree
  ppc: Add mmu_model defines for arch 2.03 and 2.07
  hw/scsi/spapr_vscsi: Remove superfluous memset
  spapr_pci: Allow VFIO devices to work on the normal PCI host bridge
  spapr_iommu: Provide a function to switch a TCE table to allowing VFIO
  spapr_iommu: Rename vfio_accel parameter
  spapr_pci: Allow PCI host bridge DMA window to be configured
  spapr: Add "slb-size" property to CPU device tree nodes
  spapr: Abort when HTAB of requested size isn't allocated
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-23 13:09:09 +01:00
Peter Maydell
431429a5b8 Merge remote-tracking branch 'remotes/berrange/tags/qcrypto-fixes-pull-20151022-2' into staging
Merge qcrypto-fixes 2015/10/22

# gpg: Signature made Thu 22 Oct 2015 19:03:45 BST using RSA key ID 15104FDF
# gpg: Good signature from "Daniel P. Berrange <dan@berrange.com>"
# gpg:                 aka "Daniel P. Berrange <berrange@redhat.com>"

* remotes/berrange/tags/qcrypto-fixes-pull-20151022-2:
  configure: avoid polluting global CFLAGS with tasn1 flags
  crypto: add sanity checking of plaintext/ciphertext length
  crypto: don't let builtin aes crash if no IV is provided
  crypto: allow use of nettle/gcrypt to be selected explicitly

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-23 12:09:02 +01:00
Peter Maydell
dfbe0642ef Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
vhost: build fix

Fix build breakages when using older gcc.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# gpg: Signature made Thu 22 Oct 2015 20:36:07 BST using RSA key ID D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>"

* remotes/mst/tags/for_upstream:
  vhost-user: fix up rhel6 build

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-23 10:24:08 +01:00
Paolo Bonzini
659f7f6556 prep: do not use CPU_LOG_IOPORT, convert to tracepoints
These messages are disabled by default; a perfect usecase for tracepoints.
Convert them over.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-10-23 12:38:28 +11:00
Laurent Vivier
29f8dd66e8 openpic: add to misc category
openpic is a programmable interrupt controller, so
add it to the misc category.

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-10-23 12:35:18 +11:00
Laurent Vivier
175fe9e7c8 macio-nvram: add to misc category
The macio nvram is a non volatile RAM, so add it
the misc category.

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-10-23 12:35:18 +11:00
Laurent Vivier
f9f2a9f26f macio: add to bridge category
macio is a bridge between the PCI bus and the Mac nvram,
IDE controller and PIC, so add it to the bridge category.

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-10-23 12:35:18 +11:00
Laurent Vivier
1d16f86a43 uninorth: add to bridge category
Uninorth is the mac99 PCI host controller, so add
it to the bridge category.

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-10-23 12:35:18 +11:00
Laurent Vivier
3469d9bce8 macio-ide: add to storage category
macio-ide is an IDE controller, so add it
to the storage category.

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-10-23 12:35:18 +11:00
Laurent Vivier
599d7326c3 cuda: add to bridge category
Cuda is a bridge between PowerMac system bus and the ADB controller,
real-time clock, pram and the power management unit.

So add it to the bridge category.

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-10-23 12:35:18 +11:00
Laurent Vivier
e16244355f grackle: add to bridge category
Grackle is the PCI host controller of oldworld powermac,
so add it to the bridge category.

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-10-23 12:35:18 +11:00
Laurent Vivier
f8d4c07c78 escc: add to input category
ESCC is a serial port controller, so add it
to the input category.

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-10-23 12:35:17 +11:00
Laurent Vivier
74623e7369 cmd646: add to storage category
cmd646 is an IDE controller, so add it to the
storage category.

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-10-23 12:35:17 +11:00
Laurent Vivier
32f3a8992e adb: add to input category
The Apple Desktop Bus is used to connect a keyboard and a mouse,
so add it to the input category.

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-10-23 12:35:17 +11:00
Benjamin Herrenschmidt
90da0d5a70 ppc/spapr: Add "ibm,pa-features" property to the device-tree
LoPAPR defines a "ibm,pa-features" per-CPU device tree property which
describes extended features of the Processor Architecture.

This adds the property to the device tree. At the moment this is the
copy of what pHyp advertises except "I=1 (cache inhibited) Large Pages"
which is enabled for TCG and disabled when running under HV KVM host
with 4K system page size.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[aik: rebased, changed commit log, moved ci_large_pages initialization,
renamed pa_features arrays]
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-10-23 12:22:40 +11:00
Benjamin Herrenschmidt
aa4bb58752 ppc: Add mmu_model defines for arch 2.03 and 2.07
This removes unused POWERPC_MMU_2_06a/POWERPC_MMU_2_06d.

This replaces POWERPC_MMU_64B with POWERPC_MMU_2_03 for POWER5+ to be
more explicit about the version of the PowerISA supported.

This defines POWERPC_MMU_2_07 and uses it for the POWER8 CPU family.
This will not have an immediate effect now but it will in the following
patch.

This should cause no behavioural change.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[aik: rebased, changed commit log]
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-10-23 12:22:40 +11:00
Thomas Huth
a23dec105c hw/scsi/spapr_vscsi: Remove superfluous memset
g_malloc0 already clears the memory, so no need for
the additional memset here.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Alexander Graf <agraf@suse.de>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-10-23 10:38:11 +11:00
David Gibson
185181f883 spapr_pci: Allow VFIO devices to work on the normal PCI host bridge
The core VFIO infrastructure more or less allows VFIO devices to work
on any normal guest PCI host bridge (PHB) without extra logic.
However, the "spapr-pci-host-bridge" device (as opposed to the special
"spapr-pci-vfio-host-bridge" device) breaks this by using a partially
KVM accelerated implementation of the guest kernel IOMMU which won't
work with VFIO devices, without additional kernel support.

This patch allows VFIO devices to work on the spapr-pci-host-bridge,
by having it switch off KVM TCE acceleration when a VFIO device is
added to the PHB (either on startup, or by hotplug).

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2015-10-23 10:38:10 +11:00
David Gibson
c10325d6f9 spapr_iommu: Provide a function to switch a TCE table to allowing VFIO
Because of the way non-VFIO guest IOMMU operations are KVM accelerated, not
all TCE tables (guest IOMMU contexts) can support VFIO devices.  Currently,
this is decided at creation time.

To support hotplug of VFIO devices, we need to allow a TCE table which
previously didn't allow VFIO devices to be switched so that it can.  This
patch adds an spapr_tce_set_need_vfio() function to do this, by
reallocating the table in userspace if necessary.

Currently this doesn't allow the KVM acceleration to be re-enabled if all
the VFIO devices are removed.  That's an optimization for another time.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2015-10-23 10:38:10 +11:00
David Gibson
6a81dd172c spapr_iommu: Rename vfio_accel parameter
The vfio_accel parameter used when creating a new TCE table (guest IOMMU
context) has a confusing name.  What it really means is whether we need the
TCE table created to be able to support VFIO devices.

VFIO is relevant, because when available we use in-kernel acceleration of
the TCE table, but that may not work with VFIO devices because updates to
the table are handled in kernel, bypass qemu and so don't hit qemu's
infrastructure for keeping the VFIO host IOMMU state in sync with the guest
IOMMU state.

Rename the parameter to "need_vfio" throughout.  This is a cosmetic change,
with no impact on the logic.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2015-10-23 10:38:10 +11:00
David Gibson
f93caaac36 spapr_pci: Allow PCI host bridge DMA window to be configured
At present the PCI host bridge (PHB) for the pseries machine type has a
fixed DMA window from 0..1GB (in PCI address space) which is mapped to real
memory via the PAPR paravirtualized IOMMU.

For better support of VFIO devices, we're going to want to allow for
different configurations of the DMA window.

Eventually we'll want to allow the guest itself to reconfigure the window
via the PAPR dynamic DMA window interface, but as a preliminary this patch
allows the user to reconfigure the window with new properties on the PHB
device.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2015-10-23 10:38:10 +11:00
Thomas Huth
fd5da5c472 spapr: Add "slb-size" property to CPU device tree nodes
According to a commit message in the Linux kernel (see here
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=b60c31d85a2a
for example), the name of the property that carries the information
about the number of SLB entries should be called "slb-size", and
not "ibm,slb-size". The Linux kernel can deal with both names, but
to be on the safe side we should support the official name, too.

[Now that LoPAPR is public, the relevant requirement can be found in
section C.6.1.8 --dwg]

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-10-23 10:38:10 +11:00
Bharata B Rao
7735fedaf4 spapr: Abort when HTAB of requested size isn't allocated
Terminate the guest when HTAB of requested size isn't allocated by
the host.

When memory hotplug is attempted on a guest that has booted with
less than requested HTAB size, the guest kernel will not be able
to gracefully fail the hotplug request. This patch will ensure that
we never end up in a situation where memory hotplug fails due to
less than requested HTAB size.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-10-23 10:38:10 +11:00
Bharata B Rao
b817772a25 spapr: Allocate HTAB from machine init
Allocate HTAB from ppc_spapr_init() so that we can abort the guest
if requested HTAB size is't allocated by the host. However retain the
htab reset call in spapr_reset_htab() so that HTAB gets reset (and
not allocated) during machine reset.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-10-23 10:38:10 +11:00
Michael S. Tsirkin
7f4a930e64 vhost-user: fix up rhel6 build
Build on RHEL6 fails:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42875

Apparently unnamed unions couldn't use C99  named field initializers.
Let's just name the payload union field.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-10-22 22:34:59 +03:00
Daniel P. Berrange
9024603776 configure: avoid polluting global CFLAGS with tasn1 flags
The previous commit

  commit 9a2fd4347c
  Author: Daniel P. Berrange <berrange@redhat.com>
  Date:   Mon Apr 13 14:01:39 2015 +0100

    crypto: add sanity checking of TLS x509 credentials

defined new variables $TEST_LIBS and $TEST_CFLAGS and
used them in tests/Makefile to augment $LIBS and $CFLAGS.

Unfortunately this overlooks the fact that tests/Makefile
is not executed via recursive-make, it is just pulled into
the top level Makefile via an include statement. So rather
than just augmenting the compiler/linker flags for tests
it polluted the global flags.

This is thought to be behind a reported failure when
building the pixman module as a sub-module, since global
$CFLAGS are passed down to configure in pixman.

This change removes the $TEST_LIBS and $TEST_CFLAGS
replacing them with $TASN1_LIBS and $TASN1_CFLAGS,
setting only against specific objects/executables
that need them.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2015-10-22 19:03:08 +01:00
Daniel P. Berrange
3a661f1eab crypto: add sanity checking of plaintext/ciphertext length
When encrypting/decrypting data, the plaintext/ciphertext
buffers are required to be a multiple of the cipher block
size. If this is not done, nettle will abort and gcrypt
will report an error. To get consistent behaviour add
explicit checks upfront for the buffer sizes.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2015-10-22 19:03:08 +01:00
Daniel P. Berrange
eb2a770b17 crypto: don't let builtin aes crash if no IV is provided
If no IV is provided, then use a default IV of all-zeros
instead of crashing. This gives parity with gcrypt and
nettle backends.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2015-10-22 19:03:08 +01:00
Daniel P. Berrange
91bfcdb01d crypto: allow use of nettle/gcrypt to be selected explicitly
Currently the choice of whether to use nettle or gcrypt is
made based on what gnutls is linked to. There are times
when it is desirable to be able to force build against a
specific library. For example, if testing changes to QEMU's
crypto code all 3 possible backends need to be checked
regardless of what the local gnutls uses.

It is also desirable to be able to enable nettle/gcrypt
for cipher/hash algorithms, without enabling gnutls
for TLS support.

This gives two new configure flags, which allow the
following possibilities

Automatically determine nettle vs gcrypt from what
gnutls links to (recommended to minimize number of
crypto libraries linked to)

 ./configure

Automatically determine nettle vs gcrypt based on
which is installed

 ./configure --disable-gnutls

Force use of nettle

 ./configure --enable-nettle

Force use of gcrypt

 ./configure --enable-gcrypt

Force use of built-in AES & crippled-DES

 ./configure --disable-nettle --disable-gcrypt

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2015-10-22 19:03:07 +01:00
Chen Gang
2a080ce266 target-tilegx: Implement prefetch instructions in pipe y2
Originally, tilegx qemu only implement prefetch instructions in pipe x1,
did not implement them in pipe y2.

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-10-22 07:51:49 -10:00
Peter Maydell
6a6739de51 Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20151021' into staging
Collected tcg backend patches

# gpg: Signature made Wed 21 Oct 2015 22:34:28 BST using RSA key ID 4DD0279B
# gpg: Good signature from "Richard Henderson <rth7680@gmail.com>"
# gpg:                 aka "Richard Henderson <rth@redhat.com>"
# gpg:                 aka "Richard Henderson <rth@twiddle.net>"

* remotes/rth/tags/pull-tcg-20151021:
  cpu-exec: Add "nochain" debug flag
  tcg/mips: Support r6 SEL{NE, EQ}Z instead of MOVN/MOVZ
  tcg/mips: Support r6 multiply/divide encodings
  tcg/mips: Support r6 JR encoding
  tcg/mips: Add use_mips32r6_instructions definition
  disas/mips: Add R6 jr/jr.hb to disassembler
  tcg-opc.h: Simplify insn_start def
  tcg/ppc: Prefer mask over andi.
  tcg/ppc: Revise goto_tb implementation
  tcg/ppc: Adjust exit_tb for change in prologue placement

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-22 18:01:53 +01:00
Peter Maydell
b803894e2c Merge remote-tracking branch 'remotes/afaerber/tags/qom-cpu-for-peter' into staging
QOM CPUState and X86CPU

* Adoption of CPUClass::disas_set_info() hook

# gpg: Signature made Thu 22 Oct 2015 17:11:24 BST using RSA key ID 3E7E013F
# gpg: Good signature from "Andreas Färber <afaerber@suse.de>"
# gpg:                 aka "Andreas Färber <afaerber@suse.com>"

* remotes/afaerber/tags/qom-cpu-for-peter:
  disas: QOMify alpha specific disas setup
  disas: QOMify mips specific disas setup
  disas: QOMify sh4 specific disas setup
  disas: QOMify lm32 specific disas setup
  disas: QOMify sparc specific disas setup
  disas: QOMify m68k specific disas setup
  disas: QOMify moxie specific disas setup
  disas: QOMify s390x specific disas setup

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-22 17:33:54 +01:00
Peter Crosthwaite
0960be7cff disas: QOMify alpha specific disas setup
Move the target_disas() alpha specifics to the CPUClass::disas_set_info()
hook and delete the #ifdef specific code in disas.c.

This also makes monitor_disas() consistent with target_disas(), as
monitor_disas() was missing a set of the BFD (This was an omission from
commit b9bec751c8).

Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Acked-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-10-22 15:49:40 +02:00
Peter Crosthwaite
63a946c7e3 disas: QOMify mips specific disas setup
Move the target_disas() mips specifics to the CPUClass::disas_set_info()
hook and delete the #ifdef specific code in disas.c.

Cc: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Acked-by: Leon Alrae <leon.alrae@imgtec.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-10-22 15:49:40 +02:00
Peter Crosthwaite
d49dd523e4 disas: QOMify sh4 specific disas setup
Move the target_disas() sh4 specifics to the CPUClass::disas_set_info()
hook and delete the #ifdef specific code in disas.c.

Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Acked-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-10-22 15:49:40 +02:00
Peter Crosthwaite
20984673e6 disas: QOMify lm32 specific disas setup
Move the target_disas() lm32 specifics to the CPUClass::disas_set_info()
hook and delete the #ifdef specific code in disas.c.

Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Acked-by: Michael Walle <michael@walle.cc>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-10-22 15:49:40 +02:00
Peter Crosthwaite
df0900eb89 disas: QOMify sparc specific disas setup
Move the target_disas() sparc specifics to the QOM disas_set_info hook
and delete the #ifdef specific code in disas.c.

Cc: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-10-22 15:49:40 +02:00
Peter Crosthwaite
4f669905d9 disas: QOMify m68k specific disas setup
Move the target_disas() m68k specifics to the CPUClass::disas_set_info()
hook and delete the #ifdef specific code in disas.c.

Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-10-22 15:49:40 +02:00
Peter Crosthwaite
9f87a4cacd disas: QOMify moxie specific disas setup
Move the target_disas() moxie specifics to the CPUClass::disas_set_info()
hook and delete the #ifdef specific code in disas.c.

Cc: Anthony Green <green@moxielogic.com>
Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-10-22 15:49:40 +02:00
Peter Crosthwaite
dbad6b74b3 disas: QOMify s390x specific disas setup
Move the target_disas() s390 specifics to the CPUClass::disas_set_info()
hook and delete the #ifdef specific code in disas.c.

Cc: Alexander Graf <agraf@suse.de>
Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Acked-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2015-10-22 15:49:40 +02:00
Peter Maydell
ca3e40e233 Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
vhost, pc, virtio features, fixes, cleanups

New features:
    VT-d support for devices behind a bridge
    vhost-user migration support

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# gpg: Signature made Thu 22 Oct 2015 12:39:19 BST using RSA key ID D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>"

* remotes/mst/tags/for_upstream: (37 commits)
  hw/isa/lpc_ich9: inject the SMI on the VCPU that is writing to APM_CNT
  i386: keep cpu_model field in MachineState uptodate
  vhost: set the correct queue index in case of migration with multiqueue
  piix: fix resource leak reported by Coverity
  seccomp: add memfd_create to whitelist
  vhost-user-test: check ownership during migration
  vhost-user-test: add live-migration test
  vhost-user-test: learn to tweak various qemu arguments
  vhost-user-test: wrap server in TestServer struct
  vhost-user-test: remove useless static check
  vhost-user-test: move wait_for_fds() out
  vhost: add migration block if memfd failed
  vhost-user: use an enum helper for features mask
  vhost user: add rarp sending after live migration for legacy guest
  vhost user: add support of live migration
  net: add trace_vhost_user_event
  vhost-user: document migration log
  vhost: use a function for each call
  vhost-user: add a migration blocker
  vhost-user: send log shm fd along with log_base
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-22 12:41:44 +01:00
Laszlo Ersek
3c23402d40 hw/isa/lpc_ich9: inject the SMI on the VCPU that is writing to APM_CNT
Commit 4d00636e97 ("ich9: Add the lpc chip", Nov 14 2012) added the
ich9_apm_ctrl_changed() ioport write callback function such that it would
inject the SMI, in response to a write to the APM_CNT register, on the
first CPU, invariably.

Since this register is used by guest code to trigger an SMI synchronously,
the interrupt should be injected on the VCPU that is performing the write.

apm_ioport_writeb() is the .write callback of the "apm_ops"
MemoryRegionOps [hw/isa/apm.c]; it is parametrized to call
ich9_apm_ctrl_changed() by ich9_lpc_init() [hw/isa/lpc_ich9.c], via
apm_init(). Therefore this change affects no other board.

ich9_generate_smi() is an unrelated function that is called by the TCO
watchdog; a watchdog is likely in its right to (asynchronously) inject
interrupts on the first CPU only.

This patch allows the combined edk2/OVMF SMM driver stack to work with
multiple VCPUs on TCG, using both qemu-system-i386 and qemu-system-x86_64.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Michael Kinney <michael.d.kinney@intel.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-22 14:39:09 +03:00
Zhu Guihua
4884b7bfe9 i386: keep cpu_model field in MachineState uptodate
Update cpu_model in MachineState for i386, so that the field can be used
for cpu hotplug, instead of using a static variable.

This patch is rebased on the latest master.

Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Acked-by: Andreas Färber <afaerber@suse.de>
2015-10-22 14:34:50 +03:00
Thibaut Collet
25a2a920dd vhost: set the correct queue index in case of migration with multiqueue
When a live migration is started the log address to mark dirty pages is provided
to the vhost backend through the vhost_dev_set_log function.
This function is called for each queue pairs but the queue index is wrongly set:
always set to the first queue pair. Then vhost backend lost descriptor addresses
of the queue pairs greater than 1 and behaviour of the vhost backend is
unpredictable.

The queue index is computed by taking account of the vq_index (to retrieve the
queue pair index) and calling the vhost_get_vq_index method of the backend.

Signed-off-by: Thibaut Collet <thibaut.collet@6wind.com>
Cc: qemu-stable@nongnu.org
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-10-22 14:34:50 +03:00
zhanghailiang
e3fce97cf5 piix: fix resource leak reported by Coverity
config_fd should be closed before return, or there will
be a resource leak error.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-10-22 14:34:50 +03:00
Eduardo Otubo
f8d82b8eb8 seccomp: add memfd_create to whitelist
This is used by memfd code.

Signed-off-by: Eduardo Otubo <eduardo.otubo@profitbricks.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Thibaut Collet <thibaut.collet@6wind.com>
2015-10-22 14:34:50 +03:00
Marc-André Lureau
1d9edff78f vhost-user-test: check ownership during migration
Check that backend source and destination do not have simultaneous
ownership during migration.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Thibaut Collet <thibaut.collet@6wind.com>
2015-10-22 14:34:50 +03:00
Marc-André Lureau
b181974724 vhost-user-test: add live-migration test
This test checks that the log fd is given to the migration source, and
mark dirty pages during migration.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Thibaut Collet <thibaut.collet@6wind.com>
2015-10-22 14:34:50 +03:00
Marc-André Lureau
704b216887 vhost-user-test: learn to tweak various qemu arguments
Add a new macro to make the qemu command line with other
values of memory size, and specific chardev id.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Thibaut Collet <thibaut.collet@6wind.com>
2015-10-22 14:34:50 +03:00
Marc-André Lureau
ae31fb5491 vhost-user-test: wrap server in TestServer struct
In the coming patches, a test will use several servers
simultaneously. Wrap the server in a struct, out of the global scope.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Thibaut Collet <thibaut.collet@6wind.com>
2015-10-22 14:34:49 +03:00
Marc-André Lureau
82755ff202 vhost-user-test: remove useless static check
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Thibaut Collet <thibaut.collet@6wind.com>
2015-10-22 14:34:49 +03:00
Marc-André Lureau
cf72b57f89 vhost-user-test: move wait_for_fds() out
This function is a precondition for most vhost-user tests.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Thibaut Collet <thibaut.collet@6wind.com>
2015-10-22 14:34:49 +03:00
Marc-André Lureau
31190ed781 vhost: add migration block if memfd failed
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Thibaut Collet <thibaut.collet@6wind.com>
2015-10-22 14:34:49 +03:00
Thibaut Collet
de1372d466 vhost-user: use an enum helper for features mask
The VHOST_USER_PROTOCOL_FEATURE_MASK will be automatically updated when
adding new features to the enum.

Signed-off-by: Thibaut Collet <thibaut.collet@6wind.com>
[Adapted from mailing list discussion - Marc-André]
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Thibaut Collet <thibaut.collet@6wind.com>
2015-10-22 14:34:49 +03:00
Thibaut Collet
3e866365e1 vhost user: add rarp sending after live migration for legacy guest
A new vhost user message is added to allow QEMU to ask to vhost user backend to
broadcast a fake RARP after live migration for guest without GUEST_ANNOUNCE
capability.

This new message is sent only if the backend supports the new
VHOST_USER_PROTOCOL_F_RARP protocol feature.
The payload of this new message is the MAC address of the guest (not known by
the backend). The MAC address is copied in the first 6 bytes of a u64 to avoid
to create a new payload message type.

This new message has no equivalent ioctl so a new callback is added in the
userOps structure to send the request.

Upon reception of this new message the vhost user backend must generate and
broadcast a fake RARP request to notify the migration is terminated.

Signed-off-by: Thibaut Collet <thibaut.collet@6wind.com>
[Rebased and fixed checkpatch errors - Marc-André]
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Thibaut Collet <thibaut.collet@6wind.com>
2015-10-22 14:34:49 +03:00
Thibaut Collet
f6f56291de vhost user: add support of live migration
Some vhost user backends are able to support live migration.
To provide this service the following features must be added:
1. Add the VIRTIO_NET_F_GUEST_ANNOUNCE capability to vhost-net when netdev
   backend is vhost-user.
2. Provide a nop receive callback to vhost-user.
   This callback is called by:
    *  qemu_announce_self after a migration to send fake RARP to avoid network
       outage for peers talking to the migrated guest.
         - For guest with GUEST_ANNOUNCE capabilities, guest already sends GARP
           when the bit VIRTIO_NET_S_ANNOUNCE is set.
           => These packets must be discarded.
         - For guest without GUEST_ANNOUNCE capabilities, migration termination
           is notified when the guest sends packets.
           => These packets can be discarded.
    * virtio_net_tx_bh with a dummy boot to send fake bootp/dhcp request.
      BIOS guest manages virtio driver to send 4 bootp/dhcp request in case of
      dummy boot.
      => These packets must be discarded.

Signed-off-by: Thibaut Collet <thibaut.collet@6wind.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Thibaut Collet <thibaut.collet@6wind.com>
2015-10-22 14:34:49 +03:00
Marc-André Lureau
69b32a6ce4 net: add trace_vhost_user_event
Replace error_report() and use tracing instead. It's not an error to get
a connection or a disconnection, so silence this and trace it instead.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Thibaut Collet <thibaut.collet@6wind.com>
2015-10-22 14:34:49 +03:00
Marc-André Lureau
c62b91e580 vhost-user: document migration log
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Thibaut Collet <thibaut.collet@6wind.com>
2015-10-22 14:34:49 +03:00
Marc-André Lureau
21e704256d vhost: use a function for each call
Replace the generic vhost_call() by specific functions for each
function call to help with type safety and changing arguments.

While doing this, I found that "unsigned long long" and "uint64_t" were
used interchangeably and causing compilation warnings, using uint64_t
instead, as the vhost & protocol specifies.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
[Fix enum usage and MQ - Thibaut Collet]
Signed-off-by: Thibaut Collet <thibaut.collet@6wind.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Thibaut Collet <thibaut.collet@6wind.com>
2015-10-22 14:34:49 +03:00
Marc-André Lureau
d2fc4402cb vhost-user: add a migration blocker
If VHOST_USER_PROTOCOL_F_LOG_SHMFD is not announced, block vhost-user
migration. The blocker is removed in vhost_dev_cleanup().

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Thibaut Collet <thibaut.collet@6wind.com>
2015-10-22 14:34:49 +03:00
Marc-André Lureau
9a78a5dd27 vhost-user: send log shm fd along with log_base
Send the shm for the dirty pages logging if the backend supports
VHOST_USER_PROTOCOL_F_LOG_SHMFD. Wait for a reply to make sure
the old log is no longer used.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Thibaut Collet <thibaut.collet@6wind.com>
2015-10-22 14:34:49 +03:00
Marc-André Lureau
15324404f6 vhost: alloc shareable log
If the backend is requires it, allocate shareable memory.

vhost_log_get() now uses 2 globals "vhost_log" and "vhost_log_shm", that
way there is a common non-shareable log and a common shareable one.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Thibaut Collet <thibaut.collet@6wind.com>
2015-10-22 14:34:49 +03:00
Marc-André Lureau
1be0ac2109 vhost-user: add vhost_user_requires_shm_log()
Check if the backend has VHOST_USER_PROTOCOL_F_LOG_SHMFD feature and
require a shared log.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Thibaut Collet <thibaut.collet@6wind.com>
2015-10-22 14:34:49 +03:00
Marc-André Lureau
c2bea314f6 vhost: add vhost_set_log_base op
Split VHOST_SET_LOG_BASE call in a seperate function callback, so that
type safety works and more arguments can be added in the next patches.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Thibaut Collet <thibaut.collet@6wind.com>
2015-10-22 14:34:49 +03:00
Marc-André Lureau
636f4dddfe vhost: document log resizing
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Thibaut Collet <thibaut.collet@6wind.com>
2015-10-22 14:34:49 +03:00
Marc-André Lureau
35f9b6ef3a util: add fallback for qemu_memfd_alloc()
Add an open/unlink/mmap fallback for system that do not support
memfd (only available since 3.17, ~1y ago).

This patch may require additional SELinux policies to work for enforced
systems, but should fail gracefully in this case.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Thibaut Collet <thibaut.collet@6wind.com>
2015-10-22 14:34:49 +03:00
Marc-André Lureau
d3592199ba util: add memfd helpers
Add qemu_memfd_alloc/free() helpers.

The function helps to allocate and seal shared memory.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Thibaut Collet <thibaut.collet@6wind.com>
2015-10-22 14:34:48 +03:00
Marc-André Lureau
f04cf9239a util: add linux-only memfd fallback
Implement memfd_create() fallback if not available in system libc.
memfd_create() is still not included in glibc today, atlhough it's been
available since Linux 3.17 in Oct 2014.

memfd has numerous advantages over traditional shm/mmap for ipc memory
sharing with fd handler, which we are going to make use of for
vhost-user logging memory in following patches.

The next patches are going to introduce helpers to use best practices of
memfd usage and provide some compatibility fallback. memfd.c is thus
temporarily useless and eventually empty if memfd_create() is provided
by the system.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Thibaut Collet <thibaut.collet@6wind.com>
2015-10-22 14:34:48 +03:00
Marc-André Lureau
e279200458 build-sys: split util-obj- on multi-lines
Make it easier to add new unrelated units with shorter lines.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Thibaut Collet <thibaut.collet@6wind.com>
2015-10-22 14:34:48 +03:00
Marc-André Lureau
1842bdfdba linux-headers: add unistd.h
New syscalls are not yet widely distributed. Add them to qemu
linux-headers include directory. Update based on v4.3-rc3 kernel headers.

Exclude mips for now, which is more problematic due to extra header
inclusion and probably unnecessary here.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Thibaut Collet <thibaut.collet@6wind.com>
2015-10-22 14:34:48 +03:00
Marc-André Lureau
751bcc3981 configure: probe for memfd
Check if memfd_create() is part of system libc.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Thibaut Collet <thibaut.collet@6wind.com>
2015-10-22 14:34:48 +03:00
Cornelia Huck
22de58fe15 virtio: add some migration doc
Try to cover the basics of virtio migration.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
2015-10-22 14:34:48 +03:00
Igor Mammedov
aebf81680b vhost: fail backend intialization early
Don't initialize vhost backend if memslots number exceeds the supported
limit. This prevents failures down the road when backend
is actually started.

[MST: rewrite commit log]

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-10-22 14:34:48 +03:00
Igor Mammedov
3fad87881e pc-dimm: add vhost slots limit check before commiting to hotplug
it allows safely cancel memory hotplug if vhost backend
doesn't support necessary amount of memory slots and prevents
QEMU crashing in vhost due to hitting vhost limit on amount
of supported memory ranges.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-10-22 14:34:48 +03:00
Igor Mammedov
2ce68e4cf5 vhost: add vhost_has_free_slot() interface
it will allow for other parts of QEMU check if it's safe
to map memory region during hotplug/runtime.
That way hotplug path will have a chance to cancel
hotplug operation instead of crashing in vhost_commit().

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-10-22 14:34:48 +03:00
Peter Maydell
c1bd899743 Merge remote-tracking branch 'remotes/xtensa/tags/20151021-xtensa' into staging
Xtensa updates:

- fix register window overflow with l32e/s32e instructions;
- make MMU events logging dependent on CPU_LOG_MMU;
- attach FLASH to system I/O region on XTFPGA boards;
- implement depbits and l32nb instructions.

# gpg: Signature made Wed 21 Oct 2015 19:34:02 BST using RSA key ID F83FA044
# gpg: Good signature from "Max Filippov <max.filippov@cogentembedded.com>"
# gpg:                 aka "Max Filippov <jcmvbkbc@gmail.com>"

* remotes/xtensa/tags/20151021-xtensa:
  target-xtensa: implement S32NB
  target-xtensa: implement depbits instruction
  target-xtensa: xtfpga: attach FLASH to system IO
  target-xtensa: use CPU_LOG_MMU for MMU event logging
  target-xtensa: add window overflow check to L32E/S32E

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-21 21:21:29 +01:00
Max Filippov
19b7bec4a3 target-xtensa: implement S32NB
S32NB provides the same functionality as S32I with two exceptions.
First, when its operation leaves the processor, the external transaction
is marked Non-Bufferable. Second, it may not be used to write to
Instruction RAM.
In QEMU S32NB is equivalent to S32I.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2015-10-21 21:29:25 +03:00
Max Filippov
5eeb40c5b1 target-xtensa: implement depbits instruction
This option provides an instruction for depositing a bit field from the
least significant position of one register to an arbitrary position in
another register.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2015-10-21 21:29:25 +03:00
Max Filippov
68931a4082 target-xtensa: xtfpga: attach FLASH to system IO
XTFPGA FLASH is tied to XTFPGA system IO block. It's not very important
for systems with MMU where system IO block is visible at single
location, but it's important for noMMU systems, where system IO block is
accessible through two separate physical address ranges.

Map XTFPGA FLASH to system IO block and fix offsets used for mapping.
Create and initialize FLASH device with series of qdev_prop_set_* as
that's the preferred interface now. Keep initialization in a separate
function.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
2015-10-21 21:28:33 +03:00
Max Filippov
5577e57b07 target-xtensa: use CPU_LOG_MMU for MMU event logging
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2015-10-21 21:27:31 +03:00
Max Filippov
f822b7e497 target-xtensa: add window overflow check to L32E/S32E
Despite L32E and S32E primary use is for window underflow and overflow
exception handlers they are just normal instructions, and thus need to
check for window overflow.

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2015-10-21 21:27:31 +03:00
Peter Maydell
8bfaa25fce Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20151021-v2' into staging
More s390x patches. The first ones are fixes: A regression, missed
compat and a missed part of the SIMD support. The others contain
optimizations and cleanup.

# gpg: Signature made Wed 21 Oct 2015 11:24:48 BST using RSA key ID C6F02FAF
# gpg: Good signature from "Cornelia Huck <huckc@linux.vnet.ibm.com>"
# gpg:                 aka "Cornelia Huck <cornelia.huck@de.ibm.com>"

* remotes/cohuck/tags/s390x-20151021-v2:
  s390x/cmma: clean up cmma reset
  s390x: reset crypto only on clear reset and QEMU reset
  s390x: machine reset function with new ipl cpu handling
  s390x/ipl: we always have an ipl device
  s390x: unify device reset during subsystem_reset()
  s390x: flagify mcic values
  s390x/kvm: Fix vector validity bit in device machine checks
  s390x/virtio-ccw: fix 2.4 virtio compat
  util/qemu-config: fix missing machine command line options

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-21 15:07:42 +01:00
David Hildenbrand
1cd4e0f6f0 s390x/cmma: clean up cmma reset
The cmma reset is per VM, so we don't need a cpu object. We can
directly make use of kvm_state, as it is already available when
the reset is called. By moving the cmma reset in our machine reset
function, we can avoid a manual reset handler.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2015-10-21 12:21:30 +02:00
David Hildenbrand
4ab729207f s390x: reset crypto only on clear reset and QEMU reset
Initializing VM crypto in initial cpu reset has multiple problems

1. We call the exact same function #VCPU times, although one time is enough
2. On SIGP initial cpu reset, we exchange the wrapping key while
   other VCPUs are running. Bad!
3. It is simply wrong. According to the Pop, a reset happens only during a
   clear reset.

So, we have to reset the keys
- on modified clear reset
- on load clear (QEMU reset - via machine reset)
- on qemu start (via machine reset)

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2015-10-21 12:21:30 +02:00
David Hildenbrand
db3b2566e0 s390x: machine reset function with new ipl cpu handling
Current implementation depends on the order of resets getting triggered.

If a cpu reset is triggered after the ipl device reset, the CPU is stopped and
the VM will not run. In fact, that hinders us from converting the ipl device
into a TYPE_DEVICE. Let's change that by manually configuring the ipl cpu
during a system reset, so we have full control and can demangle that code.

Also remove the superflous cpu parameter from s390_update_iplstate on the way.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2015-10-21 12:21:30 +02:00
David Hildenbrand
feacc6c2c8 s390x/ipl: we always have an ipl device
Both s390 machines unconditionally create an ipl device, so no need to
handle the missing case.

Now we can also change s390_ipl_update_diag308() to return void.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2015-10-21 12:21:30 +02:00
David Hildenbrand
09c7f58ca9 s390x: unify device reset during subsystem_reset()
We have to manually reset several devices that are not on a bus: Let's
collect them in an array.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2015-10-21 12:21:29 +02:00
Michael S. Tsirkin
052bd52fa9 net: don't set native endianness
commit 5be7d9f1b1
    vhost-net: tell tap backend about the vnet endianness
makes vhost net always try to set LE - even if that matches the
native endian-ness.

This makes it fail on older kernels on x86 without TUNSETVNETLE support.

To fix, make qemu_set_vnet_le/qemu_set_vnet_be skip the
ioctl if it matches the host endian-ness.

Reported-by: Marcel Apfelbaum <marcel@redhat.com>
Cc: Greg Kurz <gkurz@linux.vnet.ibm.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
2015-10-21 09:24:44 +03:00
Michael S. Tsirkin
794e8f301a exec: factor out duplicate mmap code
Anonymous and file-backed RAM allocation are now almost exactly the same.

Reduce code duplication by moving RAM mmap code out of oslib-posix.c and
exec.c.

Reported-by: Marc-André Lureau <mlureau@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>

Tested-by: Thibaut Collet <thibaut.collet@6wind.com>
2015-10-21 09:24:44 +03:00
Peter Maydell
426c0df9e3 Merge remote-tracking branch 'remotes/berrange/tags/io-channel-3-for-upstream' into staging
Merge io-channels-3 partial branch

# gpg: Signature made Tue 20 Oct 2015 16:36:10 BST using RSA key ID 15104FDF
# gpg: Good signature from "Daniel P. Berrange <dan@berrange.com>"
# gpg:                 aka "Daniel P. Berrange <berrange@redhat.com>"

* remotes/berrange/tags/io-channel-3-for-upstream:
  util: pull Buffer code out of VNC module
  coroutine: move into libqemuutil.a library
  osdep: add qemu_fork() wrapper for safely handling signals
  ui: convert VNC startup code to use SocketAddress
  sockets: allow port to be NULL when listening on IP address
  sockets: move qapi_copy_SocketAddress into qemu-sockets.c
  sockets: add helpers for creating SocketAddress from a socket

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-20 16:51:43 +01:00
Cornelia Huck
b080364aed s390x: flagify mcic values
Instead of using magic values when building the machine check
interruption code, add some defines as by chapter 11-14 in the PoP.

This should make it easier to catch problems like the missing vector
register validity bit ("s390x/kvm: Fix vector validity bit in device
machine checks"), and less hassle should we want to generate machine
checks beyond the channel reports we currently support.

Acked-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2015-10-20 16:21:00 +02:00
Christian Borntraeger
2ab75df38e s390x/kvm: Fix vector validity bit in device machine checks
Device hotplugs trigger a crw machine check. All machine checks
have validity bits for certain register types. With vector support
we also have to claim that vector registers are valid.
This is a band-aid suitable for stable. Long term we should
create the full  mcic value dynamically depending on the active
features in the kernel interrupt handler.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2015-10-20 16:21:00 +02:00
Cornelia Huck
085b0b055b s390x/virtio-ccw: fix 2.4 virtio compat
Commit 542571d5 ("virtio-ccw: enable virtio-1") missed some virtio
devices for the 2.4 compat handling. Add them.

Fixes: 542571d5 ("virtio-ccw: enable virtio-1")
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-10-20 16:21:00 +02:00
Tony Krowiak
5bcfa0c543 util/qemu-config: fix missing machine command line options
Commit 0a7cf217 ("util/qemu-config: fix regression of
qmp_query_command_line_options") aimed to restore parsing of global
machine options, but missed two: "aes-key-wrap" and
"dea-key-wrap" (which were present in the initial version of that
patch). Let's add them to the machine_opts again.

Fixes: 0a7cf217 ("util/qemu-config: fix regression of
                  qmp_query_command_line_options")
CC: Marcel Apfelbaum <marcel@redhat.com>
CC: qemu-stable@nongnu.org
Signed-off-by: Tony Krowiak <akrowiak@linux.vnet.ibm.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Message-Id: <1444664181-28023-1-git-send-email-akrowiak@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2015-10-20 16:21:00 +02:00
Daniel P. Berrange
88c5f205fa util: pull Buffer code out of VNC module
The Buffer code in the VNC server is useful for the IO channel
code, so pull it out into a shared module, QIOBuffer.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2015-10-20 14:59:09 +01:00
Daniel P. Berrange
10817bf09d coroutine: move into libqemuutil.a library
The coroutine files are currently referenced by the block-obj-y
variable. The coroutine functionality though is already used by
more than just the block code. eg migration code uses coroutine
yield. In the future the I/O channel code will also use the
coroutine yield functionality. Since the coroutine code is nicely
self-contained it can be easily built as part of the libqemuutil.a
library, making it widely available.

The headers are also moved into include/qemu, instead of the
include/block directory, since they are now part of the util
codebase, and the impl was never in the block/ directory
either.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2015-10-20 14:59:04 +01:00
Daniel P. Berrange
57cb38b383 osdep: add qemu_fork() wrapper for safely handling signals
When using regular fork() the child process of course inherits
all the parents' signal handlers. If the child then proceeds
to close() any open file descriptors, it may break some of those
registered signal handlers. The child generally does not want to
ever run any of the signal handlers that the parent may have
installed in the short time before it exec's. The parent may also
have blocked various signals which the child process will want
enabled.

This introduces a wrapper qemu_fork() that takes care to sanitize
signal handling across fork. Before forking it blocks all signals
in the parent thread. After fork returns, the parent unblocks the
signals and carries on as usual. The child, however, resets all the
signal handlers back to their defaults before it unblocks signals.
The child process can now exec the binary in a "clean" signal
environment.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2015-10-20 14:40:49 +01:00
Daniel P. Berrange
e0d03b8ceb ui: convert VNC startup code to use SocketAddress
The VNC code is currently using QemuOpts to configure the
sockets connections / listeners it needs. Convert it to
use SocketAddress to bring it in line with modern QAPI
based code elsewhere in QEMU.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2015-10-20 14:40:23 +01:00
Daniel P. Berrange
0983f5e6af sockets: allow port to be NULL when listening on IP address
If the port in the SocketAddress struct is NULL, it can allow
the kernel to automatically select a free port. This is useful
in particular in unit tests to avoid a race trying to find a
free port to run a test case on.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2015-10-20 14:21:45 +01:00
Daniel P. Berrange
2a8e21c7c8 sockets: move qapi_copy_SocketAddress into qemu-sockets.c
The qapi_copy_SocketAddress method is going to be useful
in more places than just qemu-char.c, so move it into
the qemu-sockets.c file to allow its reuse.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2015-10-20 14:15:48 +01:00
Daniel P. Berrange
17c55decec sockets: add helpers for creating SocketAddress from a socket
Add two helper methods that, given a socket file descriptor,
can return a populated SocketAddress struct containing either
the local or remote address information.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2015-10-20 14:15:42 +01:00
Peter Maydell
ee9dfed242 Merge remote-tracking branch 'remotes/kraxel/tags/pull-input-20151020-1' into staging
virtio-input: ignore events until the guest driver is ready

# gpg: Signature made Tue 20 Oct 2015 08:10:00 BST using RSA key ID D3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"

* remotes/kraxel/tags/pull-input-20151020-1:
  virtio-input: ignore events until the guest driver is ready

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-20 12:56:45 +01:00
Peter Maydell
b38c0494c1 Merge remote-tracking branch 'remotes/kraxel/tags/pull-vga-20151020-1' into staging
vga: enable virtio-vga for pseries, vmsvga cursor checks.

# gpg: Signature made Tue 20 Oct 2015 08:27:44 BST using RSA key ID D3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"

* remotes/kraxel/tags/pull-vga-20151020-1:
  vmsvga: more cursor checks
  ppc/spapr: Allow VIRTIO_VGA

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-20 12:17:53 +01:00
Peter Maydell
df81978368 Merge remote-tracking branch 'remotes/kraxel/tags/pull-fw_cfg-20151020-1' into staging
fw_cfg: add dma interface, add strings via cmdline.

# gpg: Signature made Tue 20 Oct 2015 07:07:34 BST using RSA key ID D3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"

* remotes/kraxel/tags/pull-fw_cfg-20151020-1:
  fw_cfg: Define a static signature to be returned on DMA port reads
  Enable fw_cfg DMA interface for x86
  Enable fw_cfg DMA interface for ARM
  Implement fw_cfg DMA interface
  fw_cfg DMA interface documentation
  fw_cfg: document fw_cfg_modify_iXX() update functions
  fw_cfg: insert string blobs via qemu cmdline

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-20 11:45:23 +01:00
Peter Maydell
c14e42d7a4 Merge remote-tracking branch 'remotes/kraxel/tags/pull-usb-20151020-1' into staging
usb: misc small tweaks.

# gpg: Signature made Tue 20 Oct 2015 08:24:09 BST using RSA key ID D3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"

* remotes/kraxel/tags/pull-usb-20151020-1:
  usb-audio: increate default buffer size
  usb: print device id in "info usb" monitor command
  usb-host: add wakeup call for iso xfers

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-20 10:52:56 +01:00
Peter Maydell
f52dd72dc1 Merge remote-tracking branch 'remotes/mdroth/tags/qga-pull-2015-10-14-v4-tag' into staging
qemu-ga patch queue

* add unit tests for qemu-ga
* add guest-exec support for posix/w32 guests
* added 'qemu-ga' target for w32. this allows us to do full MSI build,
  without overloading 'qemu-ga.exe' target with uneeded dependencies.
* number of s/g_new/g_malloc/ conversions for qga

v2:
* commit message and qapi documentation spelling fixes
* rename 'inp-data' guest-exec param to 'input-data'

v3:
* fix OSX build errors for test-qga by using PRId64
  format in place of glib's, and dropping use of G_SPAWN_DEFAULT
  macro for glib 2.22 compat
* fix win32 build warnings for 32-bit builds by avoid int casts
  of process HANDLEs

v4:
* assert connect_qga() doesn't fail
* only enable test-qga for linux hosts
* allow get-memory-block-info* to fail if memory blocks aren't exposed in
  sysfs

# gpg: Signature made Tue 20 Oct 2015 00:33:43 BST using RSA key ID F108B584
# gpg: Good signature from "Michael Roth <flukshun@gmail.com>"
# gpg:                 aka "Michael Roth <mdroth@utexas.edu>"
# gpg:                 aka "Michael Roth <mdroth@linux.vnet.ibm.com>"

* remotes/mdroth/tags/qga-pull-2015-10-14-v4-tag:
  qga: fix uninitialized value warning for win32
  qga: guest-exec simple stdin/stdout/stderr redirection
  qga: handle G_IO_STATUS_AGAIN in ga_channel_write_all()
  qga: handle possible SIGPIPE in guest-file-write
  qga: guest exec functionality
  qga: drop guest_file_init helper and replace it with static initializers
  tests: add a local test for guest agent
  qga: guest-get-memory-blocks shouldn't fail for unexposed memory blocks
  glib-compat: add 2.38/2.40/2.46 asserts
  qtest: add a few fd-level qmp helpers
  qga: do not override configuration verbosity
  qga: add QGA_CONF environment variable
  qga: Use g_new() & friends where that makes obvious sense
  build: qemu-ga: add 'qemu-ga' build target for w32

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-20 09:04:20 +01:00
Gerd Hoffmann
5829b09720 vmsvga: more cursor checks
Check the cursor size more carefully.  Also switch to unsigned while
being at it, so they can't be negative.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2015-10-20 09:26:36 +02:00
Benjamin Herrenschmidt
b798c19057 ppc/spapr: Allow VIRTIO_VGA
It works fine with the Linux driver out of the box

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2015-10-20 09:26:36 +02:00
Gerd Hoffmann
37bc43f7fb usb-audio: increate default buffer size
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2015-10-20 09:15:23 +02:00
Gerd Hoffmann
974826f0ab usb: print device id in "info usb" monitor command
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2015-10-20 09:15:23 +02:00
Gerd Hoffmann
e206ddfb57 usb-host: add wakeup call for iso xfers
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2015-10-20 09:15:23 +02:00
Michael Roth
e853ea1cc6 qga: fix uninitialized value warning for win32
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-10-19 18:31:54 -05:00
Yuri Pudgorodskiy
a1853dca74 qga: guest-exec simple stdin/stdout/stderr redirection
Implemented with base64-encoded strings in qga json protocol.
Glib portable GIOChannel is used for data I/O.

Optinal stdin parameter of guest-exec command is now used as
stdin content for spawned subprocess.

If capture-output bool flag is specified, guest-exec redirects out/err
file descriptiors internally to pipes and collects subprocess
output.

Guest-exe-status is modified to return this collected data to requestor
in base64 encoding.

Signed-off-by: Yuri Pudgorodskiy <yur@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
* switch from 'struct GuestIOExecData' to 'GuestIOExecData'
* s/TRUE/true/g, s/FALSE/false/g for gboolean return values
* s/inp_data/input_data/
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-10-19 18:31:54 -05:00
Yuri Pudgorodskiy
f74df9bfce qga: handle G_IO_STATUS_AGAIN in ga_channel_write_all()
glib may return G_IO_STATUS_AGAIN which is actually not an error.
Also fixed a bug when on incomplete write buf pointer was not adjusted.

Signed-off-by: Yuri Pudgorodskiy <yur@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-10-19 18:31:54 -05:00
Denis V. Lunev
4005b4732e qga: handle possible SIGPIPE in guest-file-write
qemu-ga should not exit on guest-file-write to pipe without read end
but proper error code should be returned. The behavior of the
spawned process should be default thus SIGPIPE processing should be
reset to default after fork() but before exec().

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Yuri Pudgorodskiy <yur@virtuozzo.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-10-19 18:31:54 -05:00
Yuri Pudgorodskiy
d697e30cff qga: guest exec functionality
Guest-exec rewritten in platform-independent style with glib spawn.

Child process is spawn asynchronously and exit status can later
be picked up by guest-exec-status command.

stdin/stdout/stderr of the child now is redirected to /dev/null
Later we will add ability to specify stdin in guest-exec command
and to get collected stdout/stderr with guest-exec-status.

Signed-off-by: Yuri Pudgorodskiy <yur@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Michael Roth <mdroth@linux.vnet.ibm.com>
* use g_new0 in place of g_malloc for GuestExec struct
* commit msg spelling fixes
* s/inp-data/input-data
* document capture-input mode as false by default
* use GetProcessId() for pids on w32 instead of casting HANDLE
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-10-19 18:31:53 -05:00
Denis V. Lunev
b4fe97c823 qga: drop guest_file_init helper and replace it with static initializers
This just makes code shorter and better.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Yuri Pudgorodskiy <yur@virtuozzo.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-10-19 18:31:53 -05:00
Marc-André Lureau
62c39b307b tests: add a local test for guest agent
Add some local guest agent tests, as it is better than nothing, only
when CONFIG_POSIX (using unix sockets).

With the QGA_TEST_SIDE_EFFECTING environment variable, it will include
tests with side effects, such as freezing/thawing the FS or changing the
time.

(a better test would involve a managed VM (or container), but it might
be better to leave that off to autotest/avocado)

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
* use mkdtemp() in placeof g_mkdtemp() for glib 2.22 compat
* drop redundant/conflicting compat defines for
  g_assert_{true,false}, since glib-compat has them now.
* build fixes for OSX: use PRId64 instead of glib formats, drop
  g_spawn_default usage for glib compat
* assert connect_qga() doesn't fail
* only enable test-qga for linux hosts
* allow get-memory-block-info* to fail
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-10-19 18:30:50 -05:00
Michael Roth
f693fe6ef4 qga: guest-get-memory-blocks shouldn't fail for unexposed memory blocks
Some guests don't expose memory blocks via sysfs at all. This
shouldn't be a failure, instead just return an empty list. For
other access failures we still report an error.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-10-19 18:28:07 -05:00
Marc-André Lureau
8a0b5421a0 glib-compat: add 2.38/2.40/2.46 asserts
Those are mostly useful for writing tests.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-10-19 18:28:06 -05:00
Marc-André Lureau
dc47995e52 qtest: add a few fd-level qmp helpers
Add a few functions to interact with qmp via a simple fd.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-10-19 18:28:06 -05:00
Marc-André Lureau
6eaeae37a5 qga: do not override configuration verbosity
Move the default verbosity settings before loading the configuration
file, or it will overwrite it. Found thanks to writing qga tests :)

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-10-19 18:28:06 -05:00
Marc-André Lureau
8e34bf364a qga: add QGA_CONF environment variable
Having a environment variable allows to override default configuration
path, useful for testing. Note that this can't easily be an argument,
since loading config is done before parsing the arguments.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-10-19 18:28:06 -05:00
Markus Armbruster
f3a06403b8 qga: Use g_new() & friends where that makes obvious sense
g_new(T, n) is neater than g_malloc(sizeof(T) * n).  It's also safer,
for two reasons.  One, it catches multiplication overflowing size_t.
Two, it returns T * rather than void *, which lets the compiler catch
more type errors.

This commit only touches allocations with size arguments of the form
sizeof(T).  Same Coccinelle semantic patch as in commit b45c03f.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-10-19 18:28:06 -05:00
Michael Roth
fafcaf1d74 build: qemu-ga: add 'qemu-ga' build target for w32
Currently POSIX builds rely on 'qemu-ga' target to do qga-only
distributable build. On w32, as with most standalone binary targets,
we rely on 'qemu-ga.exe' target.

Unlike with POSIX, qemu-ga for w32 has a number of related targets
such as VSS DLL and MSI package. We can do the full distributable
qga-only build on w32 with:

  make qemu-ga.exe

or:

  make msi

To make that work, we tie VSS dependencies onto qemu-ga.exe.
However, in reality the DLL isn't part of the binary, so we use a
filter to pull them out of the LINK recipe, which attempts to link
against prereqs for binary targets. Additionally, it could be argued
that VSS is a separate distributable, and shouldn't be implied by
qemu-ga.exe binary target.

To avoid this, we can tie the VSS dependencies only to the 'msi'
target, but that would make it impossible to do a qga-only build of
the w32 distributable without building the 'msi' package, which was
supported in the past.

An alternative approach is to add a new target to build the whole
distributable. w32 allows us to use the same build target we use
on POSIX, 'qemu-ga', since the current binary-only target on w32
is 'qemu-ga.exe'.

To further simplify the build, we also make 'qemu-ga' build the MSI
package if the appropriate ./configure options are set, making the
full qga-only build the same on both POSIX and w32: `make qemu-ga`

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2015-10-19 18:28:06 -05:00
Richard Henderson
89a82cd4b6 cpu-exec: Add "nochain" debug flag
Respect it to avoid linking TBs together.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-10-19 11:04:39 -10:00
James Hogan
137d63902f tcg/mips: Support r6 SEL{NE, EQ}Z instead of MOVN/MOVZ
Extend MIPS movcond implementation to support the SELNEZ/SELEQZ
instructions introduced in MIPS r6 (where MOVN/MOVZ have been removed).

Whereas the "MOVN/MOVZ rd, rs, rt" instructions have the following
semantics:
 rd = [!]rt ? rs : rd

The "SELNEZ/SELEQZ rd, rs, rt" instructions are slightly different:
 rd = [!]rt ? rs : 0

First we ensure that if one of the movcond input values is zero that it
comes last (we can swap the input arguments if we invert the condition).
This is so that it can exactly match one of the SELNEZ/SELEQZ
instructions and avoid the need to emit the other one.

Otherwise we emit the opposite instruction first into a temporary
register, and OR that into the result:
 SELNEZ/SELEQZ  TMP1, v2, c1
 SELEQZ/SELNEZ  ret, v1, c1
 OR             ret, ret, TMP1

Which does the following:
 ret = cond ? v1 : v2

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Message-Id: <1443788657-14537-7-git-send-email-james.hogan@imgtec.com>
2015-10-19 11:04:39 -10:00
James Hogan
bc6d0c22b0 tcg/mips: Support r6 multiply/divide encodings
MIPSr6 adds several new integer multiply, divide, and modulo
instructions, and removes several pre-r6 encodings, along with the HI/LO
registers which were the implicit operands of some of those
instructions. Update TCG to use the new instructions when built for r6.

The new instructions actually map much more directly to the TCG ops, as
they only provide a single 32-bit half of the result and in a normal
general purpose register instead of HI or LO.

The mulu2_i32 and muls2_i32 operations are no longer appropriate for r6,
so they are removed from the TCG opcode table. This is because they
would need to emit two separate host instructions anyway (for the high
and low half of the result), which TCG can arrange automatically for us
in the absense of mulu2_i32/muls2_i32 by splitting it into mul_i32 and
mul*h_i32 TCG ops.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Message-Id: <1443788657-14537-6-git-send-email-james.hogan@imgtec.com>
2015-10-19 11:04:39 -10:00
James Hogan
6e0d096989 tcg/mips: Support r6 JR encoding
MIPSr6 encodes JR as JALR with zero as the link register, and the pre-r6
JR encoding is removed. Update TCG to use the new encoding when built
for r6.

We still use the old encoding for pre-r6, so as not to confuse return
prediction stack hardware which may detect only particular encodings of
the return instruction.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Message-Id: <1443788657-14537-5-git-send-email-james.hogan@imgtec.com>
2015-10-19 11:04:38 -10:00
James Hogan
ce14bd4d46 tcg/mips: Add use_mips32r6_instructions definition
Add definition use_mips32r6_instructions to the MIPS TCG backend which
is constant 1 when built for MIPS release 6. This will be used to decide
between pre-R6 and R6 instruction encodings.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Message-Id: <1443788657-14537-4-git-send-email-james.hogan@imgtec.com>
2015-10-19 11:04:38 -10:00
James Hogan
d76f365350 disas/mips: Add R6 jr/jr.hb to disassembler
MIPS r6 encodes jr as jalr zero, and jr.hb as jalr.hb zero, so add these
encodings to the MIPS disassembly table.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Leon Alrae <leon.alrae@imgtec.com>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Message-Id: <1443788657-14537-3-git-send-email-james.hogan@imgtec.com>
2015-10-19 11:04:38 -10:00
James Hogan
c0e40dbdcc tcg-opc.h: Simplify insn_start def
We already have a TLADDR_ARGS definition, so rearrange the order
slightly and use it in the definition of insn_start, instead of
having an #ifdef.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Message-Id: <1443788657-14537-2-git-send-email-james.hogan@imgtec.com>
2015-10-19 11:04:38 -10:00
Richard Henderson
1e1df962e3 tcg/ppc: Prefer mask over andi.
Prefer the instruction that isn't required to modify cr0.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-10-19 11:04:38 -10:00
Richard Henderson
5bfd75a35c tcg/ppc: Revise goto_tb implementation
Restrict the size of code_gen_buffer to 2GB on ppc64, which
lets us assert that everything is reachable with addis+addi
from tb_ret_addr.  This lets us use a max of 4 insns for goto_tb
instead of 7.

Emit the indirect branch portion of goto_tb up front, which
means we only have to update two insns to update any link.
With a 64-bit store, we can update the link atomically, which
may be required in future.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-10-19 11:04:37 -10:00
Richard Henderson
70f897bdc4 tcg/ppc: Adjust exit_tb for change in prologue placement
Changing the prologue to the beginning of the code_gen_buffer
changes the direction of the "return" branch.  Need to change
the logic to match.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-10-19 11:04:37 -10:00
Kevin O'Connor
2cc06a8843 fw_cfg: Define a static signature to be returned on DMA port reads
Return a static signature ("QEMU CFG") if the guest does a read to the
DMA address io register.

Signed-off-by: Kevin O'Connor <kevin@koconnor.net>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2015-10-19 15:26:54 +02:00
Marc Marí
c886fc4c20 Enable fw_cfg DMA interface for x86
Enable the fw_cfg DMA interface for all the x86 platforms.

Based on Gerd Hoffman's initial implementation.

Signed-off-by: Marc Marí <markmb@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2015-10-19 15:26:53 +02:00
Marc Marí
0b341a85ca Enable fw_cfg DMA interface for ARM
Enable the fw_cfg DMA interface for the ARM virt machine.

Based on Gerd Hoffman's initial implementation.

Signed-off-by: Marc Marí <markmb@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2015-10-19 15:26:53 +02:00
Marc Marí
a4c0d1deb7 Implement fw_cfg DMA interface
Based on the specifications on docs/specs/fw_cfg.txt

This interface is an addon. The old interface can still be used as usual.

Based on Gerd Hoffman's initial implementation.

Signed-off-by: Marc Marí <markmb@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2015-10-19 15:26:53 +02:00
Marc Marí
c9eae1d4b9 fw_cfg DMA interface documentation
Add fw_cfg DMA interface specification in the documentation.

Based on Gerd Hoffman's initial implementation.

Signed-off-by: Marc Marí <markmb@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2015-10-19 15:26:53 +02:00
Gabriel L. Somlo
57c3d238a5 fw_cfg: document fw_cfg_modify_iXX() update functions
Document the behavior of fw_cfg_modify_iXX() for leak-less updating
of integer-type blobs.

Currently only fw_cfg_modify_i16() is coded, but 32- and 64-bit versions
may be added later if necessary..

Signed-off-by: Gabriel Somlo <somlo@cmu.edu>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2015-10-19 15:26:53 +02:00
Gabriel L. Somlo
6407d76eb4 fw_cfg: insert string blobs via qemu cmdline
Allow users to provide custom fw_cfg blobs with ascii string
payloads specified directly on the qemu command line.

Suggested-by: Jordan Justen <jordan.l.justen@intel.com>
Suggested-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Gabriel Somlo <somlo@cmu.edu>
Message-id: 1443544141-26568-1-git-send-email-somlo@cmu.edu
Reviewd-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2015-10-19 15:26:53 +02:00
Knut Omang
7df953bd45 intel_iommu: Add support for translation for devices behind bridges
- Use a hash table indexed on bus pointers to store information about buses
  instead of using the bus numbers.
  Bus pointers are stored in a new VTDBus struct together with the vector
  of device address space pointers indexed by devfn.
- The bus number is still used for lookup for selective SID based invalidate,
  in which case the bus number is lazily resolved from the bus hash table and
  cached in a separate index.

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-10-18 10:05:43 +03:00
684 changed files with 35013 additions and 7731 deletions

2
.gitignore vendored
View File

@@ -24,6 +24,8 @@
/*-darwin-user
/*-linux-user
/*-bsd-user
/ivshmem-client
/ivshmem-server
/libdis*
/libuser
/linux-headers/asm

View File

@@ -62,14 +62,22 @@ Guest CPU cores (TCG):
----------------------
Overall
L: qemu-devel@nongnu.org
S: Odd fixes
M: Paolo Bonzini <pbonzini@redhat.com>
M: Peter Crosthwaite <crosthwaite.peter@gmail.com>
M: Richard Henderson <rth@twiddle.net>
S: Maintained
F: cpu-exec.c
F: cpu-exec-common.c
F: cpus.c
F: cputlb.c
F: exec.c
F: softmmu_template.h
F: translate-all.c
F: include/exec/cpu_ldst.h
F: include/exec/cpu_ldst_template.h
F: translate-all.*
F: translate-common.c
F: include/exec/cpu*.h
F: include/exec/exec-all.h
F: include/exec/helper*.h
F: include/exec/tb-hash.h
Alpha
M: Richard Henderson <rth@twiddle.net>
@@ -81,6 +89,7 @@ F: disas/alpha.c
ARM
M: Peter Maydell <peter.maydell@linaro.org>
L: qemu-arm@nongnu.org
S: Maintained
F: target-arm/
F: hw/arm/
@@ -216,6 +225,7 @@ F: */kvm.*
ARM
M: Peter Maydell <peter.maydell@linaro.org>
L: qemu-arm@nongnu.org
S: Maintained
F: target-arm/kvm.c
@@ -235,9 +245,14 @@ M: Cornelia Huck <cornelia.huck@de.ibm.com>
M: Alexander Graf <agraf@suse.de>
S: Maintained
F: target-s390x/kvm.c
F: target-s390x/ioinst.[ch]
F: target-s390x/machine.c
F: hw/intc/s390_flic.c
F: hw/intc/s390_flic_kvm.c
F: include/hw/s390x/s390_flic.h
F: gdb-xml/s390*.xml
T: git git://github.com/cohuck/qemu.git s390-next
T: git git://github.com/borntraeger/qemu.git s390-next
X86
M: Paolo Bonzini <pbonzini@redhat.com>
@@ -287,6 +302,7 @@ ARM Machines
------------
Allwinner-a10
M: Beniamino Galvani <b.galvani@gmail.com>
L: qemu-arm@nongnu.org
S: Maintained
F: hw/*/allwinner*
F: include/hw/*/allwinner*
@@ -294,6 +310,7 @@ F: hw/arm/cubieboard.c
ARM PrimeCell
M: Peter Maydell <peter.maydell@linaro.org>
L: qemu-arm@nongnu.org
S: Maintained
F: hw/char/pl011.c
F: hw/display/pl110*
@@ -308,6 +325,7 @@ F: include/hw/arm/primecell.h
ARM cores
M: Peter Maydell <peter.maydell@linaro.org>
L: qemu-arm@nongnu.org
S: Maintained
F: hw/intc/arm*
F: hw/intc/gic_internal.h
@@ -327,54 +345,64 @@ M: Evgeny Voevodin <e.voevodin@samsung.com>
M: Maksim Kozlov <m.kozlov@samsung.com>
M: Igor Mitsyanko <i.mitsyanko@gmail.com>
M: Dmitry Solodkiy <d.solodkiy@samsung.com>
L: qemu-arm@nongnu.org
S: Maintained
F: hw/*/exynos*
Calxeda Highbank
M: Rob Herring <robh@kernel.org>
L: qemu-arm@nongnu.org
S: Maintained
F: hw/arm/highbank.c
F: hw/net/xgmac.c
Canon DIGIC
M: Antony Pavlov <antonynpavlov@gmail.com>
L: qemu-arm@nongnu.org
S: Maintained
F: include/hw/arm/digic.h
F: hw/*/digic*
Gumstix
L: qemu-devel@nongnu.org
L: qemu-arm@nongnu.org
S: Orphan
F: hw/arm/gumstix.c
i.MX31
M: Peter Chubb <peter.chubb@nicta.com.au>
L: qemu-arm@nongnu.org
S: Odd fixes
F: hw/*/imx*
F: hw/arm/kzm.c
Integrator CP
M: Peter Maydell <peter.maydell@linaro.org>
L: qemu-arm@nongnu.org
S: Maintained
F: hw/arm/integratorcp.c
Musicpal
M: Jan Kiszka <jan.kiszka@web.de>
L: qemu-arm@nongnu.org
S: Maintained
F: hw/arm/musicpal.c
nSeries
M: Andrzej Zaborowski <balrogg@gmail.com>
L: qemu-arm@nongnu.org
S: Maintained
F: hw/arm/nseries.c
Palm
M: Andrzej Zaborowski <balrogg@gmail.com>
L: qemu-arm@nongnu.org
S: Maintained
F: hw/arm/palm.c
Real View
M: Peter Maydell <peter.maydell@linaro.org>
L: qemu-arm@nongnu.org
S: Maintained
F: hw/arm/realview*
F: hw/intc/realview_gic.c
@@ -382,6 +410,7 @@ F: include/hw/intc/realview_gic.h
PXA2XX
M: Andrzej Zaborowski <balrogg@gmail.com>
L: qemu-arm@nongnu.org
S: Maintained
F: hw/arm/mainstone.c
F: hw/arm/spitz.c
@@ -391,17 +420,20 @@ F: hw/*/pxa2xx*
Stellaris
M: Peter Maydell <peter.maydell@linaro.org>
L: qemu-arm@nongnu.org
S: Maintained
F: hw/*/stellaris*
Versatile PB
M: Peter Maydell <peter.maydell@linaro.org>
L: qemu-arm@nongnu.org
S: Maintained
F: hw/*/versatile*
Xilinx Zynq
M: Alistair Francis <alistair.francis@xilinx.com>
M: Peter Crosthwaite <crosthwaite.peter@gmail.com>
L: qemu-arm@nongnu.org
S: Maintained
F: hw/arm/xilinx_zynq.c
F: hw/misc/zynq_slcr.c
@@ -411,6 +443,7 @@ F: hw/ssi/xilinx_spips.c
Xilinx ZynqMP
M: Alistair Francis <alistair.francis@xilinx.com>
M: Peter Crosthwaite <crosthwaite.peter@gmail.com>
L: qemu-arm@nongnu.org
S: Maintained
F: hw/arm/xlnx-zynqmp.c
F: hw/arm/xlnx-ep108.c
@@ -419,6 +452,7 @@ F: include/hw/arm/xlnx-zynqmp.h
ARM ACPI Subsystem
M: Shannon Zhao <zhaoshenglong@huawei.com>
M: Shannon Zhao <shannon.zhao@linaro.org>
L: qemu-arm@nongnu.org
S: Maintained
F: hw/arm/virt-acpi-build.c
F: include/hw/arm/virt-acpi-build.h
@@ -616,15 +650,13 @@ M: Christian Borntraeger <borntraeger@de.ibm.com>
M: Alexander Graf <agraf@suse.de>
S: Supported
F: hw/char/sclp*.[hc]
F: hw/s390x/s390-virtio-ccw.c
F: hw/s390x/css.[hc]
F: hw/s390x/sclp*.[hc]
F: hw/s390x/ipl*.[hc]
F: hw/s390x/*pci*.[hc]
F: hw/s390x/s390-skeys*.c
F: hw/s390x/
X: hw/s390x/s390-virtio-bus.[ch]
F: include/hw/s390x/
F: pc-bios/s390-ccw/
T: git git://github.com/cohuck/qemu virtio-ccw-upstr
F: hw/watchdog/wdt_diag288.c
T: git git://github.com/cohuck/qemu.git s390-next
T: git git://github.com/borntraeger/qemu.git s390-next
UniCore32 Machines
-------------
@@ -831,6 +863,7 @@ F: net/vhost-user.c
virtio-9p
M: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
M: Greg Kurz <gkurz@linux.vnet.ibm.com>
S: Supported
F: hw/9pfs/
F: fsdev/
@@ -851,7 +884,8 @@ M: Cornelia Huck <cornelia.huck@de.ibm.com>
M: Christian Borntraeger <borntraeger@de.ibm.com>
S: Supported
F: hw/s390x/virtio-ccw.[hc]
T: git git://github.com/cohuck/qemu virtio-ccw-upstr
T: git git://github.com/cohuck/qemu.git s390-next
T: git git://github.com/borntraeger/qemu.git s390-next
virtio-input
M: Gerd Hoffmann <kraxel@redhat.com>
@@ -1017,6 +1051,7 @@ S: Supported
F: include/exec/ioport.h
F: ioport.c
F: include/exec/memory.h
F: include/exec/ram_addr.h
F: memory.c
F: include/exec/memory-internal.h
F: exec.c
@@ -1139,6 +1174,8 @@ F: include/qom/
X: include/qom/cpu.h
F: qom/
X: qom/cpu.c
F: tests/check-qom-interface.c
F: tests/check-qom-proplist.c
F: tests/qom-test.c
QMP
@@ -1193,6 +1230,19 @@ F: crypto/
F: include/crypto/
F: tests/test-crypto-*
Coroutines
M: Stefan Hajnoczi <stefanha@redhat.com>
M: Kevin Wolf <kwolf@redhat.com>
F: util/*coroutine*
F: include/qemu/coroutine*
F: tests/test-coroutine.c
Buffers
M: Daniel P. Berrange <berrange@redhat.com>
S: Odd fixes
F: util/buffer.c
F: include/qemu/buffer.h
Usermode Emulation
------------------
Overall
@@ -1222,6 +1272,7 @@ AArch64 target
M: Claudio Fontana <claudio.fontana@huawei.com>
M: Claudio Fontana <claudio.fontana@gmail.com>
S: Maintained
L: qemu-arm@nongnu.org
F: tcg/aarch64/
F: disas/arm-a64.cc
F: disas/libvixl/
@@ -1229,6 +1280,7 @@ F: disas/libvixl/
ARM target
M: Andrzej Zaborowski <balrogg@gmail.com>
S: Maintained
L: qemu-arm@nongnu.org
F: tcg/arm/
F: disas/arm.c

View File

@@ -151,6 +151,8 @@ dummy := $(call unnest-vars,, \
stub-obj-y \
util-obj-y \
qga-obj-y \
ivshmem-client-obj-y \
ivshmem-server-obj-y \
qga-vss-dll-obj-y \
block-obj-y \
block-obj-m \
@@ -298,18 +300,15 @@ $(qapi-modules) $(SRC_PATH)/scripts/qapi-introspect.py $(qapi-py)
QGALIB_GEN=$(addprefix qga/qapi-generated/, qga-qapi-types.h qga-qapi-visit.h qga-qmp-commands.h)
$(qga-obj-y) qemu-ga.o: $(QGALIB_GEN)
# we require QGA_VSS_PROVIDER files to be built alongside qemu-ga
# executable since they are shipped together, but we don't want to actually
# link against them
qemu-ga$(EXESUF): $(qga-obj-y) libqemuutil.a libqemustub.a $(QGA_VSS_PROVIDER)
$(call LINK, $(filter-out $(QGA_VSS_PROVIDER), $^))
qemu-ga$(EXESUF): $(qga-obj-y) libqemuutil.a libqemustub.a
$(call LINK, $^)
ifdef QEMU_GA_MSI_ENABLED
QEMU_GA_MSI=qemu-ga-$(ARCH).msi
msi: $(QEMU_GA_MSI)
$(QEMU_GA_MSI): qemu-ga.exe
$(QEMU_GA_MSI): qemu-ga.exe $(QGA_VSS_PROVIDER)
$(QEMU_GA_MSI): config-host.mak
@@ -321,6 +320,16 @@ msi:
@echo "MSI build not configured or dependency resolution failed (reconfigure with --enable-guest-agent-msi option)"
endif
ifneq ($(EXESUF),)
.PHONY: qemu-ga
qemu-ga: qemu-ga$(EXESUF) $(QGA_VSS_PROVIDER) $(QEMU_GA_MSI)
endif
ivshmem-client$(EXESUF): $(ivshmem-client-obj-y)
$(call LINK, $^)
ivshmem-server$(EXESUF): $(ivshmem-server-obj-y) libqemuutil.a libqemustub.a
$(call LINK, $^)
clean:
# avoid old build problems by removing potentially incorrect old files
rm -f config.mak op-i386.h opc-i386.h gen-op-i386.h op-arm.h opc-arm.h gen-op-arm.h
@@ -431,7 +440,7 @@ endif
install: all $(if $(BUILD_DOCS),install-doc) \
install-datadir install-localstatedir
ifneq ($(TOOLS),)
$(call install-prog,$(TOOLS),$(DESTDIR)$(bindir))
$(call install-prog,$(subst qemu-ga,qemu-ga$(EXESUF),$(TOOLS)),$(DESTDIR)$(bindir))
endif
ifneq ($(CONFIG_MODULES),)
$(INSTALL_DIR) "$(DESTDIR)$(qemu_moddir)"

View File

@@ -15,10 +15,6 @@ block-obj-$(CONFIG_WIN32) += aio-win32.o
block-obj-y += block/
block-obj-y += qemu-io-cmds.o
block-obj-y += qemu-coroutine.o qemu-coroutine-lock.o qemu-coroutine-io.o
block-obj-y += qemu-coroutine-sleep.o
block-obj-y += coroutine-$(CONFIG_COROUTINE_BACKEND).o
block-obj-m = block/
#######################################################################
@@ -58,6 +54,8 @@ common-obj-y += audio/
common-obj-y += hw/
common-obj-y += accel.o
common-obj-y += replay/
common-obj-y += ui/
common-obj-y += bt-host.o bt-vhci.o
bt-host.o-cflags := $(BLUEZ_CFLAGS)
@@ -108,3 +106,8 @@ target-obj-y += trace/
# by libqemuutil.a. These should be moved to a separate .json schema.
qga-obj-y = qga/
qga-vss-dll-obj-y = qga/
######################################################################
# contrib
ivshmem-client-obj-y = contrib/ivshmem-client/
ivshmem-server-obj-y = contrib/ivshmem-server/

View File

@@ -1 +1 @@
2.4.50
2.4.94

View File

@@ -17,6 +17,9 @@
#include "block/block.h"
#include "qemu/queue.h"
#include "qemu/sockets.h"
#ifdef CONFIG_EPOLL
#include <sys/epoll.h>
#endif
struct AioHandler
{
@@ -25,9 +28,166 @@ struct AioHandler
IOHandler *io_write;
int deleted;
void *opaque;
bool is_external;
QLIST_ENTRY(AioHandler) node;
};
#ifdef CONFIG_EPOLL
/* The fd number threashold to switch to epoll */
#define EPOLL_ENABLE_THRESHOLD 64
static void aio_epoll_disable(AioContext *ctx)
{
ctx->epoll_available = false;
if (!ctx->epoll_enabled) {
return;
}
ctx->epoll_enabled = false;
close(ctx->epollfd);
}
static inline int epoll_events_from_pfd(int pfd_events)
{
return (pfd_events & G_IO_IN ? EPOLLIN : 0) |
(pfd_events & G_IO_OUT ? EPOLLOUT : 0) |
(pfd_events & G_IO_HUP ? EPOLLHUP : 0) |
(pfd_events & G_IO_ERR ? EPOLLERR : 0);
}
static bool aio_epoll_try_enable(AioContext *ctx)
{
AioHandler *node;
struct epoll_event event;
QLIST_FOREACH(node, &ctx->aio_handlers, node) {
int r;
if (node->deleted || !node->pfd.events) {
continue;
}
event.events = epoll_events_from_pfd(node->pfd.events);
event.data.ptr = node;
r = epoll_ctl(ctx->epollfd, EPOLL_CTL_ADD, node->pfd.fd, &event);
if (r) {
return false;
}
}
ctx->epoll_enabled = true;
return true;
}
static void aio_epoll_update(AioContext *ctx, AioHandler *node, bool is_new)
{
struct epoll_event event;
int r;
if (!ctx->epoll_enabled) {
return;
}
if (!node->pfd.events) {
r = epoll_ctl(ctx->epollfd, EPOLL_CTL_DEL, node->pfd.fd, &event);
if (r) {
aio_epoll_disable(ctx);
}
} else {
event.data.ptr = node;
event.events = epoll_events_from_pfd(node->pfd.events);
if (is_new) {
r = epoll_ctl(ctx->epollfd, EPOLL_CTL_ADD, node->pfd.fd, &event);
if (r) {
aio_epoll_disable(ctx);
}
} else {
r = epoll_ctl(ctx->epollfd, EPOLL_CTL_MOD, node->pfd.fd, &event);
if (r) {
aio_epoll_disable(ctx);
}
}
}
}
static int aio_epoll(AioContext *ctx, GPollFD *pfds,
unsigned npfd, int64_t timeout)
{
AioHandler *node;
int i, ret = 0;
struct epoll_event events[128];
assert(npfd == 1);
assert(pfds[0].fd == ctx->epollfd);
if (timeout > 0) {
ret = qemu_poll_ns(pfds, npfd, timeout);
}
if (timeout <= 0 || ret > 0) {
ret = epoll_wait(ctx->epollfd, events,
sizeof(events) / sizeof(events[0]),
timeout);
if (ret <= 0) {
goto out;
}
for (i = 0; i < ret; i++) {
int ev = events[i].events;
node = events[i].data.ptr;
node->pfd.revents = (ev & EPOLLIN ? G_IO_IN : 0) |
(ev & EPOLLOUT ? G_IO_OUT : 0) |
(ev & EPOLLHUP ? G_IO_HUP : 0) |
(ev & EPOLLERR ? G_IO_ERR : 0);
}
}
out:
return ret;
}
static bool aio_epoll_enabled(AioContext *ctx)
{
/* Fall back to ppoll when external clients are disabled. */
return !aio_external_disabled(ctx) && ctx->epoll_enabled;
}
static bool aio_epoll_check_poll(AioContext *ctx, GPollFD *pfds,
unsigned npfd, int64_t timeout)
{
if (!ctx->epoll_available) {
return false;
}
if (aio_epoll_enabled(ctx)) {
return true;
}
if (npfd >= EPOLL_ENABLE_THRESHOLD) {
if (aio_epoll_try_enable(ctx)) {
return true;
} else {
aio_epoll_disable(ctx);
}
}
return false;
}
#else
static void aio_epoll_update(AioContext *ctx, AioHandler *node, bool is_new)
{
}
static int aio_epoll(AioContext *ctx, GPollFD *pfds,
unsigned npfd, int64_t timeout)
{
assert(false);
}
static bool aio_epoll_enabled(AioContext *ctx)
{
return false;
}
static bool aio_epoll_check_poll(AioContext *ctx, GPollFD *pfds,
unsigned npfd, int64_t timeout)
{
return false;
}
#endif
static AioHandler *find_aio_handler(AioContext *ctx, int fd)
{
AioHandler *node;
@@ -43,11 +203,14 @@ static AioHandler *find_aio_handler(AioContext *ctx, int fd)
void aio_set_fd_handler(AioContext *ctx,
int fd,
bool is_external,
IOHandler *io_read,
IOHandler *io_write,
void *opaque)
{
AioHandler *node;
bool is_new = false;
bool deleted = false;
node = find_aio_handler(ctx, fd);
@@ -66,7 +229,7 @@ void aio_set_fd_handler(AioContext *ctx,
* releasing the walking_handlers lock.
*/
QLIST_REMOVE(node, node);
g_free(node);
deleted = true;
}
}
} else {
@@ -77,25 +240,32 @@ void aio_set_fd_handler(AioContext *ctx,
QLIST_INSERT_HEAD(&ctx->aio_handlers, node, node);
g_source_add_poll(&ctx->source, &node->pfd);
is_new = true;
}
/* Update handler with latest information */
node->io_read = io_read;
node->io_write = io_write;
node->opaque = opaque;
node->is_external = is_external;
node->pfd.events = (io_read ? G_IO_IN | G_IO_HUP | G_IO_ERR : 0);
node->pfd.events |= (io_write ? G_IO_OUT | G_IO_ERR : 0);
}
aio_epoll_update(ctx, node, is_new);
aio_notify(ctx);
if (deleted) {
g_free(node);
}
}
void aio_set_event_notifier(AioContext *ctx,
EventNotifier *notifier,
bool is_external,
EventNotifierHandler *io_read)
{
aio_set_fd_handler(ctx, event_notifier_get_fd(notifier),
(IOHandler *)io_read, NULL, notifier);
is_external, (IOHandler *)io_read, NULL, notifier);
}
bool aio_prepare(AioContext *ctx)
@@ -257,7 +427,9 @@ bool aio_poll(AioContext *ctx, bool blocking)
/* fill pollfds */
QLIST_FOREACH(node, &ctx->aio_handlers, node) {
if (!node->deleted && node->pfd.events) {
if (!node->deleted && node->pfd.events
&& !aio_epoll_enabled(ctx)
&& aio_node_check(ctx, node->is_external)) {
add_pollfd(node);
}
}
@@ -268,7 +440,17 @@ bool aio_poll(AioContext *ctx, bool blocking)
if (timeout) {
aio_context_release(ctx);
}
ret = qemu_poll_ns((GPollFD *)pollfds, npfd, timeout);
if (aio_epoll_check_poll(ctx, pollfds, npfd, timeout)) {
AioHandler epoll_handler;
epoll_handler.pfd.fd = ctx->epollfd;
epoll_handler.pfd.events = G_IO_IN | G_IO_OUT | G_IO_HUP | G_IO_ERR;
npfd = 0;
add_pollfd(&epoll_handler);
ret = aio_epoll(ctx, pollfds, npfd, timeout);
} else {
ret = qemu_poll_ns(pollfds, npfd, timeout);
}
if (blocking) {
atomic_sub(&ctx->notify_me, 2);
}
@@ -297,3 +479,16 @@ bool aio_poll(AioContext *ctx, bool blocking)
return progress;
}
void aio_context_setup(AioContext *ctx, Error **errp)
{
#ifdef CONFIG_EPOLL
assert(!ctx->epollfd);
ctx->epollfd = epoll_create1(EPOLL_CLOEXEC);
if (ctx->epollfd == -1) {
ctx->epoll_available = false;
} else {
ctx->epoll_available = true;
}
#endif
}

View File

@@ -28,11 +28,13 @@ struct AioHandler {
GPollFD pfd;
int deleted;
void *opaque;
bool is_external;
QLIST_ENTRY(AioHandler) node;
};
void aio_set_fd_handler(AioContext *ctx,
int fd,
bool is_external,
IOHandler *io_read,
IOHandler *io_write,
void *opaque)
@@ -86,6 +88,7 @@ void aio_set_fd_handler(AioContext *ctx,
node->opaque = opaque;
node->io_read = io_read;
node->io_write = io_write;
node->is_external = is_external;
event = event_notifier_get_handle(&ctx->notifier);
WSAEventSelect(node->pfd.fd, event,
@@ -98,6 +101,7 @@ void aio_set_fd_handler(AioContext *ctx,
void aio_set_event_notifier(AioContext *ctx,
EventNotifier *e,
bool is_external,
EventNotifierHandler *io_notify)
{
AioHandler *node;
@@ -133,6 +137,7 @@ void aio_set_event_notifier(AioContext *ctx,
node->e = e;
node->pfd.fd = (uintptr_t)event_notifier_get_handle(e);
node->pfd.events = G_IO_IN;
node->is_external = is_external;
QLIST_INSERT_HEAD(&ctx->aio_handlers, node, node);
g_source_add_poll(&ctx->source, &node->pfd);
@@ -304,7 +309,8 @@ bool aio_poll(AioContext *ctx, bool blocking)
/* fill fd sets */
count = 0;
QLIST_FOREACH(node, &ctx->aio_handlers, node) {
if (!node->deleted && node->io_notify) {
if (!node->deleted && node->io_notify
&& aio_node_check(ctx, node->is_external)) {
events[count++] = event_notifier_get_handle(node->e);
}
}
@@ -363,3 +369,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
aio_context_release(ctx);
return progress;
}
void aio_context_setup(AioContext *ctx, Error **errp)
{
}

23
async.c
View File

@@ -59,6 +59,11 @@ QEMUBH *aio_bh_new(AioContext *ctx, QEMUBHFunc *cb, void *opaque)
return bh;
}
void aio_bh_call(QEMUBH *bh)
{
bh->cb(bh->opaque);
}
/* Multiple occurrences of aio_bh_poll cannot be called concurrently */
int aio_bh_poll(AioContext *ctx)
{
@@ -84,7 +89,7 @@ int aio_bh_poll(AioContext *ctx)
ret = 1;
}
bh->idle = 0;
bh->cb(bh->opaque);
aio_bh_call(bh);
}
}
@@ -247,7 +252,7 @@ aio_ctx_finalize(GSource *source)
}
qemu_mutex_unlock(&ctx->bh_lock);
aio_set_event_notifier(ctx, &ctx->notifier, NULL);
aio_set_event_notifier(ctx, &ctx->notifier, false, NULL);
event_notifier_cleanup(&ctx->notifier);
rfifolock_destroy(&ctx->lock);
qemu_mutex_destroy(&ctx->bh_lock);
@@ -320,15 +325,22 @@ AioContext *aio_context_new(Error **errp)
{
int ret;
AioContext *ctx;
Error *local_err = NULL;
ctx = (AioContext *) g_source_new(&aio_source_funcs, sizeof(AioContext));
aio_context_setup(ctx, &local_err);
if (local_err) {
error_propagate(errp, local_err);
goto fail;
}
ret = event_notifier_init(&ctx->notifier, false);
if (ret < 0) {
g_source_destroy(&ctx->source);
error_setg_errno(errp, -ret, "Failed to initialize event notifier");
return NULL;
goto fail;
}
g_source_set_can_recurse(&ctx->source, true);
aio_set_event_notifier(ctx, &ctx->notifier,
false,
(EventNotifierHandler *)
event_notifier_dummy_cb);
ctx->thread_pool = NULL;
@@ -339,6 +351,9 @@ AioContext *aio_context_new(Error **errp)
ctx->notify_dummy_bh = aio_bh_new(ctx, notify_dummy_bh, NULL);
return ctx;
fail:
g_source_destroy(&ctx->source);
return NULL;
}
void aio_context_ref(AioContext *ctx)

View File

@@ -313,9 +313,11 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
assert(maxnode <= MAX_NODES);
if (mbind(ptr, sz, backend->policy,
maxnode ? backend->host_nodes : NULL, maxnode + 1, flags)) {
error_setg_errno(errp, errno,
"cannot bind memory to host NUMA nodes");
return;
if (backend->policy != MPOL_DEFAULT || errno != ENOSYS) {
error_setg_errno(errp, errno,
"cannot bind memory to host NUMA nodes");
return;
}
}
#endif
/* Preallocate memory after the NUMA policy has been instantiated.

View File

@@ -36,6 +36,17 @@
static QEMUBalloonEvent *balloon_event_fn;
static QEMUBalloonStatus *balloon_stat_fn;
static void *balloon_opaque;
static bool balloon_inhibited;
bool qemu_balloon_is_inhibited(void)
{
return balloon_inhibited;
}
void qemu_balloon_inhibit(bool state)
{
balloon_inhibited = state;
}
static bool have_balloon(Error **errp)
{

222
block.c
View File

@@ -33,7 +33,7 @@
#include "sysemu/block-backend.h"
#include "sysemu/sysemu.h"
#include "qemu/notify.h"
#include "block/coroutine.h"
#include "qemu/coroutine.h"
#include "block/qapi.h"
#include "qmp-commands.h"
#include "qemu/timer.h"
@@ -73,8 +73,7 @@ struct BdrvDirtyBitmap {
#define NOT_DONE 0x7fffffff /* used while emulated sync operation in progress */
static QTAILQ_HEAD(, BlockDriverState) bdrv_states =
QTAILQ_HEAD_INITIALIZER(bdrv_states);
struct BdrvStates bdrv_states = QTAILQ_HEAD_INITIALIZER(bdrv_states);
static QTAILQ_HEAD(, BlockDriverState) graph_bdrv_states =
QTAILQ_HEAD_INITIALIZER(graph_bdrv_states);
@@ -257,7 +256,6 @@ BlockDriverState *bdrv_new(void)
for (i = 0; i < BLOCK_OP_TYPE_MAX; i++) {
QLIST_INIT(&bs->op_blockers[i]);
}
bdrv_iostatus_disable(bs);
notifier_list_init(&bs->close_notifiers);
notifier_with_return_list_init(&bs->before_write_notifiers);
qemu_co_queue_init(&bs->throttled_reqs[0]);
@@ -857,7 +855,6 @@ static int bdrv_open_common(BlockDriverState *bs, BdrvChild *file,
goto fail_opts;
}
bs->guest_block_size = 512;
bs->request_alignment = 512;
bs->zero_beyond_eof = true;
open_flags = bdrv_open_flags(bs, flags);
@@ -1081,6 +1078,10 @@ static int bdrv_fill_options(QDict **options, const char **pfilename,
}
}
if (runstate_check(RUN_STATE_INMIGRATE)) {
*flags |= BDRV_O_INCOMING;
}
return 0;
}
@@ -1392,6 +1393,7 @@ static int bdrv_open_inherit(BlockDriverState **pbs, const char *filename,
BlockDriverState *bs;
BlockDriver *drv = NULL;
const char *drvname;
const char *backing;
Error *local_err = NULL;
int snapshot_flags = 0;
@@ -1459,6 +1461,12 @@ static int bdrv_open_inherit(BlockDriverState **pbs, const char *filename,
assert(drvname || !(flags & BDRV_O_PROTOCOL));
backing = qdict_get_try_str(options, "backing");
if (backing && *backing == '\0') {
flags |= BDRV_O_NO_BACKING;
qdict_del(options, "backing");
}
bs->open_flags = flags;
bs->options = options;
options = qdict_clone_shallow(options);
@@ -1793,8 +1801,7 @@ int bdrv_reopen_prepare(BDRVReopenState *reopen_state, BlockReopenQueue *queue,
ret = bdrv_flush(reopen_state->bs);
if (ret) {
error_set(errp, ERROR_CLASS_GENERIC_ERROR, "Error (%s) flushing drive",
strerror(-ret));
error_setg_errno(errp, -ret, "Error flushing drive");
goto error;
}
@@ -1899,7 +1906,7 @@ void bdrv_close(BlockDriverState *bs)
}
/* Disable I/O limits and drain all pending throttled requests */
if (bs->io_limits_enabled) {
if (bs->throttle_state) {
bdrv_io_limits_disable(bs);
}
@@ -1908,6 +1915,10 @@ void bdrv_close(BlockDriverState *bs)
bdrv_drain(bs); /* in case flush left pending I/O */
notifier_list_notify(&bs->close_notifiers, bs);
if (bs->blk) {
blk_dev_change_media_cb(bs->blk, false);
}
if (bs->drv) {
BdrvChild *child, *next;
@@ -1946,10 +1957,6 @@ void bdrv_close(BlockDriverState *bs)
bs->full_open_options = NULL;
}
if (bs->blk) {
blk_dev_change_media_cb(bs->blk, false);
}
QLIST_FOREACH_SAFE(ban, &bs->aio_notifiers, list, ban_next) {
g_free(ban);
}
@@ -1998,19 +2005,10 @@ static void bdrv_move_feature_fields(BlockDriverState *bs_dest,
/* move some fields that need to stay attached to the device */
/* dev info */
bs_dest->guest_block_size = bs_src->guest_block_size;
bs_dest->copy_on_read = bs_src->copy_on_read;
bs_dest->enable_write_cache = bs_src->enable_write_cache;
/* r/w error */
bs_dest->on_read_error = bs_src->on_read_error;
bs_dest->on_write_error = bs_src->on_write_error;
/* i/o status */
bs_dest->iostatus_enabled = bs_src->iostatus_enabled;
bs_dest->iostatus = bs_src->iostatus;
/* dirty bitmap */
bs_dest->dirty_bitmaps = bs_src->dirty_bitmaps;
}
@@ -2497,82 +2495,6 @@ void bdrv_get_geometry(BlockDriverState *bs, uint64_t *nb_sectors_ptr)
*nb_sectors_ptr = nb_sectors < 0 ? 0 : nb_sectors;
}
void bdrv_set_on_error(BlockDriverState *bs, BlockdevOnError on_read_error,
BlockdevOnError on_write_error)
{
bs->on_read_error = on_read_error;
bs->on_write_error = on_write_error;
}
BlockdevOnError bdrv_get_on_error(BlockDriverState *bs, bool is_read)
{
return is_read ? bs->on_read_error : bs->on_write_error;
}
BlockErrorAction bdrv_get_error_action(BlockDriverState *bs, bool is_read, int error)
{
BlockdevOnError on_err = is_read ? bs->on_read_error : bs->on_write_error;
switch (on_err) {
case BLOCKDEV_ON_ERROR_ENOSPC:
return (error == ENOSPC) ?
BLOCK_ERROR_ACTION_STOP : BLOCK_ERROR_ACTION_REPORT;
case BLOCKDEV_ON_ERROR_STOP:
return BLOCK_ERROR_ACTION_STOP;
case BLOCKDEV_ON_ERROR_REPORT:
return BLOCK_ERROR_ACTION_REPORT;
case BLOCKDEV_ON_ERROR_IGNORE:
return BLOCK_ERROR_ACTION_IGNORE;
default:
abort();
}
}
static void send_qmp_error_event(BlockDriverState *bs,
BlockErrorAction action,
bool is_read, int error)
{
IoOperationType optype;
optype = is_read ? IO_OPERATION_TYPE_READ : IO_OPERATION_TYPE_WRITE;
qapi_event_send_block_io_error(bdrv_get_device_name(bs), optype, action,
bdrv_iostatus_is_enabled(bs),
error == ENOSPC, strerror(error),
&error_abort);
}
/* This is done by device models because, while the block layer knows
* about the error, it does not know whether an operation comes from
* the device or the block layer (from a job, for example).
*/
void bdrv_error_action(BlockDriverState *bs, BlockErrorAction action,
bool is_read, int error)
{
assert(error >= 0);
if (action == BLOCK_ERROR_ACTION_STOP) {
/* First set the iostatus, so that "info block" returns an iostatus
* that matches the events raised so far (an additional error iostatus
* is fine, but not a lost one).
*/
bdrv_iostatus_set_err(bs, error);
/* Then raise the request to stop the VM and the event.
* qemu_system_vmstop_request_prepare has two effects. First,
* it ensures that the STOP event always comes after the
* BLOCK_IO_ERROR event. Second, it ensures that even if management
* can observe the STOP event and do a "cont" before the STOP
* event is issued, the VM will not stop. In this case, vm_start()
* also ensures that the STOP/RESUME pair of events is emitted.
*/
qemu_system_vmstop_request_prepare();
send_qmp_error_event(bs, action, is_read, error);
qemu_system_vmstop_request(RUN_STATE_IO_ERROR);
} else {
send_qmp_error_event(bs, action, is_read, error);
}
}
int bdrv_is_read_only(BlockDriverState *bs)
{
return bs->read_only;
@@ -2766,7 +2688,12 @@ BlockDriverState *bdrv_lookup_bs(const char *device,
blk = blk_by_name(device);
if (blk) {
return blk_bs(blk);
bs = blk_bs(blk);
if (!bs) {
error_setg(errp, "Device '%s' has no medium", device);
}
return bs;
}
}
@@ -3136,15 +3063,23 @@ void bdrv_invalidate_cache_all(Error **errp)
/**
* Return TRUE if the media is present
*/
int bdrv_is_inserted(BlockDriverState *bs)
bool bdrv_is_inserted(BlockDriverState *bs)
{
BlockDriver *drv = bs->drv;
BdrvChild *child;
if (!drv)
return 0;
if (!drv->bdrv_is_inserted)
return 1;
return drv->bdrv_is_inserted(bs);
if (!drv) {
return false;
}
if (drv->bdrv_is_inserted) {
return drv->bdrv_is_inserted(bs);
}
QLIST_FOREACH(child, &bs->children, next) {
if (!bdrv_is_inserted(child->bs)) {
return false;
}
}
return true;
}
/**
@@ -3195,11 +3130,6 @@ void bdrv_lock_medium(BlockDriverState *bs, bool locked)
}
}
void bdrv_set_guest_block_size(BlockDriverState *bs, int align)
{
bs->guest_block_size = align;
}
BdrvDirtyBitmap *bdrv_find_dirty_bitmap(BlockDriverState *bs, const char *name)
{
BdrvDirtyBitmap *bm;
@@ -3474,10 +3404,25 @@ void bdrv_reset_dirty_bitmap(BdrvDirtyBitmap *bitmap,
hbitmap_reset(bitmap->bitmap, cur_sector, nr_sectors);
}
void bdrv_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap)
void bdrv_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap, HBitmap **out)
{
assert(bdrv_dirty_bitmap_enabled(bitmap));
hbitmap_reset_all(bitmap->bitmap);
if (!out) {
hbitmap_reset_all(bitmap->bitmap);
} else {
HBitmap *backup = bitmap->bitmap;
bitmap->bitmap = hbitmap_alloc(bitmap->size,
hbitmap_granularity(backup));
*out = backup;
}
}
void bdrv_undo_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap, HBitmap *in)
{
HBitmap *tmp = bitmap->bitmap;
assert(bdrv_dirty_bitmap_enabled(bitmap));
bitmap->bitmap = in;
hbitmap_free(tmp);
}
void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
@@ -3597,46 +3542,6 @@ bool bdrv_op_blocker_is_empty(BlockDriverState *bs)
return true;
}
void bdrv_iostatus_enable(BlockDriverState *bs)
{
bs->iostatus_enabled = true;
bs->iostatus = BLOCK_DEVICE_IO_STATUS_OK;
}
/* The I/O status is only enabled if the drive explicitly
* enables it _and_ the VM is configured to stop on errors */
bool bdrv_iostatus_is_enabled(const BlockDriverState *bs)
{
return (bs->iostatus_enabled &&
(bs->on_write_error == BLOCKDEV_ON_ERROR_ENOSPC ||
bs->on_write_error == BLOCKDEV_ON_ERROR_STOP ||
bs->on_read_error == BLOCKDEV_ON_ERROR_STOP));
}
void bdrv_iostatus_disable(BlockDriverState *bs)
{
bs->iostatus_enabled = false;
}
void bdrv_iostatus_reset(BlockDriverState *bs)
{
if (bdrv_iostatus_is_enabled(bs)) {
bs->iostatus = BLOCK_DEVICE_IO_STATUS_OK;
if (bs->job) {
block_job_iostatus_reset(bs->job);
}
}
}
void bdrv_iostatus_set_err(BlockDriverState *bs, int error)
{
assert(bdrv_iostatus_is_enabled(bs));
if (bs->iostatus == BLOCK_DEVICE_IO_STATUS_OK) {
bs->iostatus = error == ENOSPC ? BLOCK_DEVICE_IO_STATUS_NOSPACE :
BLOCK_DEVICE_IO_STATUS_FAILED;
}
}
void bdrv_img_create(const char *filename, const char *fmt,
const char *base_filename, const char *base_fmt,
char *options, uint64_t img_size, int flags,
@@ -3821,7 +3726,7 @@ void bdrv_detach_aio_context(BlockDriverState *bs)
baf->detach_aio_context(baf->opaque);
}
if (bs->io_limits_enabled) {
if (bs->throttle_state) {
throttle_timers_detach_aio_context(&bs->throttle_timers);
}
if (bs->drv->bdrv_detach_aio_context) {
@@ -3857,7 +3762,7 @@ void bdrv_attach_aio_context(BlockDriverState *bs,
if (bs->drv->bdrv_attach_aio_context) {
bs->drv->bdrv_attach_aio_context(bs, new_context);
}
if (bs->io_limits_enabled) {
if (bs->throttle_state) {
throttle_timers_attach_aio_context(&bs->throttle_timers, new_context);
}
@@ -4148,14 +4053,3 @@ void bdrv_refresh_filename(BlockDriverState *bs)
QDECREF(json);
}
}
/* This accessor function purpose is to allow the device models to access the
* BlockAcctStats structure embedded inside a BlockDriverState without being
* aware of the BlockDriverState structure layout.
* It will go away when the BlockAcctStats structure will be moved inside
* the device models.
*/
BlockAcctStats *bdrv_get_stats(BlockDriverState *bs)
{
return &bs->stats;
}

View File

@@ -2,6 +2,7 @@
* QEMU System Emulator block accounting
*
* Copyright (c) 2011 Christoph Hellwig
* Copyright (c) 2015 Igalia, S.L.
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
@@ -25,6 +26,54 @@
#include "block/accounting.h"
#include "block/block_int.h"
#include "qemu/timer.h"
#include "sysemu/qtest.h"
static QEMUClockType clock_type = QEMU_CLOCK_REALTIME;
static const int qtest_latency_ns = NANOSECONDS_PER_SECOND / 1000;
void block_acct_init(BlockAcctStats *stats, bool account_invalid,
bool account_failed)
{
stats->account_invalid = account_invalid;
stats->account_failed = account_failed;
if (qtest_enabled()) {
clock_type = QEMU_CLOCK_VIRTUAL;
}
}
void block_acct_cleanup(BlockAcctStats *stats)
{
BlockAcctTimedStats *s, *next;
QSLIST_FOREACH_SAFE(s, &stats->intervals, entries, next) {
g_free(s);
}
}
void block_acct_add_interval(BlockAcctStats *stats, unsigned interval_length)
{
BlockAcctTimedStats *s;
unsigned i;
s = g_new0(BlockAcctTimedStats, 1);
s->interval_length = interval_length;
QSLIST_INSERT_HEAD(&stats->intervals, s, entries);
for (i = 0; i < BLOCK_MAX_IOTYPE; i++) {
timed_average_init(&s->latency[i], clock_type,
(uint64_t) interval_length * NANOSECONDS_PER_SECOND);
}
}
BlockAcctTimedStats *block_acct_interval_next(BlockAcctStats *stats,
BlockAcctTimedStats *s)
{
if (s == NULL) {
return QSLIST_FIRST(&stats->intervals);
} else {
return QSLIST_NEXT(s, entries);
}
}
void block_acct_start(BlockAcctStats *stats, BlockAcctCookie *cookie,
int64_t bytes, enum BlockAcctType type)
@@ -32,26 +81,69 @@ void block_acct_start(BlockAcctStats *stats, BlockAcctCookie *cookie,
assert(type < BLOCK_MAX_IOTYPE);
cookie->bytes = bytes;
cookie->start_time_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
cookie->start_time_ns = qemu_clock_get_ns(clock_type);
cookie->type = type;
}
void block_acct_done(BlockAcctStats *stats, BlockAcctCookie *cookie)
{
BlockAcctTimedStats *s;
int64_t time_ns = qemu_clock_get_ns(clock_type);
int64_t latency_ns = time_ns - cookie->start_time_ns;
if (qtest_enabled()) {
latency_ns = qtest_latency_ns;
}
assert(cookie->type < BLOCK_MAX_IOTYPE);
stats->nr_bytes[cookie->type] += cookie->bytes;
stats->nr_ops[cookie->type]++;
stats->total_time_ns[cookie->type] +=
qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - cookie->start_time_ns;
stats->total_time_ns[cookie->type] += latency_ns;
stats->last_access_time_ns = time_ns;
QSLIST_FOREACH(s, &stats->intervals, entries) {
timed_average_account(&s->latency[cookie->type], latency_ns);
}
}
void block_acct_highest_sector(BlockAcctStats *stats, int64_t sector_num,
unsigned int nb_sectors)
void block_acct_failed(BlockAcctStats *stats, BlockAcctCookie *cookie)
{
if (stats->wr_highest_sector < sector_num + nb_sectors - 1) {
stats->wr_highest_sector = sector_num + nb_sectors - 1;
assert(cookie->type < BLOCK_MAX_IOTYPE);
stats->failed_ops[cookie->type]++;
if (stats->account_failed) {
BlockAcctTimedStats *s;
int64_t time_ns = qemu_clock_get_ns(clock_type);
int64_t latency_ns = time_ns - cookie->start_time_ns;
if (qtest_enabled()) {
latency_ns = qtest_latency_ns;
}
stats->total_time_ns[cookie->type] += latency_ns;
stats->last_access_time_ns = time_ns;
QSLIST_FOREACH(s, &stats->intervals, entries) {
timed_average_account(&s->latency[cookie->type], latency_ns);
}
}
}
void block_acct_invalid(BlockAcctStats *stats, enum BlockAcctType type)
{
assert(type < BLOCK_MAX_IOTYPE);
/* block_acct_done() and block_acct_failed() update
* total_time_ns[], but this one does not. The reason is that
* invalid requests are accounted during their submission,
* therefore there's no actual I/O involved. */
stats->invalid_ops[type]++;
if (stats->account_invalid) {
stats->last_access_time_ns = qemu_clock_get_ns(clock_type);
}
}
@@ -61,3 +153,20 @@ void block_acct_merge_done(BlockAcctStats *stats, enum BlockAcctType type,
assert(type < BLOCK_MAX_IOTYPE);
stats->merged[type] += num_requests;
}
int64_t block_acct_idle_time_ns(BlockAcctStats *stats)
{
return qemu_clock_get_ns(clock_type) - stats->last_access_time_ns;
}
double block_acct_queue_depth(BlockAcctTimedStats *stats,
enum BlockAcctType type)
{
uint64_t sum, elapsed;
assert(type < BLOCK_MAX_IOTYPE);
sum = timed_average_sum(&stats->latency[type], &elapsed);
return (double) sum / elapsed;
}

View File

@@ -21,6 +21,7 @@
#include "block/blockjob.h"
#include "qapi/qmp/qerror.h"
#include "qemu/ratelimit.h"
#include "sysemu/block-backend.h"
#define BACKUP_CLUSTER_BITS 16
#define BACKUP_CLUSTER_SIZE (1 << BACKUP_CLUSTER_BITS)
@@ -131,7 +132,7 @@ static int coroutine_fn backup_do_cow(BlockDriverState *bs,
qemu_iovec_init_external(&bounce_qiov, &iov, 1);
if (is_write_notifier) {
ret = bdrv_co_no_copy_on_readv(bs,
ret = bdrv_co_readv_no_serialising(bs,
start * BACKUP_SECTORS_PER_CLUSTER,
n, &bounce_qiov);
} else {
@@ -215,7 +216,41 @@ static void backup_iostatus_reset(BlockJob *job)
{
BackupBlockJob *s = container_of(job, BackupBlockJob, common);
bdrv_iostatus_reset(s->target);
if (s->target->blk) {
blk_iostatus_reset(s->target->blk);
}
}
static void backup_cleanup_sync_bitmap(BackupBlockJob *job, int ret)
{
BdrvDirtyBitmap *bm;
BlockDriverState *bs = job->common.bs;
if (ret < 0 || block_job_is_cancelled(&job->common)) {
/* Merge the successor back into the parent, delete nothing. */
bm = bdrv_reclaim_dirty_bitmap(bs, job->sync_bitmap, NULL);
assert(bm);
} else {
/* Everything is fine, delete this bitmap and install the backup. */
bm = bdrv_dirty_bitmap_abdicate(bs, job->sync_bitmap, NULL);
assert(bm);
}
}
static void backup_commit(BlockJob *job)
{
BackupBlockJob *s = container_of(job, BackupBlockJob, common);
if (s->sync_bitmap) {
backup_cleanup_sync_bitmap(s, 0);
}
}
static void backup_abort(BlockJob *job)
{
BackupBlockJob *s = container_of(job, BackupBlockJob, common);
if (s->sync_bitmap) {
backup_cleanup_sync_bitmap(s, -1);
}
}
static const BlockJobDriver backup_job_driver = {
@@ -223,6 +258,8 @@ static const BlockJobDriver backup_job_driver = {
.job_type = BLOCK_JOB_TYPE_BACKUP,
.set_speed = backup_set_speed,
.iostatus_reset = backup_iostatus_reset,
.commit = backup_commit,
.abort = backup_abort,
};
static BlockErrorAction backup_error_action(BackupBlockJob *job,
@@ -360,8 +397,10 @@ static void coroutine_fn backup_run(void *opaque)
job->bitmap = hbitmap_alloc(end, 0);
bdrv_set_enable_write_cache(target, true);
bdrv_set_on_error(target, on_target_error, on_target_error);
bdrv_iostatus_enable(target);
if (target->blk) {
blk_set_on_error(target->blk, on_target_error, on_target_error);
blk_iostatus_enable(target->blk);
}
bdrv_add_before_write_notifier(bs, &before_write);
@@ -436,22 +475,11 @@ static void coroutine_fn backup_run(void *opaque)
/* wait until pending backup_do_cow() calls have completed */
qemu_co_rwlock_wrlock(&job->flush_rwlock);
qemu_co_rwlock_unlock(&job->flush_rwlock);
if (job->sync_bitmap) {
BdrvDirtyBitmap *bm;
if (ret < 0 || block_job_is_cancelled(&job->common)) {
/* Merge the successor back into the parent, delete nothing. */
bm = bdrv_reclaim_dirty_bitmap(bs, job->sync_bitmap, NULL);
assert(bm);
} else {
/* Everything is fine, delete this bitmap and install the backup. */
bm = bdrv_dirty_bitmap_abdicate(bs, job->sync_bitmap, NULL);
assert(bm);
}
}
hbitmap_free(job->bitmap);
bdrv_iostatus_disable(target);
if (target->blk) {
blk_iostatus_disable(target->blk);
}
bdrv_op_unblock_all(target, job->common.blocker);
data = g_malloc(sizeof(*data));
@@ -465,7 +493,7 @@ void backup_start(BlockDriverState *bs, BlockDriverState *target,
BlockdevOnError on_source_error,
BlockdevOnError on_target_error,
BlockCompletionFunc *cb, void *opaque,
Error **errp)
BlockJobTxn *txn, Error **errp)
{
int64_t len;
@@ -480,7 +508,7 @@ void backup_start(BlockDriverState *bs, BlockDriverState *target,
if ((on_source_error == BLOCKDEV_ON_ERROR_STOP ||
on_source_error == BLOCKDEV_ON_ERROR_ENOSPC) &&
!bdrv_iostatus_is_enabled(bs)) {
(!bs->blk || !blk_iostatus_is_enabled(bs->blk))) {
error_setg(errp, QERR_INVALID_PARAMETER, "on-source-error");
return;
}
@@ -547,6 +575,7 @@ void backup_start(BlockDriverState *bs, BlockDriverState *target,
sync_bitmap : NULL;
job->common.len = len;
job->common.co = qemu_coroutine_create(backup_run);
block_job_txn_add_job(txn, &job->common);
qemu_coroutine_enter(job->common.co, job);
return;

View File

@@ -30,6 +30,7 @@
#include "qapi/qmp/qdict.h"
#include "qapi/qmp/qint.h"
#include "qapi/qmp/qstring.h"
#include "sysemu/qtest.h"
typedef struct BDRVBlkdebugState {
int state;
@@ -583,9 +584,13 @@ static void suspend_request(BlockDriverState *bs, BlkdebugRule *rule)
remove_rule(rule);
QLIST_INSERT_HEAD(&s->suspended_reqs, &r, next);
printf("blkdebug: Suspended request '%s'\n", r.tag);
if (!qtest_enabled()) {
printf("blkdebug: Suspended request '%s'\n", r.tag);
}
qemu_coroutine_yield();
printf("blkdebug: Resuming request '%s'\n", r.tag);
if (!qtest_enabled()) {
printf("blkdebug: Resuming request '%s'\n", r.tag);
}
QLIST_REMOVE(&r, next);
g_free(r.tag);

View File

@@ -12,12 +12,17 @@
#include "sysemu/block-backend.h"
#include "block/block_int.h"
#include "block/blockjob.h"
#include "block/throttle-groups.h"
#include "sysemu/blockdev.h"
#include "sysemu/sysemu.h"
#include "qapi-event.h"
/* Number of coroutines to reserve per attached device model */
#define COROUTINE_POOL_RESERVATION 64
static AioContext *blk_aiocb_get_aio_context(BlockAIOCB *acb);
struct BlockBackend {
char *name;
int refcnt;
@@ -29,15 +34,31 @@ struct BlockBackend {
/* TODO change to DeviceState when all users are qdevified */
const BlockDevOps *dev_ops;
void *dev_opaque;
/* the block size for which the guest device expects atomicity */
int guest_block_size;
/* If the BDS tree is removed, some of its options are stored here (which
* can be used to restore those options in the new BDS on insert) */
BlockBackendRootState root_state;
/* I/O stats (display with "info blockstats"). */
BlockAcctStats stats;
BlockdevOnError on_read_error, on_write_error;
bool iostatus_enabled;
BlockDeviceIoStatus iostatus;
};
typedef struct BlockBackendAIOCB {
BlockAIOCB common;
QEMUBH *bh;
BlockBackend *blk;
int ret;
} BlockBackendAIOCB;
static const AIOCBInfo block_backend_aiocb_info = {
.get_aio_context = blk_aiocb_get_aio_context,
.aiocb_size = sizeof(BlockBackendAIOCB),
};
@@ -145,12 +166,17 @@ static void blk_delete(BlockBackend *blk)
bdrv_unref(blk->bs);
blk->bs = NULL;
}
if (blk->root_state.throttle_state) {
g_free(blk->root_state.throttle_group);
throttle_group_unref(blk->root_state.throttle_state);
}
/* Avoid double-remove after blk_hide_on_behalf_of_hmp_drive_del() */
if (blk->name[0]) {
QTAILQ_REMOVE(&blk_backends, blk, link);
}
g_free(blk->name);
drive_info_del(blk->legacy_dinfo);
block_acct_cleanup(&blk->stats);
g_free(blk);
}
@@ -164,6 +190,11 @@ static void drive_info_del(DriveInfo *dinfo)
g_free(dinfo);
}
int blk_get_refcnt(BlockBackend *blk)
{
return blk ? blk->refcnt : 0;
}
/*
* Increment @blk's reference count.
* @blk must not be null.
@@ -308,6 +339,29 @@ void blk_hide_on_behalf_of_hmp_drive_del(BlockBackend *blk)
}
}
/*
* Disassociates the currently associated BlockDriverState from @blk.
*/
void blk_remove_bs(BlockBackend *blk)
{
blk_update_root_state(blk);
blk->bs->blk = NULL;
bdrv_unref(blk->bs);
blk->bs = NULL;
}
/*
* Associates a new BlockDriverState with @blk.
*/
void blk_insert_bs(BlockBackend *blk, BlockDriverState *bs)
{
assert(!blk->bs && !bs->blk);
bdrv_ref(bs);
blk->bs = bs;
bs->blk = blk;
}
/*
* Attach device model @dev to @blk.
* Return 0 on success, -EBUSY when a device model is attached already.
@@ -320,7 +374,7 @@ int blk_attach_dev(BlockBackend *blk, void *dev)
}
blk_ref(blk);
blk->dev = dev;
bdrv_iostatus_reset(blk->bs);
blk_iostatus_reset(blk);
return 0;
}
@@ -347,7 +401,7 @@ void blk_detach_dev(BlockBackend *blk, void *dev)
blk->dev = NULL;
blk->dev_ops = NULL;
blk->dev_opaque = NULL;
bdrv_set_guest_block_size(blk->bs, 512);
blk->guest_block_size = 512;
blk_unref(blk);
}
@@ -381,18 +435,15 @@ void blk_set_dev_ops(BlockBackend *blk, const BlockDevOps *ops,
void blk_dev_change_media_cb(BlockBackend *blk, bool load)
{
if (blk->dev_ops && blk->dev_ops->change_media_cb) {
bool tray_was_closed = !blk_dev_is_tray_open(blk);
bool tray_was_open, tray_is_open;
tray_was_open = blk_dev_is_tray_open(blk);
blk->dev_ops->change_media_cb(blk->dev_opaque, load);
if (tray_was_closed) {
/* tray open */
qapi_event_send_device_tray_moved(blk_name(blk),
true, &error_abort);
}
if (load) {
/* tray close */
qapi_event_send_device_tray_moved(blk_name(blk),
false, &error_abort);
tray_is_open = blk_dev_is_tray_open(blk);
if (tray_was_open != tray_is_open) {
qapi_event_send_device_tray_moved(blk_name(blk), tray_is_open,
&error_abort);
}
}
}
@@ -452,7 +503,47 @@ void blk_dev_resize_cb(BlockBackend *blk)
void blk_iostatus_enable(BlockBackend *blk)
{
bdrv_iostatus_enable(blk->bs);
blk->iostatus_enabled = true;
blk->iostatus = BLOCK_DEVICE_IO_STATUS_OK;
}
/* The I/O status is only enabled if the drive explicitly
* enables it _and_ the VM is configured to stop on errors */
bool blk_iostatus_is_enabled(const BlockBackend *blk)
{
return (blk->iostatus_enabled &&
(blk->on_write_error == BLOCKDEV_ON_ERROR_ENOSPC ||
blk->on_write_error == BLOCKDEV_ON_ERROR_STOP ||
blk->on_read_error == BLOCKDEV_ON_ERROR_STOP));
}
BlockDeviceIoStatus blk_iostatus(const BlockBackend *blk)
{
return blk->iostatus;
}
void blk_iostatus_disable(BlockBackend *blk)
{
blk->iostatus_enabled = false;
}
void blk_iostatus_reset(BlockBackend *blk)
{
if (blk_iostatus_is_enabled(blk)) {
blk->iostatus = BLOCK_DEVICE_IO_STATUS_OK;
if (blk->bs && blk->bs->job) {
block_job_iostatus_reset(blk->bs->job);
}
}
}
void blk_iostatus_set_err(BlockBackend *blk, int error)
{
assert(blk_iostatus_is_enabled(blk));
if (blk->iostatus == BLOCK_DEVICE_IO_STATUS_OK) {
blk->iostatus = error == ENOSPC ? BLOCK_DEVICE_IO_STATUS_NOSPACE :
BLOCK_DEVICE_IO_STATUS_FAILED;
}
}
static int blk_check_byte_request(BlockBackend *blk, int64_t offset,
@@ -464,7 +555,7 @@ static int blk_check_byte_request(BlockBackend *blk, int64_t offset,
return -EIO;
}
if (!blk_is_inserted(blk)) {
if (!blk_is_available(blk)) {
return -ENOMEDIUM;
}
@@ -551,13 +642,15 @@ static void error_callback_bh(void *opaque)
qemu_aio_unref(acb);
}
static BlockAIOCB *abort_aio_request(BlockBackend *blk, BlockCompletionFunc *cb,
void *opaque, int ret)
BlockAIOCB *blk_abort_aio_request(BlockBackend *blk,
BlockCompletionFunc *cb,
void *opaque, int ret)
{
struct BlockBackendAIOCB *acb;
QEMUBH *bh;
acb = blk_aio_get(&block_backend_aiocb_info, blk, cb, opaque);
acb->blk = blk;
acb->ret = ret;
bh = aio_bh_new(blk_get_aio_context(blk), error_callback_bh, acb);
@@ -573,7 +666,7 @@ BlockAIOCB *blk_aio_write_zeroes(BlockBackend *blk, int64_t sector_num,
{
int ret = blk_check_request(blk, sector_num, nb_sectors);
if (ret < 0) {
return abort_aio_request(blk, cb, opaque, ret);
return blk_abort_aio_request(blk, cb, opaque, ret);
}
return bdrv_aio_write_zeroes(blk->bs, sector_num, nb_sectors, flags,
@@ -602,16 +695,28 @@ int blk_pwrite(BlockBackend *blk, int64_t offset, const void *buf, int count)
int64_t blk_getlength(BlockBackend *blk)
{
if (!blk_is_available(blk)) {
return -ENOMEDIUM;
}
return bdrv_getlength(blk->bs);
}
void blk_get_geometry(BlockBackend *blk, uint64_t *nb_sectors_ptr)
{
bdrv_get_geometry(blk->bs, nb_sectors_ptr);
if (!blk->bs) {
*nb_sectors_ptr = 0;
} else {
bdrv_get_geometry(blk->bs, nb_sectors_ptr);
}
}
int64_t blk_nb_sectors(BlockBackend *blk)
{
if (!blk_is_available(blk)) {
return -ENOMEDIUM;
}
return bdrv_nb_sectors(blk->bs);
}
@@ -621,7 +726,7 @@ BlockAIOCB *blk_aio_readv(BlockBackend *blk, int64_t sector_num,
{
int ret = blk_check_request(blk, sector_num, nb_sectors);
if (ret < 0) {
return abort_aio_request(blk, cb, opaque, ret);
return blk_abort_aio_request(blk, cb, opaque, ret);
}
return bdrv_aio_readv(blk->bs, sector_num, iov, nb_sectors, cb, opaque);
@@ -633,7 +738,7 @@ BlockAIOCB *blk_aio_writev(BlockBackend *blk, int64_t sector_num,
{
int ret = blk_check_request(blk, sector_num, nb_sectors);
if (ret < 0) {
return abort_aio_request(blk, cb, opaque, ret);
return blk_abort_aio_request(blk, cb, opaque, ret);
}
return bdrv_aio_writev(blk->bs, sector_num, iov, nb_sectors, cb, opaque);
@@ -642,6 +747,10 @@ BlockAIOCB *blk_aio_writev(BlockBackend *blk, int64_t sector_num,
BlockAIOCB *blk_aio_flush(BlockBackend *blk,
BlockCompletionFunc *cb, void *opaque)
{
if (!blk_is_available(blk)) {
return blk_abort_aio_request(blk, cb, opaque, -ENOMEDIUM);
}
return bdrv_aio_flush(blk->bs, cb, opaque);
}
@@ -651,7 +760,7 @@ BlockAIOCB *blk_aio_discard(BlockBackend *blk,
{
int ret = blk_check_request(blk, sector_num, nb_sectors);
if (ret < 0) {
return abort_aio_request(blk, cb, opaque, ret);
return blk_abort_aio_request(blk, cb, opaque, ret);
}
return bdrv_aio_discard(blk->bs, sector_num, nb_sectors, cb, opaque);
@@ -683,12 +792,20 @@ int blk_aio_multiwrite(BlockBackend *blk, BlockRequest *reqs, int num_reqs)
int blk_ioctl(BlockBackend *blk, unsigned long int req, void *buf)
{
if (!blk_is_available(blk)) {
return -ENOMEDIUM;
}
return bdrv_ioctl(blk->bs, req, buf);
}
BlockAIOCB *blk_aio_ioctl(BlockBackend *blk, unsigned long int req, void *buf,
BlockCompletionFunc *cb, void *opaque)
{
if (!blk_is_available(blk)) {
return blk_abort_aio_request(blk, cb, opaque, -ENOMEDIUM);
}
return bdrv_aio_ioctl(blk->bs, req, buf, cb, opaque);
}
@@ -704,11 +821,19 @@ int blk_co_discard(BlockBackend *blk, int64_t sector_num, int nb_sectors)
int blk_co_flush(BlockBackend *blk)
{
if (!blk_is_available(blk)) {
return -ENOMEDIUM;
}
return bdrv_co_flush(blk->bs);
}
int blk_flush(BlockBackend *blk)
{
if (!blk_is_available(blk)) {
return -ENOMEDIUM;
}
return bdrv_flush(blk->bs);
}
@@ -719,7 +844,9 @@ int blk_flush_all(void)
void blk_drain(BlockBackend *blk)
{
bdrv_drain(blk->bs);
if (blk->bs) {
bdrv_drain(blk->bs);
}
}
void blk_drain_all(void)
@@ -727,76 +854,178 @@ void blk_drain_all(void)
bdrv_drain_all();
}
void blk_set_on_error(BlockBackend *blk, BlockdevOnError on_read_error,
BlockdevOnError on_write_error)
{
blk->on_read_error = on_read_error;
blk->on_write_error = on_write_error;
}
BlockdevOnError blk_get_on_error(BlockBackend *blk, bool is_read)
{
return bdrv_get_on_error(blk->bs, is_read);
return is_read ? blk->on_read_error : blk->on_write_error;
}
BlockErrorAction blk_get_error_action(BlockBackend *blk, bool is_read,
int error)
{
return bdrv_get_error_action(blk->bs, is_read, error);
BlockdevOnError on_err = blk_get_on_error(blk, is_read);
switch (on_err) {
case BLOCKDEV_ON_ERROR_ENOSPC:
return (error == ENOSPC) ?
BLOCK_ERROR_ACTION_STOP : BLOCK_ERROR_ACTION_REPORT;
case BLOCKDEV_ON_ERROR_STOP:
return BLOCK_ERROR_ACTION_STOP;
case BLOCKDEV_ON_ERROR_REPORT:
return BLOCK_ERROR_ACTION_REPORT;
case BLOCKDEV_ON_ERROR_IGNORE:
return BLOCK_ERROR_ACTION_IGNORE;
default:
abort();
}
}
static void send_qmp_error_event(BlockBackend *blk,
BlockErrorAction action,
bool is_read, int error)
{
IoOperationType optype;
optype = is_read ? IO_OPERATION_TYPE_READ : IO_OPERATION_TYPE_WRITE;
qapi_event_send_block_io_error(blk_name(blk), optype, action,
blk_iostatus_is_enabled(blk),
error == ENOSPC, strerror(error),
&error_abort);
}
/* This is done by device models because, while the block layer knows
* about the error, it does not know whether an operation comes from
* the device or the block layer (from a job, for example).
*/
void blk_error_action(BlockBackend *blk, BlockErrorAction action,
bool is_read, int error)
{
bdrv_error_action(blk->bs, action, is_read, error);
assert(error >= 0);
if (action == BLOCK_ERROR_ACTION_STOP) {
/* First set the iostatus, so that "info block" returns an iostatus
* that matches the events raised so far (an additional error iostatus
* is fine, but not a lost one).
*/
blk_iostatus_set_err(blk, error);
/* Then raise the request to stop the VM and the event.
* qemu_system_vmstop_request_prepare has two effects. First,
* it ensures that the STOP event always comes after the
* BLOCK_IO_ERROR event. Second, it ensures that even if management
* can observe the STOP event and do a "cont" before the STOP
* event is issued, the VM will not stop. In this case, vm_start()
* also ensures that the STOP/RESUME pair of events is emitted.
*/
qemu_system_vmstop_request_prepare();
send_qmp_error_event(blk, action, is_read, error);
qemu_system_vmstop_request(RUN_STATE_IO_ERROR);
} else {
send_qmp_error_event(blk, action, is_read, error);
}
}
int blk_is_read_only(BlockBackend *blk)
{
return bdrv_is_read_only(blk->bs);
if (blk->bs) {
return bdrv_is_read_only(blk->bs);
} else {
return blk->root_state.read_only;
}
}
int blk_is_sg(BlockBackend *blk)
{
if (!blk->bs) {
return 0;
}
return bdrv_is_sg(blk->bs);
}
int blk_enable_write_cache(BlockBackend *blk)
{
return bdrv_enable_write_cache(blk->bs);
if (blk->bs) {
return bdrv_enable_write_cache(blk->bs);
} else {
return !!(blk->root_state.open_flags & BDRV_O_CACHE_WB);
}
}
void blk_set_enable_write_cache(BlockBackend *blk, bool wce)
{
bdrv_set_enable_write_cache(blk->bs, wce);
if (blk->bs) {
bdrv_set_enable_write_cache(blk->bs, wce);
} else {
if (wce) {
blk->root_state.open_flags |= BDRV_O_CACHE_WB;
} else {
blk->root_state.open_flags &= ~BDRV_O_CACHE_WB;
}
}
}
void blk_invalidate_cache(BlockBackend *blk, Error **errp)
{
if (!blk->bs) {
error_setg(errp, "Device '%s' has no medium", blk->name);
return;
}
bdrv_invalidate_cache(blk->bs, errp);
}
int blk_is_inserted(BlockBackend *blk)
bool blk_is_inserted(BlockBackend *blk)
{
return bdrv_is_inserted(blk->bs);
return blk->bs && bdrv_is_inserted(blk->bs);
}
bool blk_is_available(BlockBackend *blk)
{
return blk_is_inserted(blk) && !blk_dev_is_tray_open(blk);
}
void blk_lock_medium(BlockBackend *blk, bool locked)
{
bdrv_lock_medium(blk->bs, locked);
if (blk->bs) {
bdrv_lock_medium(blk->bs, locked);
}
}
void blk_eject(BlockBackend *blk, bool eject_flag)
{
bdrv_eject(blk->bs, eject_flag);
if (blk->bs) {
bdrv_eject(blk->bs, eject_flag);
}
}
int blk_get_flags(BlockBackend *blk)
{
return bdrv_get_flags(blk->bs);
if (blk->bs) {
return bdrv_get_flags(blk->bs);
} else {
return blk->root_state.open_flags;
}
}
int blk_get_max_transfer_length(BlockBackend *blk)
{
return blk->bs->bl.max_transfer_length;
if (blk->bs) {
return blk->bs->bl.max_transfer_length;
} else {
return 0;
}
}
void blk_set_guest_block_size(BlockBackend *blk, int align)
{
bdrv_set_guest_block_size(blk->bs, align);
blk->guest_block_size = align;
}
void *blk_blockalign(BlockBackend *blk, size_t size)
@@ -806,40 +1035,64 @@ void *blk_blockalign(BlockBackend *blk, size_t size)
bool blk_op_is_blocked(BlockBackend *blk, BlockOpType op, Error **errp)
{
if (!blk->bs) {
return false;
}
return bdrv_op_is_blocked(blk->bs, op, errp);
}
void blk_op_unblock(BlockBackend *blk, BlockOpType op, Error *reason)
{
bdrv_op_unblock(blk->bs, op, reason);
if (blk->bs) {
bdrv_op_unblock(blk->bs, op, reason);
}
}
void blk_op_block_all(BlockBackend *blk, Error *reason)
{
bdrv_op_block_all(blk->bs, reason);
if (blk->bs) {
bdrv_op_block_all(blk->bs, reason);
}
}
void blk_op_unblock_all(BlockBackend *blk, Error *reason)
{
bdrv_op_unblock_all(blk->bs, reason);
if (blk->bs) {
bdrv_op_unblock_all(blk->bs, reason);
}
}
AioContext *blk_get_aio_context(BlockBackend *blk)
{
return bdrv_get_aio_context(blk->bs);
if (blk->bs) {
return bdrv_get_aio_context(blk->bs);
} else {
return qemu_get_aio_context();
}
}
static AioContext *blk_aiocb_get_aio_context(BlockAIOCB *acb)
{
BlockBackendAIOCB *blk_acb = DO_UPCAST(BlockBackendAIOCB, common, acb);
return blk_get_aio_context(blk_acb->blk);
}
void blk_set_aio_context(BlockBackend *blk, AioContext *new_context)
{
bdrv_set_aio_context(blk->bs, new_context);
if (blk->bs) {
bdrv_set_aio_context(blk->bs, new_context);
}
}
void blk_add_aio_context_notifier(BlockBackend *blk,
void (*attached_aio_context)(AioContext *new_context, void *opaque),
void (*detach_aio_context)(void *opaque), void *opaque)
{
bdrv_add_aio_context_notifier(blk->bs, attached_aio_context,
detach_aio_context, opaque);
if (blk->bs) {
bdrv_add_aio_context_notifier(blk->bs, attached_aio_context,
detach_aio_context, opaque);
}
}
void blk_remove_aio_context_notifier(BlockBackend *blk,
@@ -848,28 +1101,36 @@ void blk_remove_aio_context_notifier(BlockBackend *blk,
void (*detach_aio_context)(void *),
void *opaque)
{
bdrv_remove_aio_context_notifier(blk->bs, attached_aio_context,
detach_aio_context, opaque);
if (blk->bs) {
bdrv_remove_aio_context_notifier(blk->bs, attached_aio_context,
detach_aio_context, opaque);
}
}
void blk_add_close_notifier(BlockBackend *blk, Notifier *notify)
{
bdrv_add_close_notifier(blk->bs, notify);
if (blk->bs) {
bdrv_add_close_notifier(blk->bs, notify);
}
}
void blk_io_plug(BlockBackend *blk)
{
bdrv_io_plug(blk->bs);
if (blk->bs) {
bdrv_io_plug(blk->bs);
}
}
void blk_io_unplug(BlockBackend *blk)
{
bdrv_io_unplug(blk->bs);
if (blk->bs) {
bdrv_io_unplug(blk->bs);
}
}
BlockAcctStats *blk_get_stats(BlockBackend *blk)
{
return bdrv_get_stats(blk->bs);
return &blk->stats;
}
void *blk_aio_get(const AIOCBInfo *aiocb_info, BlockBackend *blk,
@@ -902,6 +1163,10 @@ int blk_write_compressed(BlockBackend *blk, int64_t sector_num,
int blk_truncate(BlockBackend *blk, int64_t offset)
{
if (!blk_is_available(blk)) {
return -ENOMEDIUM;
}
return bdrv_truncate(blk->bs, offset);
}
@@ -918,20 +1183,94 @@ int blk_discard(BlockBackend *blk, int64_t sector_num, int nb_sectors)
int blk_save_vmstate(BlockBackend *blk, const uint8_t *buf,
int64_t pos, int size)
{
if (!blk_is_available(blk)) {
return -ENOMEDIUM;
}
return bdrv_save_vmstate(blk->bs, buf, pos, size);
}
int blk_load_vmstate(BlockBackend *blk, uint8_t *buf, int64_t pos, int size)
{
if (!blk_is_available(blk)) {
return -ENOMEDIUM;
}
return bdrv_load_vmstate(blk->bs, buf, pos, size);
}
int blk_probe_blocksizes(BlockBackend *blk, BlockSizes *bsz)
{
if (!blk_is_available(blk)) {
return -ENOMEDIUM;
}
return bdrv_probe_blocksizes(blk->bs, bsz);
}
int blk_probe_geometry(BlockBackend *blk, HDGeometry *geo)
{
if (!blk_is_available(blk)) {
return -ENOMEDIUM;
}
return bdrv_probe_geometry(blk->bs, geo);
}
/*
* Updates the BlockBackendRootState object with data from the currently
* attached BlockDriverState.
*/
void blk_update_root_state(BlockBackend *blk)
{
assert(blk->bs);
blk->root_state.open_flags = blk->bs->open_flags;
blk->root_state.read_only = blk->bs->read_only;
blk->root_state.detect_zeroes = blk->bs->detect_zeroes;
if (blk->root_state.throttle_group) {
g_free(blk->root_state.throttle_group);
throttle_group_unref(blk->root_state.throttle_state);
}
if (blk->bs->throttle_state) {
const char *name = throttle_group_get_name(blk->bs);
blk->root_state.throttle_group = g_strdup(name);
blk->root_state.throttle_state = throttle_group_incref(name);
} else {
blk->root_state.throttle_group = NULL;
blk->root_state.throttle_state = NULL;
}
}
/*
* Applies the information in the root state to the given BlockDriverState. This
* does not include the flags which have to be specified for bdrv_open(), use
* blk_get_open_flags_from_root_state() to inquire them.
*/
void blk_apply_root_state(BlockBackend *blk, BlockDriverState *bs)
{
bs->detect_zeroes = blk->root_state.detect_zeroes;
if (blk->root_state.throttle_group) {
bdrv_io_limits_enable(bs, blk->root_state.throttle_group);
}
}
/*
* Returns the flags to be used for bdrv_open() of a BlockDriverState which is
* supposed to inherit the root state.
*/
int blk_get_open_flags_from_root_state(BlockBackend *blk)
{
int bs_flags;
bs_flags = blk->root_state.read_only ? 0 : BDRV_O_RDWR;
bs_flags |= blk->root_state.open_flags & ~BDRV_O_RDWR;
return bs_flags;
}
BlockBackendRootState *blk_get_root_state(BlockBackend *blk)
{
return &blk->root_state;
}

View File

@@ -17,6 +17,7 @@
#include "block/blockjob.h"
#include "qapi/qmp/qerror.h"
#include "qemu/ratelimit.h"
#include "sysemu/block-backend.h"
enum {
/*
@@ -213,7 +214,7 @@ void commit_start(BlockDriverState *bs, BlockDriverState *base,
if ((on_error == BLOCKDEV_ON_ERROR_STOP ||
on_error == BLOCKDEV_ON_ERROR_ENOSPC) &&
!bdrv_iostatus_is_enabled(bs)) {
(!bs->blk || !blk_iostatus_is_enabled(bs->blk))) {
error_setg(errp, "Invalid parameter combination");
return;
}
@@ -235,14 +236,14 @@ void commit_start(BlockDriverState *bs, BlockDriverState *base,
orig_overlay_flags = bdrv_get_flags(overlay_bs);
/* convert base & overlay_bs to r/w, if necessary */
if (!(orig_base_flags & BDRV_O_RDWR)) {
reopen_queue = bdrv_reopen_queue(reopen_queue, base, NULL,
orig_base_flags | BDRV_O_RDWR);
}
if (!(orig_overlay_flags & BDRV_O_RDWR)) {
reopen_queue = bdrv_reopen_queue(reopen_queue, overlay_bs, NULL,
orig_overlay_flags | BDRV_O_RDWR);
}
if (!(orig_base_flags & BDRV_O_RDWR)) {
reopen_queue = bdrv_reopen_queue(reopen_queue, base, NULL,
orig_base_flags | BDRV_O_RDWR);
}
if (reopen_queue) {
bdrv_reopen_multiple(reopen_queue, &local_err);
if (local_err != NULL) {

View File

@@ -154,18 +154,20 @@ static int curl_sock_cb(CURL *curl, curl_socket_t fd, int action,
DPRINTF("CURL (AIO): Sock action %d on fd %d\n", action, fd);
switch (action) {
case CURL_POLL_IN:
aio_set_fd_handler(s->aio_context, fd, curl_multi_read,
NULL, state);
aio_set_fd_handler(s->aio_context, fd, false,
curl_multi_read, NULL, state);
break;
case CURL_POLL_OUT:
aio_set_fd_handler(s->aio_context, fd, NULL, curl_multi_do, state);
aio_set_fd_handler(s->aio_context, fd, false,
NULL, curl_multi_do, state);
break;
case CURL_POLL_INOUT:
aio_set_fd_handler(s->aio_context, fd, curl_multi_read,
curl_multi_do, state);
aio_set_fd_handler(s->aio_context, fd, false,
curl_multi_read, curl_multi_do, state);
break;
case CURL_POLL_REMOVE:
aio_set_fd_handler(s->aio_context, fd, NULL, NULL, NULL);
aio_set_fd_handler(s->aio_context, fd, false,
NULL, NULL, NULL);
break;
}

View File

@@ -429,28 +429,23 @@ static coroutine_fn int qemu_gluster_co_write_zeroes(BlockDriverState *bs,
int64_t sector_num, int nb_sectors, BdrvRequestFlags flags)
{
int ret;
GlusterAIOCB *acb = g_slice_new(GlusterAIOCB);
GlusterAIOCB acb;
BDRVGlusterState *s = bs->opaque;
off_t size = nb_sectors * BDRV_SECTOR_SIZE;
off_t offset = sector_num * BDRV_SECTOR_SIZE;
acb->size = size;
acb->ret = 0;
acb->coroutine = qemu_coroutine_self();
acb->aio_context = bdrv_get_aio_context(bs);
acb.size = size;
acb.ret = 0;
acb.coroutine = qemu_coroutine_self();
acb.aio_context = bdrv_get_aio_context(bs);
ret = glfs_zerofill_async(s->fd, offset, size, &gluster_finish_aiocb, acb);
ret = glfs_zerofill_async(s->fd, offset, size, gluster_finish_aiocb, &acb);
if (ret < 0) {
ret = -errno;
goto out;
return -errno;
}
qemu_coroutine_yield();
ret = acb->ret;
out:
g_slice_free(GlusterAIOCB, acb);
return ret;
return acb.ret;
}
static inline bool gluster_supports_zerofill(void)
@@ -541,35 +536,30 @@ static coroutine_fn int qemu_gluster_co_rw(BlockDriverState *bs,
int64_t sector_num, int nb_sectors, QEMUIOVector *qiov, int write)
{
int ret;
GlusterAIOCB *acb = g_slice_new(GlusterAIOCB);
GlusterAIOCB acb;
BDRVGlusterState *s = bs->opaque;
size_t size = nb_sectors * BDRV_SECTOR_SIZE;
off_t offset = sector_num * BDRV_SECTOR_SIZE;
acb->size = size;
acb->ret = 0;
acb->coroutine = qemu_coroutine_self();
acb->aio_context = bdrv_get_aio_context(bs);
acb.size = size;
acb.ret = 0;
acb.coroutine = qemu_coroutine_self();
acb.aio_context = bdrv_get_aio_context(bs);
if (write) {
ret = glfs_pwritev_async(s->fd, qiov->iov, qiov->niov, offset, 0,
&gluster_finish_aiocb, acb);
gluster_finish_aiocb, &acb);
} else {
ret = glfs_preadv_async(s->fd, qiov->iov, qiov->niov, offset, 0,
&gluster_finish_aiocb, acb);
gluster_finish_aiocb, &acb);
}
if (ret < 0) {
ret = -errno;
goto out;
return -errno;
}
qemu_coroutine_yield();
ret = acb->ret;
out:
g_slice_free(GlusterAIOCB, acb);
return ret;
return acb.ret;
}
static int qemu_gluster_truncate(BlockDriverState *bs, int64_t offset)
@@ -600,26 +590,21 @@ static coroutine_fn int qemu_gluster_co_writev(BlockDriverState *bs,
static coroutine_fn int qemu_gluster_co_flush_to_disk(BlockDriverState *bs)
{
int ret;
GlusterAIOCB *acb = g_slice_new(GlusterAIOCB);
GlusterAIOCB acb;
BDRVGlusterState *s = bs->opaque;
acb->size = 0;
acb->ret = 0;
acb->coroutine = qemu_coroutine_self();
acb->aio_context = bdrv_get_aio_context(bs);
acb.size = 0;
acb.ret = 0;
acb.coroutine = qemu_coroutine_self();
acb.aio_context = bdrv_get_aio_context(bs);
ret = glfs_fsync_async(s->fd, &gluster_finish_aiocb, acb);
ret = glfs_fsync_async(s->fd, gluster_finish_aiocb, &acb);
if (ret < 0) {
ret = -errno;
goto out;
return -errno;
}
qemu_coroutine_yield();
ret = acb->ret;
out:
g_slice_free(GlusterAIOCB, acb);
return ret;
return acb.ret;
}
#ifdef CONFIG_GLUSTERFS_DISCARD
@@ -627,28 +612,23 @@ static coroutine_fn int qemu_gluster_co_discard(BlockDriverState *bs,
int64_t sector_num, int nb_sectors)
{
int ret;
GlusterAIOCB *acb = g_slice_new(GlusterAIOCB);
GlusterAIOCB acb;
BDRVGlusterState *s = bs->opaque;
size_t size = nb_sectors * BDRV_SECTOR_SIZE;
off_t offset = sector_num * BDRV_SECTOR_SIZE;
acb->size = 0;
acb->ret = 0;
acb->coroutine = qemu_coroutine_self();
acb->aio_context = bdrv_get_aio_context(bs);
acb.size = 0;
acb.ret = 0;
acb.coroutine = qemu_coroutine_self();
acb.aio_context = bdrv_get_aio_context(bs);
ret = glfs_discard_async(s->fd, offset, size, &gluster_finish_aiocb, acb);
ret = glfs_discard_async(s->fd, offset, size, gluster_finish_aiocb, &acb);
if (ret < 0) {
ret = -errno;
goto out;
return -errno;
}
qemu_coroutine_yield();
ret = acb->ret;
out:
g_slice_free(GlusterAIOCB, acb);
return ret;
return acb.ret;
}
#endif

View File

@@ -23,6 +23,7 @@
*/
#include "trace.h"
#include "sysemu/block-backend.h"
#include "block/blockjob.h"
#include "block/block_int.h"
#include "block/throttle-groups.h"
@@ -215,6 +216,8 @@ void bdrv_disable_copy_on_read(BlockDriverState *bs)
/* Check if any requests are in-flight (including throttled requests) */
bool bdrv_requests_pending(BlockDriverState *bs)
{
BdrvChild *child;
if (!QLIST_EMPTY(&bs->tracked_requests)) {
return true;
}
@@ -224,17 +227,31 @@ bool bdrv_requests_pending(BlockDriverState *bs)
if (!qemu_co_queue_empty(&bs->throttled_reqs[1])) {
return true;
}
if (bs->file && bdrv_requests_pending(bs->file->bs)) {
return true;
}
if (bs->backing && bdrv_requests_pending(bs->backing->bs)) {
return true;
QLIST_FOREACH(child, &bs->children, next) {
if (bdrv_requests_pending(child->bs)) {
return true;
}
}
return false;
}
static void bdrv_drain_recurse(BlockDriverState *bs)
{
BdrvChild *child;
if (bs->drv && bs->drv->bdrv_drain) {
bs->drv->bdrv_drain(bs);
}
QLIST_FOREACH(child, &bs->children, next) {
bdrv_drain_recurse(child->bs);
}
}
/*
* Wait for pending requests to complete on a single BlockDriverState subtree
* Wait for pending requests to complete on a single BlockDriverState subtree,
* and suspend block driver's internal I/O until next request arrives.
*
* Note that unlike bdrv_drain_all(), the caller must hold the BlockDriverState
* AioContext.
@@ -247,6 +264,7 @@ void bdrv_drain(BlockDriverState *bs)
{
bool busy = true;
bdrv_drain_recurse(bs);
while (busy) {
/* Keep iterating */
bdrv_flush_io_queue(bs);
@@ -344,13 +362,14 @@ static void tracked_request_end(BdrvTrackedRequest *req)
static void tracked_request_begin(BdrvTrackedRequest *req,
BlockDriverState *bs,
int64_t offset,
unsigned int bytes, bool is_write)
unsigned int bytes,
enum BdrvTrackedRequestType type)
{
*req = (BdrvTrackedRequest){
.bs = bs,
.offset = offset,
.bytes = bytes,
.is_write = is_write,
.type = type,
.co = qemu_coroutine_self(),
.serialising = false,
.overlap_offset = offset,
@@ -844,7 +863,9 @@ static int coroutine_fn bdrv_aligned_preadv(BlockDriverState *bs,
mark_request_serialising(req, bdrv_get_cluster_size(bs));
}
wait_serialising_requests(req);
if (!(flags & BDRV_REQ_NO_SERIALISING)) {
wait_serialising_requests(req);
}
if (flags & BDRV_REQ_COPY_ON_READ) {
int pnum;
@@ -933,7 +954,7 @@ static int coroutine_fn bdrv_co_do_preadv(BlockDriverState *bs,
}
/* Don't do copy-on-read if we read data before write operation */
if (bs->copy_on_read && !(flags & BDRV_REQ_NO_COPY_ON_READ)) {
if (bs->copy_on_read && !(flags & BDRV_REQ_NO_SERIALISING)) {
flags |= BDRV_REQ_COPY_ON_READ;
}
@@ -967,7 +988,7 @@ static int coroutine_fn bdrv_co_do_preadv(BlockDriverState *bs,
bytes = ROUND_UP(bytes, align);
}
tracked_request_begin(&req, bs, offset, bytes, false);
tracked_request_begin(&req, bs, offset, bytes, BDRV_TRACKED_READ);
ret = bdrv_aligned_preadv(bs, &req, offset, bytes, align,
use_local_qiov ? &local_qiov : qiov,
flags);
@@ -1002,13 +1023,13 @@ int coroutine_fn bdrv_co_readv(BlockDriverState *bs, int64_t sector_num,
return bdrv_co_do_readv(bs, sector_num, nb_sectors, qiov, 0);
}
int coroutine_fn bdrv_co_no_copy_on_readv(BlockDriverState *bs,
int coroutine_fn bdrv_co_readv_no_serialising(BlockDriverState *bs,
int64_t sector_num, int nb_sectors, QEMUIOVector *qiov)
{
trace_bdrv_co_no_copy_on_readv(bs, sector_num, nb_sectors);
trace_bdrv_co_readv_no_serialising(bs, sector_num, nb_sectors);
return bdrv_co_do_readv(bs, sector_num, nb_sectors, qiov,
BDRV_REQ_NO_COPY_ON_READ);
BDRV_REQ_NO_SERIALISING);
}
int coroutine_fn bdrv_co_copy_on_readv(BlockDriverState *bs,
@@ -1151,7 +1172,9 @@ static int coroutine_fn bdrv_aligned_pwritev(BlockDriverState *bs,
bdrv_set_dirty(bs, sector_num, nb_sectors);
block_acct_highest_sector(&bs->stats, sector_num, nb_sectors);
if (bs->wr_highest_offset < offset + bytes) {
bs->wr_highest_offset = offset + bytes;
}
if (ret >= 0) {
bs->total_sectors = MAX(bs->total_sectors, sector_num + nb_sectors);
@@ -1286,7 +1309,7 @@ static int coroutine_fn bdrv_co_do_pwritev(BlockDriverState *bs,
* Pad qiov with the read parts and be sure to have a tracked request not
* only for bdrv_aligned_pwritev, but also for the reads of the RMW cycle.
*/
tracked_request_begin(&req, bs, offset, bytes, true);
tracked_request_begin(&req, bs, offset, bytes, BDRV_TRACKED_WRITE);
if (!qiov) {
ret = bdrv_co_do_zero_pwritev(bs, offset, bytes, flags, &req);
@@ -1903,7 +1926,10 @@ static int multiwrite_merge(BlockDriverState *bs, BlockRequest *reqs,
}
}
block_acct_merge_done(&bs->stats, BLOCK_ACCT_WRITE, num_reqs - outidx - 1);
if (bs->blk) {
block_acct_merge_done(blk_get_stats(bs->blk), BLOCK_ACCT_WRITE,
num_reqs - outidx - 1);
}
return outidx + 1;
}
@@ -2308,18 +2334,20 @@ static void coroutine_fn bdrv_flush_co_entry(void *opaque)
int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
{
int ret;
BdrvTrackedRequest req;
if (!bs || !bdrv_is_inserted(bs) || bdrv_is_read_only(bs) ||
bdrv_is_sg(bs)) {
return 0;
}
tracked_request_begin(&req, bs, 0, 0, BDRV_TRACKED_FLUSH);
/* Write back cached data to the OS even with cache=unsafe */
BLKDBG_EVENT(bs->file, BLKDBG_FLUSH_TO_OS);
if (bs->drv->bdrv_co_flush_to_os) {
ret = bs->drv->bdrv_co_flush_to_os(bs);
if (ret < 0) {
return ret;
goto out;
}
}
@@ -2359,14 +2387,17 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
ret = 0;
}
if (ret < 0) {
return ret;
goto out;
}
/* Now flush the underlying protocol. It will also have BDRV_O_NO_FLUSH
* in the case of cache=unsafe, so there are no useless flushes.
*/
flush_parent:
return bs->file ? bdrv_co_flush(bs->file->bs) : 0;
ret = bs->file ? bdrv_co_flush(bs->file->bs) : 0;
out:
tracked_request_end(&req);
return ret;
}
int bdrv_flush(BlockDriverState *bs)
@@ -2409,6 +2440,7 @@ static void coroutine_fn bdrv_discard_co_entry(void *opaque)
int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num,
int nb_sectors)
{
BdrvTrackedRequest req;
int max_discard, ret;
if (!bs->drv) {
@@ -2431,6 +2463,8 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num,
return 0;
}
tracked_request_begin(&req, bs, sector_num, nb_sectors,
BDRV_TRACKED_DISCARD);
bdrv_set_dirty(bs, sector_num, nb_sectors);
max_discard = MIN_NON_ZERO(bs->bl.max_discard, BDRV_REQUEST_MAX_SECTORS);
@@ -2464,20 +2498,24 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num,
acb = bs->drv->bdrv_aio_discard(bs, sector_num, nb_sectors,
bdrv_co_io_em_complete, &co);
if (acb == NULL) {
return -EIO;
ret = -EIO;
goto out;
} else {
qemu_coroutine_yield();
ret = co.ret;
}
}
if (ret && ret != -ENOTSUP) {
return ret;
goto out;
}
sector_num += num;
nb_sectors -= num;
}
return 0;
ret = 0;
out:
tracked_request_end(&req);
return ret;
}
int bdrv_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors)
@@ -2506,26 +2544,109 @@ int bdrv_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors)
return rwco.ret;
}
/* needed for generic scsi interface */
typedef struct {
CoroutineIOCompletion *co;
QEMUBH *bh;
} BdrvIoctlCompletionData;
int bdrv_ioctl(BlockDriverState *bs, unsigned long int req, void *buf)
static void bdrv_ioctl_bh_cb(void *opaque)
{
BdrvIoctlCompletionData *data = opaque;
bdrv_co_io_em_complete(data->co, -ENOTSUP);
qemu_bh_delete(data->bh);
}
static int bdrv_co_do_ioctl(BlockDriverState *bs, int req, void *buf)
{
BlockDriver *drv = bs->drv;
BdrvTrackedRequest tracked_req;
CoroutineIOCompletion co = {
.coroutine = qemu_coroutine_self(),
};
BlockAIOCB *acb;
if (drv && drv->bdrv_ioctl)
return drv->bdrv_ioctl(bs, req, buf);
return -ENOTSUP;
tracked_request_begin(&tracked_req, bs, 0, 0, BDRV_TRACKED_IOCTL);
if (!drv || !drv->bdrv_aio_ioctl) {
co.ret = -ENOTSUP;
goto out;
}
acb = drv->bdrv_aio_ioctl(bs, req, buf, bdrv_co_io_em_complete, &co);
if (!acb) {
BdrvIoctlCompletionData *data = g_new(BdrvIoctlCompletionData, 1);
data->bh = aio_bh_new(bdrv_get_aio_context(bs),
bdrv_ioctl_bh_cb, data);
data->co = &co;
qemu_bh_schedule(data->bh);
}
qemu_coroutine_yield();
out:
tracked_request_end(&tracked_req);
return co.ret;
}
typedef struct {
BlockDriverState *bs;
int req;
void *buf;
int ret;
} BdrvIoctlCoData;
static void coroutine_fn bdrv_co_ioctl_entry(void *opaque)
{
BdrvIoctlCoData *data = opaque;
data->ret = bdrv_co_do_ioctl(data->bs, data->req, data->buf);
}
/* needed for generic scsi interface */
int bdrv_ioctl(BlockDriverState *bs, unsigned long int req, void *buf)
{
BdrvIoctlCoData data = {
.bs = bs,
.req = req,
.buf = buf,
.ret = -EINPROGRESS,
};
if (qemu_in_coroutine()) {
/* Fast-path if already in coroutine context */
bdrv_co_ioctl_entry(&data);
} else {
Coroutine *co = qemu_coroutine_create(bdrv_co_ioctl_entry);
qemu_coroutine_enter(co, &data);
}
while (data.ret == -EINPROGRESS) {
aio_poll(bdrv_get_aio_context(bs), true);
}
return data.ret;
}
static void coroutine_fn bdrv_co_aio_ioctl_entry(void *opaque)
{
BlockAIOCBCoroutine *acb = opaque;
acb->req.error = bdrv_co_do_ioctl(acb->common.bs,
acb->req.req, acb->req.buf);
bdrv_co_complete(acb);
}
BlockAIOCB *bdrv_aio_ioctl(BlockDriverState *bs,
unsigned long int req, void *buf,
BlockCompletionFunc *cb, void *opaque)
{
BlockDriver *drv = bs->drv;
BlockAIOCBCoroutine *acb = qemu_aio_get(&bdrv_em_co_aiocb_info,
bs, cb, opaque);
Coroutine *co;
if (drv && drv->bdrv_aio_ioctl)
return drv->bdrv_aio_ioctl(bs, req, buf, cb, opaque);
return NULL;
acb->need_bh = true;
acb->req.error = -EINPROGRESS;
acb->req.req = req;
acb->req.buf = buf;
co = qemu_coroutine_create(bdrv_co_aio_ioctl_entry);
qemu_coroutine_enter(co, acb);
bdrv_co_maybe_schedule_bh(acb);
return &acb->common;
}
void *qemu_blockalign(BlockDriverState *bs, size_t size)
@@ -2618,3 +2739,20 @@ void bdrv_flush_io_queue(BlockDriverState *bs)
}
bdrv_start_throttled_reqs(bs);
}
void bdrv_drained_begin(BlockDriverState *bs)
{
if (!bs->quiesce_counter++) {
aio_disable_external(bdrv_get_aio_context(bs));
}
bdrv_drain(bs);
}
void bdrv_drained_end(BlockDriverState *bs)
{
assert(bs->quiesce_counter > 0);
if (--bs->quiesce_counter > 0) {
return;
}
aio_enable_external(bdrv_get_aio_context(bs));
}

View File

@@ -84,6 +84,7 @@ typedef struct IscsiTask {
IscsiLun *iscsilun;
QEMUTimer retry_timer;
bool force_next_flush;
int err_code;
} IscsiTask;
typedef struct IscsiAIOCB {
@@ -96,6 +97,7 @@ typedef struct IscsiAIOCB {
int status;
int64_t sector_num;
int nb_sectors;
int ret;
#ifdef __linux__
sg_io_hdr_t *ioh;
#endif
@@ -169,19 +171,70 @@ static inline unsigned exp_random(double mean)
return -mean * log((double)rand() / RAND_MAX);
}
/* SCSI_STATUS_TASK_SET_FULL and SCSI_STATUS_TIMEOUT were introduced
* in libiscsi 1.10.0 as part of an enum. The LIBISCSI_API_VERSION
* macro was introduced in 1.11.0. So use the API_VERSION macro as
* a hint that the macros are defined and define them ourselves
* otherwise to keep the required libiscsi version at 1.9.0 */
#if !defined(LIBISCSI_API_VERSION)
#define QEMU_SCSI_STATUS_TASK_SET_FULL 0x28
#define QEMU_SCSI_STATUS_TIMEOUT 0x0f000002
#else
#define QEMU_SCSI_STATUS_TASK_SET_FULL SCSI_STATUS_TASK_SET_FULL
#define QEMU_SCSI_STATUS_TIMEOUT SCSI_STATUS_TIMEOUT
/* SCSI_SENSE_ASCQ_INVALID_FIELD_IN_PARAMETER_LIST was introduced in
* libiscsi 1.10.0, together with other constants we need. Use it as
* a hint that we have to define them ourselves if needed, to keep the
* minimum required libiscsi version at 1.9.0. We use an ASCQ macro for
* the test because SCSI_STATUS_* is an enum.
*
* To guard against future changes where SCSI_SENSE_ASCQ_* also becomes
* an enum, check against the LIBISCSI_API_VERSION macro, which was
* introduced in 1.11.0. If it is present, there is no need to define
* anything.
*/
#if !defined(SCSI_SENSE_ASCQ_INVALID_FIELD_IN_PARAMETER_LIST) && \
!defined(LIBISCSI_API_VERSION)
#define SCSI_STATUS_TASK_SET_FULL 0x28
#define SCSI_STATUS_TIMEOUT 0x0f000002
#define SCSI_SENSE_ASCQ_INVALID_FIELD_IN_PARAMETER_LIST 0x2600
#define SCSI_SENSE_ASCQ_PARAMETER_LIST_LENGTH_ERROR 0x1a00
#endif
static int iscsi_translate_sense(struct scsi_sense *sense)
{
int ret;
switch (sense->key) {
case SCSI_SENSE_NOT_READY:
return -EBUSY;
case SCSI_SENSE_DATA_PROTECTION:
return -EACCES;
case SCSI_SENSE_COMMAND_ABORTED:
return -ECANCELED;
case SCSI_SENSE_ILLEGAL_REQUEST:
/* Parse ASCQ */
break;
default:
return -EIO;
}
switch (sense->ascq) {
case SCSI_SENSE_ASCQ_PARAMETER_LIST_LENGTH_ERROR:
case SCSI_SENSE_ASCQ_INVALID_OPERATION_CODE:
case SCSI_SENSE_ASCQ_INVALID_FIELD_IN_CDB:
case SCSI_SENSE_ASCQ_INVALID_FIELD_IN_PARAMETER_LIST:
ret = -EINVAL;
break;
case SCSI_SENSE_ASCQ_LBA_OUT_OF_RANGE:
ret = -ENOSPC;
break;
case SCSI_SENSE_ASCQ_LOGICAL_UNIT_NOT_SUPPORTED:
ret = -ENOTSUP;
break;
case SCSI_SENSE_ASCQ_MEDIUM_NOT_PRESENT:
case SCSI_SENSE_ASCQ_MEDIUM_NOT_PRESENT_TRAY_CLOSED:
case SCSI_SENSE_ASCQ_MEDIUM_NOT_PRESENT_TRAY_OPEN:
ret = -ENOMEDIUM;
break;
case SCSI_SENSE_ASCQ_WRITE_PROTECTED:
ret = -EACCES;
break;
default:
ret = -EIO;
break;
}
return ret;
}
static void
iscsi_co_generic_cb(struct iscsi_context *iscsi, int status,
void *command_data, void *opaque)
@@ -203,11 +256,11 @@ iscsi_co_generic_cb(struct iscsi_context *iscsi, int status,
goto out;
}
if (status == SCSI_STATUS_BUSY ||
status == QEMU_SCSI_STATUS_TIMEOUT ||
status == QEMU_SCSI_STATUS_TASK_SET_FULL) {
status == SCSI_STATUS_TIMEOUT ||
status == SCSI_STATUS_TASK_SET_FULL) {
unsigned retry_time =
exp_random(iscsi_retry_times[iTask->retries - 1]);
if (status == QEMU_SCSI_STATUS_TIMEOUT) {
if (status == SCSI_STATUS_TIMEOUT) {
/* make sure the request is rescheduled AFTER the
* reconnect is initiated */
retry_time = EVENT_INTERVAL * 2;
@@ -226,6 +279,7 @@ iscsi_co_generic_cb(struct iscsi_context *iscsi, int status,
return;
}
}
iTask->err_code = iscsi_translate_sense(&task->sense);
error_report("iSCSI Failure: %s", iscsi_get_error(iscsi));
} else {
iTask->iscsilun->force_next_flush |= iTask->force_next_flush;
@@ -291,8 +345,8 @@ iscsi_set_events(IscsiLun *iscsilun)
int ev = iscsi_which_events(iscsi);
if (ev != iscsilun->events) {
aio_set_fd_handler(iscsilun->aio_context,
iscsi_get_fd(iscsi),
aio_set_fd_handler(iscsilun->aio_context, iscsi_get_fd(iscsi),
false,
(ev & POLLIN) ? iscsi_process_read : NULL,
(ev & POLLOUT) ? iscsi_process_write : NULL,
iscsilun);
@@ -455,7 +509,7 @@ retry:
}
if (iTask.status != SCSI_STATUS_GOOD) {
return -EIO;
return iTask.err_code;
}
iscsi_allocationmap_set(iscsilun, sector_num, nb_sectors);
@@ -644,7 +698,7 @@ retry:
}
if (iTask.status != SCSI_STATUS_GOOD) {
return -EIO;
return iTask.err_code;
}
return 0;
@@ -683,7 +737,7 @@ retry:
}
if (iTask.status != SCSI_STATUS_GOOD) {
return -EIO;
return iTask.err_code;
}
return 0;
@@ -703,7 +757,7 @@ iscsi_aio_ioctl_cb(struct iscsi_context *iscsi, int status,
if (status < 0) {
error_report("Failed to ioctl(SG_IO) to iSCSI lun. %s",
iscsi_get_error(iscsi));
acb->status = -EIO;
acb->status = iscsi_translate_sense(&acb->task->sense);
}
acb->ioh->driver_status = 0;
@@ -726,6 +780,38 @@ iscsi_aio_ioctl_cb(struct iscsi_context *iscsi, int status,
iscsi_schedule_bh(acb);
}
static void iscsi_ioctl_bh_completion(void *opaque)
{
IscsiAIOCB *acb = opaque;
qemu_bh_delete(acb->bh);
acb->common.cb(acb->common.opaque, acb->ret);
qemu_aio_unref(acb);
}
static void iscsi_ioctl_handle_emulated(IscsiAIOCB *acb, int req, void *buf)
{
BlockDriverState *bs = acb->common.bs;
IscsiLun *iscsilun = bs->opaque;
int ret = 0;
switch (req) {
case SG_GET_VERSION_NUM:
*(int *)buf = 30000;
break;
case SG_GET_SCSI_ID:
((struct sg_scsi_id *)buf)->scsi_type = iscsilun->type;
break;
default:
ret = -EINVAL;
}
assert(!acb->bh);
acb->bh = aio_bh_new(bdrv_get_aio_context(bs),
iscsi_ioctl_bh_completion, acb);
acb->ret = ret;
qemu_bh_schedule(acb->bh);
}
static BlockAIOCB *iscsi_aio_ioctl(BlockDriverState *bs,
unsigned long int req, void *buf,
BlockCompletionFunc *cb, void *opaque)
@@ -735,8 +821,6 @@ static BlockAIOCB *iscsi_aio_ioctl(BlockDriverState *bs,
struct iscsi_data data;
IscsiAIOCB *acb;
assert(req == SG_IO);
acb = qemu_aio_get(&iscsi_aiocb_info, bs, cb, opaque);
acb->iscsilun = iscsilun;
@@ -745,6 +829,11 @@ static BlockAIOCB *iscsi_aio_ioctl(BlockDriverState *bs,
acb->buf = NULL;
acb->ioh = buf;
if (req != SG_IO) {
iscsi_ioctl_handle_emulated(acb, req, buf);
return &acb->common;
}
acb->task = malloc(sizeof(struct scsi_task));
if (acb->task == NULL) {
error_report("iSCSI: Failed to allocate task for scsi command. %s",
@@ -809,38 +898,6 @@ static BlockAIOCB *iscsi_aio_ioctl(BlockDriverState *bs,
return &acb->common;
}
static void ioctl_cb(void *opaque, int status)
{
int *p_status = opaque;
*p_status = status;
}
static int iscsi_ioctl(BlockDriverState *bs, unsigned long int req, void *buf)
{
IscsiLun *iscsilun = bs->opaque;
int status;
switch (req) {
case SG_GET_VERSION_NUM:
*(int *)buf = 30000;
break;
case SG_GET_SCSI_ID:
((struct sg_scsi_id *)buf)->scsi_type = iscsilun->type;
break;
case SG_IO:
status = -EINPROGRESS;
iscsi_aio_ioctl(bs, req, buf, ioctl_cb, &status);
while (status == -EINPROGRESS) {
aio_poll(iscsilun->aio_context, true);
}
return 0;
default:
return -1;
}
return 0;
}
#endif
static int64_t
@@ -905,7 +962,7 @@ retry:
}
if (iTask.status != SCSI_STATUS_GOOD) {
return -EIO;
return iTask.err_code;
}
iscsi_allocationmap_clear(iscsilun, sector_num, nb_sectors);
@@ -999,7 +1056,7 @@ retry:
}
if (iTask.status != SCSI_STATUS_GOOD) {
return -EIO;
return iTask.err_code;
}
if (flags & BDRV_REQ_MAY_UNMAP) {
@@ -1280,9 +1337,8 @@ static void iscsi_detach_aio_context(BlockDriverState *bs)
{
IscsiLun *iscsilun = bs->opaque;
aio_set_fd_handler(iscsilun->aio_context,
iscsi_get_fd(iscsilun->iscsi),
NULL, NULL, NULL);
aio_set_fd_handler(iscsilun->aio_context, iscsi_get_fd(iscsilun->iscsi),
false, NULL, NULL, NULL);
iscsilun->events = 0;
if (iscsilun->nop_timer) {
@@ -1772,7 +1828,6 @@ static BlockDriver bdrv_iscsi = {
.bdrv_co_flush_to_disk = iscsi_co_flush,
#ifdef __linux__
.bdrv_ioctl = iscsi_ioctl,
.bdrv_aio_ioctl = iscsi_aio_ioctl,
#endif

View File

@@ -287,7 +287,7 @@ void laio_detach_aio_context(void *s_, AioContext *old_context)
{
struct qemu_laio_state *s = s_;
aio_set_event_notifier(old_context, &s->e, NULL);
aio_set_event_notifier(old_context, &s->e, false, NULL);
qemu_bh_delete(s->completion_bh);
}
@@ -296,7 +296,8 @@ void laio_attach_aio_context(void *s_, AioContext *new_context)
struct qemu_laio_state *s = s_;
s->completion_bh = aio_bh_new(new_context, qemu_laio_completion_bh, s);
aio_set_event_notifier(new_context, &s->e, qemu_laio_completion_cb);
aio_set_event_notifier(new_context, &s->e, false,
qemu_laio_completion_cb);
}
void *laio_init(void)

View File

@@ -14,6 +14,7 @@
#include "trace.h"
#include "block/blockjob.h"
#include "block/block_int.h"
#include "sysemu/block-backend.h"
#include "qapi/qmp/qerror.h"
#include "qemu/ratelimit.h"
#include "qemu/bitmap.h"
@@ -383,9 +384,11 @@ static void mirror_exit(BlockJob *job, void *opaque)
aio_context_release(replace_aio_context);
}
g_free(s->replaces);
bdrv_op_unblock_all(s->target, s->common.blocker);
bdrv_unref(s->target);
block_job_completed(&s->common, data->ret);
g_free(data);
bdrv_drained_end(src);
bdrv_unref(src);
}
@@ -599,10 +602,15 @@ immediate_exit:
g_free(s->cow_bitmap);
g_free(s->in_flight_bitmap);
bdrv_release_dirty_bitmap(bs, s->dirty_bitmap);
bdrv_iostatus_disable(s->target);
if (s->target->blk) {
blk_iostatus_disable(s->target->blk);
}
data = g_malloc(sizeof(*data));
data->ret = ret;
/* Before we switch to target in mirror_exit, make sure data doesn't
* change. */
bdrv_drained_begin(s->common.bs);
block_job_defer_to_main_loop(&s->common, mirror_exit, data);
}
@@ -621,7 +629,9 @@ static void mirror_iostatus_reset(BlockJob *job)
{
MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
bdrv_iostatus_reset(s->target);
if (s->target->blk) {
blk_iostatus_reset(s->target->blk);
}
}
static void mirror_complete(BlockJob *job, Error **errp)
@@ -704,7 +714,7 @@ static void mirror_start_job(BlockDriverState *bs, BlockDriverState *target,
if ((on_source_error == BLOCKDEV_ON_ERROR_STOP ||
on_source_error == BLOCKDEV_ON_ERROR_ENOSPC) &&
!bdrv_iostatus_is_enabled(bs)) {
(!bs->blk || !blk_iostatus_is_enabled(bs->blk))) {
error_setg(errp, QERR_INVALID_PARAMETER, "on-source-error");
return;
}
@@ -736,12 +746,17 @@ static void mirror_start_job(BlockDriverState *bs, BlockDriverState *target,
s->dirty_bitmap = bdrv_create_dirty_bitmap(bs, granularity, NULL, errp);
if (!s->dirty_bitmap) {
g_free(s->replaces);
block_job_release(bs);
block_job_unref(&s->common);
return;
}
bdrv_op_block_all(s->target, s->common.blocker);
bdrv_set_enable_write_cache(s->target, true);
bdrv_set_on_error(s->target, on_target_error, on_target_error);
bdrv_iostatus_enable(s->target);
if (s->target->blk) {
blk_set_on_error(s->target->blk, on_target_error, on_target_error);
blk_iostatus_enable(s->target->blk);
}
s->common.co = qemu_coroutine_create(mirror_run);
trace_mirror_start(bs, s, s->common.co, opaque);
qemu_coroutine_enter(s->common.co, s);

View File

@@ -124,7 +124,7 @@ static int nbd_co_send_request(BlockDriverState *bs,
s->send_coroutine = qemu_coroutine_self();
aio_context = bdrv_get_aio_context(bs);
aio_set_fd_handler(aio_context, s->sock,
aio_set_fd_handler(aio_context, s->sock, false,
nbd_reply_ready, nbd_restart_write, bs);
if (qiov) {
if (!s->is_unix) {
@@ -144,7 +144,8 @@ static int nbd_co_send_request(BlockDriverState *bs,
} else {
rc = nbd_send_request(s->sock, request);
}
aio_set_fd_handler(aio_context, s->sock, nbd_reply_ready, NULL, bs);
aio_set_fd_handler(aio_context, s->sock, false,
nbd_reply_ready, NULL, bs);
s->send_coroutine = NULL;
qemu_co_mutex_unlock(&s->send_mutex);
return rc;
@@ -348,14 +349,15 @@ int nbd_client_co_discard(BlockDriverState *bs, int64_t sector_num,
void nbd_client_detach_aio_context(BlockDriverState *bs)
{
aio_set_fd_handler(bdrv_get_aio_context(bs),
nbd_get_client_session(bs)->sock, NULL, NULL, NULL);
nbd_get_client_session(bs)->sock,
false, NULL, NULL, NULL);
}
void nbd_client_attach_aio_context(BlockDriverState *bs,
AioContext *new_context)
{
aio_set_fd_handler(new_context, nbd_get_client_session(bs)->sock,
nbd_reply_ready, NULL, bs);
false, nbd_reply_ready, NULL, bs);
}
void nbd_client_close(BlockDriverState *bs)

View File

@@ -206,24 +206,24 @@ static SocketAddress *nbd_config(BDRVNBDState *s, QDict *options, char **export,
saddr = g_new0(SocketAddress, 1);
if (qdict_haskey(options, "path")) {
saddr->kind = SOCKET_ADDRESS_KIND_UNIX;
saddr->q_unix = g_new0(UnixSocketAddress, 1);
saddr->q_unix->path = g_strdup(qdict_get_str(options, "path"));
saddr->type = SOCKET_ADDRESS_KIND_UNIX;
saddr->u.q_unix = g_new0(UnixSocketAddress, 1);
saddr->u.q_unix->path = g_strdup(qdict_get_str(options, "path"));
qdict_del(options, "path");
} else {
saddr->kind = SOCKET_ADDRESS_KIND_INET;
saddr->inet = g_new0(InetSocketAddress, 1);
saddr->inet->host = g_strdup(qdict_get_str(options, "host"));
saddr->type = SOCKET_ADDRESS_KIND_INET;
saddr->u.inet = g_new0(InetSocketAddress, 1);
saddr->u.inet->host = g_strdup(qdict_get_str(options, "host"));
if (!qdict_get_try_str(options, "port")) {
saddr->inet->port = g_strdup_printf("%d", NBD_DEFAULT_PORT);
saddr->u.inet->port = g_strdup_printf("%d", NBD_DEFAULT_PORT);
} else {
saddr->inet->port = g_strdup(qdict_get_str(options, "port"));
saddr->u.inet->port = g_strdup(qdict_get_str(options, "port"));
}
qdict_del(options, "host");
qdict_del(options, "port");
}
s->client.is_unix = saddr->kind == SOCKET_ADDRESS_KIND_UNIX;
s->client.is_unix = saddr->type == SOCKET_ADDRESS_KIND_UNIX;
*export = g_strdup(qdict_get_try_str(options, "export"));
if (*export) {

View File

@@ -63,11 +63,10 @@ static void nfs_set_events(NFSClient *client)
{
int ev = nfs_which_events(client->context);
if (ev != client->events) {
aio_set_fd_handler(client->aio_context,
nfs_get_fd(client->context),
aio_set_fd_handler(client->aio_context, nfs_get_fd(client->context),
false,
(ev & POLLIN) ? nfs_process_read : NULL,
(ev & POLLOUT) ? nfs_process_write : NULL,
client);
(ev & POLLOUT) ? nfs_process_write : NULL, client);
}
client->events = ev;
@@ -242,9 +241,8 @@ static void nfs_detach_aio_context(BlockDriverState *bs)
{
NFSClient *client = bs->opaque;
aio_set_fd_handler(client->aio_context,
nfs_get_fd(client->context),
NULL, NULL, NULL);
aio_set_fd_handler(client->aio_context, nfs_get_fd(client->context),
false, NULL, NULL, NULL);
client->events = 0;
}
@@ -263,9 +261,8 @@ static void nfs_client_close(NFSClient *client)
if (client->fh) {
nfs_close(client->context, client->fh);
}
aio_set_fd_handler(client->aio_context,
nfs_get_fd(client->context),
NULL, NULL, NULL);
aio_set_fd_handler(client->aio_context, nfs_get_fd(client->context),
false, NULL, NULL, NULL);
nfs_destroy_context(client->context);
}
memset(client, 0, sizeof(NFSClient));

View File

@@ -220,7 +220,7 @@ static int64_t allocate_clusters(BlockDriverState *bs, int64_t sector_num,
s->bat_bitmap[idx + i] = cpu_to_le32(s->data_end / s->off_multiplier);
s->data_end += s->tracks;
bitmap_set(s->bat_dirty_bmap,
bat_entry_off(idx) / s->bat_dirty_block, 1);
bat_entry_off(idx + i) / s->bat_dirty_block, 1);
}
return bat2sect(s, idx) + sector_num % s->tracks;

View File

@@ -64,7 +64,7 @@ BlockDeviceInfo *bdrv_block_device_info(BlockDriverState *bs, Error **errp)
info->backing_file_depth = bdrv_get_backing_file_depth(bs);
info->detect_zeroes = bs->detect_zeroes;
if (bs->io_limits_enabled) {
if (bs->throttle_state) {
ThrottleConfig cfg;
throttle_group_get_config(bs, &cfg);
@@ -301,17 +301,17 @@ static void bdrv_query_info(BlockBackend *blk, BlockInfo **p_info,
info->tray_open = blk_dev_is_tray_open(blk);
}
if (bdrv_iostatus_is_enabled(bs)) {
if (blk_iostatus_is_enabled(blk)) {
info->has_io_status = true;
info->io_status = bs->iostatus;
info->io_status = blk_iostatus(blk);
}
if (!QLIST_EMPTY(&bs->dirty_bitmaps)) {
if (bs && !QLIST_EMPTY(&bs->dirty_bitmaps)) {
info->has_dirty_bitmaps = true;
info->dirty_bitmaps = bdrv_query_dirty_bitmaps(bs);
}
if (bs->drv) {
if (bs && bs->drv) {
info->has_inserted = true;
info->inserted = bdrv_block_device_info(bs, errp);
if (info->inserted == NULL) {
@@ -344,18 +344,73 @@ static BlockStats *bdrv_query_stats(const BlockDriverState *bs,
}
s->stats = g_malloc0(sizeof(*s->stats));
s->stats->rd_bytes = bs->stats.nr_bytes[BLOCK_ACCT_READ];
s->stats->wr_bytes = bs->stats.nr_bytes[BLOCK_ACCT_WRITE];
s->stats->rd_operations = bs->stats.nr_ops[BLOCK_ACCT_READ];
s->stats->wr_operations = bs->stats.nr_ops[BLOCK_ACCT_WRITE];
s->stats->rd_merged = bs->stats.merged[BLOCK_ACCT_READ];
s->stats->wr_merged = bs->stats.merged[BLOCK_ACCT_WRITE];
s->stats->wr_highest_offset =
bs->stats.wr_highest_sector * BDRV_SECTOR_SIZE;
s->stats->flush_operations = bs->stats.nr_ops[BLOCK_ACCT_FLUSH];
s->stats->wr_total_time_ns = bs->stats.total_time_ns[BLOCK_ACCT_WRITE];
s->stats->rd_total_time_ns = bs->stats.total_time_ns[BLOCK_ACCT_READ];
s->stats->flush_total_time_ns = bs->stats.total_time_ns[BLOCK_ACCT_FLUSH];
if (bs->blk) {
BlockAcctStats *stats = blk_get_stats(bs->blk);
BlockAcctTimedStats *ts = NULL;
s->stats->rd_bytes = stats->nr_bytes[BLOCK_ACCT_READ];
s->stats->wr_bytes = stats->nr_bytes[BLOCK_ACCT_WRITE];
s->stats->rd_operations = stats->nr_ops[BLOCK_ACCT_READ];
s->stats->wr_operations = stats->nr_ops[BLOCK_ACCT_WRITE];
s->stats->failed_rd_operations = stats->failed_ops[BLOCK_ACCT_READ];
s->stats->failed_wr_operations = stats->failed_ops[BLOCK_ACCT_WRITE];
s->stats->failed_flush_operations = stats->failed_ops[BLOCK_ACCT_FLUSH];
s->stats->invalid_rd_operations = stats->invalid_ops[BLOCK_ACCT_READ];
s->stats->invalid_wr_operations = stats->invalid_ops[BLOCK_ACCT_WRITE];
s->stats->invalid_flush_operations =
stats->invalid_ops[BLOCK_ACCT_FLUSH];
s->stats->rd_merged = stats->merged[BLOCK_ACCT_READ];
s->stats->wr_merged = stats->merged[BLOCK_ACCT_WRITE];
s->stats->flush_operations = stats->nr_ops[BLOCK_ACCT_FLUSH];
s->stats->wr_total_time_ns = stats->total_time_ns[BLOCK_ACCT_WRITE];
s->stats->rd_total_time_ns = stats->total_time_ns[BLOCK_ACCT_READ];
s->stats->flush_total_time_ns = stats->total_time_ns[BLOCK_ACCT_FLUSH];
s->stats->has_idle_time_ns = stats->last_access_time_ns > 0;
if (s->stats->has_idle_time_ns) {
s->stats->idle_time_ns = block_acct_idle_time_ns(stats);
}
s->stats->account_invalid = stats->account_invalid;
s->stats->account_failed = stats->account_failed;
while ((ts = block_acct_interval_next(stats, ts))) {
BlockDeviceTimedStatsList *timed_stats =
g_malloc0(sizeof(*timed_stats));
BlockDeviceTimedStats *dev_stats = g_malloc0(sizeof(*dev_stats));
timed_stats->next = s->stats->timed_stats;
timed_stats->value = dev_stats;
s->stats->timed_stats = timed_stats;
TimedAverage *rd = &ts->latency[BLOCK_ACCT_READ];
TimedAverage *wr = &ts->latency[BLOCK_ACCT_WRITE];
TimedAverage *fl = &ts->latency[BLOCK_ACCT_FLUSH];
dev_stats->interval_length = ts->interval_length;
dev_stats->min_rd_latency_ns = timed_average_min(rd);
dev_stats->max_rd_latency_ns = timed_average_max(rd);
dev_stats->avg_rd_latency_ns = timed_average_avg(rd);
dev_stats->min_wr_latency_ns = timed_average_min(wr);
dev_stats->max_wr_latency_ns = timed_average_max(wr);
dev_stats->avg_wr_latency_ns = timed_average_avg(wr);
dev_stats->min_flush_latency_ns = timed_average_min(fl);
dev_stats->max_flush_latency_ns = timed_average_max(fl);
dev_stats->avg_flush_latency_ns = timed_average_avg(fl);
dev_stats->avg_rd_queue_depth =
block_acct_queue_depth(ts, BLOCK_ACCT_READ);
dev_stats->avg_wr_queue_depth =
block_acct_queue_depth(ts, BLOCK_ACCT_WRITE);
}
}
s->stats->wr_highest_offset = bs->wr_highest_offset;
if (bs->file) {
s->has_parent = true;
@@ -381,7 +436,9 @@ BlockInfoList *qmp_query_block(Error **errp)
bdrv_query_info(blk, &info->value, &local_err);
if (local_err) {
error_propagate(errp, local_err);
goto err;
g_free(info);
qapi_free_BlockInfoList(head);
return NULL;
}
*p_next = info;
@@ -389,10 +446,6 @@ BlockInfoList *qmp_query_block(Error **errp)
}
return head;
err:
qapi_free_BlockInfoList(head);
return NULL;
}
BlockStatsList *qmp_query_blockstats(bool has_query_nodes,

View File

@@ -312,7 +312,7 @@ static int count_contiguous_clusters(int nb_clusters, int cluster_size,
if (!offset)
return 0;
assert(qcow2_get_cluster_type(first_entry) != QCOW2_CLUSTER_COMPRESSED);
assert(qcow2_get_cluster_type(first_entry) == QCOW2_CLUSTER_NORMAL);
for (i = 0; i < nb_clusters; i++) {
uint64_t l2_entry = be64_to_cpu(l2_table[i]) & mask;
@@ -324,14 +324,16 @@ static int count_contiguous_clusters(int nb_clusters, int cluster_size,
return i;
}
static int count_contiguous_free_clusters(int nb_clusters, uint64_t *l2_table)
static int count_contiguous_clusters_by_type(int nb_clusters,
uint64_t *l2_table,
int wanted_type)
{
int i;
for (i = 0; i < nb_clusters; i++) {
int type = qcow2_get_cluster_type(be64_to_cpu(l2_table[i]));
if (type != QCOW2_CLUSTER_UNALLOCATED) {
if (type != wanted_type) {
break;
}
}
@@ -554,13 +556,14 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
ret = -EIO;
goto fail;
}
c = count_contiguous_clusters(nb_clusters, s->cluster_size,
&l2_table[l2_index], QCOW_OFLAG_ZERO);
c = count_contiguous_clusters_by_type(nb_clusters, &l2_table[l2_index],
QCOW2_CLUSTER_ZERO);
*cluster_offset = 0;
break;
case QCOW2_CLUSTER_UNALLOCATED:
/* how many empty clusters ? */
c = count_contiguous_free_clusters(nb_clusters, &l2_table[l2_index]);
c = count_contiguous_clusters_by_type(nb_clusters, &l2_table[l2_index],
QCOW2_CLUSTER_UNALLOCATED);
*cluster_offset = 0;
break;
case QCOW2_CLUSTER_NORMAL:

View File

@@ -560,13 +560,16 @@ static int alloc_refcount_block(BlockDriverState *bs,
}
/* Hook up the new refcount table in the qcow2 header */
uint8_t data[12];
cpu_to_be64w((uint64_t*)data, table_offset);
cpu_to_be32w((uint32_t*)(data + 8), table_clusters);
struct QEMU_PACKED {
uint64_t d64;
uint32_t d32;
} data;
cpu_to_be64w(&data.d64, table_offset);
cpu_to_be32w(&data.d32, table_clusters);
BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_ALLOC_SWITCH_TABLE);
ret = bdrv_pwrite_sync(bs->file->bs,
offsetof(QCowHeader, refcount_table_offset),
data, sizeof(data));
&data, sizeof(data));
if (ret < 0) {
goto fail_table;
}
@@ -1241,7 +1244,7 @@ fail:
/* refcount checking functions */
static size_t refcount_array_byte_size(BDRVQcow2State *s, uint64_t entries)
static uint64_t refcount_array_byte_size(BDRVQcow2State *s, uint64_t entries)
{
/* This assertion holds because there is no way we can address more than
* 2^(64 - 9) clusters at once (with cluster size 512 = 2^9, and because

View File

@@ -2738,18 +2738,16 @@ static ImageInfoSpecific *qcow2_get_specific_info(BlockDriverState *bs)
ImageInfoSpecific *spec_info = g_new(ImageInfoSpecific, 1);
*spec_info = (ImageInfoSpecific){
.kind = IMAGE_INFO_SPECIFIC_KIND_QCOW2,
{
.qcow2 = g_new(ImageInfoSpecificQCow2, 1),
},
.type = IMAGE_INFO_SPECIFIC_KIND_QCOW2,
.u.qcow2 = g_new(ImageInfoSpecificQCow2, 1),
};
if (s->qcow_version == 2) {
*spec_info->qcow2 = (ImageInfoSpecificQCow2){
*spec_info->u.qcow2 = (ImageInfoSpecificQCow2){
.compat = g_strdup("0.10"),
.refcount_bits = s->refcount_bits,
};
} else if (s->qcow_version == 3) {
*spec_info->qcow2 = (ImageInfoSpecificQCow2){
*spec_info->u.qcow2 = (ImageInfoSpecificQCow2){
.compat = g_strdup("1.1"),
.lazy_refcounts = s->compatible_features &
QCOW2_COMPAT_LAZY_REFCOUNTS,

View File

@@ -26,7 +26,7 @@
#define BLOCK_QCOW2_H
#include "crypto/cipher.h"
#include "block/coroutine.h"
#include "qemu/coroutine.h"
//#define DEBUG_ALLOC
//#define DEBUG_ALLOC2

View File

@@ -375,6 +375,18 @@ static void bdrv_qed_attach_aio_context(BlockDriverState *bs,
}
}
static void bdrv_qed_drain(BlockDriverState *bs)
{
BDRVQEDState *s = bs->opaque;
/* Cancel timer and start doing I/O that were meant to happen as if it
* fired, that way we get bdrv_drain() taking care of the ongoing requests
* correctly. */
qed_cancel_need_check_timer(s);
qed_plug_allocating_write_reqs(s);
bdrv_aio_flush(s->bs, qed_clear_need_check, s);
}
static int bdrv_qed_open(BlockDriverState *bs, QDict *options, int flags,
Error **errp)
{
@@ -1676,6 +1688,7 @@ static BlockDriver bdrv_qed = {
.bdrv_check = bdrv_qed_check,
.bdrv_detach_aio_context = bdrv_qed_detach_aio_context,
.bdrv_attach_aio_context = bdrv_qed_attach_aio_context,
.bdrv_drain = bdrv_qed_drain,
};
static void bdrv_qed_init(void)

View File

@@ -127,11 +127,6 @@ do { \
#define FTYPE_FILE 0
#define FTYPE_CD 1
#define FTYPE_FD 2
/* if the FD is not accessed during that time (in ns), we try to
reopen it to see if the disk has been changed */
#define FD_OPEN_TIMEOUT (1000000000)
#define MAX_BLOCKSIZE 4096
@@ -141,13 +136,6 @@ typedef struct BDRVRawState {
int open_flags;
size_t buf_align;
#if defined(__linux__)
/* linux floppy specific */
int64_t fd_open_time;
int64_t fd_error_time;
int fd_got_error;
int fd_media_changed;
#endif
#ifdef CONFIG_LINUX_AIO
int use_aio;
void *aio_ctx;
@@ -635,7 +623,7 @@ static int raw_reopen_prepare(BDRVReopenState *state,
}
#endif
if (s->type == FTYPE_FD || s->type == FTYPE_CD) {
if (s->type == FTYPE_CD) {
raw_s->open_flags |= O_NONBLOCK;
}
@@ -1988,8 +1976,8 @@ BlockDriver bdrv_file = {
#if defined(__APPLE__) && defined(__MACH__)
static kern_return_t FindEjectableCDMedia( io_iterator_t *mediaIterator );
static kern_return_t GetBSDPath( io_iterator_t mediaIterator, char *bsdPath, CFIndex maxPathSize );
static kern_return_t GetBSDPath(io_iterator_t mediaIterator, char *bsdPath,
CFIndex maxPathSize, int flags);
kern_return_t FindEjectableCDMedia( io_iterator_t *mediaIterator )
{
kern_return_t kernResult;
@@ -2016,7 +2004,8 @@ kern_return_t FindEjectableCDMedia( io_iterator_t *mediaIterator )
return kernResult;
}
kern_return_t GetBSDPath( io_iterator_t mediaIterator, char *bsdPath, CFIndex maxPathSize )
kern_return_t GetBSDPath(io_iterator_t mediaIterator, char *bsdPath,
CFIndex maxPathSize, int flags)
{
io_object_t nextMedia;
kern_return_t kernResult = KERN_FAILURE;
@@ -2029,7 +2018,9 @@ kern_return_t GetBSDPath( io_iterator_t mediaIterator, char *bsdPath, CFIndex ma
if ( bsdPathAsCFString ) {
size_t devPathLength;
strcpy( bsdPath, _PATH_DEV );
strcat( bsdPath, "r" );
if (flags & BDRV_O_NOCACHE) {
strcat(bsdPath, "r");
}
devPathLength = strlen( bsdPath );
if ( CFStringGetCString( bsdPathAsCFString, bsdPath + devPathLength, maxPathSize - devPathLength, kCFStringEncodingASCII ) ) {
kernResult = KERN_SUCCESS;
@@ -2141,8 +2132,8 @@ static int hdev_open(BlockDriverState *bs, QDict *options, int flags,
int fd;
kernResult = FindEjectableCDMedia( &mediaIterator );
kernResult = GetBSDPath( mediaIterator, bsdPath, sizeof( bsdPath ) );
kernResult = GetBSDPath(mediaIterator, bsdPath, sizeof(bsdPath),
flags);
if ( bsdPath[ 0 ] != '\0' ) {
strcat(bsdPath,"s0");
/* some CDs don't have a partition 0 */
@@ -2187,53 +2178,6 @@ static int hdev_open(BlockDriverState *bs, QDict *options, int flags,
}
#if defined(__linux__)
/* Note: we do not have a reliable method to detect if the floppy is
present. The current method is to try to open the floppy at every
I/O and to keep it opened during a few hundreds of ms. */
static int fd_open(BlockDriverState *bs)
{
BDRVRawState *s = bs->opaque;
int last_media_present;
if (s->type != FTYPE_FD)
return 0;
last_media_present = (s->fd >= 0);
if (s->fd >= 0 &&
(qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - s->fd_open_time) >= FD_OPEN_TIMEOUT) {
qemu_close(s->fd);
s->fd = -1;
DPRINTF("Floppy closed\n");
}
if (s->fd < 0) {
if (s->fd_got_error &&
(qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - s->fd_error_time) < FD_OPEN_TIMEOUT) {
DPRINTF("No floppy (open delayed)\n");
return -EIO;
}
s->fd = qemu_open(bs->filename, s->open_flags & ~O_NONBLOCK);
if (s->fd < 0) {
s->fd_error_time = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
s->fd_got_error = 1;
if (last_media_present)
s->fd_media_changed = 1;
DPRINTF("No floppy\n");
return -EIO;
}
DPRINTF("Floppy opened\n");
}
if (!last_media_present)
s->fd_media_changed = 1;
s->fd_open_time = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
s->fd_got_error = 0;
return 0;
}
static int hdev_ioctl(BlockDriverState *bs, unsigned long int req, void *buf)
{
BDRVRawState *s = bs->opaque;
return ioctl(s->fd, req, buf);
}
static BlockAIOCB *hdev_aio_ioctl(BlockDriverState *bs,
unsigned long int req, void *buf,
@@ -2256,8 +2200,8 @@ static BlockAIOCB *hdev_aio_ioctl(BlockDriverState *bs,
pool = aio_get_thread_pool(bdrv_get_aio_context(bs));
return thread_pool_submit_aio(pool, aio_worker, acb, cb, opaque);
}
#endif /* linux */
#elif defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
static int fd_open(BlockDriverState *bs)
{
BDRVRawState *s = bs->opaque;
@@ -2267,14 +2211,6 @@ static int fd_open(BlockDriverState *bs)
return 0;
return -EIO;
}
#else /* !linux && !FreeBSD */
static int fd_open(BlockDriverState *bs)
{
return 0;
}
#endif /* !linux && !FreeBSD */
static coroutine_fn BlockAIOCB *hdev_aio_discard(BlockDriverState *bs,
int64_t sector_num, int nb_sectors,
@@ -2318,14 +2254,13 @@ static int hdev_create(const char *filename, QemuOpts *opts,
int64_t total_size = 0;
bool has_prefix;
/* This function is used by all three protocol block drivers and therefore
* any of these three prefixes may be given.
/* This function is used by both protocol block drivers and therefore either
* of these prefixes may be given.
* The return value has to be stored somewhere, otherwise this is an error
* due to -Werror=unused-value. */
has_prefix =
strstart(filename, "host_device:", &filename) ||
strstart(filename, "host_cdrom:" , &filename) ||
strstart(filename, "host_floppy:", &filename);
strstart(filename, "host_cdrom:" , &filename);
(void)has_prefix;
@@ -2400,160 +2335,10 @@ static BlockDriver bdrv_host_device = {
/* generic scsi device */
#ifdef __linux__
.bdrv_ioctl = hdev_ioctl,
.bdrv_aio_ioctl = hdev_aio_ioctl,
#endif
};
#ifdef __linux__
static void floppy_parse_filename(const char *filename, QDict *options,
Error **errp)
{
/* The prefix is optional, just as for "file". */
strstart(filename, "host_floppy:", &filename);
qdict_put_obj(options, "filename", QOBJECT(qstring_from_str(filename)));
}
static int floppy_open(BlockDriverState *bs, QDict *options, int flags,
Error **errp)
{
BDRVRawState *s = bs->opaque;
Error *local_err = NULL;
int ret;
s->type = FTYPE_FD;
/* open will not fail even if no floppy is inserted, so add O_NONBLOCK */
ret = raw_open_common(bs, options, flags, O_NONBLOCK, &local_err);
if (ret) {
if (local_err) {
error_propagate(errp, local_err);
}
return ret;
}
/* close fd so that we can reopen it as needed */
qemu_close(s->fd);
s->fd = -1;
s->fd_media_changed = 1;
error_report("Host floppy pass-through is deprecated");
error_printf("Support for it will be removed in a future release.\n");
return 0;
}
static int floppy_probe_device(const char *filename)
{
int fd, ret;
int prio = 0;
struct floppy_struct fdparam;
struct stat st;
if (strstart(filename, "/dev/fd", NULL) &&
!strstart(filename, "/dev/fdset/", NULL) &&
!strstart(filename, "/dev/fd/", NULL)) {
prio = 50;
}
fd = qemu_open(filename, O_RDONLY | O_NONBLOCK);
if (fd < 0) {
goto out;
}
ret = fstat(fd, &st);
if (ret == -1 || !S_ISBLK(st.st_mode)) {
goto outc;
}
/* Attempt to detect via a floppy specific ioctl */
ret = ioctl(fd, FDGETPRM, &fdparam);
if (ret >= 0)
prio = 100;
outc:
qemu_close(fd);
out:
return prio;
}
static int floppy_is_inserted(BlockDriverState *bs)
{
return fd_open(bs) >= 0;
}
static int floppy_media_changed(BlockDriverState *bs)
{
BDRVRawState *s = bs->opaque;
int ret;
/*
* XXX: we do not have a true media changed indication.
* It does not work if the floppy is changed without trying to read it.
*/
fd_open(bs);
ret = s->fd_media_changed;
s->fd_media_changed = 0;
DPRINTF("Floppy changed=%d\n", ret);
return ret;
}
static void floppy_eject(BlockDriverState *bs, bool eject_flag)
{
BDRVRawState *s = bs->opaque;
int fd;
if (s->fd >= 0) {
qemu_close(s->fd);
s->fd = -1;
}
fd = qemu_open(bs->filename, s->open_flags | O_NONBLOCK);
if (fd >= 0) {
if (ioctl(fd, FDEJECT, 0) < 0)
perror("FDEJECT");
qemu_close(fd);
}
}
static BlockDriver bdrv_host_floppy = {
.format_name = "host_floppy",
.protocol_name = "host_floppy",
.instance_size = sizeof(BDRVRawState),
.bdrv_needs_filename = true,
.bdrv_probe_device = floppy_probe_device,
.bdrv_parse_filename = floppy_parse_filename,
.bdrv_file_open = floppy_open,
.bdrv_close = raw_close,
.bdrv_reopen_prepare = raw_reopen_prepare,
.bdrv_reopen_commit = raw_reopen_commit,
.bdrv_reopen_abort = raw_reopen_abort,
.bdrv_create = hdev_create,
.create_opts = &raw_create_opts,
.bdrv_aio_readv = raw_aio_readv,
.bdrv_aio_writev = raw_aio_writev,
.bdrv_aio_flush = raw_aio_flush,
.bdrv_refresh_limits = raw_refresh_limits,
.bdrv_io_plug = raw_aio_plug,
.bdrv_io_unplug = raw_aio_unplug,
.bdrv_flush_io_queue = raw_aio_flush_io_queue,
.bdrv_truncate = raw_truncate,
.bdrv_getlength = raw_getlength,
.has_variable_length = true,
.bdrv_get_allocated_file_size
= raw_get_allocated_file_size,
.bdrv_detach_aio_context = raw_detach_aio_context,
.bdrv_attach_aio_context = raw_attach_aio_context,
/* removable device support */
.bdrv_is_inserted = floppy_is_inserted,
.bdrv_media_changed = floppy_media_changed,
.bdrv_eject = floppy_eject,
};
#endif
#if defined(__linux__) || defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
static void cdrom_parse_filename(const char *filename, QDict *options,
Error **errp)
@@ -2609,15 +2394,13 @@ out:
return prio;
}
static int cdrom_is_inserted(BlockDriverState *bs)
static bool cdrom_is_inserted(BlockDriverState *bs)
{
BDRVRawState *s = bs->opaque;
int ret;
ret = ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT);
if (ret == CDS_DISC_OK)
return 1;
return 0;
return ret == CDS_DISC_OK;
}
static void cdrom_eject(BlockDriverState *bs, bool eject_flag)
@@ -2684,7 +2467,6 @@ static BlockDriver bdrv_host_cdrom = {
.bdrv_lock_medium = cdrom_lock_medium,
/* generic scsi device */
.bdrv_ioctl = hdev_ioctl,
.bdrv_aio_ioctl = hdev_aio_ioctl,
};
#endif /* __linux__ */
@@ -2743,7 +2525,7 @@ static int cdrom_reopen(BlockDriverState *bs)
return 0;
}
static int cdrom_is_inserted(BlockDriverState *bs)
static bool cdrom_is_inserted(BlockDriverState *bs)
{
return raw_getlength(bs) > 0;
}
@@ -2831,7 +2613,6 @@ static void bdrv_file_init(void)
bdrv_register(&bdrv_file);
bdrv_register(&bdrv_host_device);
#ifdef __linux__
bdrv_register(&bdrv_host_floppy);
bdrv_register(&bdrv_host_cdrom);
#endif
#if defined(__FreeBSD__) || defined(__FreeBSD_kernel__)

View File

@@ -154,11 +154,6 @@ static int raw_truncate(BlockDriverState *bs, int64_t offset)
return bdrv_truncate(bs->file->bs, offset);
}
static int raw_is_inserted(BlockDriverState *bs)
{
return bdrv_is_inserted(bs->file->bs);
}
static int raw_media_changed(BlockDriverState *bs)
{
return bdrv_media_changed(bs->file->bs);
@@ -174,11 +169,6 @@ static void raw_lock_medium(BlockDriverState *bs, bool locked)
bdrv_lock_medium(bs->file->bs, locked);
}
static int raw_ioctl(BlockDriverState *bs, unsigned long int req, void *buf)
{
return bdrv_ioctl(bs->file->bs, req, buf);
}
static BlockAIOCB *raw_aio_ioctl(BlockDriverState *bs,
unsigned long int req, void *buf,
BlockCompletionFunc *cb,
@@ -264,11 +254,9 @@ BlockDriver bdrv_raw = {
.bdrv_refresh_limits = &raw_refresh_limits,
.bdrv_probe_blocksizes = &raw_probe_blocksizes,
.bdrv_probe_geometry = &raw_probe_geometry,
.bdrv_is_inserted = &raw_is_inserted,
.bdrv_media_changed = &raw_media_changed,
.bdrv_eject = &raw_eject,
.bdrv_lock_medium = &raw_lock_medium,
.bdrv_ioctl = &raw_ioctl,
.bdrv_aio_ioctl = &raw_aio_ioctl,
.create_opts = &raw_create_opts,
.bdrv_has_zero_init = &raw_has_zero_init

View File

@@ -651,14 +651,16 @@ static coroutine_fn void do_co_req(void *opaque)
unsigned int *rlen = srco->rlen;
co = qemu_coroutine_self();
aio_set_fd_handler(srco->aio_context, sockfd, NULL, restart_co_req, co);
aio_set_fd_handler(srco->aio_context, sockfd, false,
NULL, restart_co_req, co);
ret = send_co_req(sockfd, hdr, data, wlen);
if (ret < 0) {
goto out;
}
aio_set_fd_handler(srco->aio_context, sockfd, restart_co_req, NULL, co);
aio_set_fd_handler(srco->aio_context, sockfd, false,
restart_co_req, NULL, co);
ret = qemu_co_recv(sockfd, hdr, sizeof(*hdr));
if (ret != sizeof(*hdr)) {
@@ -683,7 +685,8 @@ static coroutine_fn void do_co_req(void *opaque)
out:
/* there is at most one request for this sockfd, so it is safe to
* set each handler to NULL. */
aio_set_fd_handler(srco->aio_context, sockfd, NULL, NULL, NULL);
aio_set_fd_handler(srco->aio_context, sockfd, false,
NULL, NULL, NULL);
srco->ret = ret;
srco->finished = true;
@@ -735,7 +738,8 @@ static coroutine_fn void reconnect_to_sdog(void *opaque)
BDRVSheepdogState *s = opaque;
AIOReq *aio_req, *next;
aio_set_fd_handler(s->aio_context, s->fd, NULL, NULL, NULL);
aio_set_fd_handler(s->aio_context, s->fd, false, NULL,
NULL, NULL);
close(s->fd);
s->fd = -1;
@@ -938,7 +942,8 @@ static int get_sheep_fd(BDRVSheepdogState *s, Error **errp)
return fd;
}
aio_set_fd_handler(s->aio_context, fd, co_read_response, NULL, s);
aio_set_fd_handler(s->aio_context, fd, false,
co_read_response, NULL, s);
return fd;
}
@@ -1199,7 +1204,7 @@ static void coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
qemu_co_mutex_lock(&s->lock);
s->co_send = qemu_coroutine_self();
aio_set_fd_handler(s->aio_context, s->fd,
aio_set_fd_handler(s->aio_context, s->fd, false,
co_read_response, co_write_request, s);
socket_set_cork(s->fd, 1);
@@ -1218,7 +1223,8 @@ static void coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
}
out:
socket_set_cork(s->fd, 0);
aio_set_fd_handler(s->aio_context, s->fd, co_read_response, NULL, s);
aio_set_fd_handler(s->aio_context, s->fd, false,
co_read_response, NULL, s);
s->co_send = NULL;
qemu_co_mutex_unlock(&s->lock);
}
@@ -1368,7 +1374,8 @@ static void sd_detach_aio_context(BlockDriverState *bs)
{
BDRVSheepdogState *s = bs->opaque;
aio_set_fd_handler(s->aio_context, s->fd, NULL, NULL, NULL);
aio_set_fd_handler(s->aio_context, s->fd, false, NULL,
NULL, NULL);
}
static void sd_attach_aio_context(BlockDriverState *bs,
@@ -1377,7 +1384,8 @@ static void sd_attach_aio_context(BlockDriverState *bs,
BDRVSheepdogState *s = bs->opaque;
s->aio_context = new_context;
aio_set_fd_handler(new_context, s->fd, co_read_response, NULL, s);
aio_set_fd_handler(new_context, s->fd, false,
co_read_response, NULL, s);
}
/* TODO Convert to fine grained options */
@@ -1490,7 +1498,8 @@ static int sd_open(BlockDriverState *bs, QDict *options, int flags,
g_free(buf);
return 0;
out:
aio_set_fd_handler(bdrv_get_aio_context(bs), s->fd, NULL, NULL, NULL);
aio_set_fd_handler(bdrv_get_aio_context(bs), s->fd,
false, NULL, NULL, NULL);
if (s->fd >= 0) {
closesocket(s->fd);
}
@@ -1528,7 +1537,8 @@ static void sd_reopen_commit(BDRVReopenState *state)
BDRVSheepdogState *s = state->bs->opaque;
if (s->fd) {
aio_set_fd_handler(s->aio_context, s->fd, NULL, NULL, NULL);
aio_set_fd_handler(s->aio_context, s->fd, false,
NULL, NULL, NULL);
closesocket(s->fd);
}
@@ -1551,7 +1561,8 @@ static void sd_reopen_abort(BDRVReopenState *state)
}
if (re_s->fd) {
aio_set_fd_handler(s->aio_context, re_s->fd, NULL, NULL, NULL);
aio_set_fd_handler(s->aio_context, re_s->fd, false,
NULL, NULL, NULL);
closesocket(re_s->fd);
}
@@ -1935,7 +1946,8 @@ static void sd_close(BlockDriverState *bs)
error_report("%s, %s", sd_strerror(rsp->result), s->name);
}
aio_set_fd_handler(bdrv_get_aio_context(bs), s->fd, NULL, NULL, NULL);
aio_set_fd_handler(bdrv_get_aio_context(bs), s->fd,
false, NULL, NULL, NULL);
closesocket(s->fd);
g_free(s->host_spec);
}

View File

@@ -253,9 +253,9 @@ int bdrv_snapshot_delete(BlockDriverState *bs,
return -ENOTSUP;
}
void bdrv_snapshot_delete_by_id_or_name(BlockDriverState *bs,
const char *id_or_name,
Error **errp)
int bdrv_snapshot_delete_by_id_or_name(BlockDriverState *bs,
const char *id_or_name,
Error **errp)
{
int ret;
Error *local_err = NULL;
@@ -270,6 +270,7 @@ void bdrv_snapshot_delete_by_id_or_name(BlockDriverState *bs,
if (ret < 0) {
error_propagate(errp, local_err);
}
return ret;
}
int bdrv_snapshot_list(BlockDriverState *bs,
@@ -356,3 +357,130 @@ int bdrv_snapshot_load_tmp_by_id_or_name(BlockDriverState *bs,
return ret;
}
/* Group operations. All block drivers are involved.
* These functions will properly handle dataplane (take aio_context_acquire
* when appropriate for appropriate block drivers) */
bool bdrv_all_can_snapshot(BlockDriverState **first_bad_bs)
{
bool ok = true;
BlockDriverState *bs = NULL;
while (ok && (bs = bdrv_next(bs))) {
AioContext *ctx = bdrv_get_aio_context(bs);
aio_context_acquire(ctx);
if (bdrv_is_inserted(bs) && !bdrv_is_read_only(bs)) {
ok = bdrv_can_snapshot(bs);
}
aio_context_release(ctx);
}
*first_bad_bs = bs;
return ok;
}
int bdrv_all_delete_snapshot(const char *name, BlockDriverState **first_bad_bs,
Error **err)
{
int ret = 0;
BlockDriverState *bs = NULL;
QEMUSnapshotInfo sn1, *snapshot = &sn1;
while (ret == 0 && (bs = bdrv_next(bs))) {
AioContext *ctx = bdrv_get_aio_context(bs);
aio_context_acquire(ctx);
if (bdrv_can_snapshot(bs) &&
bdrv_snapshot_find(bs, snapshot, name) >= 0) {
ret = bdrv_snapshot_delete_by_id_or_name(bs, name, err);
}
aio_context_release(ctx);
}
*first_bad_bs = bs;
return ret;
}
int bdrv_all_goto_snapshot(const char *name, BlockDriverState **first_bad_bs)
{
int err = 0;
BlockDriverState *bs = NULL;
while (err == 0 && (bs = bdrv_next(bs))) {
AioContext *ctx = bdrv_get_aio_context(bs);
aio_context_acquire(ctx);
if (bdrv_can_snapshot(bs)) {
err = bdrv_snapshot_goto(bs, name);
}
aio_context_release(ctx);
}
*first_bad_bs = bs;
return err;
}
int bdrv_all_find_snapshot(const char *name, BlockDriverState **first_bad_bs)
{
QEMUSnapshotInfo sn;
int err = 0;
BlockDriverState *bs = NULL;
while (err == 0 && (bs = bdrv_next(bs))) {
AioContext *ctx = bdrv_get_aio_context(bs);
aio_context_acquire(ctx);
if (bdrv_can_snapshot(bs)) {
err = bdrv_snapshot_find(bs, &sn, name);
}
aio_context_release(ctx);
}
*first_bad_bs = bs;
return err;
}
int bdrv_all_create_snapshot(QEMUSnapshotInfo *sn,
BlockDriverState *vm_state_bs,
uint64_t vm_state_size,
BlockDriverState **first_bad_bs)
{
int err = 0;
BlockDriverState *bs = NULL;
while (err == 0 && (bs = bdrv_next(bs))) {
AioContext *ctx = bdrv_get_aio_context(bs);
aio_context_acquire(ctx);
if (bs == vm_state_bs) {
sn->vm_state_size = vm_state_size;
err = bdrv_snapshot_create(bs, sn);
} else if (bdrv_can_snapshot(bs)) {
sn->vm_state_size = 0;
err = bdrv_snapshot_create(bs, sn);
}
aio_context_release(ctx);
}
*first_bad_bs = bs;
return err;
}
BlockDriverState *bdrv_all_find_vmstate_bs(void)
{
bool not_found = true;
BlockDriverState *bs = NULL;
while (not_found && (bs = bdrv_next(bs))) {
AioContext *ctx = bdrv_get_aio_context(bs);
aio_context_acquire(ctx);
not_found = !bdrv_can_snapshot(bs);
aio_context_release(ctx);
}
return bs;
}

View File

@@ -800,14 +800,15 @@ static coroutine_fn void set_fd_handler(BDRVSSHState *s, BlockDriverState *bs)
rd_handler, wr_handler);
aio_set_fd_handler(bdrv_get_aio_context(bs), s->sock,
rd_handler, wr_handler, co);
false, rd_handler, wr_handler, co);
}
static coroutine_fn void clear_fd_handler(BDRVSSHState *s,
BlockDriverState *bs)
{
DPRINTF("s->sock=%d", s->sock);
aio_set_fd_handler(bdrv_get_aio_context(bs), s->sock, NULL, NULL, NULL);
aio_set_fd_handler(bdrv_get_aio_context(bs), s->sock,
false, NULL, NULL, NULL);
}
/* A non-blocking call returned EAGAIN, so yield, ensuring the

View File

@@ -16,6 +16,7 @@
#include "block/blockjob.h"
#include "qapi/qmp/qerror.h"
#include "qemu/ratelimit.h"
#include "sysemu/block-backend.h"
enum {
/*
@@ -222,7 +223,7 @@ void stream_start(BlockDriverState *bs, BlockDriverState *base,
if ((on_error == BLOCKDEV_ON_ERROR_STOP ||
on_error == BLOCKDEV_ON_ERROR_ENOSPC) &&
!bdrv_iostatus_is_enabled(bs)) {
(!bs->blk || !blk_iostatus_is_enabled(bs->blk))) {
error_setg(errp, QERR_INVALID_PARAMETER, "on-error");
return;
}

View File

@@ -33,8 +33,7 @@
* its own locking.
*
* This locking is however handled internally in this file, so it's
* mostly transparent to outside users (but see the documentation in
* throttle_groups_lock()).
* transparent to outside users.
*
* The whole ThrottleGroup structure is private and invisible to
* outside users, that only use it through its ThrottleState.
@@ -76,9 +75,9 @@ static QTAILQ_HEAD(, ThrottleGroup) throttle_groups =
* created.
*
* @name: the name of the ThrottleGroup
* @ret: the ThrottleGroup
* @ret: the ThrottleState member of the ThrottleGroup
*/
static ThrottleGroup *throttle_group_incref(const char *name)
ThrottleState *throttle_group_incref(const char *name)
{
ThrottleGroup *tg = NULL;
ThrottleGroup *iter;
@@ -108,7 +107,7 @@ static ThrottleGroup *throttle_group_incref(const char *name)
qemu_mutex_unlock(&throttle_groups_lock);
return tg;
return &tg->ts;
}
/* Decrease the reference count of a ThrottleGroup.
@@ -116,10 +115,12 @@ static ThrottleGroup *throttle_group_incref(const char *name)
* When the reference count reaches zero the ThrottleGroup is
* destroyed.
*
* @tg: The ThrottleGroup to unref
* @ts: The ThrottleGroup to unref, given by its ThrottleState member
*/
static void throttle_group_unref(ThrottleGroup *tg)
void throttle_group_unref(ThrottleState *ts)
{
ThrottleGroup *tg = container_of(ts, ThrottleGroup, ts);
qemu_mutex_lock(&throttle_groups_lock);
if (--tg->refcount == 0) {
QTAILQ_REMOVE(&throttle_groups, tg, list);
@@ -401,7 +402,8 @@ static void write_timer_cb(void *opaque)
void throttle_group_register_bs(BlockDriverState *bs, const char *groupname)
{
int i;
ThrottleGroup *tg = throttle_group_incref(groupname);
ThrottleState *ts = throttle_group_incref(groupname);
ThrottleGroup *tg = container_of(ts, ThrottleGroup, ts);
int clock_type = QEMU_CLOCK_REALTIME;
if (qtest_enabled()) {
@@ -409,7 +411,7 @@ void throttle_group_register_bs(BlockDriverState *bs, const char *groupname)
clock_type = QEMU_CLOCK_VIRTUAL;
}
bs->throttle_state = &tg->ts;
bs->throttle_state = ts;
qemu_mutex_lock(&tg->lock);
/* If the ThrottleGroup is new set this BlockDriverState as the token */
@@ -435,6 +437,9 @@ void throttle_group_register_bs(BlockDriverState *bs, const char *groupname)
* list, destroying the timers and setting the throttle_state pointer
* to NULL.
*
* The BlockDriverState must not have pending throttled requests, so
* the caller has to drain them first.
*
* The group will be destroyed if it's empty after this operation.
*
* @bs: the BlockDriverState to remove
@@ -444,6 +449,10 @@ void throttle_group_unregister_bs(BlockDriverState *bs)
ThrottleGroup *tg = container_of(bs->throttle_state, ThrottleGroup, ts);
int i;
assert(bs->pending_reqs[0] == 0 && bs->pending_reqs[1] == 0);
assert(qemu_co_queue_empty(&bs->throttled_reqs[0]));
assert(qemu_co_queue_empty(&bs->throttled_reqs[1]));
qemu_mutex_lock(&tg->lock);
for (i = 0; i < 2; i++) {
if (tg->tokens[i] == bs) {
@@ -461,38 +470,10 @@ void throttle_group_unregister_bs(BlockDriverState *bs)
throttle_timers_destroy(&bs->throttle_timers);
qemu_mutex_unlock(&tg->lock);
throttle_group_unref(tg);
throttle_group_unref(&tg->ts);
bs->throttle_state = NULL;
}
/* Acquire the lock of this throttling group.
*
* You won't normally need to use this. None of the functions from the
* ThrottleGroup API require you to acquire the lock since all of them
* deal with it internally.
*
* This should only be used in exceptional cases when you want to
* access the protected fields of a BlockDriverState directly
* (e.g. bdrv_swap()).
*
* @bs: a BlockDriverState that is member of the group
*/
void throttle_group_lock(BlockDriverState *bs)
{
ThrottleGroup *tg = container_of(bs->throttle_state, ThrottleGroup, ts);
qemu_mutex_lock(&tg->lock);
}
/* Release the lock of this throttling group.
*
* See the comments in throttle_group_lock().
*/
void throttle_group_unlock(BlockDriverState *bs)
{
ThrottleGroup *tg = container_of(bs->throttle_state, ThrottleGroup, ts);
qemu_mutex_unlock(&tg->lock);
}
static void throttle_groups_init(void)
{
qemu_mutex_init(&throttle_groups_lock);

View File

@@ -53,7 +53,7 @@
#include "block/block_int.h"
#include "qemu/module.h"
#include "migration/migration.h"
#include "block/coroutine.h"
#include "qemu/coroutine.h"
#if defined(CONFIG_UUID)
#include <uuid/uuid.h>

View File

@@ -2161,19 +2161,19 @@ static ImageInfoSpecific *vmdk_get_specific_info(BlockDriverState *bs)
ImageInfoList **next;
*spec_info = (ImageInfoSpecific){
.kind = IMAGE_INFO_SPECIFIC_KIND_VMDK,
.type = IMAGE_INFO_SPECIFIC_KIND_VMDK,
{
.vmdk = g_new0(ImageInfoSpecificVmdk, 1),
},
};
*spec_info->vmdk = (ImageInfoSpecificVmdk) {
*spec_info->u.vmdk = (ImageInfoSpecificVmdk) {
.create_type = g_strdup(s->create_type),
.cid = s->cid,
.parent_cid = s->parent_cid,
};
next = &spec_info->vmdk->extents;
next = &spec_info->u.vmdk->extents;
for (i = 0; i < s->num_extents; i++) {
*next = g_new0(ImageInfoList, 1);
(*next)->value = vmdk_get_extent_info(&s->extents[i]);

View File

@@ -174,7 +174,7 @@ int win32_aio_attach(QEMUWin32AIOState *aio, HANDLE hfile)
void win32_aio_detach_aio_context(QEMUWin32AIOState *aio,
AioContext *old_context)
{
aio_set_event_notifier(old_context, &aio->e, NULL);
aio_set_event_notifier(old_context, &aio->e, false, NULL);
aio->is_aio_context_attached = false;
}
@@ -182,7 +182,8 @@ void win32_aio_attach_aio_context(QEMUWin32AIOState *aio,
AioContext *new_context)
{
aio->is_aio_context_attached = true;
aio_set_event_notifier(new_context, &aio->e, win32_aio_completion_cb);
aio_set_event_notifier(new_context, &aio->e, false,
win32_aio_completion_cb);
}
QEMUWin32AIOState *win32_aio_init(void)

View File

@@ -11,7 +11,7 @@
*/
#include "block/block_int.h"
#include "block/coroutine.h"
#include "qemu/coroutine.h"
#include "block/write-threshold.h"
#include "qemu/notify.h"
#include "qapi-event.h"

1700
blockdev.c

File diff suppressed because it is too large Load Diff

View File

@@ -29,13 +29,27 @@
#include "block/block.h"
#include "block/blockjob.h"
#include "block/block_int.h"
#include "sysemu/block-backend.h"
#include "qapi/qmp/qerror.h"
#include "qapi/qmp/qjson.h"
#include "block/coroutine.h"
#include "qemu/coroutine.h"
#include "qmp-commands.h"
#include "qemu/timer.h"
#include "qapi-event.h"
/* Transactional group of block jobs */
struct BlockJobTxn {
/* Is this txn being cancelled? */
bool aborting;
/* List of jobs */
QLIST_HEAD(, BlockJob) jobs;
/* Reference count */
int refcnt;
};
void *block_job_create(const BlockJobDriver *driver, BlockDriverState *bs,
int64_t speed, BlockCompletionFunc *cb,
void *opaque, Error **errp)
@@ -59,6 +73,7 @@ void *block_job_create(const BlockJobDriver *driver, BlockDriverState *bs,
job->cb = cb;
job->opaque = opaque;
job->busy = true;
job->refcnt = 1;
bs->job = job;
/* Only set speed when necessary to avoid NotSupported error */
@@ -67,7 +82,7 @@ void *block_job_create(const BlockJobDriver *driver, BlockDriverState *bs,
block_job_set_speed(job, speed, &local_err);
if (local_err) {
block_job_release(bs);
block_job_unref(job);
error_propagate(errp, local_err);
return NULL;
}
@@ -75,15 +90,101 @@ void *block_job_create(const BlockJobDriver *driver, BlockDriverState *bs,
return job;
}
void block_job_release(BlockDriverState *bs)
void block_job_ref(BlockJob *job)
{
BlockJob *job = bs->job;
++job->refcnt;
}
bs->job = NULL;
bdrv_op_unblock_all(bs, job->blocker);
error_free(job->blocker);
g_free(job->id);
g_free(job);
void block_job_unref(BlockJob *job)
{
if (--job->refcnt == 0) {
job->bs->job = NULL;
bdrv_op_unblock_all(job->bs, job->blocker);
bdrv_unref(job->bs);
error_free(job->blocker);
g_free(job->id);
g_free(job);
}
}
static void block_job_completed_single(BlockJob *job)
{
if (!job->ret) {
if (job->driver->commit) {
job->driver->commit(job);
}
} else {
if (job->driver->abort) {
job->driver->abort(job);
}
}
job->cb(job->opaque, job->ret);
if (job->txn) {
block_job_txn_unref(job->txn);
}
block_job_unref(job);
}
static void block_job_completed_txn_abort(BlockJob *job)
{
AioContext *ctx;
BlockJobTxn *txn = job->txn;
BlockJob *other_job, *next;
if (txn->aborting) {
/*
* We are cancelled by another job, which will handle everything.
*/
return;
}
txn->aborting = true;
/* We are the first failed job. Cancel other jobs. */
QLIST_FOREACH(other_job, &txn->jobs, txn_list) {
ctx = bdrv_get_aio_context(other_job->bs);
aio_context_acquire(ctx);
}
QLIST_FOREACH(other_job, &txn->jobs, txn_list) {
if (other_job == job || other_job->completed) {
/* Other jobs are "effectively" cancelled by us, set the status for
* them; this job, however, may or may not be cancelled, depending
* on the caller, so leave it. */
if (other_job != job) {
other_job->cancelled = true;
}
continue;
}
block_job_cancel_sync(other_job);
assert(other_job->completed);
}
QLIST_FOREACH_SAFE(other_job, &txn->jobs, txn_list, next) {
ctx = bdrv_get_aio_context(other_job->bs);
block_job_completed_single(other_job);
aio_context_release(ctx);
}
}
static void block_job_completed_txn_success(BlockJob *job)
{
AioContext *ctx;
BlockJobTxn *txn = job->txn;
BlockJob *other_job, *next;
/*
* Successful completion, see if there are other running jobs in this
* txn.
*/
QLIST_FOREACH(other_job, &txn->jobs, txn_list) {
if (!other_job->completed) {
return;
}
}
/* We are the last completed job, commit the transaction. */
QLIST_FOREACH_SAFE(other_job, &txn->jobs, txn_list, next) {
ctx = bdrv_get_aio_context(other_job->bs);
aio_context_acquire(ctx);
assert(other_job->ret == 0);
block_job_completed_single(other_job);
aio_context_release(ctx);
}
}
void block_job_completed(BlockJob *job, int ret)
@@ -91,8 +192,16 @@ void block_job_completed(BlockJob *job, int ret)
BlockDriverState *bs = job->bs;
assert(bs->job == job);
job->cb(job->opaque, ret);
block_job_release(bs);
assert(!job->completed);
job->completed = true;
job->ret = ret;
if (!job->txn) {
block_job_completed_single(job);
} else if (ret < 0 || block_job_is_cancelled(job)) {
block_job_completed_txn_abort(job);
} else {
block_job_completed_txn_success(job);
}
}
void block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
@@ -177,43 +286,29 @@ struct BlockFinishData {
int ret;
};
static void block_job_finish_cb(void *opaque, int ret)
{
struct BlockFinishData *data = opaque;
data->cancelled = block_job_is_cancelled(data->job);
data->ret = ret;
data->cb(data->opaque, ret);
}
static int block_job_finish_sync(BlockJob *job,
void (*finish)(BlockJob *, Error **errp),
Error **errp)
{
struct BlockFinishData data;
BlockDriverState *bs = job->bs;
Error *local_err = NULL;
int ret;
assert(bs->job == job);
/* Set up our own callback to store the result and chain to
* the original callback.
*/
data.job = job;
data.cb = job->cb;
data.opaque = job->opaque;
data.ret = -EINPROGRESS;
job->cb = block_job_finish_cb;
job->opaque = &data;
block_job_ref(job);
finish(job, &local_err);
if (local_err) {
error_propagate(errp, local_err);
block_job_unref(job);
return -EBUSY;
}
while (data.ret == -EINPROGRESS) {
while (!job->completed) {
aio_poll(bdrv_get_aio_context(bs), true);
}
return (data.cancelled && data.ret == 0) ? -ECANCELED : data.ret;
ret = (job->cancelled && job->ret == 0) ? -ECANCELED : job->ret;
block_job_unref(job);
return ret;
}
/* A wrapper around block_job_cancel() taking an Error ** parameter so it may be
@@ -354,8 +449,8 @@ BlockErrorAction block_job_error_action(BlockJob *job, BlockDriverState *bs,
job->user_paused = true;
block_job_pause(job);
block_job_iostatus_set_err(job, error);
if (bs != job->bs) {
bdrv_iostatus_set_err(bs, error);
if (bs->blk && bs != job->bs) {
blk_iostatus_set_err(bs->blk, error);
}
}
return action;
@@ -405,3 +500,36 @@ void block_job_defer_to_main_loop(BlockJob *job,
qemu_bh_schedule(data->bh);
}
BlockJobTxn *block_job_txn_new(void)
{
BlockJobTxn *txn = g_new0(BlockJobTxn, 1);
QLIST_INIT(&txn->jobs);
txn->refcnt = 1;
return txn;
}
static void block_job_txn_ref(BlockJobTxn *txn)
{
txn->refcnt++;
}
void block_job_txn_unref(BlockJobTxn *txn)
{
if (txn && --txn->refcnt == 0) {
g_free(txn);
}
}
void block_job_txn_add_job(BlockJobTxn *txn, BlockJob *job)
{
if (!txn) {
return;
}
assert(!job->txn);
job->txn = txn;
QLIST_INSERT_HEAD(&txn->jobs, job, txn_list);
block_job_txn_ref(txn);
}

View File

@@ -740,8 +740,7 @@ static void padzero(abi_ulong elf_bss, abi_ulong last_bss)
size must be known */
if (qemu_real_host_page_size < qemu_host_page_size) {
abi_ulong end_addr, end_addr1;
end_addr1 = (elf_bss + qemu_real_host_page_size - 1) &
~(qemu_real_host_page_size - 1);
end_addr1 = REAL_HOST_PAGE_ALIGN(elf_bss);
end_addr = HOST_PAGE_ALIGN(elf_bss);
if (end_addr1 < end_addr) {
mmap((void *)g2h(end_addr1), end_addr - end_addr1,

246
configure vendored
View File

@@ -8,6 +8,9 @@
CLICOLOR_FORCE= GREP_OPTIONS=
unset CLICOLOR_FORCE GREP_OPTIONS
# Don't allow CCACHE, if present, to use cached results of compile tests!
export CCACHE_RECACHE=yes
# Temporary directory used for files created while
# configure runs. Since it is in the build directory
# we can safely blow away any previous version of it
@@ -261,6 +264,7 @@ rdma=""
gprof="no"
debug_tcg="no"
debug="no"
fortify_source=""
strip_opt="yes"
tcg_interpreter="no"
bigendian="no"
@@ -331,6 +335,8 @@ gtkabi=""
gtk_gl="no"
gnutls=""
gnutls_hash=""
nettle=""
gcrypt=""
vte=""
virglrenderer=""
tpm="yes"
@@ -417,9 +423,6 @@ if test "$debug_info" = "yes"; then
LDFLAGS="-g $LDFLAGS"
fi
test_cflags=""
test_libs=""
# make source path absolute
source_path=`cd "$source_path"; pwd`
@@ -724,6 +727,8 @@ if test "$mingw32" = "yes" ; then
QEMU_CFLAGS="-DWIN32_LEAN_AND_MEAN -DWINVER=0x501 $QEMU_CFLAGS"
# enable C99/POSIX format strings (needs mingw32-runtime 3.15 or later)
QEMU_CFLAGS="-D__USE_MINGW_ANSI_STDIO=1 $QEMU_CFLAGS"
# MinGW needs -mthreads for TLS and macro _MT.
QEMU_CFLAGS="-mthreads $QEMU_CFLAGS"
LIBS="-lwinmm -lws2_32 -liphlpapi $LIBS"
write_c_skeleton;
if compile_prog "" "-liberty" ; then
@@ -788,6 +793,9 @@ for opt do
--enable-modules)
modules="yes"
;;
--disable-modules)
modules="no"
;;
--cpu=*)
;;
--target-list=*) target_list="$optarg"
@@ -877,6 +885,7 @@ for opt do
debug_tcg="yes"
debug="yes"
strip_opt="no"
fortify_source="no"
;;
--enable-sparse) sparse="yes"
;;
@@ -1114,6 +1123,14 @@ for opt do
;;
--enable-gnutls) gnutls="yes"
;;
--disable-nettle) nettle="no"
;;
--enable-nettle) nettle="yes"
;;
--disable-gcrypt) gcrypt="no"
;;
--enable-gcrypt) gcrypt="yes"
;;
--enable-rdma) rdma="yes"
;;
--disable-rdma) rdma="no"
@@ -1324,6 +1341,8 @@ disabled with --disable-FEATURE, default is enabled if available:
sparse sparse checker
gnutls GNUTLS cryptography support
nettle nettle cryptography support
gcrypt libgcrypt cryptography support
sdl SDL UI
--with-sdlabi select preferred SDL ABI 1.2 or 2.0
gtk gtk UI
@@ -1331,7 +1350,6 @@ disabled with --disable-FEATURE, default is enabled if available:
vte vte support for the gtk UI
curses curses UI
vnc VNC UI support
vnc-tls TLS encryption for VNC server
vnc-sasl SASL encryption for VNC server
vnc-jpeg JPEG lossy compression for VNC server
vnc-png PNG compression for VNC server
@@ -1410,6 +1428,9 @@ if compile_object ; then
else
error_exit "\"$cc\" either does not exist or does not work"
fi
if ! compile_prog ; then
error_exit "\"$cc\" cannot build an executable (is your linker broken?)"
fi
# Check that the C++ compiler exists and works with the C compiler
if has $cxx; then
@@ -1470,6 +1491,16 @@ for flag in $gcc_flags; do
done
if test "$stack_protector" != "no"; then
cat > $TMPC << EOF
int main(int argc, char *argv[])
{
char arr[64], *p = arr, *c = argv[0];
while (*c) {
*p++ = *c++;
}
return 0;
}
EOF
gcc_flags="-fstack-protector-strong -fstack-protector-all"
sp_on=0
for flag in $gcc_flags; do
@@ -1872,16 +1903,34 @@ fi
# libseccomp check
if test "$seccomp" != "no" ; then
if test "$cpu" = "i386" || test "$cpu" = "x86_64" &&
$pkg_config --atleast-version=2.1.1 libseccomp; then
case "$cpu" in
i386|x86_64)
libseccomp_minver="2.1.0"
;;
arm|aarch64)
libseccomp_minver="2.2.3"
;;
*)
libseccomp_minver=""
;;
esac
if test "$libseccomp_minver" != "" &&
$pkg_config --atleast-version=$libseccomp_minver libseccomp ; then
libs_softmmu="$libs_softmmu `$pkg_config --libs libseccomp`"
QEMU_CFLAGS="$QEMU_CFLAGS `$pkg_config --cflags libseccomp`"
seccomp="yes"
seccomp="yes"
else
if test "$seccomp" = "yes"; then
feature_not_found "libseccomp" "Install libseccomp devel >= 2.1.1"
fi
seccomp="no"
if test "$seccomp" = "yes" ; then
if test "$libseccomp_minver" != "" ; then
feature_not_found "libseccomp" \
"Install libseccomp devel >= $libseccomp_minver"
else
feature_not_found "libseccomp" \
"libseccomp is not supported for host cpu $cpu"
fi
fi
seccomp="no"
fi
fi
##########################################
@@ -1912,6 +1961,23 @@ EOF
elif
cat > $TMPC <<EOF &&
#include <xenctrl.h>
#include <stdint.h>
int main(void) {
xc_interface *xc = NULL;
xen_domain_handle_t handle;
xc_domain_create(xc, 0, handle, 0, NULL, NULL);
return 0;
}
EOF
compile_prog "" "$xen_libs"
then
xen_ctrl_version=470
xen=yes
# Xen 4.6
elif
cat > $TMPC <<EOF &&
#include <xenctrl.h>
#include <xenstore.h>
#include <stdint.h>
#include <xen/hvm/hvm_info_table.h>
@@ -2254,20 +2320,76 @@ else
gnutls_hash="no"
fi
if test "$gnutls_gcrypt" != "no"; then
if has "libgcrypt-config"; then
# If user didn't give a --disable/enable-gcrypt flag,
# then mark as disabled if user requested nettle
# explicitly, or if gnutls links to nettle
if test -z "$gcrypt"
then
if test "$nettle" = "yes" || test "$gnutls_nettle" = "yes"
then
gcrypt="no"
fi
fi
# If user didn't give a --disable/enable-nettle flag,
# then mark as disabled if user requested gcrypt
# explicitly, or if gnutls links to gcrypt
if test -z "$nettle"
then
if test "$gcrypt" = "yes" || test "$gnutls_gcrypt" = "yes"
then
nettle="no"
fi
fi
has_libgcrypt_config() {
if ! has "libgcrypt-config"
then
return 1
fi
if test -n "$cross_prefix"
then
host=`libgcrypt-config --host`
if test "$host-" != $cross_prefix
then
return 1
fi
fi
return 0
}
if test "$gcrypt" != "no"; then
if has_libgcrypt_config; then
gcrypt_cflags=`libgcrypt-config --cflags`
gcrypt_libs=`libgcrypt-config --libs`
# Debian has remove -lgpg-error from libgcrypt-config
# as it "spreads unnecessary dependencies" which in
# turn breaks static builds...
if test "$static" = "yes"
then
gcrypt_libs="$gcrypt_libs -lgpg-error"
fi
libs_softmmu="$gcrypt_libs $libs_softmmu"
libs_tools="$gcrypt_libs $libs_tools"
QEMU_CFLAGS="$QEMU_CFLAGS $gcrypt_cflags"
gcrypt="yes"
if test -z "$nettle"; then
nettle="no"
fi
else
feature_not_found "gcrypt" "Install gcrypt devel"
if test "$gcrypt" = "yes"; then
feature_not_found "gcrypt" "Install gcrypt devel"
else
gcrypt="no"
fi
fi
fi
if test "$gnutls_nettle" != "no"; then
if test "$nettle" != "no"; then
if $pkg_config --exists "nettle"; then
nettle_cflags=`$pkg_config --cflags nettle`
nettle_libs=`$pkg_config --libs nettle`
@@ -2275,20 +2397,30 @@ if test "$gnutls_nettle" != "no"; then
libs_softmmu="$nettle_libs $libs_softmmu"
libs_tools="$nettle_libs $libs_tools"
QEMU_CFLAGS="$QEMU_CFLAGS $nettle_cflags"
nettle="yes"
else
feature_not_found "nettle" "Install nettle devel"
if test "$nettle" = "yes"; then
feature_not_found "nettle" "Install nettle devel"
else
nettle="no"
fi
fi
fi
if test "$gcrypt" = "yes" && test "$nettle" = "yes"
then
error_exit "Only one of gcrypt & nettle can be enabled"
fi
##########################################
# libtasn1 - only for the TLS creds/session test suite
tasn1=yes
tasn1_cflags=""
tasn1_libs=""
if $pkg_config --exists "libtasn1"; then
tasn1_cflags=`$pkg_config --cflags libtasn1`
tasn1_libs=`$pkg_config --libs libtasn1`
test_cflags="$test_cflags $tasn1_cflags"
test_libs="$test_libs $tasn1_libs"
else
tasn1=no
fi
@@ -3211,25 +3343,11 @@ fi
libs_softmmu="$libs_softmmu $fdt_libs"
##########################################
# opengl probe (for sdl2, milkymist-tmu2)
# GLX probe, used by milkymist-tmu2
# this is temporary, code will be switched to egl mid-term.
cat > $TMPC << EOF
#include <X11/Xlib.h>
#include <GL/gl.h>
#include <GL/glx.h>
int main(void) { glBegin(0); glXQueryVersion(0,0,0); return 0; }
EOF
if compile_prog "" "-lGL -lX11" ; then
have_glx=yes
else
have_glx=no
fi
# opengl probe (for sdl2, gtk, milkymist-tmu2)
if test "$opengl" != "no" ; then
opengl_pkgs="gl glesv2 epoxy egl"
if $pkg_config $opengl_pkgs x11 && test "$have_glx" = "yes"; then
opengl_pkgs="epoxy"
if $pkg_config $opengl_pkgs x11; then
opengl_cflags="$($pkg_config --cflags $opengl_pkgs) $x11_cflags"
opengl_libs="$($pkg_config --libs $opengl_pkgs) $x11_libs"
opengl=yes
@@ -3491,6 +3609,22 @@ if compile_prog "" "" ; then
eventfd=yes
fi
# check if memfd is supported
memfd=no
cat > $TMPC << EOF
#include <sys/memfd.h>
int main(void)
{
return memfd_create("foo", MFD_ALLOW_SEALING);
}
EOF
if compile_prog "" "" ; then
memfd=yes
fi
# check for fallocate
fallocate=no
cat > $TMPC << EOF
@@ -4321,6 +4455,7 @@ fi
# check if ccache is interfering with
# semantic analysis of macros
unset CCACHE_CPP2
ccache_cpp2=no
cat > $TMPC << EOF
static const int Z = 1;
@@ -4344,6 +4479,20 @@ if ! compile_object "-Werror"; then
ccache_cpp2=yes
fi
#################################################
# clang does not support glibc + FORTIFY_SOURCE.
if test "$fortify_source" != "no"; then
if echo | $cc -dM -E - | grep __clang__ > /dev/null 2>&1 ; then
fortify_source="no";
elif test -n "$cxx" &&
echo | $cxx -dM -E - | grep __clang__ >/dev/null 2>&1 ; then
fortify_source="no";
else
fortify_source="yes"
fi
fi
##########################################
# End of CC checks
# After here, no more $cc or $ld runs
@@ -4351,8 +4500,10 @@ fi
if test "$gcov" = "yes" ; then
CFLAGS="-fprofile-arcs -ftest-coverage -g $CFLAGS"
LDFLAGS="-fprofile-arcs -ftest-coverage $LDFLAGS"
elif test "$debug" = "no" ; then
elif test "$fortify_source" = "yes" ; then
CFLAGS="-O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 $CFLAGS"
elif test "$debug" = "no"; then
CFLAGS="-O2 $CFLAGS"
fi
##########################################
@@ -4417,6 +4568,7 @@ if test "$want_tools" = "yes" ; then
tools="qemu-img\$(EXESUF) qemu-io\$(EXESUF) $tools"
if [ "$linux" = "yes" -o "$bsd" = "yes" -o "$solaris" = "yes" ] ; then
tools="qemu-nbd\$(EXESUF) $tools"
tools="ivshmem-client\$(EXESUF) ivshmem-server\$(EXESUF) $tools"
fi
fi
if test "$softmmu" = yes ; then
@@ -4437,7 +4589,7 @@ fi
if [ "$guest_agent" != "no" ]; then
if [ "$linux" = "yes" -o "$bsd" = "yes" -o "$solaris" = "yes" -o "$mingw32" = "yes" ] ; then
tools="qemu-ga\$(EXESUF) $tools"
tools="qemu-ga $tools"
guest_agent=yes
elif [ "$guest_agent" != yes ]; then
guest_agent=no
@@ -4605,8 +4757,8 @@ echo "GTK support $gtk"
echo "GTK GL support $gtk_gl"
echo "GNUTLS support $gnutls"
echo "GNUTLS hash $gnutls_hash"
echo "GNUTLS gcrypt $gnutls_gcrypt"
echo "GNUTLS nettle $gnutls_nettle ${gnutls_nettle+($nettle_version)}"
echo "libgcrypt $gcrypt"
echo "nettle $nettle ${nettle+($nettle_version)}"
echo "libtasn1 $tasn1"
echo "VTE support $vte"
echo "curses support $curses"
@@ -4885,6 +5037,9 @@ fi
if test "$eventfd" = "yes" ; then
echo "CONFIG_EVENTFD=y" >> $config_host_mak
fi
if test "$memfd" = "yes" ; then
echo "CONFIG_MEMFD=y" >> $config_host_mak
fi
if test "$fallocate" = "yes" ; then
echo "CONFIG_FALLOCATE=y" >> $config_host_mak
fi
@@ -4972,11 +5127,11 @@ fi
if test "$gnutls_hash" = "yes" ; then
echo "CONFIG_GNUTLS_HASH=y" >> $config_host_mak
fi
if test "$gnutls_gcrypt" = "yes" ; then
echo "CONFIG_GNUTLS_GCRYPT=y" >> $config_host_mak
if test "$gcrypt" = "yes" ; then
echo "CONFIG_GCRYPT=y" >> $config_host_mak
fi
if test "$gnutls_nettle" = "yes" ; then
echo "CONFIG_GNUTLS_NETTLE=y" >> $config_host_mak
if test "$nettle" = "yes" ; then
echo "CONFIG_NETTLE=y" >> $config_host_mak
echo "CONFIG_NETTLE_VERSION_MAJOR=${nettle_version%%.*}" >> $config_host_mak
fi
if test "$tasn1" = "yes" ; then
@@ -5311,8 +5466,8 @@ echo "EXESUF=$EXESUF" >> $config_host_mak
echo "DSOSUF=$DSOSUF" >> $config_host_mak
echo "LDFLAGS_SHARED=$LDFLAGS_SHARED" >> $config_host_mak
echo "LIBS_QGA+=$libs_qga" >> $config_host_mak
echo "TEST_LIBS=$test_libs" >> $config_host_mak
echo "TEST_CFLAGS=$test_cflags" >> $config_host_mak
echo "TASN1_LIBS=$tasn1_libs" >> $config_host_mak
echo "TASN1_CFLAGS=$tasn1_cflags" >> $config_host_mak
echo "POD2MAN=$POD2MAN" >> $config_host_mak
echo "TRANSLATE_OPT_CFLAGS=$TRANSLATE_OPT_CFLAGS" >> $config_host_mak
if test "$gcov" = "yes" ; then
@@ -5558,6 +5713,7 @@ case "$target_name" in
echo "CONFIG_KVM=y" >> $config_target_mak
if test "$vhost_net" = "yes" ; then
echo "CONFIG_VHOST_NET=y" >> $config_target_mak
echo "CONFIG_VHOST_NET_TEST_$target_name=y" >> $config_host_mak
fi
fi
esac

View File

@@ -0,0 +1 @@
ivshmem-client-obj-y = ivshmem-client.o main.o

View File

@@ -0,0 +1,446 @@
/*
* Copyright 6WIND S.A., 2014
*
* This work is licensed under the terms of the GNU GPL, version 2 or
* (at your option) any later version. See the COPYING file in the
* top-level directory.
*/
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/un.h>
#include "qemu-common.h"
#include "qemu/queue.h"
#include "ivshmem-client.h"
/* log a message on stdout if verbose=1 */
#define IVSHMEM_CLIENT_DEBUG(client, fmt, ...) do { \
if ((client)->verbose) { \
printf(fmt, ## __VA_ARGS__); \
} \
} while (0)
/* read message from the unix socket */
static int
ivshmem_client_read_one_msg(IvshmemClient *client, int64_t *index, int *fd)
{
int ret;
struct msghdr msg;
struct iovec iov[1];
union {
struct cmsghdr cmsg;
char control[CMSG_SPACE(sizeof(int))];
} msg_control;
struct cmsghdr *cmsg;
iov[0].iov_base = index;
iov[0].iov_len = sizeof(*index);
memset(&msg, 0, sizeof(msg));
msg.msg_iov = iov;
msg.msg_iovlen = 1;
msg.msg_control = &msg_control;
msg.msg_controllen = sizeof(msg_control);
ret = recvmsg(client->sock_fd, &msg, 0);
if (ret < sizeof(*index)) {
IVSHMEM_CLIENT_DEBUG(client, "cannot read message: %s\n",
strerror(errno));
return -1;
}
if (ret == 0) {
IVSHMEM_CLIENT_DEBUG(client, "lost connection to server\n");
return -1;
}
*index = GINT64_FROM_LE(*index);
*fd = -1;
for (cmsg = CMSG_FIRSTHDR(&msg); cmsg; cmsg = CMSG_NXTHDR(&msg, cmsg)) {
if (cmsg->cmsg_len != CMSG_LEN(sizeof(int)) ||
cmsg->cmsg_level != SOL_SOCKET ||
cmsg->cmsg_type != SCM_RIGHTS) {
continue;
}
memcpy(fd, CMSG_DATA(cmsg), sizeof(*fd));
}
return 0;
}
/* free a peer when the server advertises a disconnection or when the
* client is freed */
static void
ivshmem_client_free_peer(IvshmemClient *client, IvshmemClientPeer *peer)
{
unsigned vector;
QTAILQ_REMOVE(&client->peer_list, peer, next);
for (vector = 0; vector < peer->vectors_count; vector++) {
close(peer->vectors[vector]);
}
g_free(peer);
}
/* handle message coming from server (new peer, new vectors) */
static int
ivshmem_client_handle_server_msg(IvshmemClient *client)
{
IvshmemClientPeer *peer;
int64_t peer_id;
int ret, fd;
ret = ivshmem_client_read_one_msg(client, &peer_id, &fd);
if (ret < 0) {
return -1;
}
/* can return a peer or the local client */
peer = ivshmem_client_search_peer(client, peer_id);
/* delete peer */
if (fd == -1) {
if (peer == NULL || peer == &client->local) {
IVSHMEM_CLIENT_DEBUG(client, "receive delete for invalid "
"peer %" PRId64 "\n", peer_id);
return -1;
}
IVSHMEM_CLIENT_DEBUG(client, "delete peer id = %" PRId64 "\n", peer_id);
ivshmem_client_free_peer(client, peer);
return 0;
}
/* new peer */
if (peer == NULL) {
peer = g_malloc0(sizeof(*peer));
peer->id = peer_id;
peer->vectors_count = 0;
QTAILQ_INSERT_TAIL(&client->peer_list, peer, next);
IVSHMEM_CLIENT_DEBUG(client, "new peer id = %" PRId64 "\n", peer_id);
}
/* new vector */
IVSHMEM_CLIENT_DEBUG(client, " new vector %d (fd=%d) for peer id %"
PRId64 "\n", peer->vectors_count, fd, peer->id);
if (peer->vectors_count >= G_N_ELEMENTS(peer->vectors)) {
IVSHMEM_CLIENT_DEBUG(client, "Too many vectors received, failing");
return -1;
}
peer->vectors[peer->vectors_count] = fd;
peer->vectors_count++;
return 0;
}
/* init a new ivshmem client */
int
ivshmem_client_init(IvshmemClient *client, const char *unix_sock_path,
IvshmemClientNotifCb notif_cb, void *notif_arg,
bool verbose)
{
int ret;
unsigned i;
memset(client, 0, sizeof(*client));
ret = snprintf(client->unix_sock_path, sizeof(client->unix_sock_path),
"%s", unix_sock_path);
if (ret < 0 || ret >= sizeof(client->unix_sock_path)) {
IVSHMEM_CLIENT_DEBUG(client, "could not copy unix socket path\n");
return -1;
}
for (i = 0; i < IVSHMEM_CLIENT_MAX_VECTORS; i++) {
client->local.vectors[i] = -1;
}
QTAILQ_INIT(&client->peer_list);
client->local.id = -1;
client->notif_cb = notif_cb;
client->notif_arg = notif_arg;
client->verbose = verbose;
client->shm_fd = -1;
client->sock_fd = -1;
return 0;
}
/* create and connect to the unix socket */
int
ivshmem_client_connect(IvshmemClient *client)
{
struct sockaddr_un sun;
int fd, ret;
int64_t tmp;
IVSHMEM_CLIENT_DEBUG(client, "connect to client %s\n",
client->unix_sock_path);
client->sock_fd = socket(AF_UNIX, SOCK_STREAM, 0);
if (client->sock_fd < 0) {
IVSHMEM_CLIENT_DEBUG(client, "cannot create socket: %s\n",
strerror(errno));
return -1;
}
sun.sun_family = AF_UNIX;
ret = snprintf(sun.sun_path, sizeof(sun.sun_path), "%s",
client->unix_sock_path);
if (ret < 0 || ret >= sizeof(sun.sun_path)) {
IVSHMEM_CLIENT_DEBUG(client, "could not copy unix socket path\n");
goto err_close;
}
if (connect(client->sock_fd, (struct sockaddr *)&sun, sizeof(sun)) < 0) {
IVSHMEM_CLIENT_DEBUG(client, "cannot connect to %s: %s\n", sun.sun_path,
strerror(errno));
goto err_close;
}
/* first, we expect a protocol version */
if (ivshmem_client_read_one_msg(client, &tmp, &fd) < 0 ||
(tmp != IVSHMEM_PROTOCOL_VERSION) || fd != -1) {
IVSHMEM_CLIENT_DEBUG(client, "cannot read from server\n");
goto err_close;
}
/* then, we expect our index + a fd == -1 */
if (ivshmem_client_read_one_msg(client, &client->local.id, &fd) < 0 ||
client->local.id < 0 || fd != -1) {
IVSHMEM_CLIENT_DEBUG(client, "cannot read from server (2)\n");
goto err_close;
}
IVSHMEM_CLIENT_DEBUG(client, "our_id=%" PRId64 "\n", client->local.id);
/* now, we expect shared mem fd + a -1 index, note that shm fd
* is not used */
if (ivshmem_client_read_one_msg(client, &tmp, &fd) < 0 ||
tmp != -1 || fd < 0) {
if (fd >= 0) {
close(fd);
}
IVSHMEM_CLIENT_DEBUG(client, "cannot read from server (3)\n");
goto err_close;
}
client->shm_fd = fd;
IVSHMEM_CLIENT_DEBUG(client, "shm_fd=%d\n", fd);
return 0;
err_close:
close(client->sock_fd);
client->sock_fd = -1;
return -1;
}
/* close connection to the server, and free all peer structures */
void
ivshmem_client_close(IvshmemClient *client)
{
IvshmemClientPeer *peer;
unsigned i;
IVSHMEM_CLIENT_DEBUG(client, "close client\n");
while ((peer = QTAILQ_FIRST(&client->peer_list)) != NULL) {
ivshmem_client_free_peer(client, peer);
}
close(client->shm_fd);
client->shm_fd = -1;
close(client->sock_fd);
client->sock_fd = -1;
client->local.id = -1;
for (i = 0; i < IVSHMEM_CLIENT_MAX_VECTORS; i++) {
close(client->local.vectors[i]);
client->local.vectors[i] = -1;
}
client->local.vectors_count = 0;
}
/* get the fd_set according to the unix socket and peer list */
void
ivshmem_client_get_fds(const IvshmemClient *client, fd_set *fds, int *maxfd)
{
int fd;
unsigned vector;
FD_SET(client->sock_fd, fds);
if (client->sock_fd >= *maxfd) {
*maxfd = client->sock_fd + 1;
}
for (vector = 0; vector < client->local.vectors_count; vector++) {
fd = client->local.vectors[vector];
FD_SET(fd, fds);
if (fd >= *maxfd) {
*maxfd = fd + 1;
}
}
}
/* handle events from eventfd: just print a message on notification */
static int
ivshmem_client_handle_event(IvshmemClient *client, const fd_set *cur, int maxfd)
{
IvshmemClientPeer *peer;
uint64_t kick;
unsigned i;
int ret;
peer = &client->local;
for (i = 0; i < peer->vectors_count; i++) {
if (peer->vectors[i] >= maxfd || !FD_ISSET(peer->vectors[i], cur)) {
continue;
}
ret = read(peer->vectors[i], &kick, sizeof(kick));
if (ret < 0) {
return ret;
}
if (ret != sizeof(kick)) {
IVSHMEM_CLIENT_DEBUG(client, "invalid read size = %d\n", ret);
errno = EINVAL;
return -1;
}
IVSHMEM_CLIENT_DEBUG(client, "received event on fd %d vector %d: %"
PRIu64 "\n", peer->vectors[i], i, kick);
if (client->notif_cb != NULL) {
client->notif_cb(client, peer, i, client->notif_arg);
}
}
return 0;
}
/* read and handle new messages on the given fd_set */
int
ivshmem_client_handle_fds(IvshmemClient *client, fd_set *fds, int maxfd)
{
if (client->sock_fd < maxfd && FD_ISSET(client->sock_fd, fds) &&
ivshmem_client_handle_server_msg(client) < 0 && errno != EINTR) {
IVSHMEM_CLIENT_DEBUG(client, "ivshmem_client_handle_server_msg() "
"failed\n");
return -1;
} else if (ivshmem_client_handle_event(client, fds, maxfd) < 0 &&
errno != EINTR) {
IVSHMEM_CLIENT_DEBUG(client, "ivshmem_client_handle_event() failed\n");
return -1;
}
return 0;
}
/* send a notification on a vector of a peer */
int
ivshmem_client_notify(const IvshmemClient *client,
const IvshmemClientPeer *peer, unsigned vector)
{
uint64_t kick;
int fd;
if (vector >= peer->vectors_count) {
IVSHMEM_CLIENT_DEBUG(client, "invalid vector %u on peer %" PRId64 "\n",
vector, peer->id);
return -1;
}
fd = peer->vectors[vector];
IVSHMEM_CLIENT_DEBUG(client, "notify peer %" PRId64
" on vector %d, fd %d\n", peer->id, vector, fd);
kick = 1;
if (write(fd, &kick, sizeof(kick)) != sizeof(kick)) {
fprintf(stderr, "could not write to %d: %s\n", peer->vectors[vector],
strerror(errno));
return -1;
}
return 0;
}
/* send a notification to all vectors of a peer */
int
ivshmem_client_notify_all_vects(const IvshmemClient *client,
const IvshmemClientPeer *peer)
{
unsigned vector;
int ret = 0;
for (vector = 0; vector < peer->vectors_count; vector++) {
if (ivshmem_client_notify(client, peer, vector) < 0) {
ret = -1;
}
}
return ret;
}
/* send a notification to all peers */
int
ivshmem_client_notify_broadcast(const IvshmemClient *client)
{
IvshmemClientPeer *peer;
int ret = 0;
QTAILQ_FOREACH(peer, &client->peer_list, next) {
if (ivshmem_client_notify_all_vects(client, peer) < 0) {
ret = -1;
}
}
return ret;
}
/* lookup peer from its id */
IvshmemClientPeer *
ivshmem_client_search_peer(IvshmemClient *client, int64_t peer_id)
{
IvshmemClientPeer *peer;
if (peer_id == client->local.id) {
return &client->local;
}
QTAILQ_FOREACH(peer, &client->peer_list, next) {
if (peer->id == peer_id) {
return peer;
}
}
return NULL;
}
/* dump our info, the list of peers their vectors on stdout */
void
ivshmem_client_dump(const IvshmemClient *client)
{
const IvshmemClientPeer *peer;
unsigned vector;
/* dump local infos */
peer = &client->local;
printf("our_id = %" PRId64 "\n", peer->id);
for (vector = 0; vector < peer->vectors_count; vector++) {
printf(" vector %d is enabled (fd=%d)\n", vector,
peer->vectors[vector]);
}
/* dump peers */
QTAILQ_FOREACH(peer, &client->peer_list, next) {
printf("peer_id = %" PRId64 "\n", peer->id);
for (vector = 0; vector < peer->vectors_count; vector++) {
printf(" vector %d is enabled (fd=%d)\n", vector,
peer->vectors[vector]);
}
}
}

View File

@@ -0,0 +1,213 @@
/*
* Copyright 6WIND S.A., 2014
*
* This work is licensed under the terms of the GNU GPL, version 2 or
* (at your option) any later version. See the COPYING file in the
* top-level directory.
*/
#ifndef _IVSHMEM_CLIENT_H_
#define _IVSHMEM_CLIENT_H_
/**
* This file provides helper to implement an ivshmem client. It is used
* on the host to ask QEMU to send an interrupt to an ivshmem PCI device in a
* guest. QEMU also implements an ivshmem client similar to this one, they both
* connect to an ivshmem server.
*
* A standalone ivshmem client based on this file is provided for debug/test
* purposes.
*/
#include <limits.h>
#include <sys/select.h>
#include "qemu/queue.h"
#include "hw/misc/ivshmem.h"
/**
* Maximum number of notification vectors supported by the client
*/
#define IVSHMEM_CLIENT_MAX_VECTORS 64
/**
* Structure storing a peer
*
* Each time a client connects to an ivshmem server, it is advertised to
* all connected clients through the unix socket. When our ivshmem
* client receives a notification, it creates a IvshmemClientPeer
* structure to store the infos of this peer.
*
* This structure is also used to store the information of our own
* client in (IvshmemClient)->local.
*/
typedef struct IvshmemClientPeer {
QTAILQ_ENTRY(IvshmemClientPeer) next; /**< next in list*/
int64_t id; /**< the id of the peer */
int vectors[IVSHMEM_CLIENT_MAX_VECTORS]; /**< one fd per vector */
unsigned vectors_count; /**< number of vectors */
} IvshmemClientPeer;
QTAILQ_HEAD(IvshmemClientPeerList, IvshmemClientPeer);
typedef struct IvshmemClientPeerList IvshmemClientPeerList;
typedef struct IvshmemClient IvshmemClient;
/**
* Typedef of callback function used when our IvshmemClient receives a
* notification from a peer.
*/
typedef void (*IvshmemClientNotifCb)(
const IvshmemClient *client,
const IvshmemClientPeer *peer,
unsigned vect, void *arg);
/**
* Structure describing an ivshmem client
*
* This structure stores all information related to our client: the name
* of the server unix socket, the list of peers advertised by the
* server, our own client information, and a pointer the notification
* callback function used when we receive a notification from a peer.
*/
struct IvshmemClient {
char unix_sock_path[PATH_MAX]; /**< path to unix sock */
int sock_fd; /**< unix sock filedesc */
int shm_fd; /**< shm file descriptor */
IvshmemClientPeerList peer_list; /**< list of peers */
IvshmemClientPeer local; /**< our own infos */
IvshmemClientNotifCb notif_cb; /**< notification callback */
void *notif_arg; /**< notification argument */
bool verbose; /**< true to enable debug */
};
/**
* Initialize an ivshmem client
*
* @client: A pointer to an uninitialized IvshmemClient structure
* @unix_sock_path: The pointer to the unix socket file name
* @notif_cb: If not NULL, the pointer to the function to be called when
* our IvshmemClient receives a notification from a peer
* @notif_arg: Opaque pointer given as-is to the notification callback
* function
* @verbose: True to enable debug
*
* Returns: 0 on success, or a negative value on error
*/
int ivshmem_client_init(IvshmemClient *client, const char *unix_sock_path,
IvshmemClientNotifCb notif_cb, void *notif_arg,
bool verbose);
/**
* Connect to the server
*
* Connect to the server unix socket, and read the first initial
* messages sent by the server, giving the ID of the client and the file
* descriptor of the shared memory.
*
* @client: The ivshmem client
*
* Returns: 0 on success, or a negative value on error
*/
int ivshmem_client_connect(IvshmemClient *client);
/**
* Close connection to the server and free all peer structures
*
* @client: The ivshmem client
*/
void ivshmem_client_close(IvshmemClient *client);
/**
* Fill a fd_set with file descriptors to be monitored
*
* This function will fill a fd_set with all file descriptors
* that must be polled (unix server socket and peers eventfd). The
* function will not initialize the fd_set, it is up to the caller
* to do this.
*
* @client: The ivshmem client
* @fds: The fd_set to be updated
* @maxfd: Must be set to the max file descriptor + 1 in fd_set. This value is
* updated if this function adds a greater fd in fd_set.
*/
void ivshmem_client_get_fds(const IvshmemClient *client, fd_set *fds,
int *maxfd);
/**
* Read and handle new messages
*
* Given a fd_set filled by select(), handle incoming messages from
* server or peers.
*
* @client: The ivshmem client
* @fds: The fd_set containing the file descriptors to be checked. Note
* that file descriptors that are not related to our client are
* ignored.
* @maxfd: The maximum fd in fd_set, plus one.
*
* Returns: 0 on success, or a negative value on error
*/
int ivshmem_client_handle_fds(IvshmemClient *client, fd_set *fds, int maxfd);
/**
* Send a notification to a vector of a peer
*
* @client: The ivshmem client
* @peer: The peer to be notified
* @vector: The number of the vector
*
* Returns: 0 on success, or a negative value on error
*/
int ivshmem_client_notify(const IvshmemClient *client,
const IvshmemClientPeer *peer, unsigned vector);
/**
* Send a notification to all vectors of a peer
*
* @client: The ivshmem client
* @peer: The peer to be notified
*
* Returns: 0 on success, or a negative value on error (at least one
* notification failed)
*/
int ivshmem_client_notify_all_vects(const IvshmemClient *client,
const IvshmemClientPeer *peer);
/**
* Broadcat a notification to all vectors of all peers
*
* @client: The ivshmem client
*
* Returns: 0 on success, or a negative value on error (at least one
* notification failed)
*/
int ivshmem_client_notify_broadcast(const IvshmemClient *client);
/**
* Search a peer from its identifier
*
* Return the peer structure from its peer_id. If the given peer_id is
* the local id, the function returns the local peer structure.
*
* @client: The ivshmem client
* @peer_id: The identifier of the peer structure
*
* Returns: The peer structure, or NULL if not found
*/
IvshmemClientPeer *
ivshmem_client_search_peer(IvshmemClient *client, int64_t peer_id);
/**
* Dump information of this ivshmem client on stdout
*
* Dump the id and the vectors of the given ivshmem client and the list
* of its peers and their vectors on stdout.
*
* @client: The ivshmem client
*/
void ivshmem_client_dump(const IvshmemClient *client);
#endif /* _IVSHMEM_CLIENT_H_ */

View File

@@ -0,0 +1,240 @@
/*
* Copyright 6WIND S.A., 2014
*
* This work is licensed under the terms of the GNU GPL, version 2 or
* (at your option) any later version. See the COPYING file in the
* top-level directory.
*/
#include "qemu-common.h"
#include "ivshmem-client.h"
#define IVSHMEM_CLIENT_DEFAULT_VERBOSE 0
#define IVSHMEM_CLIENT_DEFAULT_UNIX_SOCK_PATH "/tmp/ivshmem_socket"
typedef struct IvshmemClientArgs {
bool verbose;
const char *unix_sock_path;
} IvshmemClientArgs;
/* show ivshmem_client_usage and exit with given error code */
static void
ivshmem_client_usage(const char *name, int code)
{
fprintf(stderr, "%s [opts]\n", name);
fprintf(stderr, " -h: show this help\n");
fprintf(stderr, " -v: verbose mode\n");
fprintf(stderr, " -S <unix_sock_path>: path to the unix socket\n"
" to connect to.\n"
" default=%s\n", IVSHMEM_CLIENT_DEFAULT_UNIX_SOCK_PATH);
exit(code);
}
/* parse the program arguments, exit on error */
static void
ivshmem_client_parse_args(IvshmemClientArgs *args, int argc, char *argv[])
{
int c;
while ((c = getopt(argc, argv,
"h" /* help */
"v" /* verbose */
"S:" /* unix_sock_path */
)) != -1) {
switch (c) {
case 'h': /* help */
ivshmem_client_usage(argv[0], 0);
break;
case 'v': /* verbose */
args->verbose = 1;
break;
case 'S': /* unix_sock_path */
args->unix_sock_path = optarg;
break;
default:
ivshmem_client_usage(argv[0], 1);
break;
}
}
}
/* show command line help */
static void
ivshmem_client_cmdline_help(void)
{
printf("dump: dump peers (including us)\n"
"int <peer> <vector>: notify one vector on a peer\n"
"int <peer> all: notify all vectors of a peer\n"
"int all: notify all vectors of all peers (excepting us)\n");
}
/* read stdin and handle commands */
static int
ivshmem_client_handle_stdin_command(IvshmemClient *client)
{
IvshmemClientPeer *peer;
char buf[128];
char *s, *token;
int ret;
int peer_id, vector;
memset(buf, 0, sizeof(buf));
ret = read(0, buf, sizeof(buf) - 1);
if (ret < 0) {
return -1;
}
s = buf;
while ((token = strsep(&s, "\n\r;")) != NULL) {
if (!strcmp(token, "")) {
continue;
}
if (!strcmp(token, "?")) {
ivshmem_client_cmdline_help();
}
if (!strcmp(token, "help")) {
ivshmem_client_cmdline_help();
} else if (!strcmp(token, "dump")) {
ivshmem_client_dump(client);
} else if (!strcmp(token, "int all")) {
ivshmem_client_notify_broadcast(client);
} else if (sscanf(token, "int %d %d", &peer_id, &vector) == 2) {
peer = ivshmem_client_search_peer(client, peer_id);
if (peer == NULL) {
printf("cannot find peer_id = %d\n", peer_id);
continue;
}
ivshmem_client_notify(client, peer, vector);
} else if (sscanf(token, "int %d all", &peer_id) == 1) {
peer = ivshmem_client_search_peer(client, peer_id);
if (peer == NULL) {
printf("cannot find peer_id = %d\n", peer_id);
continue;
}
ivshmem_client_notify_all_vects(client, peer);
} else {
printf("invalid command, type help\n");
}
}
printf("cmd> ");
fflush(stdout);
return 0;
}
/* listen on stdin (command line), on unix socket (notifications of new
* and dead peers), and on eventfd (IRQ request) */
static int
ivshmem_client_poll_events(IvshmemClient *client)
{
fd_set fds;
int ret, maxfd;
while (1) {
FD_ZERO(&fds);
FD_SET(0, &fds); /* add stdin in fd_set */
maxfd = 1;
ivshmem_client_get_fds(client, &fds, &maxfd);
ret = select(maxfd, &fds, NULL, NULL, NULL);
if (ret < 0) {
if (errno == EINTR) {
continue;
}
fprintf(stderr, "select error: %s\n", strerror(errno));
break;
}
if (ret == 0) {
continue;
}
if (FD_ISSET(0, &fds) &&
ivshmem_client_handle_stdin_command(client) < 0 && errno != EINTR) {
fprintf(stderr, "ivshmem_client_handle_stdin_command() failed\n");
break;
}
if (ivshmem_client_handle_fds(client, &fds, maxfd) < 0) {
fprintf(stderr, "ivshmem_client_handle_fds() failed\n");
break;
}
}
return ret;
}
/* callback when we receive a notification (just display it) */
static void
ivshmem_client_notification_cb(const IvshmemClient *client,
const IvshmemClientPeer *peer,
unsigned vect, void *arg)
{
(void)client;
(void)arg;
printf("receive notification from peer_id=%" PRId64 " vector=%u\n",
peer->id, vect);
}
int
main(int argc, char *argv[])
{
struct sigaction sa;
IvshmemClient client;
IvshmemClientArgs args = {
.verbose = IVSHMEM_CLIENT_DEFAULT_VERBOSE,
.unix_sock_path = IVSHMEM_CLIENT_DEFAULT_UNIX_SOCK_PATH,
};
/* parse arguments, will exit on error */
ivshmem_client_parse_args(&args, argc, argv);
/* Ignore SIGPIPE, see this link for more info:
* http://www.mail-archive.com/libevent-users@monkey.org/msg01606.html */
sa.sa_handler = SIG_IGN;
sa.sa_flags = 0;
if (sigemptyset(&sa.sa_mask) == -1 ||
sigaction(SIGPIPE, &sa, 0) == -1) {
perror("failed to ignore SIGPIPE; sigaction");
return 1;
}
ivshmem_client_cmdline_help();
printf("cmd> ");
fflush(stdout);
if (ivshmem_client_init(&client, args.unix_sock_path,
ivshmem_client_notification_cb, NULL,
args.verbose) < 0) {
fprintf(stderr, "cannot init client\n");
return 1;
}
while (1) {
if (ivshmem_client_connect(&client) < 0) {
fprintf(stderr, "cannot connect to server, retry in 1 second\n");
sleep(1);
continue;
}
fprintf(stdout, "listen on server socket %d\n", client.sock_fd);
if (ivshmem_client_poll_events(&client) == 0) {
continue;
}
/* disconnected from server, reset all peers */
fprintf(stdout, "disconnected from server\n");
ivshmem_client_close(&client);
}
return 0;
}

View File

@@ -0,0 +1 @@
ivshmem-server-obj-y = ivshmem-server.o main.o

View File

@@ -0,0 +1,493 @@
/*
* Copyright 6WIND S.A., 2014
*
* This work is licensed under the terms of the GNU GPL, version 2 or
* (at your option) any later version. See the COPYING file in the
* top-level directory.
*/
#include "qemu-common.h"
#include "qemu/sockets.h"
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/un.h>
#ifdef CONFIG_LINUX
#include <sys/vfs.h>
#endif
#include "ivshmem-server.h"
/* log a message on stdout if verbose=1 */
#define IVSHMEM_SERVER_DEBUG(server, fmt, ...) do { \
if ((server)->verbose) { \
printf(fmt, ## __VA_ARGS__); \
} \
} while (0)
/** maximum size of a huge page, used by ivshmem_server_ftruncate() */
#define IVSHMEM_SERVER_MAX_HUGEPAGE_SIZE (1024 * 1024 * 1024)
/** default listen backlog (number of sockets not accepted) */
#define IVSHMEM_SERVER_LISTEN_BACKLOG 10
/* send message to a client unix socket */
static int
ivshmem_server_send_one_msg(int sock_fd, int64_t peer_id, int fd)
{
int ret;
struct msghdr msg;
struct iovec iov[1];
union {
struct cmsghdr cmsg;
char control[CMSG_SPACE(sizeof(int))];
} msg_control;
struct cmsghdr *cmsg;
peer_id = GINT64_TO_LE(peer_id);
iov[0].iov_base = &peer_id;
iov[0].iov_len = sizeof(peer_id);
memset(&msg, 0, sizeof(msg));
msg.msg_iov = iov;
msg.msg_iovlen = 1;
/* if fd is specified, add it in a cmsg */
if (fd >= 0) {
memset(&msg_control, 0, sizeof(msg_control));
msg.msg_control = &msg_control;
msg.msg_controllen = sizeof(msg_control);
cmsg = CMSG_FIRSTHDR(&msg);
cmsg->cmsg_level = SOL_SOCKET;
cmsg->cmsg_type = SCM_RIGHTS;
cmsg->cmsg_len = CMSG_LEN(sizeof(int));
memcpy(CMSG_DATA(cmsg), &fd, sizeof(fd));
}
ret = sendmsg(sock_fd, &msg, 0);
if (ret <= 0) {
return -1;
}
return 0;
}
/* free a peer when the server advertises a disconnection or when the
* server is freed */
static void
ivshmem_server_free_peer(IvshmemServer *server, IvshmemServerPeer *peer)
{
unsigned vector;
IvshmemServerPeer *other_peer;
IVSHMEM_SERVER_DEBUG(server, "free peer %" PRId64 "\n", peer->id);
close(peer->sock_fd);
QTAILQ_REMOVE(&server->peer_list, peer, next);
/* advertise the deletion to other peers */
QTAILQ_FOREACH(other_peer, &server->peer_list, next) {
ivshmem_server_send_one_msg(other_peer->sock_fd, peer->id, -1);
}
for (vector = 0; vector < peer->vectors_count; vector++) {
event_notifier_cleanup(&peer->vectors[vector]);
}
g_free(peer);
}
/* send the peer id and the shm_fd just after a new client connection */
static int
ivshmem_server_send_initial_info(IvshmemServer *server, IvshmemServerPeer *peer)
{
int ret;
/* send our protocol version first */
ret = ivshmem_server_send_one_msg(peer->sock_fd, IVSHMEM_PROTOCOL_VERSION,
-1);
if (ret < 0) {
IVSHMEM_SERVER_DEBUG(server, "cannot send version: %s\n",
strerror(errno));
return -1;
}
/* send the peer id to the client */
ret = ivshmem_server_send_one_msg(peer->sock_fd, peer->id, -1);
if (ret < 0) {
IVSHMEM_SERVER_DEBUG(server, "cannot send peer id: %s\n",
strerror(errno));
return -1;
}
/* send the shm_fd */
ret = ivshmem_server_send_one_msg(peer->sock_fd, -1, server->shm_fd);
if (ret < 0) {
IVSHMEM_SERVER_DEBUG(server, "cannot send shm fd: %s\n",
strerror(errno));
return -1;
}
return 0;
}
/* handle message on listening unix socket (new client connection) */
static int
ivshmem_server_handle_new_conn(IvshmemServer *server)
{
IvshmemServerPeer *peer, *other_peer;
struct sockaddr_un unaddr;
socklen_t unaddr_len;
int newfd;
unsigned i;
/* accept the incoming connection */
unaddr_len = sizeof(unaddr);
newfd = qemu_accept(server->sock_fd,
(struct sockaddr *)&unaddr, &unaddr_len);
if (newfd < 0) {
IVSHMEM_SERVER_DEBUG(server, "cannot accept() %s\n", strerror(errno));
return -1;
}
qemu_set_nonblock(newfd);
IVSHMEM_SERVER_DEBUG(server, "accept()=%d\n", newfd);
/* allocate new structure for this peer */
peer = g_malloc0(sizeof(*peer));
peer->sock_fd = newfd;
/* get an unused peer id */
/* XXX: this could use id allocation such as Linux IDA, or simply
* a free-list */
for (i = 0; i < G_MAXUINT16; i++) {
if (ivshmem_server_search_peer(server, server->cur_id) == NULL) {
break;
}
server->cur_id++;
}
if (i == G_MAXUINT16) {
IVSHMEM_SERVER_DEBUG(server, "cannot allocate new client id\n");
close(newfd);
g_free(peer);
return -1;
}
peer->id = server->cur_id++;
/* create eventfd, one per vector */
peer->vectors_count = server->n_vectors;
for (i = 0; i < peer->vectors_count; i++) {
if (event_notifier_init(&peer->vectors[i], FALSE) < 0) {
IVSHMEM_SERVER_DEBUG(server, "cannot create eventfd\n");
goto fail;
}
}
/* send peer id and shm fd */
if (ivshmem_server_send_initial_info(server, peer) < 0) {
IVSHMEM_SERVER_DEBUG(server, "cannot send initial info\n");
goto fail;
}
/* advertise the new peer to others */
QTAILQ_FOREACH(other_peer, &server->peer_list, next) {
for (i = 0; i < peer->vectors_count; i++) {
ivshmem_server_send_one_msg(other_peer->sock_fd, peer->id,
peer->vectors[i].wfd);
}
}
/* advertise the other peers to the new one */
QTAILQ_FOREACH(other_peer, &server->peer_list, next) {
for (i = 0; i < peer->vectors_count; i++) {
ivshmem_server_send_one_msg(peer->sock_fd, other_peer->id,
other_peer->vectors[i].wfd);
}
}
/* advertise the new peer to itself */
for (i = 0; i < peer->vectors_count; i++) {
ivshmem_server_send_one_msg(peer->sock_fd, peer->id,
event_notifier_get_fd(&peer->vectors[i]));
}
QTAILQ_INSERT_TAIL(&server->peer_list, peer, next);
IVSHMEM_SERVER_DEBUG(server, "new peer id = %" PRId64 "\n",
peer->id);
return 0;
fail:
while (i--) {
event_notifier_cleanup(&peer->vectors[i]);
}
close(newfd);
g_free(peer);
return -1;
}
/* Try to ftruncate a file to next power of 2 of shmsize.
* If it fails; all power of 2 above shmsize are tested until
* we reach the maximum huge page size. This is useful
* if the shm file is in a hugetlbfs that cannot be truncated to the
* shm_size value. */
static int
ivshmem_server_ftruncate(int fd, unsigned shmsize)
{
int ret;
struct stat mapstat;
/* align shmsize to next power of 2 */
shmsize = pow2ceil(shmsize);
if (fstat(fd, &mapstat) != -1 && mapstat.st_size == shmsize) {
return 0;
}
while (shmsize <= IVSHMEM_SERVER_MAX_HUGEPAGE_SIZE) {
ret = ftruncate(fd, shmsize);
if (ret == 0) {
return ret;
}
shmsize *= 2;
}
return -1;
}
/* Init a new ivshmem server */
int
ivshmem_server_init(IvshmemServer *server, const char *unix_sock_path,
const char *shm_path, size_t shm_size, unsigned n_vectors,
bool verbose)
{
int ret;
memset(server, 0, sizeof(*server));
server->verbose = verbose;
ret = snprintf(server->unix_sock_path, sizeof(server->unix_sock_path),
"%s", unix_sock_path);
if (ret < 0 || ret >= sizeof(server->unix_sock_path)) {
IVSHMEM_SERVER_DEBUG(server, "could not copy unix socket path\n");
return -1;
}
ret = snprintf(server->shm_path, sizeof(server->shm_path),
"%s", shm_path);
if (ret < 0 || ret >= sizeof(server->shm_path)) {
IVSHMEM_SERVER_DEBUG(server, "could not copy shm path\n");
return -1;
}
server->shm_size = shm_size;
server->n_vectors = n_vectors;
QTAILQ_INIT(&server->peer_list);
return 0;
}
#ifdef CONFIG_LINUX
#define HUGETLBFS_MAGIC 0x958458f6
static long gethugepagesize(const char *path)
{
struct statfs fs;
int ret;
do {
ret = statfs(path, &fs);
} while (ret != 0 && errno == EINTR);
if (ret != 0) {
return -1;
}
if (fs.f_type != HUGETLBFS_MAGIC) {
return -1;
}
return fs.f_bsize;
}
#endif
/* open shm, create and bind to the unix socket */
int
ivshmem_server_start(IvshmemServer *server)
{
struct sockaddr_un sun;
int shm_fd, sock_fd, ret;
/* open shm file */
#ifdef CONFIG_LINUX
long hpagesize;
hpagesize = gethugepagesize(server->shm_path);
if (hpagesize < 0 && errno != ENOENT) {
IVSHMEM_SERVER_DEBUG(server, "cannot stat shm file %s: %s\n",
server->shm_path, strerror(errno));
}
if (hpagesize > 0) {
gchar *filename = g_strdup_printf("%s/ivshmem.XXXXXX", server->shm_path);
IVSHMEM_SERVER_DEBUG(server, "Using hugepages: %s\n", server->shm_path);
shm_fd = mkstemp(filename);
unlink(filename);
g_free(filename);
} else
#endif
{
IVSHMEM_SERVER_DEBUG(server, "Using POSIX shared memory: %s\n",
server->shm_path);
shm_fd = shm_open(server->shm_path, O_CREAT|O_RDWR, S_IRWXU);
}
if (shm_fd < 0) {
fprintf(stderr, "cannot open shm file %s: %s\n", server->shm_path,
strerror(errno));
return -1;
}
if (ivshmem_server_ftruncate(shm_fd, server->shm_size) < 0) {
fprintf(stderr, "ftruncate(%s) failed: %s\n", server->shm_path,
strerror(errno));
goto err_close_shm;
}
IVSHMEM_SERVER_DEBUG(server, "create & bind socket %s\n",
server->unix_sock_path);
/* create the unix listening socket */
sock_fd = socket(AF_UNIX, SOCK_STREAM, 0);
if (sock_fd < 0) {
IVSHMEM_SERVER_DEBUG(server, "cannot create socket: %s\n",
strerror(errno));
goto err_close_shm;
}
sun.sun_family = AF_UNIX;
ret = snprintf(sun.sun_path, sizeof(sun.sun_path), "%s",
server->unix_sock_path);
if (ret < 0 || ret >= sizeof(sun.sun_path)) {
IVSHMEM_SERVER_DEBUG(server, "could not copy unix socket path\n");
goto err_close_sock;
}
if (bind(sock_fd, (struct sockaddr *)&sun, sizeof(sun)) < 0) {
IVSHMEM_SERVER_DEBUG(server, "cannot connect to %s: %s\n", sun.sun_path,
strerror(errno));
goto err_close_sock;
}
if (listen(sock_fd, IVSHMEM_SERVER_LISTEN_BACKLOG) < 0) {
IVSHMEM_SERVER_DEBUG(server, "listen() failed: %s\n", strerror(errno));
goto err_close_sock;
}
server->sock_fd = sock_fd;
server->shm_fd = shm_fd;
return 0;
err_close_sock:
close(sock_fd);
err_close_shm:
close(shm_fd);
return -1;
}
/* close connections to clients, the unix socket and the shm fd */
void
ivshmem_server_close(IvshmemServer *server)
{
IvshmemServerPeer *peer, *npeer;
IVSHMEM_SERVER_DEBUG(server, "close server\n");
QTAILQ_FOREACH_SAFE(peer, &server->peer_list, next, npeer) {
ivshmem_server_free_peer(server, peer);
}
unlink(server->unix_sock_path);
close(server->sock_fd);
close(server->shm_fd);
server->sock_fd = -1;
server->shm_fd = -1;
}
/* get the fd_set according to the unix socket and the peer list */
void
ivshmem_server_get_fds(const IvshmemServer *server, fd_set *fds, int *maxfd)
{
IvshmemServerPeer *peer;
if (server->sock_fd == -1) {
return;
}
FD_SET(server->sock_fd, fds);
if (server->sock_fd >= *maxfd) {
*maxfd = server->sock_fd + 1;
}
QTAILQ_FOREACH(peer, &server->peer_list, next) {
FD_SET(peer->sock_fd, fds);
if (peer->sock_fd >= *maxfd) {
*maxfd = peer->sock_fd + 1;
}
}
}
/* process incoming messages on the sockets in fd_set */
int
ivshmem_server_handle_fds(IvshmemServer *server, fd_set *fds, int maxfd)
{
IvshmemServerPeer *peer, *peer_next;
if (server->sock_fd < maxfd && FD_ISSET(server->sock_fd, fds) &&
ivshmem_server_handle_new_conn(server) < 0 && errno != EINTR) {
IVSHMEM_SERVER_DEBUG(server, "ivshmem_server_handle_new_conn() "
"failed\n");
return -1;
}
QTAILQ_FOREACH_SAFE(peer, &server->peer_list, next, peer_next) {
/* any message from a peer socket result in a close() */
IVSHMEM_SERVER_DEBUG(server, "peer->sock_fd=%d\n", peer->sock_fd);
if (peer->sock_fd < maxfd && FD_ISSET(peer->sock_fd, fds)) {
ivshmem_server_free_peer(server, peer);
}
}
return 0;
}
/* lookup peer from its id */
IvshmemServerPeer *
ivshmem_server_search_peer(IvshmemServer *server, int64_t peer_id)
{
IvshmemServerPeer *peer;
QTAILQ_FOREACH(peer, &server->peer_list, next) {
if (peer->id == peer_id) {
return peer;
}
}
return NULL;
}
/* dump our info, the list of peers their vectors on stdout */
void
ivshmem_server_dump(const IvshmemServer *server)
{
const IvshmemServerPeer *peer;
unsigned vector;
/* dump peers */
QTAILQ_FOREACH(peer, &server->peer_list, next) {
printf("peer_id = %" PRId64 "\n", peer->id);
for (vector = 0; vector < peer->vectors_count; vector++) {
printf(" vector %d is enabled (fd=%d)\n", vector,
event_notifier_get_fd(&peer->vectors[vector]));
}
}
}

View File

@@ -0,0 +1,167 @@
/*
* Copyright 6WIND S.A., 2014
*
* This work is licensed under the terms of the GNU GPL, version 2 or
* (at your option) any later version. See the COPYING file in the
* top-level directory.
*/
#ifndef _IVSHMEM_SERVER_H_
#define _IVSHMEM_SERVER_H_
/**
* The ivshmem server is a daemon that creates a unix socket in listen
* mode. The ivshmem clients (qemu or ivshmem-client) connect to this
* unix socket. For each client, the server will create some eventfd
* (see EVENTFD(2)), one per vector. These fd are transmitted to all
* clients using the SCM_RIGHTS cmsg message. Therefore, each client is
* able to send a notification to another client without beeing
* "profixied" by the server.
*
* We use this mechanism to send interruptions between guests.
* qemu is able to transform an event on a eventfd into a PCI MSI-x
* interruption in the guest.
*
* The ivshmem server is also able to share the file descriptor
* associated to the ivshmem shared memory.
*/
#include <limits.h>
#include <sys/select.h>
#include <stdint.h>
#include <stdbool.h>
#include "qemu/event_notifier.h"
#include "qemu/queue.h"
#include "hw/misc/ivshmem.h"
/**
* Maximum number of notification vectors supported by the server
*/
#define IVSHMEM_SERVER_MAX_VECTORS 64
/**
* Structure storing a peer
*
* Each time a client connects to an ivshmem server, a new
* IvshmemServerPeer structure is created. This peer and all its
* vectors are advertised to all connected clients through the connected
* unix sockets.
*/
typedef struct IvshmemServerPeer {
QTAILQ_ENTRY(IvshmemServerPeer) next; /**< next in list*/
int sock_fd; /**< connected unix sock */
int64_t id; /**< the id of the peer */
EventNotifier vectors[IVSHMEM_SERVER_MAX_VECTORS]; /**< one per vector */
unsigned vectors_count; /**< number of vectors */
} IvshmemServerPeer;
QTAILQ_HEAD(IvshmemServerPeerList, IvshmemServerPeer);
typedef struct IvshmemServerPeerList IvshmemServerPeerList;
/**
* Structure describing an ivshmem server
*
* This structure stores all information related to our server: the name
* of the server unix socket and the list of connected peers.
*/
typedef struct IvshmemServer {
char unix_sock_path[PATH_MAX]; /**< path to unix socket */
int sock_fd; /**< unix sock file descriptor */
char shm_path[PATH_MAX]; /**< path to shm */
size_t shm_size; /**< size of shm */
int shm_fd; /**< shm file descriptor */
unsigned n_vectors; /**< number of vectors */
uint16_t cur_id; /**< id to be given to next client */
bool verbose; /**< true in verbose mode */
IvshmemServerPeerList peer_list; /**< list of peers */
} IvshmemServer;
/**
* Initialize an ivshmem server
*
* @server: A pointer to an uninitialized IvshmemServer structure
* @unix_sock_path: The pointer to the unix socket file name
* @shm_path: Path to the shared memory. The path corresponds to a POSIX
* shm name or a hugetlbfs mount point.
* @shm_size: Size of shared memory
* @n_vectors: Number of interrupt vectors per client
* @verbose: True to enable verbose mode
*
* Returns: 0 on success, or a negative value on error
*/
int
ivshmem_server_init(IvshmemServer *server, const char *unix_sock_path,
const char *shm_path, size_t shm_size, unsigned n_vectors,
bool verbose);
/**
* Open the shm, then create and bind to the unix socket
*
* @server: The pointer to the initialized IvshmemServer structure
*
* Returns: 0 on success, or a negative value on error
*/
int ivshmem_server_start(IvshmemServer *server);
/**
* Close the server
*
* Close connections to all clients, close the unix socket and the
* shared memory file descriptor. The structure remains initialized, so
* it is possible to call ivshmem_server_start() again after a call to
* ivshmem_server_close().
*
* @server: The ivshmem server
*/
void ivshmem_server_close(IvshmemServer *server);
/**
* Fill a fd_set with file descriptors to be monitored
*
* This function will fill a fd_set with all file descriptors that must
* be polled (unix server socket and peers unix socket). The function
* will not initialize the fd_set, it is up to the caller to do it.
*
* @server: The ivshmem server
* @fds: The fd_set to be updated
* @maxfd: Must be set to the max file descriptor + 1 in fd_set. This value is
* updated if this function adds a greater fd in fd_set.
*/
void
ivshmem_server_get_fds(const IvshmemServer *server, fd_set *fds, int *maxfd);
/**
* Read and handle new messages
*
* Given a fd_set (for instance filled by a call to select()), handle
* incoming messages from peers.
*
* @server: The ivshmem server
* @fds: The fd_set containing the file descriptors to be checked. Note that
* file descriptors that are not related to our server are ignored.
* @maxfd: The maximum fd in fd_set, plus one.
*
* Returns: 0 on success, or a negative value on error
*/
int ivshmem_server_handle_fds(IvshmemServer *server, fd_set *fds, int maxfd);
/**
* Search a peer from its identifier
*
* @server: The ivshmem server
* @peer_id: The identifier of the peer structure
*
* Returns: The peer structure, or NULL if not found
*/
IvshmemServerPeer *
ivshmem_server_search_peer(IvshmemServer *server, int64_t peer_id);
/**
* Dump information of this ivshmem server and its peers on stdout
*
* @server: The ivshmem server
*/
void ivshmem_server_dump(const IvshmemServer *server);
#endif /* _IVSHMEM_SERVER_H_ */

View File

@@ -0,0 +1,263 @@
/*
* Copyright 6WIND S.A., 2014
*
* This work is licensed under the terms of the GNU GPL, version 2 or
* (at your option) any later version. See the COPYING file in the
* top-level directory.
*/
#include "qemu-common.h"
#include "ivshmem-server.h"
#define IVSHMEM_SERVER_DEFAULT_VERBOSE 0
#define IVSHMEM_SERVER_DEFAULT_FOREGROUND 0
#define IVSHMEM_SERVER_DEFAULT_PID_FILE "/var/run/ivshmem-server.pid"
#define IVSHMEM_SERVER_DEFAULT_UNIX_SOCK_PATH "/tmp/ivshmem_socket"
#define IVSHMEM_SERVER_DEFAULT_SHM_PATH "ivshmem"
#define IVSHMEM_SERVER_DEFAULT_SHM_SIZE (4*1024*1024)
#define IVSHMEM_SERVER_DEFAULT_N_VECTORS 1
/* used to quit on signal SIGTERM */
static int ivshmem_server_quit;
/* arguments given by the user */
typedef struct IvshmemServerArgs {
bool verbose;
bool foreground;
const char *pid_file;
const char *unix_socket_path;
const char *shm_path;
uint64_t shm_size;
unsigned n_vectors;
} IvshmemServerArgs;
/* show ivshmem_server_usage and exit with given error code */
static void
ivshmem_server_usage(const char *name, int code)
{
fprintf(stderr, "%s [opts]\n", name);
fprintf(stderr, " -h: show this help\n");
fprintf(stderr, " -v: verbose mode\n");
fprintf(stderr, " -F: foreground mode (default is to daemonize)\n");
fprintf(stderr, " -p <pid_file>: path to the PID file (used in daemon\n"
" mode only).\n"
" Default=%s\n", IVSHMEM_SERVER_DEFAULT_SHM_PATH);
fprintf(stderr, " -S <unix_socket_path>: path to the unix socket\n"
" to listen to.\n"
" Default=%s\n", IVSHMEM_SERVER_DEFAULT_UNIX_SOCK_PATH);
fprintf(stderr, " -m <shm_path>: path to the shared memory.\n"
" The path corresponds to a POSIX shm name or a\n"
" hugetlbfs mount point.\n"
" default=%s\n", IVSHMEM_SERVER_DEFAULT_SHM_PATH);
fprintf(stderr, " -l <size>: size of shared memory in bytes. The suffix\n"
" K, M and G can be used (ex: 1K means 1024).\n"
" default=%u\n", IVSHMEM_SERVER_DEFAULT_SHM_SIZE);
fprintf(stderr, " -n <n_vects>: number of vectors.\n"
" default=%u\n", IVSHMEM_SERVER_DEFAULT_N_VECTORS);
exit(code);
}
/* parse the program arguments, exit on error */
static void
ivshmem_server_parse_args(IvshmemServerArgs *args, int argc, char *argv[])
{
int c;
unsigned long long v;
Error *errp = NULL;
while ((c = getopt(argc, argv,
"h" /* help */
"v" /* verbose */
"F" /* foreground */
"p:" /* pid_file */
"S:" /* unix_socket_path */
"m:" /* shm_path */
"l:" /* shm_size */
"n:" /* n_vectors */
)) != -1) {
switch (c) {
case 'h': /* help */
ivshmem_server_usage(argv[0], 0);
break;
case 'v': /* verbose */
args->verbose = 1;
break;
case 'F': /* foreground */
args->foreground = 1;
break;
case 'p': /* pid_file */
args->pid_file = optarg;
break;
case 'S': /* unix_socket_path */
args->unix_socket_path = optarg;
break;
case 'm': /* shm_path */
args->shm_path = optarg;
break;
case 'l': /* shm_size */
parse_option_size("shm_size", optarg, &args->shm_size, &errp);
if (errp) {
fprintf(stderr, "cannot parse shm size: %s\n",
error_get_pretty(errp));
error_free(errp);
ivshmem_server_usage(argv[0], 1);
}
break;
case 'n': /* n_vectors */
if (parse_uint_full(optarg, &v, 0) < 0) {
fprintf(stderr, "cannot parse n_vectors\n");
ivshmem_server_usage(argv[0], 1);
}
args->n_vectors = v;
break;
default:
ivshmem_server_usage(argv[0], 1);
break;
}
}
if (args->n_vectors > IVSHMEM_SERVER_MAX_VECTORS) {
fprintf(stderr, "too many requested vectors (max is %d)\n",
IVSHMEM_SERVER_MAX_VECTORS);
ivshmem_server_usage(argv[0], 1);
}
if (args->verbose == 1 && args->foreground == 0) {
fprintf(stderr, "cannot use verbose in daemon mode\n");
ivshmem_server_usage(argv[0], 1);
}
}
/* wait for events on listening server unix socket and connected client
* sockets */
static int
ivshmem_server_poll_events(IvshmemServer *server)
{
fd_set fds;
int ret = 0, maxfd;
while (!ivshmem_server_quit) {
FD_ZERO(&fds);
maxfd = 0;
ivshmem_server_get_fds(server, &fds, &maxfd);
ret = select(maxfd, &fds, NULL, NULL, NULL);
if (ret < 0) {
if (errno == EINTR) {
continue;
}
fprintf(stderr, "select error: %s\n", strerror(errno));
break;
}
if (ret == 0) {
continue;
}
if (ivshmem_server_handle_fds(server, &fds, maxfd) < 0) {
fprintf(stderr, "ivshmem_server_handle_fds() failed\n");
break;
}
}
return ret;
}
static void
ivshmem_server_quit_cb(int signum)
{
ivshmem_server_quit = 1;
}
int
main(int argc, char *argv[])
{
IvshmemServer server;
struct sigaction sa, sa_quit;
IvshmemServerArgs args = {
.verbose = IVSHMEM_SERVER_DEFAULT_VERBOSE,
.foreground = IVSHMEM_SERVER_DEFAULT_FOREGROUND,
.pid_file = IVSHMEM_SERVER_DEFAULT_PID_FILE,
.unix_socket_path = IVSHMEM_SERVER_DEFAULT_UNIX_SOCK_PATH,
.shm_path = IVSHMEM_SERVER_DEFAULT_SHM_PATH,
.shm_size = IVSHMEM_SERVER_DEFAULT_SHM_SIZE,
.n_vectors = IVSHMEM_SERVER_DEFAULT_N_VECTORS,
};
int ret = 1;
/* parse arguments, will exit on error */
ivshmem_server_parse_args(&args, argc, argv);
/* Ignore SIGPIPE, see this link for more info:
* http://www.mail-archive.com/libevent-users@monkey.org/msg01606.html */
sa.sa_handler = SIG_IGN;
sa.sa_flags = 0;
if (sigemptyset(&sa.sa_mask) == -1 ||
sigaction(SIGPIPE, &sa, 0) == -1) {
perror("failed to ignore SIGPIPE; sigaction");
goto err;
}
sa_quit.sa_handler = ivshmem_server_quit_cb;
sa_quit.sa_flags = 0;
if (sigemptyset(&sa_quit.sa_mask) == -1 ||
sigaction(SIGTERM, &sa_quit, 0) == -1) {
perror("failed to add SIGTERM handler; sigaction");
goto err;
}
/* init the ivshms structure */
if (ivshmem_server_init(&server, args.unix_socket_path, args.shm_path,
args.shm_size, args.n_vectors, args.verbose) < 0) {
fprintf(stderr, "cannot init server\n");
goto err;
}
/* start the ivshmem server (open shm & unix socket) */
if (ivshmem_server_start(&server) < 0) {
fprintf(stderr, "cannot bind\n");
goto err;
}
/* daemonize if asked to */
if (!args.foreground) {
FILE *fp;
if (qemu_daemon(1, 1) < 0) {
fprintf(stderr, "cannot daemonize: %s\n", strerror(errno));
goto err_close;
}
/* write pid file */
fp = fopen(args.pid_file, "w");
if (fp == NULL) {
fprintf(stderr, "cannot write pid file: %s\n", strerror(errno));
goto err_close;
}
fprintf(fp, "%d\n", (int) getpid());
fclose(fp);
}
ivshmem_server_poll_events(&server);
fprintf(stdout, "server disconnected\n");
ret = 0;
err_close:
ivshmem_server_close(&server);
err:
return ret;
}

View File

@@ -30,6 +30,7 @@
#if defined(TARGET_I386) && !defined(CONFIG_USER_ONLY)
#include "hw/i386/apic.h"
#endif
#include "sysemu/replay.h"
/* -icount align implementation. */
@@ -184,7 +185,7 @@ static inline tcg_target_ulong cpu_tb_exec(CPUState *cpu, uint8_t *tb_ptr)
/* Execute the code without caching the generated code. An interpreter
could be used if available. */
static void cpu_exec_nocache(CPUState *cpu, int max_cycles,
TranslationBlock *orig_tb)
TranslationBlock *orig_tb, bool ignore_icount)
{
TranslationBlock *tb;
@@ -194,7 +195,8 @@ static void cpu_exec_nocache(CPUState *cpu, int max_cycles,
max_cycles = CF_COUNT_MASK;
tb = tb_gen_code(cpu, orig_tb->pc, orig_tb->cs_base, orig_tb->flags,
max_cycles | CF_NOCACHE);
max_cycles | CF_NOCACHE
| (ignore_icount ? CF_IGNORE_ICOUNT : 0));
tb->orig_tb = tcg_ctx.tb_ctx.tb_invalidated_flag ? NULL : orig_tb;
cpu->current_tb = tb;
/* execute the generated code */
@@ -345,21 +347,25 @@ int cpu_exec(CPUState *cpu)
uintptr_t next_tb;
SyncClocks sc;
/* replay_interrupt may need current_cpu */
current_cpu = cpu;
if (cpu->halted) {
#if defined(TARGET_I386) && !defined(CONFIG_USER_ONLY)
if (cpu->interrupt_request & CPU_INTERRUPT_POLL) {
if ((cpu->interrupt_request & CPU_INTERRUPT_POLL)
&& replay_interrupt()) {
apic_poll_irq(x86_cpu->apic_state);
cpu_reset_interrupt(cpu, CPU_INTERRUPT_POLL);
}
#endif
if (!cpu_has_work(cpu)) {
current_cpu = NULL;
return EXCP_HALTED;
}
cpu->halted = 0;
}
current_cpu = cpu;
atomic_mb_set(&tcg_current_cpu, cpu);
rcu_read_lock();
@@ -401,10 +407,22 @@ int cpu_exec(CPUState *cpu)
cpu->exception_index = -1;
break;
#else
cc->do_interrupt(cpu);
cpu->exception_index = -1;
if (replay_exception()) {
cc->do_interrupt(cpu);
cpu->exception_index = -1;
} else if (!replay_has_interrupt()) {
/* give a chance to iothread in replay mode */
ret = EXCP_INTERRUPT;
break;
}
#endif
}
} else if (replay_has_exception()
&& cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
/* try to cause an exception pending in the log */
cpu_exec_nocache(cpu, 1, tb_find_fast(cpu), true);
ret = -1;
break;
}
next_tb = 0; /* force lookup of first TB */
@@ -420,30 +438,40 @@ int cpu_exec(CPUState *cpu)
cpu->exception_index = EXCP_DEBUG;
cpu_loop_exit(cpu);
}
if (interrupt_request & CPU_INTERRUPT_HALT) {
if (replay_mode == REPLAY_MODE_PLAY
&& !replay_has_interrupt()) {
/* Do nothing */
} else if (interrupt_request & CPU_INTERRUPT_HALT) {
replay_interrupt();
cpu->interrupt_request &= ~CPU_INTERRUPT_HALT;
cpu->halted = 1;
cpu->exception_index = EXCP_HLT;
cpu_loop_exit(cpu);
}
#if defined(TARGET_I386)
if (interrupt_request & CPU_INTERRUPT_INIT) {
else if (interrupt_request & CPU_INTERRUPT_INIT) {
replay_interrupt();
cpu_svm_check_intercept_param(env, SVM_EXIT_INIT, 0);
do_cpu_init(x86_cpu);
cpu->exception_index = EXCP_HALTED;
cpu_loop_exit(cpu);
}
#else
if (interrupt_request & CPU_INTERRUPT_RESET) {
else if (interrupt_request & CPU_INTERRUPT_RESET) {
replay_interrupt();
cpu_reset(cpu);
cpu_loop_exit(cpu);
}
#endif
/* The target hook has 3 exit conditions:
False when the interrupt isn't processed,
True when it is, and we should restart on a new TB,
and via longjmp via cpu_loop_exit. */
if (cc->cpu_exec_interrupt(cpu, interrupt_request)) {
next_tb = 0;
else {
replay_interrupt();
if (cc->cpu_exec_interrupt(cpu, interrupt_request)) {
next_tb = 0;
}
}
/* Don't use the cached interrupt_request value,
do_interrupt may have updated the EXITTB flag. */
@@ -454,7 +482,8 @@ int cpu_exec(CPUState *cpu)
next_tb = 0;
}
}
if (unlikely(cpu->exit_request)) {
if (unlikely(cpu->exit_request
|| replay_has_interrupt())) {
cpu->exit_request = 0;
cpu->exception_index = EXCP_INTERRUPT;
cpu_loop_exit(cpu);
@@ -477,7 +506,8 @@ int cpu_exec(CPUState *cpu)
/* see if we can patch the calling TB. When the TB
spans two pages, we cannot safely do a direct
jump. */
if (next_tb != 0 && tb->page_addr[1] == -1) {
if (next_tb != 0 && tb->page_addr[1] == -1
&& !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
tb_add_jump((TranslationBlock *)(next_tb & ~TB_EXIT_MASK),
next_tb & TB_EXIT_MASK, tb);
}
@@ -518,7 +548,7 @@ int cpu_exec(CPUState *cpu)
if (insns_left > 0) {
/* Execute remaining instructions. */
tb = (TranslationBlock *)(next_tb & ~TB_EXIT_MASK);
cpu_exec_nocache(cpu, insns_left, tb);
cpu_exec_nocache(cpu, insns_left, tb, false);
align_clocks(&sc, cpu);
}
cpu->exception_index = EXCP_INTERRUPT;
@@ -538,15 +568,27 @@ int cpu_exec(CPUState *cpu)
only be set by a memory fault) */
} /* for(;;) */
} else {
/* Reload env after longjmp - the compiler may have smashed all
* local variables as longjmp is marked 'noreturn'. */
#if defined(__clang__) || !QEMU_GNUC_PREREQ(4, 6)
/* Some compilers wrongly smash all local variables after
* siglongjmp. There were bug reports for gcc 4.5.0 and clang.
* Reload essential local variables here for those compilers.
* Newer versions of gcc would complain about this code (-Wclobbered). */
cpu = current_cpu;
cc = CPU_GET_CLASS(cpu);
cpu->can_do_io = 1;
#ifdef TARGET_I386
x86_cpu = X86_CPU(cpu);
env = &x86_cpu->env;
#endif
#else /* buggy compiler */
/* Assert that the compiler does not smash local variables. */
g_assert(cpu == current_cpu);
g_assert(cc == CPU_GET_CLASS(cpu));
#ifdef TARGET_I386
g_assert(x86_cpu == X86_CPU(cpu));
g_assert(env == &x86_cpu->env);
#endif
#endif /* buggy compiler */
cpu->can_do_io = 1;
tb_lock_reset();
}
} /* for(;;) */

75
cpus.c
View File

@@ -42,6 +42,7 @@
#include "qemu/seqlock.h"
#include "qapi-event.h"
#include "hw/nmi.h"
#include "sysemu/replay.h"
#ifndef _WIN32
#include "qemu/compatfd.h"
@@ -334,7 +335,7 @@ static int64_t qemu_icount_round(int64_t count)
return (count + (1 << icount_time_shift) - 1) >> icount_time_shift;
}
static void icount_warp_rt(void *opaque)
static void icount_warp_rt(void)
{
/* The icount_warp_timer is rescheduled soon after vm_clock_warp_start
* changes from -1 to another value, so the race here is okay.
@@ -345,7 +346,8 @@ static void icount_warp_rt(void *opaque)
seqlock_write_lock(&timers_state.vm_clock_seqlock);
if (runstate_is_running()) {
int64_t clock = cpu_get_clock_locked();
int64_t clock = REPLAY_CLOCK(REPLAY_CLOCK_VIRTUAL_RT,
cpu_get_clock_locked());
int64_t warp_delta;
warp_delta = clock - vm_clock_warp_start;
@@ -368,6 +370,11 @@ static void icount_warp_rt(void *opaque)
}
}
static void icount_dummy_timer(void *opaque)
{
(void)opaque;
}
void qtest_clock_warp(int64_t dest)
{
int64_t clock = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
@@ -403,6 +410,18 @@ void qemu_clock_warp(QEMUClockType type)
return;
}
/* Nothing to do if the VM is stopped: QEMU_CLOCK_VIRTUAL timers
* do not fire, so computing the deadline does not make sense.
*/
if (!runstate_is_running()) {
return;
}
/* warp clock deterministically in record/replay mode */
if (!replay_checkpoint(CHECKPOINT_CLOCK_WARP)) {
return;
}
if (icount_sleep) {
/*
* If the CPUs have been sleeping, advance QEMU_CLOCK_VIRTUAL timer now.
@@ -412,7 +431,7 @@ void qemu_clock_warp(QEMUClockType type)
* the CPU starts running, in case the CPU is woken by an event other
* than the earliest QEMU_CLOCK_VIRTUAL timer.
*/
icount_warp_rt(NULL);
icount_warp_rt();
timer_del(icount_warp_timer);
}
if (!all_cpu_threads_idle()) {
@@ -605,7 +624,7 @@ void configure_icount(QemuOpts *opts, Error **errp)
icount_sleep = qemu_opt_get_bool(opts, "sleep", true);
if (icount_sleep) {
icount_warp_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL_RT,
icount_warp_rt, NULL);
icount_dummy_timer, NULL);
}
icount_align_option = qemu_opt_get_bool(opts, "align", false);
@@ -694,15 +713,6 @@ void cpu_synchronize_all_post_init(void)
}
}
void cpu_clean_all_dirty(void)
{
CPUState *cpu;
CPU_FOREACH(cpu) {
cpu_clean_state(cpu);
}
}
static int do_vm_stop(RunState state)
{
int ret = 0;
@@ -1405,12 +1415,36 @@ int vm_stop_force_state(RunState state)
return vm_stop(state);
} else {
runstate_set(state);
bdrv_drain_all();
/* Make sure to return an error if the flush in a previous vm_stop()
* failed. */
return bdrv_flush_all();
}
}
static int64_t tcg_get_icount_limit(void)
{
int64_t deadline;
if (replay_mode != REPLAY_MODE_PLAY) {
deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
/* Maintain prior (possibly buggy) behaviour where if no deadline
* was set (as there is no QEMU_CLOCK_VIRTUAL timer) or it is more than
* INT32_MAX nanoseconds ahead, we still use INT32_MAX
* nanoseconds.
*/
if ((deadline < 0) || (deadline > INT32_MAX)) {
deadline = INT32_MAX;
}
return qemu_icount_round(deadline);
} else {
return replay_get_instructions();
}
}
static int tcg_cpu_exec(CPUState *cpu)
{
int ret;
@@ -1423,24 +1457,12 @@ static int tcg_cpu_exec(CPUState *cpu)
#endif
if (use_icount) {
int64_t count;
int64_t deadline;
int decr;
timers_state.qemu_icount -= (cpu->icount_decr.u16.low
+ cpu->icount_extra);
cpu->icount_decr.u16.low = 0;
cpu->icount_extra = 0;
deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
/* Maintain prior (possibly buggy) behaviour where if no deadline
* was set (as there is no QEMU_CLOCK_VIRTUAL timer) or it is more than
* INT32_MAX nanoseconds ahead, we still use INT32_MAX
* nanoseconds.
*/
if ((deadline < 0) || (deadline > INT32_MAX)) {
deadline = INT32_MAX;
}
count = qemu_icount_round(deadline);
count = tcg_get_icount_limit();
timers_state.qemu_icount += count;
decr = (count > 0xffff) ? 0xffff : count;
count -= decr;
@@ -1458,6 +1480,7 @@ static int tcg_cpu_exec(CPUState *cpu)
+ cpu->icount_extra);
cpu->icount_decr.u32 = 0;
cpu->icount_extra = 0;
replay_account_executed_instructions();
}
return ret;
}

View File

@@ -25,8 +25,7 @@ typedef struct QCryptoCipherBuiltinAES QCryptoCipherBuiltinAES;
struct QCryptoCipherBuiltinAES {
AES_KEY encrypt_key;
AES_KEY decrypt_key;
uint8_t *iv;
size_t niv;
uint8_t iv[AES_BLOCK_SIZE];
};
typedef struct QCryptoCipherBuiltinDESRFB QCryptoCipherBuiltinDESRFB;
struct QCryptoCipherBuiltinDESRFB {
@@ -40,6 +39,7 @@ struct QCryptoCipherBuiltin {
QCryptoCipherBuiltinAES aes;
QCryptoCipherBuiltinDESRFB desrfb;
} state;
size_t blocksize;
void (*free)(QCryptoCipher *cipher);
int (*setiv)(QCryptoCipher *cipher,
const uint8_t *iv, size_t niv,
@@ -61,7 +61,6 @@ static void qcrypto_cipher_free_aes(QCryptoCipher *cipher)
{
QCryptoCipherBuiltin *ctxt = cipher->opaque;
g_free(ctxt->state.aes.iv);
g_free(ctxt);
cipher->opaque = NULL;
}
@@ -145,15 +144,13 @@ static int qcrypto_cipher_setiv_aes(QCryptoCipher *cipher,
Error **errp)
{
QCryptoCipherBuiltin *ctxt = cipher->opaque;
if (niv != 16) {
error_setg(errp, "IV must be 16 bytes not %zu", niv);
if (niv != AES_BLOCK_SIZE) {
error_setg(errp, "IV must be %d bytes not %zu",
AES_BLOCK_SIZE, niv);
return -1;
}
g_free(ctxt->state.aes.iv);
ctxt->state.aes.iv = g_new0(uint8_t, niv);
memcpy(ctxt->state.aes.iv, iv, niv);
ctxt->state.aes.niv = niv;
memcpy(ctxt->state.aes.iv, iv, AES_BLOCK_SIZE);
return 0;
}
@@ -185,6 +182,7 @@ static int qcrypto_cipher_init_aes(QCryptoCipher *cipher,
goto error;
}
ctxt->blocksize = AES_BLOCK_SIZE;
ctxt->free = qcrypto_cipher_free_aes;
ctxt->setiv = qcrypto_cipher_setiv_aes;
ctxt->encrypt = qcrypto_cipher_encrypt_aes;
@@ -286,6 +284,7 @@ static int qcrypto_cipher_init_des_rfb(QCryptoCipher *cipher,
memcpy(ctxt->state.desrfb.key, key, nkey);
ctxt->state.desrfb.nkey = nkey;
ctxt->blocksize = 8;
ctxt->free = qcrypto_cipher_free_des_rfb;
ctxt->setiv = qcrypto_cipher_setiv_des_rfb;
ctxt->encrypt = qcrypto_cipher_encrypt_des_rfb;
@@ -374,6 +373,12 @@ int qcrypto_cipher_encrypt(QCryptoCipher *cipher,
{
QCryptoCipherBuiltin *ctxt = cipher->opaque;
if (len % ctxt->blocksize) {
error_setg(errp, "Length %zu must be a multiple of block size %zu",
len, ctxt->blocksize);
return -1;
}
return ctxt->encrypt(cipher, in, out, len, errp);
}
@@ -386,6 +391,12 @@ int qcrypto_cipher_decrypt(QCryptoCipher *cipher,
{
QCryptoCipherBuiltin *ctxt = cipher->opaque;
if (len % ctxt->blocksize) {
error_setg(errp, "Length %zu must be a multiple of block size %zu",
len, ctxt->blocksize);
return -1;
}
return ctxt->decrypt(cipher, in, out, len, errp);
}

View File

@@ -34,6 +34,11 @@ bool qcrypto_cipher_supports(QCryptoCipherAlgorithm alg)
}
}
typedef struct QCryptoCipherGcrypt QCryptoCipherGcrypt;
struct QCryptoCipherGcrypt {
gcry_cipher_hd_t handle;
size_t blocksize;
};
QCryptoCipher *qcrypto_cipher_new(QCryptoCipherAlgorithm alg,
QCryptoCipherMode mode,
@@ -41,7 +46,7 @@ QCryptoCipher *qcrypto_cipher_new(QCryptoCipherAlgorithm alg,
Error **errp)
{
QCryptoCipher *cipher;
gcry_cipher_hd_t handle;
QCryptoCipherGcrypt *ctx;
gcry_error_t err;
int gcryalg, gcrymode;
@@ -87,7 +92,9 @@ QCryptoCipher *qcrypto_cipher_new(QCryptoCipherAlgorithm alg,
cipher->alg = alg;
cipher->mode = mode;
err = gcry_cipher_open(&handle, gcryalg, gcrymode, 0);
ctx = g_new0(QCryptoCipherGcrypt, 1);
err = gcry_cipher_open(&ctx->handle, gcryalg, gcrymode, 0);
if (err != 0) {
error_setg(errp, "Cannot initialize cipher: %s",
gcry_strerror(err));
@@ -100,10 +107,12 @@ QCryptoCipher *qcrypto_cipher_new(QCryptoCipherAlgorithm alg,
* bizarre RFB variant of DES :-)
*/
uint8_t *rfbkey = qcrypto_cipher_munge_des_rfb_key(key, nkey);
err = gcry_cipher_setkey(handle, rfbkey, nkey);
err = gcry_cipher_setkey(ctx->handle, rfbkey, nkey);
g_free(rfbkey);
ctx->blocksize = 8;
} else {
err = gcry_cipher_setkey(handle, key, nkey);
err = gcry_cipher_setkey(ctx->handle, key, nkey);
ctx->blocksize = 16;
}
if (err != 0) {
error_setg(errp, "Cannot set key: %s",
@@ -111,11 +120,12 @@ QCryptoCipher *qcrypto_cipher_new(QCryptoCipherAlgorithm alg,
goto error;
}
cipher->opaque = handle;
cipher->opaque = ctx;
return cipher;
error:
gcry_cipher_close(handle);
gcry_cipher_close(ctx->handle);
g_free(ctx);
g_free(cipher);
return NULL;
}
@@ -123,12 +133,13 @@ QCryptoCipher *qcrypto_cipher_new(QCryptoCipherAlgorithm alg,
void qcrypto_cipher_free(QCryptoCipher *cipher)
{
gcry_cipher_hd_t handle;
QCryptoCipherGcrypt *ctx;
if (!cipher) {
return;
}
handle = cipher->opaque;
gcry_cipher_close(handle);
ctx = cipher->opaque;
gcry_cipher_close(ctx->handle);
g_free(ctx);
g_free(cipher);
}
@@ -139,10 +150,16 @@ int qcrypto_cipher_encrypt(QCryptoCipher *cipher,
size_t len,
Error **errp)
{
gcry_cipher_hd_t handle = cipher->opaque;
QCryptoCipherGcrypt *ctx = cipher->opaque;
gcry_error_t err;
err = gcry_cipher_encrypt(handle,
if (len % ctx->blocksize) {
error_setg(errp, "Length %zu must be a multiple of block size %zu",
len, ctx->blocksize);
return -1;
}
err = gcry_cipher_encrypt(ctx->handle,
out, len,
in, len);
if (err != 0) {
@@ -161,10 +178,16 @@ int qcrypto_cipher_decrypt(QCryptoCipher *cipher,
size_t len,
Error **errp)
{
gcry_cipher_hd_t handle = cipher->opaque;
QCryptoCipherGcrypt *ctx = cipher->opaque;
gcry_error_t err;
err = gcry_cipher_decrypt(handle,
if (len % ctx->blocksize) {
error_setg(errp, "Length %zu must be a multiple of block size %zu",
len, ctx->blocksize);
return -1;
}
err = gcry_cipher_decrypt(ctx->handle,
out, len,
in, len);
if (err != 0) {
@@ -180,11 +203,17 @@ int qcrypto_cipher_setiv(QCryptoCipher *cipher,
const uint8_t *iv, size_t niv,
Error **errp)
{
gcry_cipher_hd_t handle = cipher->opaque;
QCryptoCipherGcrypt *ctx = cipher->opaque;
gcry_error_t err;
gcry_cipher_reset(handle);
err = gcry_cipher_setiv(handle, iv, niv);
if (niv != ctx->blocksize) {
error_setg(errp, "Expected IV size %zu not %zu",
ctx->blocksize, niv);
return -1;
}
gcry_cipher_reset(ctx->handle);
err = gcry_cipher_setiv(ctx->handle, iv, niv);
if (err != 0) {
error_setg(errp, "Cannot set IV: %s",
gcry_strerror(err));

View File

@@ -69,7 +69,7 @@ struct QCryptoCipherNettle {
nettle_cipher_func *alg_encrypt;
nettle_cipher_func *alg_decrypt;
uint8_t *iv;
size_t niv;
size_t blocksize;
};
bool qcrypto_cipher_supports(QCryptoCipherAlgorithm alg)
@@ -125,7 +125,7 @@ QCryptoCipher *qcrypto_cipher_new(QCryptoCipherAlgorithm alg,
ctx->alg_encrypt = des_encrypt_wrapper;
ctx->alg_decrypt = des_decrypt_wrapper;
ctx->niv = DES_BLOCK_SIZE;
ctx->blocksize = DES_BLOCK_SIZE;
break;
case QCRYPTO_CIPHER_ALG_AES_128:
@@ -140,14 +140,14 @@ QCryptoCipher *qcrypto_cipher_new(QCryptoCipherAlgorithm alg,
ctx->alg_encrypt = aes_encrypt_wrapper;
ctx->alg_decrypt = aes_decrypt_wrapper;
ctx->niv = AES_BLOCK_SIZE;
ctx->blocksize = AES_BLOCK_SIZE;
break;
default:
error_setg(errp, "Unsupported cipher algorithm %d", alg);
goto error;
}
ctx->iv = g_new0(uint8_t, ctx->niv);
ctx->iv = g_new0(uint8_t, ctx->blocksize);
cipher->opaque = ctx;
return cipher;
@@ -184,6 +184,12 @@ int qcrypto_cipher_encrypt(QCryptoCipher *cipher,
{
QCryptoCipherNettle *ctx = cipher->opaque;
if (len % ctx->blocksize) {
error_setg(errp, "Length %zu must be a multiple of block size %zu",
len, ctx->blocksize);
return -1;
}
switch (cipher->mode) {
case QCRYPTO_CIPHER_MODE_ECB:
ctx->alg_encrypt(ctx->ctx_encrypt, len, out, in);
@@ -191,7 +197,7 @@ int qcrypto_cipher_encrypt(QCryptoCipher *cipher,
case QCRYPTO_CIPHER_MODE_CBC:
cbc_encrypt(ctx->ctx_encrypt, ctx->alg_encrypt,
ctx->niv, ctx->iv,
ctx->blocksize, ctx->iv,
len, out, in);
break;
default:
@@ -211,6 +217,12 @@ int qcrypto_cipher_decrypt(QCryptoCipher *cipher,
{
QCryptoCipherNettle *ctx = cipher->opaque;
if (len % ctx->blocksize) {
error_setg(errp, "Length %zu must be a multiple of block size %zu",
len, ctx->blocksize);
return -1;
}
switch (cipher->mode) {
case QCRYPTO_CIPHER_MODE_ECB:
ctx->alg_decrypt(ctx->ctx_decrypt ? ctx->ctx_decrypt : ctx->ctx_encrypt,
@@ -219,7 +231,7 @@ int qcrypto_cipher_decrypt(QCryptoCipher *cipher,
case QCRYPTO_CIPHER_MODE_CBC:
cbc_decrypt(ctx->ctx_decrypt ? ctx->ctx_decrypt : ctx->ctx_encrypt,
ctx->alg_decrypt, ctx->niv, ctx->iv,
ctx->alg_decrypt, ctx->blocksize, ctx->iv,
len, out, in);
break;
default:
@@ -235,9 +247,9 @@ int qcrypto_cipher_setiv(QCryptoCipher *cipher,
Error **errp)
{
QCryptoCipherNettle *ctx = cipher->opaque;
if (niv != ctx->niv) {
if (niv != ctx->blocksize) {
error_setg(errp, "Expected IV size %zu not %zu",
ctx->niv, niv);
ctx->blocksize, niv);
return -1;
}
memcpy(ctx->iv, iv, niv);

View File

@@ -47,7 +47,7 @@ qcrypto_cipher_validate_key_length(QCryptoCipherAlgorithm alg,
return true;
}
#if defined(CONFIG_GNUTLS_GCRYPT) || defined(CONFIG_GNUTLS_NETTLE)
#if defined(CONFIG_GCRYPT) || defined(CONFIG_NETTLE)
static uint8_t *
qcrypto_cipher_munge_des_rfb_key(const uint8_t *key,
size_t nkey)
@@ -63,11 +63,11 @@ qcrypto_cipher_munge_des_rfb_key(const uint8_t *key,
}
return ret;
}
#endif /* CONFIG_GNUTLS_GCRYPT || CONFIG_GNUTLS_NETTLE */
#endif /* CONFIG_GCRYPT || CONFIG_NETTLE */
#ifdef CONFIG_GNUTLS_GCRYPT
#ifdef CONFIG_GCRYPT
#include "crypto/cipher-gcrypt.c"
#elif defined CONFIG_GNUTLS_NETTLE
#elif defined CONFIG_NETTLE
#include "crypto/cipher-nettle.c"
#else
#include "crypto/cipher-builtin.c"

View File

@@ -24,8 +24,9 @@
#ifdef CONFIG_GNUTLS
#include <gnutls/gnutls.h>
#include <gnutls/crypto.h>
#endif
#ifdef CONFIG_GNUTLS_GCRYPT
#ifdef CONFIG_GCRYPT
#include <gcrypt.h>
#endif
@@ -37,6 +38,7 @@
* - When GNUTLS >= 2.12, we must not initialize gcrypt threading
* because GNUTLS will do that itself
* - When GNUTLS < 2.12 we must always initialize gcrypt threading
* - When GNUTLS is disabled we must always initialize gcrypt threading
*
* But....
*
@@ -47,12 +49,15 @@
*
* - gcrypt < 1.6.0
* AND
* - gnutls < 2.12
* - gnutls < 2.12
* OR
* - gnutls is disabled
*
*/
#if (defined(CONFIG_GNUTLS_GCRYPT) && \
(!defined(GNUTLS_VERSION_NUMBER) || \
#if (defined(CONFIG_GCRYPT) && \
(!defined(CONFIG_GNUTLS) || \
!defined(GNUTLS_VERSION_NUMBER) || \
(GNUTLS_VERSION_NUMBER < 0x020c00)) && \
(!defined(GCRYPT_VERSION_NUMBER) || \
(GCRYPT_VERSION_NUMBER < 0x010600)))
@@ -113,6 +118,7 @@ static struct gcry_thread_cbs qcrypto_gcrypt_thread_impl = {
int qcrypto_init(Error **errp)
{
#ifdef CONFIG_GNUTLS
int ret;
ret = gnutls_global_init();
if (ret < 0) {
@@ -125,8 +131,9 @@ int qcrypto_init(Error **errp)
gnutls_global_set_log_level(10);
gnutls_global_set_log_function(qcrypto_gnutls_log);
#endif
#endif
#ifdef CONFIG_GNUTLS_GCRYPT
#ifdef CONFIG_GCRYPT
if (!gcry_check_version(GCRYPT_VERSION)) {
error_setg(errp, "Unable to initialize gcrypt");
return -1;
@@ -139,12 +146,3 @@ int qcrypto_init(Error **errp)
return 0;
}
#else /* ! CONFIG_GNUTLS */
int qcrypto_init(Error **errp G_GNUC_UNUSED)
{
return 0;
}
#endif /* ! CONFIG_GNUTLS */

View File

@@ -123,10 +123,10 @@ qcrypto_tls_creds_get_path(QCryptoTLSCreds *creds,
goto cleanup;
}
trace_qcrypto_tls_creds_get_path(creds, filename,
*cred ? *cred : "<none>");
ret = 0;
cleanup:
trace_qcrypto_tls_creds_get_path(creds, filename,
*cred ? *cred : "<none>");
return ret;
}

View File

@@ -255,6 +255,7 @@ qcrypto_tls_creds_check_cert_key_purpose(QCryptoTLSCredsX509 *creds,
}
g_free(buffer);
buffer = NULL;
}
if (isServer) {
@@ -485,7 +486,8 @@ qcrypto_tls_creds_x509_sanity_check(QCryptoTLSCredsX509 *creds,
int ret = -1;
memset(cacerts, 0, sizeof(cacerts));
if (access(certFile, R_OK) == 0) {
if (certFile &&
access(certFile, R_OK) == 0) {
cert = qcrypto_tls_creds_load_cert(creds,
certFile, isServer,
errp);
@@ -654,6 +656,10 @@ qcrypto_tls_creds_x509_unload(QCryptoTLSCredsX509 *creds)
gnutls_certificate_free_credentials(creds->data);
creds->data = NULL;
}
if (creds->parent_obj.dh_params) {
gnutls_dh_params_deinit(creds->parent_obj.dh_params);
creds->parent_obj.dh_params = NULL;
}
}

View File

@@ -304,9 +304,9 @@ qcrypto_tls_session_check_certificate(QCryptoTLSSession *session,
allow = qemu_acl_party_is_allowed(acl, session->peername);
error_setg(errp, "TLS x509 ACL check for %s is %s",
session->peername, allow ? "allowed" : "denied");
if (!allow) {
error_setg(errp, "TLS x509 ACL check for %s is denied",
session->peername);
goto error;
}
}

View File

@@ -1,3 +1 @@
# Default configuration for aarch64-linux-user
CONFIG_GDBSTUB_XML=y

View File

@@ -35,5 +35,5 @@ CONFIG_SDHCI=y
CONFIG_EDU=y
CONFIG_VGA=y
CONFIG_VGA_PCI=y
CONFIG_IVSHMEM=$(CONFIG_KVM)
CONFIG_IVSHMEM=$(CONFIG_POSIX)
CONFIG_ROCKER=y

View File

@@ -3,6 +3,7 @@
include pci.mak
include sound.mak
include usb.mak
CONFIG_VIRTIO_VGA=y
CONFIG_ISA_MMIO=y
CONFIG_ESCC=y
CONFIG_M48T59=y

55
disas.c
View File

@@ -214,11 +214,6 @@ void target_disas(FILE *out, CPUState *cpu, target_ulong code,
s.info.mach = bfd_mach_i386_i386;
}
s.info.print_insn = print_insn_i386;
#elif defined(TARGET_SPARC)
s.info.print_insn = print_insn_sparc;
#ifdef TARGET_SPARC64
s.info.mach = bfd_mach_sparc_v9b;
#endif
#elif defined(TARGET_PPC)
if ((flags >> 16) & 1) {
s.info.endian = BFD_ENDIAN_LITTLE;
@@ -235,29 +230,6 @@ void target_disas(FILE *out, CPUState *cpu, target_ulong code,
}
s.info.disassembler_options = (char *)"any";
s.info.print_insn = print_insn_ppc;
#elif defined(TARGET_M68K)
s.info.print_insn = print_insn_m68k;
#elif defined(TARGET_MIPS)
#ifdef TARGET_WORDS_BIGENDIAN
s.info.print_insn = print_insn_big_mips;
#else
s.info.print_insn = print_insn_little_mips;
#endif
#elif defined(TARGET_SH4)
s.info.mach = bfd_mach_sh4;
s.info.print_insn = print_insn_sh;
#elif defined(TARGET_ALPHA)
s.info.mach = bfd_mach_alpha_ev6;
s.info.print_insn = print_insn_alpha;
#elif defined(TARGET_S390X)
s.info.mach = bfd_mach_s390_64;
s.info.print_insn = print_insn_s390;
#elif defined(TARGET_MOXIE)
s.info.mach = bfd_arch_moxie;
s.info.print_insn = print_insn_moxie;
#elif defined(TARGET_LM32)
s.info.mach = bfd_mach_lm32;
s.info.print_insn = print_insn_lm32;
#endif
if (s.info.print_insn == NULL) {
s.info.print_insn = print_insn_od_target;
@@ -429,13 +401,6 @@ void monitor_disas(Monitor *mon, CPUState *cpu,
s.info.mach = bfd_mach_i386_i386;
}
s.info.print_insn = print_insn_i386;
#elif defined(TARGET_ALPHA)
s.info.print_insn = print_insn_alpha;
#elif defined(TARGET_SPARC)
s.info.print_insn = print_insn_sparc;
#ifdef TARGET_SPARC64
s.info.mach = bfd_mach_sparc_v9b;
#endif
#elif defined(TARGET_PPC)
if (flags & 0xFFFF) {
/* If we have a precise definition of the instruction set, use it. */
@@ -451,26 +416,6 @@ void monitor_disas(Monitor *mon, CPUState *cpu,
s.info.endian = BFD_ENDIAN_LITTLE;
}
s.info.print_insn = print_insn_ppc;
#elif defined(TARGET_M68K)
s.info.print_insn = print_insn_m68k;
#elif defined(TARGET_MIPS)
#ifdef TARGET_WORDS_BIGENDIAN
s.info.print_insn = print_insn_big_mips;
#else
s.info.print_insn = print_insn_little_mips;
#endif
#elif defined(TARGET_SH4)
s.info.mach = bfd_mach_sh4;
s.info.print_insn = print_insn_sh;
#elif defined(TARGET_S390X)
s.info.mach = bfd_mach_s390_64;
s.info.print_insn = print_insn_s390;
#elif defined(TARGET_MOXIE)
s.info.mach = bfd_arch_moxie;
s.info.print_insn = print_insn_moxie;
#elif defined(TARGET_LM32)
s.info.mach = bfd_mach_lm32;
s.info.print_insn = print_insn_lm32;
#endif
if (!s.info.print_insn) {
monitor_printf(mon, "0x" TARGET_FMT_lx

View File

@@ -1779,7 +1779,7 @@ print_insn_coprocessor (bfd_vma pc, struct disassemble_info *info, long given,
/* Is ``imm'' a negative number? */
if (imm & 0x40)
imm |= (-1 << 7);
imm |= (~0u << 7);
func (stream, "%d", imm);
}

View File

@@ -2420,9 +2420,11 @@ const struct mips_opcode mips_builtin_opcodes[] =
{"hibernate","", 0x42000023, 0xffffffff, 0, 0, V1 },
{"ins", "t,r,+A,+B", 0x7c000004, 0xfc00003f, WR_t|RD_s, 0, I33 },
{"jr", "s", 0x00000008, 0xfc1fffff, UBD|RD_s, 0, I1 },
{"jr", "s", 0x00000009, 0xfc1fffff, UBD|RD_s, 0, I32R6 }, /* jalr */
/* jr.hb is officially MIPS{32,64}R2, but it works on R1 as jr with
the same hazard barrier effect. */
{"jr.hb", "s", 0x00000408, 0xfc1fffff, UBD|RD_s, 0, I32 },
{"jr.hb", "s", 0x00000409, 0xfc1fffff, UBD|RD_s, 0, I32R6 }, /* jalr.hb */
{"j", "s", 0x00000008, 0xfc1fffff, UBD|RD_s, 0, I1 }, /* jr */
/* SVR4 PIC code requires special handling for j, so it must be a
macro. */

View File

@@ -19,12 +19,20 @@ which is included at the end of this document.
* A dirty bitmap's name is unique to the node, but bitmaps attached to different
nodes can share the same name.
* Dirty bitmaps created for internal use by QEMU may be anonymous and have no
name, but any user-created bitmaps may not be. There can be any number of
anonymous bitmaps per node.
* The name of a user-created bitmap must not be empty ("").
## Bitmap Modes
* A Bitmap can be "frozen," which means that it is currently in-use by a backup
operation and cannot be deleted, renamed, written to, reset,
etc.
* The normal operating mode for a bitmap is "active."
## Basic QMP Usage
### Supported Commands ###
@@ -97,11 +105,7 @@ which is included at the end of this document.
}
```
## Transactions (Not yet implemented)
* Transactional commands are forthcoming in a future version,
and are not yet available for use. This section serves as
documentation of intent for their design and usage.
## Transactions
### Justification
@@ -323,6 +327,155 @@ full backup as a backing image.
"event": "BLOCK_JOB_COMPLETED" }
```
### Partial Transactional Failures
* Sometimes, a transaction will succeed in launching and return success,
but then later the backup jobs themselves may fail. It is possible that
a management application may have to deal with a partial backup failure
after a successful transaction.
* If multiple backup jobs are specified in a single transaction, when one of
them fails, it will not interact with the other backup jobs in any way.
* The job(s) that succeeded will clear the dirty bitmap associated with the
operation, but the job(s) that failed will not. It is not "safe" to delete
any incremental backups that were created successfully in this scenario,
even though others failed.
#### Example
* QMP example highlighting two backup jobs:
```json
{ "execute": "transaction",
"arguments": {
"actions": [
{ "type": "drive-backup",
"data": { "device": "drive0", "bitmap": "bitmap0",
"format": "qcow2", "mode": "existing",
"sync": "incremental", "target": "d0-incr-1.qcow2" } },
{ "type": "drive-backup",
"data": { "device": "drive1", "bitmap": "bitmap1",
"format": "qcow2", "mode": "existing",
"sync": "incremental", "target": "d1-incr-1.qcow2" } },
]
}
}
```
* QMP example response, highlighting one success and one failure:
* Acknowledgement that the Transaction was accepted and jobs were launched:
```json
{ "return": {} }
```
* Later, QEMU sends notice that the first job was completed:
```json
{ "timestamp": { "seconds": 1447192343, "microseconds": 615698 },
"data": { "device": "drive0", "type": "backup",
"speed": 0, "len": 67108864, "offset": 67108864 },
"event": "BLOCK_JOB_COMPLETED"
}
```
* Later yet, QEMU sends notice that the second job has failed:
```json
{ "timestamp": { "seconds": 1447192399, "microseconds": 683015 },
"data": { "device": "drive1", "action": "report",
"operation": "read" },
"event": "BLOCK_JOB_ERROR" }
```
```json
{ "timestamp": { "seconds": 1447192399, "microseconds": 685853 },
"data": { "speed": 0, "offset": 0, "len": 67108864,
"error": "Input/output error",
"device": "drive1", "type": "backup" },
"event": "BLOCK_JOB_COMPLETED" }
* In the above example, "d0-incr-1.qcow2" is valid and must be kept,
but "d1-incr-1.qcow2" is invalid and should be deleted. If a VM-wide
incremental backup of all drives at a point-in-time is to be made,
new backups for both drives will need to be made, taking into account
that a new incremental backup for drive0 needs to be based on top of
"d0-incr-1.qcow2."
### Grouped Completion Mode
* While jobs launched by transactions normally complete or fail on their own,
it is possible to instruct them to complete or fail together as a group.
* QMP transactions take an optional properties structure that can affect
the semantics of the transaction.
* The "completion-mode" transaction property can be either "individual"
which is the default, legacy behavior described above, or "grouped,"
a new behavior detailed below.
* Delayed Completion: In grouped completion mode, no jobs will report
success until all jobs are ready to report success.
* Grouped failure: If any job fails in grouped completion mode, all remaining
jobs will be cancelled. Any incremental backups will restore their dirty
bitmap objects as if no backup command was ever issued.
* Regardless of if QEMU reports a particular incremental backup job as
CANCELLED or as an ERROR, the in-memory bitmap will be restored.
#### Example
* Here's the same example scenario from above with the new property:
```json
{ "execute": "transaction",
"arguments": {
"actions": [
{ "type": "drive-backup",
"data": { "device": "drive0", "bitmap": "bitmap0",
"format": "qcow2", "mode": "existing",
"sync": "incremental", "target": "d0-incr-1.qcow2" } },
{ "type": "drive-backup",
"data": { "device": "drive1", "bitmap": "bitmap1",
"format": "qcow2", "mode": "existing",
"sync": "incremental", "target": "d1-incr-1.qcow2" } },
],
"properties": {
"completion-mode": "grouped"
}
}
}
```
* QMP example response, highlighting a failure for drive2:
* Acknowledgement that the Transaction was accepted and jobs were launched:
```json
{ "return": {} }
```
* Later, QEMU sends notice that the second job has errored out,
but that the first job was also cancelled:
```json
{ "timestamp": { "seconds": 1447193702, "microseconds": 632377 },
"data": { "device": "drive1", "action": "report",
"operation": "read" },
"event": "BLOCK_JOB_ERROR" }
```
```json
{ "timestamp": { "seconds": 1447193702, "microseconds": 640074 },
"data": { "speed": 0, "offset": 0, "len": 67108864,
"error": "Input/output error",
"device": "drive1", "type": "backup" },
"event": "BLOCK_JOB_COMPLETED" }
```
```json
{ "timestamp": { "seconds": 1447193702, "microseconds": 640163 },
"data": { "device": "drive0", "type": "backup", "speed": 0,
"len": 67108864, "offset": 16777216 },
"event": "BLOCK_JOB_CANCELLED" }
```
<!--
The FreeBSD Documentation License

View File

@@ -291,3 +291,194 @@ save/send this state when we are in the middle of a pio operation
(that is what ide_drive_pio_state_needed() checks). If DRQ_STAT is
not enabled, the values on that fields are garbage and don't need to
be sent.
= Return path =
In most migration scenarios there is only a single data path that runs
from the source VM to the destination, typically along a single fd (although
possibly with another fd or similar for some fast way of throwing pages across).
However, some uses need two way communication; in particular the Postcopy
destination needs to be able to request pages on demand from the source.
For these scenarios there is a 'return path' from the destination to the source;
qemu_file_get_return_path(QEMUFile* fwdpath) gives the QEMUFile* for the return
path.
Source side
Forward path - written by migration thread
Return path - opened by main thread, read by return-path thread
Destination side
Forward path - read by main thread
Return path - opened by main thread, written by main thread AND postcopy
thread (protected by rp_mutex)
= Postcopy =
'Postcopy' migration is a way to deal with migrations that refuse to converge
(or take too long to converge) its plus side is that there is an upper bound on
the amount of migration traffic and time it takes, the down side is that during
the postcopy phase, a failure of *either* side or the network connection causes
the guest to be lost.
In postcopy the destination CPUs are started before all the memory has been
transferred, and accesses to pages that are yet to be transferred cause
a fault that's translated by QEMU into a request to the source QEMU.
Postcopy can be combined with precopy (i.e. normal migration) so that if precopy
doesn't finish in a given time the switch is made to postcopy.
=== Enabling postcopy ===
To enable postcopy, issue this command on the monitor prior to the
start of migration:
migrate_set_capability x-postcopy-ram on
The normal commands are then used to start a migration, which is still
started in precopy mode. Issuing:
migrate_start_postcopy
will now cause the transition from precopy to postcopy.
It can be issued immediately after migration is started or any
time later on. Issuing it after the end of a migration is harmless.
Note: During the postcopy phase, the bandwidth limits set using
migrate_set_speed is ignored (to avoid delaying requested pages that
the destination is waiting for).
=== Postcopy device transfer ===
Loading of device data may cause the device emulation to access guest RAM
that may trigger faults that have to be resolved by the source, as such
the migration stream has to be able to respond with page data *during* the
device load, and hence the device data has to be read from the stream completely
before the device load begins to free the stream up. This is achieved by
'packaging' the device data into a blob that's read in one go.
Source behaviour
Until postcopy is entered the migration stream is identical to normal
precopy, except for the addition of a 'postcopy advise' command at
the beginning, to tell the destination that postcopy might happen.
When postcopy starts the source sends the page discard data and then
forms the 'package' containing:
Command: 'postcopy listen'
The device state
A series of sections, identical to the precopy streams device state stream
containing everything except postcopiable devices (i.e. RAM)
Command: 'postcopy run'
The 'package' is sent as the data part of a Command: 'CMD_PACKAGED', and the
contents are formatted in the same way as the main migration stream.
During postcopy the source scans the list of dirty pages and sends them
to the destination without being requested (in much the same way as precopy),
however when a page request is received from the destination, the dirty page
scanning restarts from the requested location. This causes requested pages
to be sent quickly, and also causes pages directly after the requested page
to be sent quickly in the hope that those pages are likely to be used
by the destination soon.
Destination behaviour
Initially the destination looks the same as precopy, with a single thread
reading the migration stream; the 'postcopy advise' and 'discard' commands
are processed to change the way RAM is managed, but don't affect the stream
processing.
------------------------------------------------------------------------------
1 2 3 4 5 6 7
main -----DISCARD-CMD_PACKAGED ( LISTEN DEVICE DEVICE DEVICE RUN )
thread | |
| (page request)
| \___
v \
listen thread: --- page -- page -- page -- page -- page --
a b c
------------------------------------------------------------------------------
On receipt of CMD_PACKAGED (1)
All the data associated with the package - the ( ... ) section in the
diagram - is read into memory (into a QEMUSizedBuffer), and the main thread
recurses into qemu_loadvm_state_main to process the contents of the package (2)
which contains commands (3,6) and devices (4...)
On receipt of 'postcopy listen' - 3 -(i.e. the 1st command in the package)
a new thread (a) is started that takes over servicing the migration stream,
while the main thread carries on loading the package. It loads normal
background page data (b) but if during a device load a fault happens (5) the
returned page (c) is loaded by the listen thread allowing the main threads
device load to carry on.
The last thing in the CMD_PACKAGED is a 'RUN' command (6) letting the destination
CPUs start running.
At the end of the CMD_PACKAGED (7) the main thread returns to normal running behaviour
and is no longer used by migration, while the listen thread carries
on servicing page data until the end of migration.
=== Postcopy states ===
Postcopy moves through a series of states (see postcopy_state) from
ADVISE->DISCARD->LISTEN->RUNNING->END
Advise: Set at the start of migration if postcopy is enabled, even
if it hasn't had the start command; here the destination
checks that its OS has the support needed for postcopy, and performs
setup to ensure the RAM mappings are suitable for later postcopy.
The destination will fail early in migration at this point if the
required OS support is not present.
(Triggered by reception of POSTCOPY_ADVISE command)
Discard: Entered on receipt of the first 'discard' command; prior to
the first Discard being performed, hugepages are switched off
(using madvise) to ensure that no new huge pages are created
during the postcopy phase, and to cause any huge pages that
have discards on them to be broken.
Listen: The first command in the package, POSTCOPY_LISTEN, switches
the destination state to Listen, and starts a new thread
(the 'listen thread') which takes over the job of receiving
pages off the migration stream, while the main thread carries
on processing the blob. With this thread able to process page
reception, the destination now 'sensitises' the RAM to detect
any access to missing pages (on Linux using the 'userfault'
system).
Running: POSTCOPY_RUN causes the destination to synchronise all
state and start the CPUs and IO devices running. The main
thread now finishes processing the migration package and
now carries on as it would for normal precopy migration
(although it can't do the cleanup it would do as it
finishes a normal migration).
End: The listen thread can now quit, and perform the cleanup of migration
state, the migration is now complete.
=== Source side page maps ===
The source side keeps two bitmaps during postcopy; 'the migration bitmap'
and 'unsent map'. The 'migration bitmap' is basically the same as in
the precopy case, and holds a bit to indicate that page is 'dirty' -
i.e. needs sending. During the precopy phase this is updated as the CPU
dirties pages, however during postcopy the CPUs are stopped and nothing
should dirty anything any more.
The 'unsent map' is used for the transition to postcopy. It is a bitmap that
has a bit cleared whenever a page is sent to the destination, however during
the transition to postcopy mode it is combined with the migration bitmap
to form a set of pages that:
a) Have been sent but then redirtied (which must be discarded)
b) Have not yet been sent - which also must be discarded to cause any
transparent huge pages built during precopy to be broken.
Note that the contents of the unsentmap are sacrificed during the calculation
of the discard set and thus aren't valid once in postcopy. The dirtymap
is still valid and is used to ensure that no page is sent more than once. Any
request for a page that has already been sent is ignored. Duplicate requests
such as this can happen as a page is sent at about the same time the
destination accesses it.

View File

@@ -106,12 +106,15 @@ Types, commands, and events share a common namespace. Therefore,
generally speaking, type definitions should always use CamelCase for
user-defined type names, while built-in types are lowercase. Type
definitions should not end in 'Kind', as this namespace is used for
creating implicit C enums for visiting union types. Command names,
creating implicit C enums for visiting union types, or in 'List', as
this namespace is used for creating array types. Command names,
and field names within a type, should be all lower case with words
separated by a hyphen. However, some existing older commands and
complex types use underscore; when extending such expressions,
consistency is preferred over blindly avoiding underscore. Event
names should be ALL_CAPS with words separated by underscore.
names should be ALL_CAPS with words separated by underscore. Field
names cannot start with 'has-' or 'has_', as this is reserved for
tracking optional fields.
Any name (command, event, type, field, or enum value) beginning with
"x-" is marked experimental, and may be withdrawn or changed
@@ -122,9 +125,10 @@ vendor), even if the rest of the name uses dash (example:
__com.redhat_drive-mirror). Other than downstream extensions (with
leading underscore and the use of dots), all names should begin with a
letter, and contain only ASCII letters, digits, dash, and underscore.
It is okay to reuse names that match C keywords; the generator will
rename a field named "default" in the QAPI to "q_default" in the
generated C code.
Names beginning with 'q_' are reserved for the generator: QMP names
that resemble C keywords or other problematic strings will be munged
in C to use this prefix. For example, a field named "default" in
qapi becomes "q_default" in the generated C code.
In the rest of this document, usage lines are given for each
expression type, with literal strings written in lower case and
@@ -510,8 +514,23 @@ exactly the server (QEMU) supports.
For this purpose, QMP provides introspection via command
query-qmp-schema. QGA currently doesn't support introspection.
While Client JSON Protocol wire compatibility should be maintained
between qemu versions, we cannot make the same guarantees for
introspection stability. For example, one version of qemu may provide
a non-variant optional member of a struct, and a later version rework
the member to instead be non-optional and associated with a variant.
Likewise, one version of qemu may list a member with open-ended type
'str', and a later version could convert it to a finite set of strings
via an enum type; or a member may be converted from a specific type to
an alternate that represents a choice between the original type and
something else.
query-qmp-schema returns a JSON array of SchemaInfo objects. These
objects together describe the wire ABI, as defined in the QAPI schema.
There is no specified order to the SchemaInfo objects returned; a
client must search for a particular name throughout the entire array
to learn more about that name, but is at least guaranteed that there
will be no collisions between type, command, and event names.
However, the SchemaInfo can't reflect all the rules and restrictions
that apply to QMP. It's interface introspection (figuring out what's
@@ -592,7 +611,9 @@ any. Each element is a JSON object with members "name" (the member's
name), "type" (the name of its type), and optionally "default". The
member is optional if "default" is present. Currently, "default" can
only have value null. Other values are reserved for future
extensions.
extensions. The "members" array is in no particular order; clients
must search the entire object when learning whether a particular
member is supported.
Example: the SchemaInfo for MyType from section Struct types
@@ -606,7 +627,9 @@ Example: the SchemaInfo for MyType from section Struct types
"variants" is a JSON array describing the object's variant members.
Each element is a JSON object with members "case" (the value of type
tag this element applies to) and "type" (the name of an object type
that provides the variant members for this type tag value).
that provides the variant members for this type tag value). The
"variants" array is in no particular order, and is not guaranteed to
list cases in the same order as the corresponding "tag" enum type.
Example: the SchemaInfo for flat union BlockdevOptions from section
Union types
@@ -647,7 +670,8 @@ Union types
The SchemaInfo for an alternate type has meta-type "alternate", and
variant member "members". "members" is a JSON array. Each element is
a JSON object with member "type", which names a type. Values of the
alternate type conform to exactly one of its member types.
alternate type conform to exactly one of its member types. There is
no guarantee on the order in which "members" will be listed.
Example: the SchemaInfo for BlockRef from section Alternate types
@@ -658,15 +682,20 @@ Example: the SchemaInfo for BlockRef from section Alternate types
The SchemaInfo for an array type has meta-type "array", and variant
member "element-type", which names the array's element type. Array
types are implicitly defined.
types are implicitly defined. For convenience, the array's name may
resemble the element type; however, clients should examine member
"element-type" instead of making assumptions based on parsing member
"name".
Example: the SchemaInfo for ['str']
{ "name": "strList", "meta-type": "array",
{ "name": "[str]", "meta-type": "array",
"element-type": "str" }
The SchemaInfo for an enumeration type has meta-type "enum" and
variant member "values".
variant member "values". The values are listed in no particular
order; clients must search the entire enum when learning whether a
particular value is supported.
Example: the SchemaInfo for MyEnum from section Enumeration types

View File

@@ -28,6 +28,8 @@ Example:
"data": { "actual": 944766976 },
"timestamp": { "seconds": 1267020223, "microseconds": 435656 } }
Note: this event is rate-limited.
BLOCK_IMAGE_CORRUPTED
---------------------
@@ -296,6 +298,8 @@ Example:
"data": { "reference": "usr1", "sector-num": 345435, "sectors-count": 5 },
"timestamp": { "seconds": 1344522075, "microseconds": 745528 } }
Note: this event is rate-limited.
QUORUM_REPORT_BAD
-----------------
@@ -318,6 +322,8 @@ Example:
"data": { "node-name": "1.raw", "sector-num": 345435, "sectors-count": 5 },
"timestamp": { "seconds": 1344522075, "microseconds": 745528 } }
Note: this event is rate-limited.
RESET
-----
@@ -358,6 +364,8 @@ Example:
"data": { "offset": 78 },
"timestamp": { "seconds": 1267020223, "microseconds": 435656 } }
Note: this event is rate-limited.
SHUTDOWN
--------
@@ -632,6 +640,8 @@ Example:
"data": { "id": "channel0", "open": true },
"timestamp": { "seconds": 1401385907, "microseconds": 422329 } }
Note: this event is rate-limited separately for each "id".
WAKEUP
------
@@ -662,3 +672,5 @@ Example:
Note: If action is "reset", "shutdown", or "pause" the WATCHDOG event is
followed respectively by the RESET, SHUTDOWN, or STOP events.
Note: this event is rate-limited.

View File

@@ -175,6 +175,11 @@ The format of asynchronous events is:
For a listing of supported asynchronous events, please, refer to the
qmp-events.txt file.
Some events are rate-limited to at most one per second. If additional
"similar" events arrive within one second, all but the last one are
dropped, and the last one is delayed. "Similar" normally means same
event type. See qmp-events.txt for details.
2.5 QGA Synchronization
-----------------------

168
docs/replay.txt Normal file
View File

@@ -0,0 +1,168 @@
Copyright (c) 2010-2015 Institute for System Programming
of the Russian Academy of Sciences.
This work is licensed under the terms of the GNU GPL, version 2 or later.
See the COPYING file in the top-level directory.
Record/replay
-------------
Record/replay functions are used for the reverse execution and deterministic
replay of qemu execution. This implementation of deterministic replay can
be used for deterministic debugging of guest code through a gdb remote
interface.
Execution recording writes a non-deterministic events log, which can be later
used for replaying the execution anywhere and for unlimited number of times.
It also supports checkpointing for faster rewinding during reverse debugging.
Execution replaying reads the log and replays all non-deterministic events
including external input, hardware clocks, and interrupts.
Deterministic replay has the following features:
* Deterministically replays whole system execution and all contents of
the memory, state of the hardware devices, clocks, and screen of the VM.
* Writes execution log into the file for later replaying for multiple times
on different machines.
* Supports i386, x86_64, and ARM hardware platforms.
* Performs deterministic replay of all operations with keyboard and mouse
input devices.
Usage of the record/replay:
* First, record the execution, by adding the following arguments to the command line:
'-icount shift=7,rr=record,rrfile=replay.bin -net none'.
Block devices' images are not actually changed in the recording mode,
because all of the changes are written to the temporary overlay file.
* Then you can replay it by using another command
line option: '-icount shift=7,rr=replay,rrfile=replay.bin -net none'
* '-net none' option should also be specified if network replay patches
are not applied.
Papers with description of deterministic replay implementation:
http://www.computer.org/csdl/proceedings/csmr/2012/4666/00/4666a553-abs.html
http://dl.acm.org/citation.cfm?id=2786805.2803179
Modifications of qemu include:
* wrappers for clock and time functions to save their return values in the log
* saving different asynchronous events (e.g. system shutdown) into the log
* synchronization of the bottom halves execution
* synchronization of the threads from thread pool
* recording/replaying user input (mouse and keyboard)
* adding internal checkpoints for cpu and io synchronization
Non-deterministic events
------------------------
Our record/replay system is based on saving and replaying non-deterministic
events (e.g. keyboard input) and simulating deterministic ones (e.g. reading
from HDD or memory of the VM). Saving only non-deterministic events makes
log file smaller, simulation faster, and allows using reverse debugging even
for realtime applications.
The following non-deterministic data from peripheral devices is saved into
the log: mouse and keyboard input, network packets, audio controller input,
USB packets, serial port input, and hardware clocks (they are non-deterministic
too, because their values are taken from the host machine). Inputs from
simulated hardware, memory of VM, software interrupts, and execution of
instructions are not saved into the log, because they are deterministic and
can be replayed by simulating the behavior of virtual machine starting from
initial state.
We had to solve three tasks to implement deterministic replay: recording
non-deterministic events, replaying non-deterministic events, and checking
that there is no divergence between record and replay modes.
We changed several parts of QEMU to make event log recording and replaying.
Devices' models that have non-deterministic input from external devices were
changed to write every external event into the execution log immediately.
E.g. network packets are written into the log when they arrive into the virtual
network adapter.
All non-deterministic events are coming from these devices. But to
replay them we need to know at which moments they occur. We specify
these moments by counting the number of instructions executed between
every pair of consecutive events.
Instruction counting
--------------------
QEMU should work in icount mode to use record/replay feature. icount was
designed to allow deterministic execution in absence of external inputs
of the virtual machine. We also use icount to control the occurrence of the
non-deterministic events. The number of instructions elapsed from the last event
is written to the log while recording the execution. In replay mode we
can predict when to inject that event using the instruction counter.
Timers
------
Timers are used to execute callbacks from different subsystems of QEMU
at the specified moments of time. There are several kinds of timers:
* Real time clock. Based on host time and used only for callbacks that
do not change the virtual machine state. For this reason real time
clock and timers does not affect deterministic replay at all.
* Virtual clock. These timers run only during the emulation. In icount
mode virtual clock value is calculated using executed instructions counter.
That is why it is completely deterministic and does not have to be recorded.
* Host clock. This clock is used by device models that simulate real time
sources (e.g. real time clock chip). Host clock is the one of the sources
of non-determinism. Host clock read operations should be logged to
make the execution deterministic.
* Real time clock for icount. This clock is similar to real time clock but
it is used only for increasing virtual clock while virtual machine is
sleeping. Due to its nature it is also non-deterministic as the host clock
and has to be logged too.
Checkpoints
-----------
Replaying of the execution of virtual machine is bound by sources of
non-determinism. These are inputs from clock and peripheral devices,
and QEMU thread scheduling. Thread scheduling affect on processing events
from timers, asynchronous input-output, and bottom halves.
Invocations of timers are coupled with clock reads and changing the state
of the virtual machine. Reads produce non-deterministic data taken from
host clock. And VM state changes should preserve their order. Their relative
order in replay mode must replicate the order of callbacks in record mode.
To preserve this order we use checkpoints. When a specific clock is processed
in record mode we save to the log special "checkpoint" event.
Checkpoints here do not refer to virtual machine snapshots. They are just
record/replay events used for synchronization.
QEMU in replay mode will try to invoke timers processing in random moment
of time. That's why we do not process a group of timers until the checkpoint
event will be read from the log. Such an event allows synchronizing CPU
execution and timer events.
Another checkpoints application in record/replay is instruction counting
while the virtual machine is idle. This function (qemu_clock_warp) is called
from the wait loop. It changes virtual machine state and must be deterministic
then. That is why we added checkpoint to this function to prevent its
operation in replay mode when it does not correspond to record mode.
Bottom halves
-------------
Disk I/O events are completely deterministic in our model, because
in both record and replay modes we start virtual machine from the same
disk state. But callbacks that virtual disk controller uses for reading and
writing the disk may occur at different moments of time in record and replay
modes.
Reading and writing requests are created by CPU thread of QEMU. Later these
requests proceed to block layer which creates "bottom halves". Bottom
halves consist of callback and its parameters. They are processed when
main loop locks the global mutex. These locks are not synchronized with
replaying process because main loop also processes the events that do not
affect the virtual machine state (like user interaction with monitor).
That is why we had to implement saving and replaying bottom halves callbacks
synchronously to the CPU execution. When the callback is about to execute
it is added to the queue in the replay module. This queue is written to the
log when its callbacks are executed. In replay mode callbacks are not processed
until the corresponding event is read from the events log file.
Sometimes the block layer uses asynchronous callbacks for its internal purposes
(like reading or writing VM snapshots or disk image cluster tables). In this
case bottom halves are not marked as "replayable" and do not saved
into the log.

View File

@@ -76,6 +76,13 @@ increasing address order, similar to memcpy().
Selector Register IOport: 0x510
Data Register IOport: 0x511
DMA Address IOport: 0x514
=== ARM Register Locations ===
Selector Register address: Base + 8 (2 bytes)
Data Register address: Base + 0 (8 bytes)
DMA Address address: Base + 16 (8 bytes)
== Firmware Configuration Items ==
@@ -86,11 +93,15 @@ by selecting the "signature" item using key 0x0000 (FW_CFG_SIGNATURE),
and reading four bytes from the data register. If the fw_cfg device is
present, the four bytes read will contain the characters "QEMU".
=== Revision (Key 0x0001, FW_CFG_ID) ===
If the DMA interface is available, then reading the DMA Address
Register returns 0x51454d5520434647 ("QEMU CFG" in big-endian format).
A 32-bit little-endian unsigned int, this item is used as an interface
revision number, and is currently set to 1 by QEMU when fw_cfg is
initialized.
=== Revision / feature bitmap (Key 0x0001, FW_CFG_ID) ===
A 32-bit little-endian unsigned int, this item is used to check for enabled
features.
- Bit 0: traditional interface. Always set.
- Bit 1: DMA interface.
=== File Directory (Key 0x0019, FW_CFG_FILE_DIR) ===
@@ -132,6 +143,55 @@ Selector Reg. Range Usage
In practice, the number of allowed firmware configuration items is given
by the value of FW_CFG_MAX_ENTRY (see fw_cfg.h).
= Guest-side DMA Interface =
If bit 1 of the feature bitmap is set, the DMA interface is present. This does
not replace the existing fw_cfg interface, it is an add-on. This interface
can be used through the 64-bit wide address register.
The address register is in big-endian format. The value for the register is 0
at startup and after an operation. A write to the least significant half (at
offset 4) triggers an operation. This means that operations with 32-bit
addresses can be triggered with just one write, whereas operations with
64-bit addresses can be triggered with one 64-bit write or two 32-bit writes,
starting with the most significant half (at offset 0).
In this register, the physical address of a FWCfgDmaAccess structure in RAM
should be written. This is the format of the FWCfgDmaAccess structure:
typedef struct FWCfgDmaAccess {
uint32_t control;
uint32_t length;
uint64_t address;
} FWCfgDmaAccess;
The fields of the structure are in big endian mode, and the field at the lowest
address is the "control" field.
The "control" field has the following bits:
- Bit 0: Error
- Bit 1: Read
- Bit 2: Skip
- Bit 3: Select. The upper 16 bits are the selected index.
When an operation is triggered, if the "control" field has bit 3 set, the
upper 16 bits are interpreted as an index of a firmware configuration item.
This has the same effect as writing the selector register.
If the "control" field has bit 1 set, a read operation will be performed.
"length" bytes for the current selector and offset will be copied into the
physical RAM address specified by the "address" field.
If the "control" field has bit 2 set (and not bit 1), a skip operation will be
performed. The offset for the current selector will be advanced "length" bytes.
To check the result, read the "control" field:
error bit set -> something went wrong.
all bits cleared -> transfer finished successfully.
otherwise -> transfer still in progress (doesn't happen
today due to implementation not being async,
but may in the future).
= Host-side API =
The following functions are available to the QEMU programmer for adding
@@ -159,6 +219,17 @@ will convert a 16-, 32-, or 64-bit integer to little-endian, then add
a dynamically allocated copy of the appropriately sized item to fw_cfg
under the given selector key value.
== fw_cfg_modify_iXX() ==
Modify the value of an XX-bit item (where XX may be 16, 32, or 64).
Similarly to the corresponding fw_cfg_add_iXX() function set, convert
a 16-, 32-, or 64-bit integer to little endian, create a dynamically
allocated copy of the required size, and replace the existing item at
the given selector key value with the newly allocated one. The previous
item, assumed to have been allocated during an earlier call to
fw_cfg_add_iXX() or fw_cfg_modify_iXX() (of the same width XX), is freed
before the function returns.
== fw_cfg_add_file() ==
Given a filename (i.e., fw_cfg item name), starting pointer, and size,
@@ -216,6 +287,21 @@ the following syntax:
where <item_name> is the fw_cfg item name, and <path> is the location
on the host file system of a file containing the data to be inserted.
Small enough items may be provided directly as strings on the command
line, using the syntax:
-fw_cfg [name=]<item_name>,string=<string>
The terminating NUL character of the content <string> will NOT be
included as part of the fw_cfg item data, which is consistent with
the absence of a NUL terminator for items inserted via the file option.
Both <item_name> and, if applicable, the content <string> are passed
through by QEMU without any interpretation, expansion, or further
processing. Any such processing (potentially performed e.g., by the shell)
is outside of QEMU's responsibility; as such, using plain ASCII characters
is recommended.
NOTE: Users *SHOULD* choose item names beginning with the prefix "opt/"
when using the "-fw_cfg" command line option, to avoid conflicting with
item names used internally by QEMU. For instance:

View File

@@ -2,30 +2,106 @@
Device Specification for Inter-VM shared memory device
------------------------------------------------------
The Inter-VM shared memory device is designed to share a region of memory to
userspace in multiple virtual guests. The memory region does not belong to any
guest, but is a POSIX memory object on the host. Optionally, the device may
support sending interrupts to other guests sharing the same memory region.
The Inter-VM shared memory device is designed to share a memory region (created
on the host via the POSIX shared memory API) between multiple QEMU processes
running different guests. In order for all guests to be able to pick up the
shared memory area, it is modeled by QEMU as a PCI device exposing said memory
to the guest as a PCI BAR.
The memory region does not belong to any guest, but is a POSIX memory object on
the host. The host can access this shared memory if needed.
The device also provides an optional communication mechanism between guests
sharing the same memory object. More details about that in the section 'Guest to
guest communication' section.
The Inter-VM PCI device
-----------------------
*BARs*
From the VM point of view, the ivshmem PCI device supports three BARs.
The device supports three BARs. BAR0 is a 1 Kbyte MMIO region to support
registers. BAR1 is used for MSI-X when it is enabled in the device. BAR2 is
used to map the shared memory object from the host. The size of BAR2 is
specified when the guest is started and must be a power of 2 in size.
- BAR0 is a 1 Kbyte MMIO region to support registers and interrupts when MSI is
not used.
- BAR1 is used for MSI-X when it is enabled in the device.
- BAR2 is used to access the shared memory object.
*Registers*
It is your choice how to use the device but you must choose between two
behaviors :
The device currently supports 4 registers of 32-bits each. Registers
are used for synchronization between guests sharing the same memory object when
interrupts are supported (this requires using the shared memory server).
- basically, if you only need the shared memory part, you will map BAR2.
This way, you have access to the shared memory in guest and can use it as you
see fit (memnic, for example, uses it in userland
http://dpdk.org/browse/memnic).
The server assigns each VM an ID number and sends this ID number to the QEMU
process when the guest starts.
- BAR0 and BAR1 are used to implement an optional communication mechanism
through interrupts in the guests. If you need an event mechanism between the
guests accessing the shared memory, you will most likely want to write a
kernel driver that will handle interrupts. See details in the section 'Guest
to guest communication' section.
The behavior is chosen when starting your QEMU processes:
- no communication mechanism needed, the first QEMU to start creates the shared
memory on the host, subsequent QEMU processes will use it.
- communication mechanism needed, an ivshmem server must be started before any
QEMU processes, then each QEMU process connects to the server unix socket.
For more details on the QEMU ivshmem parameters, see qemu-doc documentation.
Guest to guest communication
----------------------------
This section details the communication mechanism between the guests accessing
the ivhsmem shared memory.
*ivshmem server*
This server code is available in qemu.git/contrib/ivshmem-server.
The server must be started on the host before any guest.
It creates a shared memory object then waits for clients to connect on a unix
socket. All the messages are little-endian int64_t integer.
For each client (QEMU process) that connects to the server:
- the server sends a protocol version, if client does not support it, the client
closes the communication,
- the server assigns an ID for this client and sends this ID to him as the first
message,
- the server sends a fd to the shared memory object to this client,
- the server creates a new set of host eventfds associated to the new client and
sends this set to all already connected clients,
- finally, the server sends all the eventfds sets for all clients to the new
client.
The server signals all clients when one of them disconnects.
The client IDs are limited to 16 bits because of the current implementation (see
Doorbell register in 'PCI device registers' subsection). Hence only 65536
clients are supported.
All the file descriptors (fd to the shared memory, eventfds for each client)
are passed to clients using SCM_RIGHTS over the server unix socket.
Apart from the current ivshmem implementation in QEMU, an ivshmem client has
been provided in qemu.git/contrib/ivshmem-client for debug.
*QEMU as an ivshmem client*
At initialisation, when creating the ivshmem device, QEMU first receives a
protocol version and closes communication with server if it does not match.
Then, QEMU gets its ID from the server then makes it available through BAR0
IVPosition register for the VM to use (see 'PCI device registers' subsection).
QEMU then uses the fd to the shared memory to map it to BAR2.
eventfds for all other clients received from the server are stored to implement
BAR0 Doorbell register (see 'PCI device registers' subsection).
Finally, eventfds assigned to this QEMU process are used to send interrupts in
this VM.
*PCI device registers*
From the VM point of view, the ivshmem PCI device supports 4 registers of
32-bits each.
enum ivshmem_registers {
IntrMask = 0,
@@ -49,8 +125,8 @@ bit to 0 and unmasked by setting the first bit to 1.
IVPosition Register: The IVPosition register is read-only and reports the
guest's ID number. The guest IDs are non-negative integers. When using the
server, since the server is a separate process, the VM ID will only be set when
the device is ready (shared memory is received from the server and accessible via
the device). If the device is not ready, the IVPosition will return -1.
the device is ready (shared memory is received from the server and accessible
via the device). If the device is not ready, the IVPosition will return -1.
Applications should ensure that they have a valid VM ID before accessing the
shared memory.
@@ -59,8 +135,8 @@ Doorbell register. The doorbell register is 32-bits, logically divided into
two 16-bit fields. The high 16-bits are the guest ID to interrupt and the low
16-bits are the interrupt vector to trigger. The semantics of the value
written to the doorbell depends on whether the device is using MSI or a regular
pin-based interrupt. In short, MSI uses vectors while regular interrupts set the
status register.
pin-based interrupt. In short, MSI uses vectors while regular interrupts set
the status register.
Regular Interrupts
@@ -71,7 +147,7 @@ interrupt in the destination guest.
Message Signalled Interrupts
A ivshmem device may support multiple MSI vectors. If so, the lower 16-bits
An ivshmem device may support multiple MSI vectors. If so, the lower 16-bits
written to the Doorbell register must be between 0 and the maximum number of
vectors the guest supports. The lower 16 bits written to the doorbell is the
MSI vector that will be raised in the destination guest. The number of MSI
@@ -83,14 +159,3 @@ interrupt itself should be communicated via the shared memory region. Devices
supporting multiple MSI vectors can use different vectors to indicate different
events have occurred. The semantics of interrupt vectors are left to the
user's discretion.
Usage in the Guest
------------------
The shared memory device is intended to be used with the provided UIO driver.
Very little configuration is needed. The guest should map BAR0 to access the
registers (an array of 32-bit ints allows simple writing) and map BAR2 to
access the shared memory region itself. The size of the shared memory region
is specified when the guest (or shared memory server) is started. A guest may
map the whole shared memory region or only part of it.

View File

@@ -87,6 +87,14 @@ Depending on the request type, payload can be:
User address: a 64-bit user address
mmap offset: 64-bit offset where region starts in the mapped memory
* Log description
---------------------------
| log size | log offset |
---------------------------
log size: size of area used for logging
log offset: offset from start of supplied file descriptor
where logging starts (i.e. where guest address 0 would be logged)
In QEMU the vhost-user message is implemented with the following struct:
typedef struct VhostUserMsg {
@@ -98,6 +106,7 @@ typedef struct VhostUserMsg {
struct vhost_vring_state state;
struct vhost_vring_addr addr;
VhostUserMemory memory;
VhostUserLog log;
};
} QEMU_PACKED VhostUserMsg;
@@ -115,11 +124,13 @@ the ones that do:
* VHOST_GET_FEATURES
* VHOST_GET_PROTOCOL_FEATURES
* VHOST_GET_VRING_BASE
* VHOST_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD)
There are several messages that the master sends with file descriptors passed
in the ancillary data:
* VHOST_SET_MEM_TABLE
* VHOST_SET_LOG_BASE (if VHOST_USER_PROTOCOL_F_LOG_SHMFD)
* VHOST_SET_LOG_FD
* VHOST_SET_VRING_KICK
* VHOST_SET_VRING_CALL
@@ -135,13 +146,45 @@ As older slaves don't support negotiating protocol features,
a feature bit was dedicated for this purpose:
#define VHOST_USER_F_PROTOCOL_FEATURES 30
Starting and stopping rings
----------------------
Client must only process each ring when it is started.
Client must only pass data between the ring and the
backend, when the ring is enabled.
If ring is started but disabled, client must process the
ring without talking to the backend.
For example, for a networking device, in the disabled state
client must not supply any new RX packets, but must process
and discard any TX packets.
If VHOST_USER_F_PROTOCOL_FEATURES has not been negotiated, the ring is initialized
in an enabled state.
If VHOST_USER_F_PROTOCOL_FEATURES has been negotiated, the ring is initialized
in a disabled state. Client must not pass data to/from the backend until ring is enabled by
VHOST_USER_SET_VRING_ENABLE with parameter 1, or after it has been disabled by
VHOST_USER_SET_VRING_ENABLE with parameter 0.
Each ring is initialized in a stopped state, client must not process it until
ring is started, or after it has been stopped.
Client must start ring upon receiving a kick (that is, detecting that file
descriptor is readable) on the descriptor specified by
VHOST_USER_SET_VRING_KICK, and stop ring upon receiving
VHOST_USER_GET_VRING_BASE.
While processing the rings (whether they are enabled or not), client must
support changing some configuration aspects on the fly.
Multiple queue support
----------------------
Multiple queue is treated as a protocol extension, hence the slave has to
implement protocol features first. The multiple queues feature is supported
only when the protocol feature VHOST_USER_PROTOCOL_F_MQ (bit 0) is set:
#define VHOST_USER_PROTOCOL_F_MQ 0
only when the protocol feature VHOST_USER_PROTOCOL_F_MQ (bit 0) is set.
The max number of queues the slave supports can be queried with message
VHOST_USER_GET_PROTOCOL_FEATURES. Master should stop when the number of
@@ -152,6 +195,66 @@ queue in the sent message to identify a specified queue. One queue pair
is enabled initially. More queues are enabled dynamically, by sending
message VHOST_USER_SET_VRING_ENABLE.
Migration
---------
During live migration, the master may need to track the modifications
the slave makes to the memory mapped regions. The client should mark
the dirty pages in a log. Once it complies to this logging, it may
declare the VHOST_F_LOG_ALL vhost feature.
To start/stop logging of data/used ring writes, server may send messages
VHOST_USER_SET_FEATURES with VHOST_F_LOG_ALL and VHOST_USER_SET_VRING_ADDR with
VHOST_VRING_F_LOG in ring's flags set to 1/0, respectively.
All the modifications to memory pointed by vring "descriptor" should
be marked. Modifications to "used" vring should be marked if
VHOST_VRING_F_LOG is part of ring's flags.
Dirty pages are of size:
#define VHOST_LOG_PAGE 0x1000
The log memory fd is provided in the ancillary data of
VHOST_USER_SET_LOG_BASE message when the slave has
VHOST_USER_PROTOCOL_F_LOG_SHMFD protocol feature.
The size of the log is supplied as part of VhostUserMsg
which should be large enough to cover all known guest
addresses. Log starts at the supplied offset in the
supplied file descriptor.
The log covers from address 0 to the maximum of guest
regions. In pseudo-code, to mark page at "addr" as dirty:
page = addr / VHOST_LOG_PAGE
log[page / 8] |= 1 << page % 8
Where addr is the guest physical address.
Use atomic operations, as the log may be concurrently manipulated.
Note that when logging modifications to the used ring (when VHOST_VRING_F_LOG
is set for this ring), log_guest_addr should be used to calculate the log
offset: the write to first byte of the used ring is logged at this offset from
log start. Also note that this value might be outside the legal guest physical
address range (i.e. does not have to be covered by the VhostUserMemory table),
but the bit offset of the last byte of the ring must fall within
the size supplied by VhostUserLog.
VHOST_USER_SET_LOG_FD is an optional message with an eventfd in
ancillary data, it may be used to inform the master that the log has
been modified.
Once the source has finished migration, rings will be stopped by
the source. No further update must be done before rings are
restarted.
Protocol features
-----------------
#define VHOST_USER_PROTOCOL_F_MQ 0
#define VHOST_USER_PROTOCOL_F_LOG_SHMFD 1
#define VHOST_USER_PROTOCOL_F_RARP 2
Message types
-------------
@@ -211,14 +314,16 @@ Message types
as an owner of the session. This can be used on the Slave as a
"session start" flag.
* VHOST_USER_RESET_DEVICE
* VHOST_USER_RESET_OWNER
Id: 4
Equivalent ioctl: VHOST_RESET_DEVICE
Master payload: N/A
Issued when a new connection is about to be closed. The Master will no
longer own this connection (and will usually close it).
This is no longer used. Used to be sent to request disabling
all rings, but some clients interpreted it to also discard
connection state (this interpretation would lead to bugs).
It is recommended that clients either ignore this message,
or use it to disable all rings.
* VHOST_USER_SET_MEM_TABLE
@@ -236,8 +341,14 @@ Message types
Id: 6
Equivalent ioctl: VHOST_SET_LOG_BASE
Master payload: u64
Slave payload: N/A
Sets logging shared memory space.
When slave has VHOST_USER_PROTOCOL_F_LOG_SHMFD protocol
feature, the log memory fd is provided in the ancillary data of
VHOST_USER_SET_LOG_BASE message, the size and offset of shared
memory area provided in the message.
Sets the logging base address.
* VHOST_USER_SET_LOG_FD
@@ -337,3 +448,19 @@ Message types
Master payload: vring state description
Signal slave to enable or disable corresponding vring.
This request should be sent only when VHOST_USER_F_PROTOCOL_FEATURES
has been negotiated.
* VHOST_USER_SEND_RARP
Id: 19
Equivalent ioctl: N/A
Master payload: u64
Ask vhost user backend to broadcast a fake RARP to notify the migration
is terminated for guest that does not support GUEST_ANNOUNCE.
Only legal if feature bit VHOST_USER_F_PROTOCOL_FEATURES is present in
VHOST_USER_GET_FEATURES and protocol feature bit VHOST_USER_PROTOCOL_F_RARP
is present in VHOST_USER_GET_PROTOCOL_FEATURES.
The first 6 bytes of the payload contain the mac address of the guest to
allow the vhost user backend to construct and broadcast the fake RARP.

106
docs/virtio-migration.txt Normal file
View File

@@ -0,0 +1,106 @@
Virtio devices and migration
============================
Copyright 2015 IBM Corp.
This work is licensed under the terms of the GNU GPL, version 2 or later. See
the COPYING file in the top-level directory.
Saving and restoring the state of virtio devices is a bit of a twisty maze,
for several reasons:
- state is distributed between several parts:
- virtio core, for common fields like features, number of queues, ...
- virtio transport (pci, ccw, ...), for the different proxy devices and
transport specific state (msix vectors, indicators, ...)
- virtio device (net, blk, ...), for the different device types and their
state (mac address, request queue, ...)
- most fields are saved via the stream interface; subsequently, subsections
have been added to make cross-version migration possible
This file attempts to document the current procedure and point out some
caveats.
Save state procedure
====================
virtio core virtio transport virtio device
----------- ---------------- -------------
save() function registered
via register_savevm()
virtio_save() <----------
------> save_config()
- save proxy device
- save transport-specific
device fields
- save common device
fields
- save common virtqueue
fields
------> save_queue()
- save transport-specific
virtqueue fields
------> save_device()
- save device-specific
fields
- save subsections
- device endianness,
if changed from
default endianness
- 64 bit features, if
any high feature bit
is set
- virtio-1 virtqueue
fields, if VERSION_1
is set
Load state procedure
====================
virtio core virtio transport virtio device
----------- ---------------- -------------
load() function registered
via register_savevm()
virtio_load() <----------
------> load_config()
- load proxy device
- load transport-specific
device fields
- load common device
fields
- load common virtqueue
fields
------> load_queue()
- load transport-specific
virtqueue fields
- notify guest
------> load_device()
- load device-specific
fields
- load subsections
- device endianness
- 64 bit features
- virtio-1 virtqueue
fields
- sanitize endianness
- sanitize features
- virtqueue index sanity
check
- feature-dependent setup
Implications of this setup
==========================
Devices need to be careful in their state processing during load: The
load_device() procedure is invoked by the core before subsections have
been loaded. Any code that depends on information transmitted in subsections
therefore has to be invoked in the device's load() function _after_
virtio_load() returned (like e.g. code depending on features).
Any extension of the state being migrated should be done in subsections
added to the core for compatibility reasons. If transport or device specific
state is added, core needs to invoke a callback from the new subsection.

View File

@@ -210,7 +210,7 @@ if you don't see these strings, then something went wrong.
=== Errors ===
QMP commands should use the error interface exported by the error.h header
file. Basically, errors are set by calling the error_set() function.
file. Basically, most errors are set by calling the error_setg() function.
Let's say we don't accept the string "message" to contain the word "love". If
it does contain it, we want the "hello-world" command to return an error:
@@ -219,8 +219,7 @@ void qmp_hello_world(bool has_message, const char *message, Error **errp)
{
if (has_message) {
if (strstr(message, "love")) {
error_set(errp, ERROR_CLASS_GENERIC_ERROR,
"the word 'love' is not allowed");
error_setg(errp, "the word 'love' is not allowed");
return;
}
printf("%s\n", message);
@@ -229,10 +228,8 @@ void qmp_hello_world(bool has_message, const char *message, Error **errp)
}
}
The first argument to the error_set() function is the Error pointer to pointer,
which is passed to all QMP functions. The second argument is a ErrorClass
value, which should be ERROR_CLASS_GENERIC_ERROR most of the time (more
details about error classes are given below). The third argument is a human
The first argument to the error_setg() function is the Error pointer
to pointer, which is passed to all QMP functions. The next argument is a human
description of the error, this is a free-form printf-like string.
Let's test the example above. Build qemu, run it as defined in the "Testing"
@@ -249,8 +246,9 @@ The QMP server's response should be:
}
}
As a general rule, all QMP errors should use ERROR_CLASS_GENERIC_ERROR. There
are two exceptions to this rule:
As a general rule, all QMP errors should use ERROR_CLASS_GENERIC_ERROR
(done by default when using error_setg()). There are two exceptions to
this rule:
1. A non-generic ErrorClass value exists* for the failure you want to report
(eg. DeviceNotFound)
@@ -259,8 +257,8 @@ are two exceptions to this rule:
want to report, hence you have to add a new ErrorClass value so that they
can check for it
If the failure you want to report doesn't fall in one of the two cases above,
just report ERROR_CLASS_GENERIC_ERROR.
If the failure you want to report falls into one of the two cases above,
use error_set() with a second argument of an ErrorClass value.
* All existing ErrorClass values are defined in the qapi-schema.json file

190
exec.c
View File

@@ -50,11 +50,15 @@
#include "qemu/rcu_queue.h"
#include "qemu/main-loop.h"
#include "translate-all.h"
#include "sysemu/replay.h"
#include "exec/memory-internal.h"
#include "exec/ram_addr.h"
#include "qemu/range.h"
#ifndef _WIN32
#include "qemu/mmap-alloc.h"
#endif
//#define DEBUG_SUBPAGE
@@ -84,9 +88,9 @@ static MemoryRegion io_mem_unassigned;
*/
#define RAM_RESIZEABLE (1 << 2)
/* An extra page is mapped on top of this RAM.
/* RAM is backed by an mmapped file.
*/
#define RAM_EXTRA (1 << 3)
#define RAM_FILE (1 << 3)
#endif
struct CPUTailQ cpus = QTAILQ_HEAD_INITIALIZER(cpus);
@@ -879,6 +883,7 @@ void cpu_abort(CPUState *cpu, const char *fmt, ...)
}
va_end(ap2);
va_end(ap);
replay_finish();
#if defined(CONFIG_USER_ONLY)
{
struct sigaction act;
@@ -898,7 +903,7 @@ static RAMBlock *qemu_get_ram_block(ram_addr_t addr)
block = atomic_rcu_read(&ram_list.mru_block);
if (block && addr - block->offset < block->max_length) {
goto found;
return block;
}
QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
if (addr - block->offset < block->max_length) {
@@ -1059,9 +1064,11 @@ static uint16_t phys_section_add(PhysPageMap *map,
static void phys_section_destroy(MemoryRegion *mr)
{
bool have_sub_page = mr->subpage;
memory_region_unref(mr);
if (mr->subpage) {
if (have_sub_page) {
subpage_t *subpage = container_of(mr, subpage_t, iomem);
object_unref(OBJECT(&subpage->iomem));
g_free(subpage);
@@ -1191,9 +1198,6 @@ static long gethugepagesize(const char *path, Error **errp)
return 0;
}
if (fs.f_type != HUGETLBFS_MAGIC)
fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path);
return fs.f_bsize;
}
@@ -1202,16 +1206,14 @@ static void *file_ram_alloc(RAMBlock *block,
const char *path,
Error **errp)
{
struct stat st;
char *filename;
char *sanitized_name;
char *c;
void *ptr;
void *area = NULL;
void *area;
int fd;
uint64_t hpagesize;
uint64_t total;
Error *local_err = NULL;
size_t offset;
hpagesize = gethugepagesize(path, &local_err);
if (local_err) {
@@ -1233,29 +1235,35 @@ static void *file_ram_alloc(RAMBlock *block,
goto error;
}
/* Make name safe to use with mkstemp by replacing '/' with '_'. */
sanitized_name = g_strdup(memory_region_name(block->mr));
for (c = sanitized_name; *c != '\0'; c++) {
if (*c == '/')
*c = '_';
if (!stat(path, &st) && S_ISDIR(st.st_mode)) {
/* Make name safe to use with mkstemp by replacing '/' with '_'. */
sanitized_name = g_strdup(memory_region_name(block->mr));
for (c = sanitized_name; *c != '\0'; c++) {
if (*c == '/') {
*c = '_';
}
}
filename = g_strdup_printf("%s/qemu_back_mem.%s.XXXXXX", path,
sanitized_name);
g_free(sanitized_name);
fd = mkstemp(filename);
if (fd >= 0) {
unlink(filename);
}
g_free(filename);
} else {
fd = open(path, O_RDWR | O_CREAT, 0644);
}
filename = g_strdup_printf("%s/qemu_back_mem.%s.XXXXXX", path,
sanitized_name);
g_free(sanitized_name);
fd = mkstemp(filename);
if (fd < 0) {
error_setg_errno(errp, errno,
"unable to create backing store for hugepages");
g_free(filename);
goto error;
}
unlink(filename);
g_free(filename);
memory = ROUND_UP(memory, hpagesize);
total = memory + hpagesize;
/*
* ftruncate is not supported by hugetlbfs in older
@@ -1267,40 +1275,14 @@ static void *file_ram_alloc(RAMBlock *block,
perror("ftruncate");
}
ptr = mmap(0, total, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS,
-1, 0);
if (ptr == MAP_FAILED) {
error_setg_errno(errp, errno,
"unable to allocate memory range for hugepages");
close(fd);
goto error;
}
offset = QEMU_ALIGN_UP((uintptr_t)ptr, hpagesize) - (uintptr_t)ptr;
area = mmap(ptr + offset, memory, PROT_READ | PROT_WRITE,
(block->flags & RAM_SHARED ? MAP_SHARED : MAP_PRIVATE) |
MAP_FIXED,
fd, 0);
area = qemu_ram_mmap(fd, memory, hpagesize, block->flags & RAM_SHARED);
if (area == MAP_FAILED) {
error_setg_errno(errp, errno,
"unable to map backing store for hugepages");
munmap(ptr, total);
close(fd);
goto error;
}
if (offset > 0) {
munmap(ptr, offset);
}
ptr += offset;
total -= offset;
if (total > memory + getpagesize()) {
munmap(ptr + memory + getpagesize(),
total - memory - getpagesize());
}
if (mem_prealloc) {
os_mem_prealloc(fd, area, memory);
}
@@ -1309,10 +1291,6 @@ static void *file_ram_alloc(RAMBlock *block,
return area;
error:
if (mem_prealloc) {
error_report("%s", error_get_pretty(*errp));
exit(1);
}
return NULL;
}
#endif
@@ -1398,6 +1376,11 @@ static RAMBlock *find_ram_block(ram_addr_t addr)
return NULL;
}
const char *qemu_ram_get_idstr(RAMBlock *rb)
{
return rb->idstr;
}
/* Called with iothread lock held. */
void qemu_ram_set_idstr(ram_addr_t addr, const char *name, DeviceState *dev)
{
@@ -1468,7 +1451,7 @@ int qemu_ram_resize(ram_addr_t base, ram_addr_t newsize, Error **errp)
assert(block);
newsize = TARGET_PAGE_ALIGN(newsize);
newsize = HOST_PAGE_ALIGN(newsize);
if (block->used_length == newsize) {
return 0;
@@ -1612,13 +1595,13 @@ ram_addr_t qemu_ram_alloc_from_file(ram_addr_t size, MemoryRegion *mr,
return -1;
}
size = TARGET_PAGE_ALIGN(size);
size = HOST_PAGE_ALIGN(size);
new_block = g_malloc0(sizeof(*new_block));
new_block->mr = mr;
new_block->used_length = size;
new_block->max_length = size;
new_block->flags = share ? RAM_SHARED : 0;
new_block->flags |= RAM_EXTRA;
new_block->flags |= RAM_FILE;
new_block->host = file_ram_alloc(new_block, size,
mem_path, errp);
if (!new_block->host) {
@@ -1648,8 +1631,8 @@ ram_addr_t qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
ram_addr_t addr;
Error *local_err = NULL;
size = TARGET_PAGE_ALIGN(size);
max_size = TARGET_PAGE_ALIGN(max_size);
size = HOST_PAGE_ALIGN(size);
max_size = HOST_PAGE_ALIGN(max_size);
new_block = g_malloc0(sizeof(*new_block));
new_block->mr = mr;
new_block->resized = resized;
@@ -1720,8 +1703,8 @@ static void reclaim_ramblock(RAMBlock *block)
xen_invalidate_map_cache_entry(block->host);
#ifndef _WIN32
} else if (block->fd >= 0) {
if (block->flags & RAM_EXTRA) {
munmap(block->host, block->max_length + getpagesize());
if (block->flags & RAM_FILE) {
qemu_ram_munmap(block->host, block->max_length);
} else {
munmap(block->host, block->max_length);
}
@@ -1898,8 +1881,16 @@ static void *qemu_ram_ptr_length(ram_addr_t addr, hwaddr *size)
}
}
/* Some of the softmmu routines need to translate from a host pointer
* (typically a TLB entry) back to a ram offset.
/*
* Translates a host ptr back to a RAMBlock, a ram_addr and an offset
* in that RAMBlock.
*
* ptr: Host pointer to look up
* round_offset: If true round the result offset down to a page boundary
* *ram_addr: set to result ram_addr
* *offset: set to result offset within the RAMBlock
*
* Returns: RAMBlock (or NULL if not found)
*
* By the time this function returns, the returned pointer is not protected
* by RCU anymore. If the caller is not within an RCU critical section and
@@ -1907,18 +1898,22 @@ static void *qemu_ram_ptr_length(ram_addr_t addr, hwaddr *size)
* pointer, such as a reference to the region that includes the incoming
* ram_addr_t.
*/
MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr)
RAMBlock *qemu_ram_block_from_host(void *ptr, bool round_offset,
ram_addr_t *ram_addr,
ram_addr_t *offset)
{
RAMBlock *block;
uint8_t *host = ptr;
MemoryRegion *mr;
if (xen_enabled()) {
rcu_read_lock();
*ram_addr = xen_ram_addr_from_mapcache(ptr);
mr = qemu_get_ram_block(*ram_addr)->mr;
block = qemu_get_ram_block(*ram_addr);
if (block) {
*offset = (host - block->host);
}
rcu_read_unlock();
return mr;
return block;
}
rcu_read_lock();
@@ -1941,10 +1936,49 @@ MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr)
return NULL;
found:
*ram_addr = block->offset + (host - block->host);
mr = block->mr;
*offset = (host - block->host);
if (round_offset) {
*offset &= TARGET_PAGE_MASK;
}
*ram_addr = block->offset + *offset;
rcu_read_unlock();
return mr;
return block;
}
/*
* Finds the named RAMBlock
*
* name: The name of RAMBlock to find
*
* Returns: RAMBlock (or NULL if not found)
*/
RAMBlock *qemu_ram_block_by_name(const char *name)
{
RAMBlock *block;
QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
if (!strcmp(name, block->idstr)) {
return block;
}
}
return NULL;
}
/* Some of the softmmu routines need to translate from a host pointer
(typically a TLB entry) back to a ram offset. */
MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr)
{
RAMBlock *block;
ram_addr_t offset; /* Not used */
block = qemu_ram_block_from_host(ptr, false, ram_addr, &offset);
if (!block) {
return NULL;
}
return block->mr;
}
static void notdirty_mem_write(void *opaque, hwaddr ram_addr,
@@ -2725,8 +2759,8 @@ void cpu_register_map_client(QEMUBH *bh)
void cpu_exec_init_all(void)
{
qemu_mutex_init(&ram_list.mutex);
memory_map_init();
io_mem_init();
memory_map_init();
qemu_mutex_init(&map_client_list_lock);
}
@@ -3523,6 +3557,16 @@ int cpu_memory_rw_debug(CPUState *cpu, target_ulong addr,
}
return 0;
}
/*
* Allows code that needs to deal with migration bitmaps etc to still be built
* target independent.
*/
size_t qemu_target_page_bits(void)
{
return TARGET_PAGE_BITS;
}
#endif
/*

View File

@@ -1128,10 +1128,19 @@ int main(int argc, char **argv)
}
}
if (chdir("/") < 0) {
do_perror("chdir");
goto error;
}
if (chroot(rpath) < 0) {
do_perror("chroot");
goto error;
}
get_version = false;
#ifdef FS_IOC_GETVERSION
/* check whether underlying FS support IOC_GETVERSION */
retval = statfs(rpath, &st_fs);
retval = statfs("/", &st_fs);
if (!retval) {
switch (st_fs.f_type) {
case EXT2_SUPER_MAGIC:
@@ -1144,16 +1153,7 @@ int main(int argc, char **argv)
}
#endif
if (chdir("/") < 0) {
do_perror("chdir");
goto error;
}
if (chroot(rpath) < 0) {
do_perror("chroot");
goto error;
}
umask(0);
if (init_capabilities() < 0) {
goto error;
}

View File

@@ -956,6 +956,13 @@ static int gdb_handle_packet(GDBState *s, const char *line_buf)
if (*p == ',')
p++;
len = strtoull(p, NULL, 16);
/* memtohex() doubles the required space */
if (len > MAX_PACKET_LENGTH / 2) {
put_packet (s, "E22");
break;
}
if (target_memory_rw_debug(s->g_cpu, addr, mem_buf, len, false) != 0) {
put_packet (s, "E14");
} else {
@@ -970,6 +977,12 @@ static int gdb_handle_packet(GDBState *s, const char *line_buf)
len = strtoull(p, (char **)&p, 16);
if (*p == ':')
p++;
/* hextomem() reads 2*len bytes */
if (len > strlen(p) / 2) {
put_packet (s, "E22");
break;
}
hextomem(mem_buf, p, len);
if (target_memory_rw_debug(s->g_cpu, addr, mem_buf, len,
true) != 0) {
@@ -1107,7 +1120,8 @@ static int gdb_handle_packet(GDBState *s, const char *line_buf)
cpu = find_cpu(thread);
if (cpu != NULL) {
cpu_synchronize_state(cpu);
len = snprintf((char *)mem_buf, sizeof(mem_buf),
/* memtohex() doubles the required space */
len = snprintf((char *)mem_buf, sizeof(buf) / 2,
"CPU#%d [%s]", cpu->cpu_index,
cpu->halted ? "halted " : "running");
memtohex(buf, mem_buf, len);
@@ -1136,8 +1150,8 @@ static int gdb_handle_packet(GDBState *s, const char *line_buf)
put_packet(s, "E01");
break;
}
hextomem(mem_buf, p + 5, len);
len = len / 2;
hextomem(mem_buf, p + 5, len);
mem_buf[len++] = 0;
qemu_chr_be_write(s->mon_chr, mem_buf, len);
put_packet(s, "OK");

View File

@@ -194,8 +194,8 @@ ETEXI
{
.name = "change",
.args_type = "device:B,target:F,arg:s?",
.params = "device filename [format]",
.args_type = "device:B,target:F,arg:s?,read-only-mode:s?",
.params = "device filename [format [read-only-mode]]",
.help = "change a removable medium, optional format",
.mhandler.cmd = hmp_change,
},
@@ -206,7 +206,7 @@ STEXI
Change the configuration of a device.
@table @option
@item change @var{diskdevice} @var{filename} [@var{format}]
@item change @var{diskdevice} @var{filename} [@var{format} [@var{read-only-mode}]]
Change the medium for a removable disk device to point to @var{filename}. eg
@example
@@ -215,6 +215,20 @@ Change the medium for a removable disk device to point to @var{filename}. eg
@var{format} is optional.
@var{read-only-mode} may be used to change the read-only status of the device.
It accepts the following values:
@table @var
@item retain
Retains the current status; this is the default.
@item read-only
Makes the device read-only.
@item read-write
Makes the device writable.
@end table
@item change vnc @var{display},@var{options}
Change the configuration of the VNC server. The valid syntax for @var{display}
and @var{options} are described at @ref{sec_invocation}. eg
@@ -1005,6 +1019,23 @@ STEXI
@item migrate_set_parameter @var{parameter} @var{value}
@findex migrate_set_parameter
Set the parameter @var{parameter} for migration.
ETEXI
{
.name = "migrate_start_postcopy",
.args_type = "",
.params = "",
.help = "Followup to a migration command to switch the migration"
" to postcopy mode. The x-postcopy-ram capability must "
"be set before the original migration command.",
.mhandler.cmd = hmp_migrate_start_postcopy,
},
STEXI
@item migrate_start_postcopy
@findex migrate_start_postcopy
Switch in-progress migration to postcopy mode. Ignored after the end of
migration (or once already in postcopy).
ETEXI
{

84
hmp.c
View File

@@ -27,6 +27,7 @@
#include "qapi/opts-visitor.h"
#include "qapi/qmp/qerror.h"
#include "qapi/string-output-visitor.h"
#include "qapi/util.h"
#include "qapi-visit.h"
#include "ui/console.h"
#include "block/qapi.h"
@@ -521,6 +522,7 @@ void hmp_info_blockstats(Monitor *mon, const QDict *qdict)
" flush_total_time_ns=%" PRId64
" rd_merged=%" PRId64
" wr_merged=%" PRId64
" idle_time_ns=%" PRId64
"\n",
stats->value->stats->rd_bytes,
stats->value->stats->wr_bytes,
@@ -531,7 +533,8 @@ void hmp_info_blockstats(Monitor *mon, const QDict *qdict)
stats->value->stats->rd_total_time_ns,
stats->value->stats->flush_total_time_ns,
stats->value->stats->rd_merged,
stats->value->stats->wr_merged);
stats->value->stats->wr_merged,
stats->value->stats->idle_time_ns);
}
qapi_free_BlockStatsList(stats_list);
@@ -569,8 +572,8 @@ void hmp_info_vnc(Monitor *mon, const QDict *qdict)
for (client = info->clients; client; client = client->next) {
monitor_printf(mon, "Client:\n");
monitor_printf(mon, " address: %s:%s\n",
client->value->base->host,
client->value->base->service);
client->value->host,
client->value->service);
monitor_printf(mon, " x509_dname: %s\n",
client->value->x509_dname ?
client->value->x509_dname : "none");
@@ -638,7 +641,7 @@ void hmp_info_spice(Monitor *mon, const QDict *qdict)
for (chan = info->channels; chan; chan = chan->next) {
monitor_printf(mon, "Channel:\n");
monitor_printf(mon, " address: %s:%s%s\n",
chan->value->base->host, chan->value->base->port,
chan->value->host, chan->value->port,
chan->value->tls ? " [tls]" : "");
monitor_printf(mon, " session: %" PRId64 "\n",
chan->value->connection_id);
@@ -841,11 +844,11 @@ void hmp_info_tpm(Monitor *mon, const QDict *qdict)
c, TpmModel_lookup[ti->model]);
monitor_printf(mon, " \\ %s: type=%s",
ti->id, TpmTypeOptionsKind_lookup[ti->options->kind]);
ti->id, TpmTypeOptionsKind_lookup[ti->options->type]);
switch (ti->options->kind) {
switch (ti->options->type) {
case TPM_TYPE_OPTIONS_KIND_PASSTHROUGH:
tpo = ti->options->passthrough;
tpo = ti->options->u.passthrough;
monitor_printf(mon, "%s%s%s%s",
tpo->has_path ? ",path=" : "",
tpo->has_path ? tpo->path : "",
@@ -1293,6 +1296,13 @@ void hmp_client_migrate_info(Monitor *mon, const QDict *qdict)
hmp_handle_error(mon, &err);
}
void hmp_migrate_start_postcopy(Monitor *mon, const QDict *qdict)
{
Error *err = NULL;
qmp_migrate_start_postcopy(&err);
hmp_handle_error(mon, &err);
}
void hmp_set_password(Monitor *mon, const QDict *qdict)
{
const char *protocol = qdict_get_str(qdict, "protocol");
@@ -1336,24 +1346,46 @@ void hmp_change(Monitor *mon, const QDict *qdict)
const char *device = qdict_get_str(qdict, "device");
const char *target = qdict_get_str(qdict, "target");
const char *arg = qdict_get_try_str(qdict, "arg");
const char *read_only = qdict_get_try_str(qdict, "read-only-mode");
BlockdevChangeReadOnlyMode read_only_mode = 0;
Error *err = NULL;
if (strcmp(device, "vnc") == 0 &&
(strcmp(target, "passwd") == 0 ||
strcmp(target, "password") == 0)) {
if (!arg) {
monitor_read_password(mon, hmp_change_read_arg, NULL);
if (strcmp(device, "vnc") == 0) {
if (read_only) {
monitor_printf(mon,
"Parameter 'read-only-mode' is invalid for VNC\n");
return;
}
if (strcmp(target, "passwd") == 0 ||
strcmp(target, "password") == 0) {
if (!arg) {
monitor_read_password(mon, hmp_change_read_arg, NULL);
return;
}
}
qmp_change("vnc", target, !!arg, arg, &err);
} else {
if (read_only) {
read_only_mode =
qapi_enum_parse(BlockdevChangeReadOnlyMode_lookup,
read_only, BLOCKDEV_CHANGE_READ_ONLY_MODE_MAX,
BLOCKDEV_CHANGE_READ_ONLY_MODE_RETAIN, &err);
if (err) {
hmp_handle_error(mon, &err);
return;
}
}
qmp_blockdev_change_medium(device, target, !!arg, arg,
!!read_only, read_only_mode, &err);
if (err &&
error_get_class(err) == ERROR_CLASS_DEVICE_ENCRYPTED) {
error_free(err);
monitor_read_block_device_key(mon, device, NULL, NULL);
return;
}
}
qmp_change(device, target, !!arg, arg, &err);
if (err &&
error_get_class(err) == ERROR_CLASS_DEVICE_ENCRYPTED) {
error_free(err);
monitor_read_block_device_key(mon, device, NULL, NULL);
return;
}
hmp_handle_error(mon, &err);
}
@@ -1735,15 +1767,15 @@ void hmp_sendkey(Monitor *mon, const QDict *qdict)
if (*endp != '\0') {
goto err_out;
}
keylist->value->kind = KEY_VALUE_KIND_NUMBER;
keylist->value->number = value;
keylist->value->type = KEY_VALUE_KIND_NUMBER;
keylist->value->u.number = value;
} else {
int idx = index_from_key(keyname_buf);
if (idx == Q_KEY_CODE_MAX) {
goto err_out;
}
keylist->value->kind = KEY_VALUE_KIND_QCODE;
keylist->value->qcode = idx;
keylist->value->type = KEY_VALUE_KIND_QCODE;
keylist->value->u.qcode = idx;
}
if (!separator) {
@@ -1958,12 +1990,12 @@ void hmp_info_memory_devices(Monitor *mon, const QDict *qdict)
value = info->value;
if (value) {
switch (value->kind) {
switch (value->type) {
case MEMORY_DEVICE_INFO_KIND_DIMM:
di = value->dimm;
di = value->u.dimm;
monitor_printf(mon, "Memory device [%s]: \"%s\"\n",
MemoryDeviceInfoKind_lookup[value->kind],
MemoryDeviceInfoKind_lookup[value->type],
di->id ? di->id : "");
monitor_printf(mon, " addr: 0x%" PRIx64 "\n", di->addr);
monitor_printf(mon, " slot: %" PRId64 "\n", di->slot);

1
hmp.h
View File

@@ -69,6 +69,7 @@ void hmp_migrate_set_capability(Monitor *mon, const QDict *qdict);
void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict);
void hmp_migrate_set_cache_size(Monitor *mon, const QDict *qdict);
void hmp_client_migrate_info(Monitor *mon, const QDict *qdict);
void hmp_migrate_start_postcopy(Monitor *mon, const QDict *qdict);
void hmp_set_password(Monitor *mon, const QDict *qdict);
void hmp_expire_password(Monitor *mon, const QDict *qdict);
void hmp_eject(Monitor *mon, const QDict *qdict);

View File

@@ -14,7 +14,7 @@
#include "fsdev/qemu-fsdev.h"
#include "qemu/thread.h"
#include "block/coroutine.h"
#include "qemu/coroutine.h"
#include "virtio-9p-coth.h"
int v9fs_co_readdir_r(V9fsPDU *pdu, V9fsFidState *fidp, struct dirent *dent,

View File

@@ -14,7 +14,7 @@
#include "fsdev/qemu-fsdev.h"
#include "qemu/thread.h"
#include "block/coroutine.h"
#include "qemu/coroutine.h"
#include "virtio-9p-coth.h"
int v9fs_co_st_gen(V9fsPDU *pdu, V9fsPath *path, mode_t st_mode,

View File

@@ -14,7 +14,7 @@
#include "fsdev/qemu-fsdev.h"
#include "qemu/thread.h"
#include "block/coroutine.h"
#include "qemu/coroutine.h"
#include "virtio-9p-coth.h"
static ssize_t __readlink(V9fsState *s, V9fsPath *path, V9fsString *buf)

View File

@@ -14,7 +14,7 @@
#include "fsdev/qemu-fsdev.h"
#include "qemu/thread.h"
#include "block/coroutine.h"
#include "qemu/coroutine.h"
#include "virtio-9p-coth.h"
int v9fs_co_llistxattr(V9fsPDU *pdu, V9fsPath *path, void *value, size_t size)

View File

@@ -12,71 +12,30 @@
*
*/
#include "fsdev/qemu-fsdev.h"
#include "qemu/thread.h"
#include "qemu/event_notifier.h"
#include "block/coroutine.h"
#include "qemu-common.h"
#include "block/thread-pool.h"
#include "qemu/coroutine.h"
#include "qemu/main-loop.h"
#include "virtio-9p-coth.h"
/* v9fs glib thread pool */
static V9fsThPool v9fs_pool;
/* Called from QEMU I/O thread. */
static void coroutine_enter_cb(void *opaque, int ret)
{
Coroutine *co = opaque;
qemu_coroutine_enter(co, NULL);
}
/* Called from worker thread. */
static int coroutine_enter_func(void *arg)
{
Coroutine *co = arg;
qemu_coroutine_enter(co, NULL);
return 0;
}
void co_run_in_worker_bh(void *opaque)
{
Coroutine *co = opaque;
g_thread_pool_push(v9fs_pool.pool, co, NULL);
}
static void v9fs_qemu_process_req_done(EventNotifier *e)
{
Coroutine *co;
event_notifier_test_and_clear(e);
while ((co = g_async_queue_try_pop(v9fs_pool.completed)) != NULL) {
qemu_coroutine_enter(co, NULL);
}
}
static void v9fs_thread_routine(gpointer data, gpointer user_data)
{
Coroutine *co = data;
qemu_coroutine_enter(co, NULL);
g_async_queue_push(v9fs_pool.completed, co);
event_notifier_set(&v9fs_pool.e);
}
int v9fs_init_worker_threads(void)
{
int ret = 0;
V9fsThPool *p = &v9fs_pool;
sigset_t set, oldset;
sigfillset(&set);
/* Leave signal handling to the iothread. */
pthread_sigmask(SIG_SETMASK, &set, &oldset);
p->pool = g_thread_pool_new(v9fs_thread_routine, p, -1, FALSE, NULL);
if (!p->pool) {
ret = -1;
goto err_out;
}
p->completed = g_async_queue_new();
if (!p->completed) {
/*
* We are going to terminate.
* So don't worry about cleanup
*/
ret = -1;
goto err_out;
}
event_notifier_init(&p->e, 0);
event_notifier_set_handler(&p->e, v9fs_qemu_process_req_done);
err_out:
pthread_sigmask(SIG_SETMASK, &oldset, NULL);
return ret;
thread_pool_submit_aio(qemu_get_aio_context()->thread_pool,
coroutine_enter_func, co, coroutine_enter_cb, co);
}

View File

@@ -16,16 +16,8 @@
#define _QEMU_VIRTIO_9P_COTH_H
#include "qemu/thread.h"
#include "block/coroutine.h"
#include "qemu/coroutine.h"
#include "virtio-9p.h"
#include <glib.h>
typedef struct V9fsThPool {
EventNotifier e;
GThreadPool *pool;
GAsyncQueue *completed;
} V9fsThPool;
/*
* we want to use bottom half because we want to make sure the below
@@ -45,7 +37,7 @@ typedef struct V9fsThPool {
qemu_bh_schedule(co_bh); \
/* \
* yield in qemu thread and re-enter back \
* in glib worker thread \
* in worker thread \
*/ \
qemu_coroutine_yield(); \
qemu_bh_delete(co_bh); \

View File

@@ -43,6 +43,16 @@ static void virtio_9p_get_config(VirtIODevice *vdev, uint8_t *config)
g_free(cfg);
}
static void virtio_9p_save(QEMUFile *f, void *opaque)
{
virtio_save(VIRTIO_DEVICE(opaque), f);
}
static int virtio_9p_load(QEMUFile *f, void *opaque, int version_id)
{
return virtio_load(VIRTIO_DEVICE(opaque), f, version_id);
}
static void virtio_9p_device_realize(DeviceState *dev, Error **errp)
{
VirtIODevice *vdev = VIRTIO_DEVICE(dev);
@@ -106,10 +116,6 @@ static void virtio_9p_device_realize(DeviceState *dev, Error **errp)
" and export path:%s", s->fsconf.fsdev_id, s->ctx.fs_root);
goto out;
}
if (v9fs_init_worker_threads() < 0) {
error_setg(errp, "worker thread initialization failed");
goto out;
}
/*
* Check details of export path, We need to use fs driver
@@ -130,6 +136,7 @@ static void virtio_9p_device_realize(DeviceState *dev, Error **errp)
}
v9fs_path_free(&path);
register_savevm(dev, "virtio-9p", -1, 1, virtio_9p_save, virtio_9p_load, s);
return;
out:
g_free(s->ctx.fs_root);
@@ -138,6 +145,17 @@ out:
v9fs_path_free(&path);
}
static void virtio_9p_device_unrealize(DeviceState *dev, Error **errp)
{
VirtIODevice *vdev = VIRTIO_DEVICE(dev);
V9fsState *s = VIRTIO_9P(dev);
virtio_cleanup(vdev);
unregister_savevm(dev, "virtio-9p", s);
g_free(s->ctx.fs_root);
g_free(s->tag);
}
/* virtio-9p device */
static Property virtio_9p_properties[] = {
@@ -154,6 +172,7 @@ static void virtio_9p_class_init(ObjectClass *klass, void *data)
dc->props = virtio_9p_properties;
set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
vdc->realize = virtio_9p_device_realize;
vdc->unrealize = virtio_9p_device_unrealize;
vdc->get_features = virtio_9p_get_features;
vdc->get_config = virtio_9p_get_config;
}

View File

@@ -13,7 +13,7 @@
#include "fsdev/file-op-9p.h"
#include "fsdev/virtio-9p-marshal.h"
#include "qemu/thread.h"
#include "block/coroutine.h"
#include "qemu/coroutine.h"
enum {
P9_TLERROR = 6,

View File

@@ -1163,9 +1163,7 @@ void *acpi_data_push(GArray *table_data, unsigned size)
unsigned acpi_data_len(GArray *table)
{
#if GLIB_CHECK_VERSION(2, 22, 0)
assert(g_array_get_element_size(table) == 1);
#endif
return table->len;
}

View File

@@ -625,8 +625,12 @@ void acpi_pm1_cnt_reset(ACPIREGS *ar)
void acpi_gpe_init(ACPIREGS *ar, uint8_t len)
{
ar->gpe.len = len;
ar->gpe.sts = g_malloc0(len / 2);
ar->gpe.en = g_malloc0(len / 2);
/* Only first len / 2 bytes are ever used,
* but the caller in ich9.c migrates full len bytes.
* TODO: fix ich9.c and drop the extra allocation.
*/
ar->gpe.sts = g_malloc0(len);
ar->gpe.en = g_malloc0(len);
}
void acpi_gpe_reset(ACPIREGS *ar)

View File

@@ -155,6 +155,7 @@ static void acpi_memory_hotplug_write(void *opaque, hwaddr addr, uint64_t data,
qapi_event_send_mem_unplug_error(dev->id,
error_get_pretty(local_err),
&error_abort);
error_free(local_err);
break;
}
trace_mhp_acpi_pc_dimm_deleted(mem_st->selector);
@@ -238,10 +239,12 @@ void acpi_memory_plug_cb(ACPIREGS *ar, qemu_irq irq, MemHotplugState *mem_st,
mdev->dimm = dev;
mdev->is_enabled = true;
mdev->is_inserting = true;
if (dev->hotplugged) {
mdev->is_inserting = true;
/* do ACPI magic */
acpi_send_gpe_event(ar, irq, ACPI_MEMORY_HOTPLUG_STATUS);
/* do ACPI magic */
acpi_send_gpe_event(ar, irq, ACPI_MEMORY_HOTPLUG_STATUS);
}
return;
}

View File

@@ -39,6 +39,9 @@ static void aw_a10_init(Object *obj)
qemu_check_nic_model(&nd_table[0], TYPE_AW_EMAC);
qdev_set_nic_properties(DEVICE(&s->emac), &nd_table[0]);
}
object_initialize(&s->sata, sizeof(s->sata), TYPE_ALLWINNER_AHCI);
qdev_set_parent_bus(DEVICE(&s->sata), sysbus_get_default());
}
static void aw_a10_realize(DeviceState *dev, Error **errp)
@@ -93,6 +96,14 @@ static void aw_a10_realize(DeviceState *dev, Error **errp)
sysbus_mmio_map(sysbusdev, 0, AW_A10_EMAC_BASE);
sysbus_connect_irq(sysbusdev, 0, s->irq[55]);
object_property_set_bool(OBJECT(&s->sata), true, "realized", &err);
if (err) {
error_propagate(errp, err);
return;
}
sysbus_mmio_map(SYS_BUS_DEVICE(&s->sata), 0, AW_A10_SATA_BASE);
sysbus_connect_irq(SYS_BUS_DEVICE(&s->sata), 0, s->irq[56]);
/* FIXME use a qdev chardev prop instead of serial_hds[] */
serial_mm_init(get_system_memory(), AW_A10_UART0_REG_BASE, 2, s->irq[1],
115200, serial_hds[0], DEVICE_NATIVE_ENDIAN);

Some files were not shown because too many files have changed in this diff Show More