When driving QEMU from the outside, we have basically no chance to
determine how quickly the guest OS picks up key events, so we usually
have to limit ourselves to very slow keyboard presses to make sure
the guest always has enough chance to pick them up.
This patch adds a trace events when the keyboarde queue is drained.
An external driver can use that as hint that new keys can be pressed.
Signed-off-by: Alexander Graf <agraf@suse.de>
Message-id: 1490883775-94658-1-git-send-email-agraf@suse.de
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
qemu_input_event_send() discards key event when the guest is paused,
but not the delay.
The delay ends up in the input queue, and qemu_input_event_send_key()
will further fill the queue with upcoming events.
VNC uses qemu_input_event_send_key_delay(), not SPICE, which results
in a different input behaviour on pause: VNC will queue the events
(except the first that is discarded), SPICE will discard all events.
Don't queue delay if paused, and provide same behaviour on SPICE and
VNC clients on resume (and potentially avoid over-allocating the
buffer queue)
Fixes:
https://bugzilla.redhat.com/show_bug.cgi?id=1444326
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20170425130520.31819-1-marcandre.lureau@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Working up the stack, this replaces the slirp_socket_load/save
with VMState definitions.
A place holder for IPv6 support is added as a comment; it needs
testing once the rest of the IPv6 code is there.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
The socket structure has a pair of unions for lhost and fhost
addresses; the unions are identical so split them out into
a separate union declaration.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Convert the sbuf structure to a VMStateDescription.
Note this uses the VMSTATE_WITH_TMP mechanism to calculate
and reload the offsets based on the pointers.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Convert the migration of the struct tcpcb to use a VMStateDescription,
the rest of it will come later.
Mostly mechanical, except for conversion of some 'char' to uint8_t
to ensure portability.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
ASAN detects an "unknown-crash" when running pxe-test:
/ppc64/pxe/spapr-vlan: =================================================================
==7143==ERROR: AddressSanitizer: unknown-crash on address 0x7f6dcd298d30 at pc 0x55e22218830d bp 0x7f6dcd2989e0 sp 0x7f6dcd2989d0
READ of size 128 at 0x7f6dcd298d30 thread T2
#0 0x55e22218830c in tftp_session_allocate /home/elmarco/src/qq/slirp/tftp.c:73
#1 0x55e22218a1f8 in tftp_handle_rrq /home/elmarco/src/qq/slirp/tftp.c:289
#2 0x55e22218b54c in tftp_input /home/elmarco/src/qq/slirp/tftp.c:446
#3 0x55e2221833fe in udp6_input /home/elmarco/src/qq/slirp/udp6.c:82
#4 0x55e222137b17 in ip6_input /home/elmarco/src/qq/slirp/ip6_input.c:67
Address 0x7f6dcd298d30 is located in stack of thread T2 at offset 96 in frame
#0 0x55e222182420 in udp6_input /home/elmarco/src/qq/slirp/udp6.c:13
This frame has 3 object(s):
[32, 48) '<unknown>'
[96, 124) 'lhost' <== Memory access at offset 96 partially overflows this variable
[160, 200) 'save_ip' <== Memory access at offset 96 partially underflows this variable
The sockaddr_storage pointer is the sockaddr_in6 lhost on the
stack. Copy only the source addr size.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
gcc 7 (on fedora 26) objects to many of the snprintf's
in the smb path and command creation because it can't
figure out that the smb_dir (i.e. the /tmp dir for the configuration)
is known to be short.
Replace all these fixed length buffers by g_str* functions that dynamically
allocate and use g_dir_make_tmp to make the directory.
(It's fairly new glib but we have a compat function for it).
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
The OS will allocate automatically a free port. This is useful if you
want to be sure to not get any port conflict. You still have to figure
out which port you got, for example with "lsof" (this could be exposed
in the monitor if needed).
Example of use:
$ qemu-system-x86_64 -net user,hostfwd=127.0.0.1:0-:22 ...
Then, get your port with:
$ lsof -np 1474 | grep LISTEN
qemu-syst 31777 bernat 12u IPv4 [...] TCP 127.0.0.1:35145 (LISTEN)
Signed-off-by: Vincent Bernat <vincent@bernat.im>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Since commit "c53eeaf75a04 configure: eliminate Python dependency for
--help", configure --help fails to produce the list of available trace
backends if invoked out-of-tree. It also spits the following error:
grep: scripts/tracetool/backend/*.py: No such file or directory
This patch simply adds the missing $source_path to fix it.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-id: 149321376763.7874.12797658801011614451.stgit@bahia
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
qemu-ga patch queue
* new commands: guest-get-timezone, guest-get-users, guest-get-host-name
* fix hang on w32 when stopping qemu-ga service while fs frozen
* fix missing setting of can-offline in guest-get-vcpus
* make qemu-ga VSS w32 service on-demand rather than on-startup
* fix unecessary errors to EventLog on w32
* improvements to fsfreeze documentation
v2:
* document 'zone' field of guest-get-timezone as informational-only
(Daniel, Eric)
* fix build error for glib < 2.32 (Peter)
# gpg: Signature made Thu 27 Apr 2017 06:43:42 AM BST
# gpg: using RSA key 0x3353C9CEF108B584
# gpg: Good signature from "Michael Roth <flukshun@gmail.com>"
# gpg: aka "Michael Roth <mdroth@utexas.edu>"
# gpg: aka "Michael Roth <mdroth@linux.vnet.ibm.com>"
# Primary key fingerprint: CEAC C9E1 5534 EBAB B82D 3FA0 3353 C9CE F108 B584
* mdroth/tags/qga-pull-2017-04-25-v2-tag:
qga: Add `guest-get-timezone` command
qga: Add 'guest-get-users' command
qga: improve fsfreeze documentations
qga: Add 'guest-get-host-name' command
qga-win: Fix Event Viewer errors caused by qemu-ga
qga-win: Fix a bug where qemu-ga service is stuck during stop operation
qga-win: Enable 'can-offline' field in 'guest-get-vcpus' reply
qemu-ga: Make QGA VSS provider service run only when needed
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Adds a new command `guest-get-timezone` reporting the currently
configured timezone on the system. The information on what timezone is
currently is configured is useful in case of Windows VMs where the
offset of the hardware clock is required to have the same offset. This
can be used for management systems like `oVirt` to detect the timezone
difference and warn administrators of the misconfiguration.
Signed-off-by: Vinzenz Feenstra <vfeenstr@redhat.com>
Reviewed-by: Sameeh Jubran <sameeh@daynix.com>
Tested-by: Sameeh Jubran <sameeh@daynix.com>
* moved stub implementation to end of function for consistency
* document that timezone names are for informational use only.
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
A command that will list all currently logged in users, and the time
since when they are logged in.
Examples:
virsh # qemu-agent-command F25 '{ "execute": "guest-get-users" }'
{"return":[{"login-time":1490622289.903835,"user":"root"}]}
virsh # qemu-agent-command Win2k12r2 '{ "execute": "guest-get-users" }'
{"return":[{"login-time":1490351044.670552,"domain":"LADIDA",
"user":"Administrator"}]}
Signed-off-by: Vinzenz Feenstra <vfeenstr@redhat.com>
* make g_hash_table_contains compat func inline to avoid
unused warnings
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Retrieving the guest host name is a very useful feature for virtual management
systems. This information can help to have more user friendly VM access
details, instead of an IP there would be the host name. Also the host name
reported can be used to have automated checks for valid SSL certificates.
virsh # qemu-agent-command F25 '{ "execute": "guest-get-host-name" }'
{"return":{"host-name":"F25.lab.evilissimo.net"}}
Signed-off-by: Vinzenz Feenstra <vfeenstr@redhat.com>
* minor whitespace fix-ups
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
When the command "guest-fsfreeze-freeze" is executed it causes
the VSS service to log the error below in the Event Viewer. This
error is caused by an issue in the function "CommitSnapshots" in
provider.cpp:
* When VSS_TIMEOUT_MSEC expires the funtion returns E_ABORT. This causes
the error #12293.
|event id| error |
* 12293 : Volume Shadow Copy Service error: Error calling a routine on a
Shadow Copy Provider {00000000-0000-0000-0000-000000000000}.
Routine details CommitSnapshots [hr = 0x80004004, Operation
aborted.
Signed-off-by: Sameeh Jubran <sameeh@daynix.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
After triggering a freeze command without any following thaw command,
qemu-ga will not respond to stop operation. This behaviour is wanted on Linux
as there is no time limit for a freeze command and we want to prevent
quitting in the middle of freeze, on the other hand on Windows the time
limit for freeze is 10 seconds, so we should wait for the timeout, thaw
the file system and quit.
Signed-off-by: Sameeh Jubran <sameeh@daynix.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
The QGA schema states:
@can-offline: Whether offlining the VCPU is possible. This member
is always filled in by the guest agent when the structure
is returned, and always ignored on input (hence it can be
omitted then).
Currently 'can-offline' is missing entirely from the reply. This causes
errors in libvirt which is expecting the reply to be compliant with the
schema docs.
BZ#1438735: https://bugzilla.redhat.com/show_bug.cgi?id=1438735
Signed-off-by: Sameeh Jubran <sameeh@daynix.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Currently the service runs in background on boot even though it is not
needed and once it is running it never stops. The service needs to be
running only during freeze operation and it should be stopped after
executing thaw.
Signed-off-by: Sameeh Jubran <sameeh@daynix.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Users of tcg_gen_atomic_cmpxchg and do_atomic_op rightfully utilize
the output. Even though this code is dead, it gets translated, and
without the initialization we encounter a tcg_error.
Reported-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Tested-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Tested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
This reverts commit 0fc8aec7de.
In commit 2dfe5113b1 we split a trace event with a lot of arguments
in two, because the UST trace backend has a limit on the number
of arguments you can have in a single trace event. Unfortunately
we subsequently forgot about this, and in commit 0fc8aec7de
we merged the two trace events again, recreating the "UST backend
doesn't build" bug.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
HMP pull, with tcg fix
# gpg: Signature made Wed 26 Apr 2017 14:55:30 BST
# gpg: using RSA key 0x0516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A 9FA9 0516 331E BC5B FDE7
* remotes/dgilbert/tags/pull-hmp-20170426:
tests: Add a tester for HMP commands
libqtest: Add a generic function to run a callback function for every machine
libqtest: Ignore QMP events when parsing the response for HMP commands
monitor: Check whether TCG is enabled before running the "info jit" code
hmp: gpa2hva and gpa2hpa hostaddr command
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
HMP commands do not get any automatic testing yet, so on certain
QEMU machines, some HMP commands were causing crashes in the past.
Thus we should test HMP commands in our test suite, too, to avoid
that such problems creep in again in the future.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1493097407-20482-1-git-send-email-thuth@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Some tests need to run single tests for every available machine of the
current QEMU binary. To avoid code duplication, let's extract this
code that deals with 'query-machines' into a separate function.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1490860207-8302-3-git-send-email-thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
When running certain HMP commands (like "device_del") via QMP, we
can sometimes get a QMP event in the response first, so that the
"g_assert(ret)" statement in qtest_hmp() triggers and the test
fails. Fix this by ignoring such QMP events while looking for the
real return value from QMP.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1490860207-8302-2-git-send-email-thuth@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Added note to qtest_hmp/qtest_hmpv's header description to say
it discards events
The "info jit" command currently aborts on Mac OS X with the message
"qemu_mutex_lock: Invalid argument" when running with "-M accel=qtest".
We should only call into the TCG code here if TCG has really been
enabled and initialized.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1493179907-22516-1-git-send-email-thuth@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Tested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
ppc patch queue 2017-04-26
Here's a respind of my first pull request for qemu-2.10, consisting of
assorted patches which have accumulated while qemu-2.9 stabilized.
Highlights are:
* Rework / cleanup of the XICS interrupt controller
* Substantial improvement to the 'powernv' machine type
- Includes an MMIO XICS version
* POWER9 support improvements
- POWER9 guests with KVM
- Partial support for POWER9 guests with TCG
* IOMMU and VFIO improvements
* Assorted minor changes
There are several IPMI patches here that aren't usually in my area of
maintenance, but there isn't a regular maintainer and these patches
are for the benefit of the powernv machine type.
This pull request supersedes my 2017-04-26 pull request. This new set
fixes a bug in one of the aforementioned IPMI patches which caused
clang sanitizer failures (and may have crashed on some libc / host
versions).
# gpg: Signature made Wed 26 Apr 2017 07:58:10 BST
# gpg: using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-2.10-20170426: (48 commits)
MAINTAINERS: Remove myself from e500
target/ppc: Style fixes
e500,book3s: mfspr 259: Register mapped/aliased SPRG3 user read
target/ppc: Flush TLB on write to PIDR
spapr-cpu-core: Release ICPState object during CPU unrealization
ppc/pnv: generate an OEM SEL event on shutdown
ppc/pnv: add initial IPMI sensors for the BMC simulator
ppc/pnv: populate device tree for IPMI BT devices
ppc/pnv: populate device tree for serial devices
ppc/pnv: populate device tree for RTC devices
ppc/pnv: scan ISA bus to populate device tree
ppc/pnv: enable only one LPC bus
ppc/pnv: Add support for POWER8+ LPC Controller
spapr: remove the 'nr_servers' field from the machine
target/ppc: Fix size of struct PPCElfPrstatus
ipmi: introduce an ipmi_bmc_gen_event() API
ipmi: introduce an ipmi_bmc_sdr_find() API
ipmi: provide support for FRUs
ipmi: use a file to load SDRs
ppc: add IPMI support
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Xen 2017/04/21 + fix
# gpg: Signature made Tue 25 Apr 2017 19:10:37 BST
# gpg: using RSA key 0x894F8F4870E1AE90
# gpg: Good signature from "Stefano Stabellini <stefano.stabellini@eu.citrix.com>"
# gpg: aka "Stefano Stabellini <sstabellini@kernel.org>"
# Primary key fingerprint: D04E 33AB A51F 67BA 07D3 0AEA 894F 8F48 70E1 AE90
* remotes/sstabellini/tags/xen-20170421-v2-tag: (21 commits)
move xen-mapcache.c to hw/i386/xen/
move xen-hvm.c to hw/i386/xen/
move xen-common.c to hw/xen/
add xen-9p-backend to MAINTAINERS under Xen
xen/9pfs: build and register Xen 9pfs backend
xen/9pfs: send responses back to the frontend
xen/9pfs: implement in/out_iov_from_pdu and vmarshal/vunmarshal
xen/9pfs: receive requests from the frontend
xen/9pfs: connect to the frontend
xen/9pfs: introduce Xen 9pfs backend
9p: introduce a type for the 9p header
xen: import ring.h from xen
configure: use pkg-config for obtaining xen version
xen: additionally restrict xenforeignmemory operations
xen: use libxendevice model to restrict operations
xen: use 5 digit xen versions
xen: use libxendevicemodel when available
configure: detect presence of libxendevicemodel
xen: create wrappers for all other uses of xc_hvm_XXX() functions
xen: rename xen_modified_memory() to xen_hvm_modified_memory()
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
I recently left Freescale/NXP, and even before that it'd been a few years
since I was actively involved in KVM/QEMU work.
Signed-off-by: Scott Wood <oss@buserror.net>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This makes a small step fixing one of many style problems that exist in
the older ppc code. This removes spaces between function (or macro) name
and the following '('.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This patch registers mfspr 259 for Book3S and e500 family cores
following this research:
mfspr 259 provides read-only mapped user access to SPRG3(SPR 275) according to:
- PowerISA 2.02, Book III (documents implementation starting with POWER4+ @ p20)
- IBM PowerPC 970MP RISC Microprocessor User's Manual v2.1, page 48
- Amit Singh: "Mac OS X Internals: A Systems Approach" on 970 and 970FX cores:
He demonstrates mfspr 259 reading TLS data from Mac OS X on G5 on page 588
- NXP documents it in the Core Reference Manuals of: e500, e500mc and e5500
- getcpu() of the 32 & 64-bit Book3S Linux vDSOs use it to read the core number
mfspr 259 does not appear to be implemented in these cores according to:
- 74xx series: MPC7410/MPC7400 and MPC7450 RISC Microprocessor Reference Manuals
- 4xx series: PPC440 Processor User's Manual, Revision 1.09 by AMCC
- 750 series: IBM PowerPC 750CL RISC Microprocessor User's Manual
- e200 series: e200z4 Power Architectureâ Core Reference Manual
Implementation: gen_spr_usprg3() is called from init_proc_book3s_common()
(covers the 970 and POWER cores) and init_proc_e500() (covers the e500 family)
to register spr_read_ureg() in the same way which it already provides
the mapped SPR access for SPR_USPRG4-7 in gen_spr_usprgh() for cores
which have the same read-only mapped SPRG register access for SPRG4-7.
Verified using Linux by pinning a thread to a core and checking sched_getcpu()
using qemu-system-ppc64 -M pseries -cpu POWER8 using MTTCG on a x86_64 host.
Signed-off-by: Bernhard Kaindl <bernhard.kaindl@thalesgroup.com>
Reviewed-by: Stefan Resch <stefan.resch@thalesgroup.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The PIDR (process id register) is used to store the id of the currently
running process, which is used to select the process table entry used to
perform address translation. This means that when we write to this register
all the translations in the TLB become outdated as they are for a
previously running process. Thus when this register is written to we need
to invalidate the TLB entries to ensure stale entries aren't used to
to perform translation for the new process, which would result in at best
segfaults or alternatively just random memory being accessed.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
[dwg: Fixed compile error for 32-bit targets]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Recent commits that re-organized ICPState object missed to destroy
the object when CPU is unrealized. Fix this so that CPU unplug
doesn't abort QEMU.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
OpenPOWER systems expect to be notified with such an event before a
shutdown or a reboot. An OEM SEL message is sent with specific
identifiers and a user data containing the request : OFF or REBOOT.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Skiboot, the firmware for the PowerNV platform, expects the BMC to
provide some specific IPMI sensors. These sensors are exposed in the
device tree and their values are updated by the firmware at boot time.
Sensors of interest are :
"FW Boot Progress"
"Boot Count"
As such a device is defined on the command line, we can only detect
its presence at reset time.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This is an empty shell that we will use to include nodes in the device
tree for ISA devices. We expect RTC, UART and IPMI BT devices.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The default LPC bus of a multichip system is on chip 0. It's
recognized by the firmware (skiboot) using a "primary" property in the
device tree.
We introduce a pnv_chip_lpc_offset() routine to locate the LPC node of
a chip and set the property directly from the machine level.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
It adds the Naples chip which supports proper LPC interrupts via the
LPC controller rather than via an external CPLD.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[clg: - updated for qemu-2.9
- ported on latest PowerNV patchset
- moved the IRQ handler in pnv_lpc.c
- introduced pnv_lpc_isa_irq_create() to create the ISA IRQs ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
xics_system_init() does not need 'nr_servers' anymore as it is only
used to define the 'interrupt-controller' node in the device tree. So
let's just compute the value when calling spapr_dt_xics().
This also gives us an opportunity to simplify the xics_system_init()
routine and introduce a specific spapr_ics_create() helper to create
the sPAPR ICS object.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
gdb refuses to parse QEMU memory dumps because struct PPCElfPrstatus
is the wrong size. Fix it.
Signed-off-by: Anton Blanchard <anton@samba.org>
Fixes: e62fbc54d4 ("target-ppc: dump-guest-memory support")
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
It will be used to fill the message buffer with custom events expected
by some systems. Typically, an Open PowerNV platform guest is notified
with an OEM SEL message before a shutdown or a reboot.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Acked-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This patch exposes a new IPMI routine to query a sdr entry from the
sdr table maintained by the IPMI BMC simulator. The API is very
similar to the internal sdr_find_entry() routine and should be used
the same way to query one or all sdrs.
A typical use would be to loop on the sdrs to build nodes of a device
tree.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Acked-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This patch provides a simple FRU support for the BMC simulator. FRUs
are loaded from a file which name is specified in the object
properties, each entry having a fixed size, also specified in the
properties. If the file is unknown or not accessible for some reason,
a unique entry of 1024 bytes is created as a default. Just enough to
start some simulation.
These commands complies with the IPMI spec : "34. FRU Inventory Device
Commands".
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Acked-by: Corey Minyard <cminyard@mvista.com>
[dwg: Folded in subsequent fix to handle NULL filename]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The IPMI BMC simulator populates the sdr/sensor tables with a minimal
set of entries (Watchdog). But some qemu platforms might want to use
extra entries for their custom needs.
This patch modifies slighty the initializing routine to take into
account a larger set read from a file. The name of the file to use is
defined through a new 'sdr' property of the simulator device.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Acked-by: Corey Minyard <cminyard@mvista.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
OpenPOWER systems use a BT device to communicate with the BMC.
Provide support for it.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The OCC is an on-chip microcontroller based on a ppc405 core used
for various power management tasks. It comes with a pile of additional
hardware sitting on the PIB (aka XSCOM bus). At this point we don't
emulate it (nor plan to do so). However there is one facility which
is provided by the surrounding hardware that we do need, which is the
interrupt generation facility. OPAL uses it to send itself interrupts
under some circumstances and there are other uses around the corner.
So this implement just enough to support this.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[clg: - updated for qemu-2.9
- changed the XSCOM interface to fit new model
- QOMified the model ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The Processor Service Interface (PSI) Controller is one of the engines
of the "Bridge" unit which connects the different interfaces to the
Power Processor.
This adds just enough of the PSI bridge to handle various on-chip and
the one external interrupt. The rest of PSI has to do with the link to
the IBM FSP service processor which we don't plan to emulate (not used
on OpenPower machines).
The ics_get() and ics_resend() handlers of the XICSFabric interface of
the PowerNV machine are now defined to handle the Interrupt Control
Source of PSI. The InterruptStatsProvider interface is also modified
to dump the new ICS.
Originally from Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This provides to a PowerNV chip (POWER8) access to the Interrupt
Management area, which contains the registers of the Interrupt Control
Presenters of each thread. These are used to accept, return, forward
interrupts in the system.
This area is modeled with a per-chip container memory region holding
all the ICP registers. Each thread of a chip is then associated with
its ICP registers using a memory subregion indexed by its PIR number
in the overall region.
The device tree is populated accordingly.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Each thread of a core is linked to an ICP. This allocates a PnvICPState
object before the PowerPCCPU object is realized and lets the XICSFabric
do the store under the 'intc' backlink when xics_cpu_setup() is
called.
This modeling removes the need of maintaining an array of ICP objects
under the PowerNV machine and also simplifies the XICSFabric icp_get()
handler.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
A XICSFabric QOM interface is used by the XICS layer to manipulate the
ICP and ICS objects. Let's define the associated handlers for the
PowerNV machine. All handlers should be defined even if there is no
ICS under the PowerNV machine yet.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This provides a new ICPState object for the PowerNV machine (POWER8).
Access to the Interrupt Management area is done though a memory
region. It contains the registers of the Interrupt Control Presenters
of each thread which are used to accept, return, forward interrupts in
the system.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Today, all the ICPs are created before the CPUs, stored in an array
under the sPAPR machine and linked to the CPU when the core threads
are realized. This modeling brings some complexity when a lookup in
the array is required and it can be simplified by allocating the ICPs
when the CPUs are.
This is the purpose of this proposal which introduces a new 'icp_type'
field under the machine and creates the ICP objects of the right type
(KVM or not) before the PowerPCCPU object are.
This change allows more cleanups : the removal of the icps array under
the sPAPR machine and the removal of the xics_get_cpu_index_by_dt_id()
helper.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This is the second step to abstract the IRQ 'server' number of the
XICS layer. Now that the prereq cleanups have been done in the
previous patch, we can move down the 'cpu_dt_id' to 'cpu_index'
mapping in the sPAPR machine handler.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Today, the ICPState array of the sPAPR machine is indexed with
'cpu_index' of the CPUState. This numbering of CPUs is internal to
QEMU and the guest only knows about what is exposed in the device
tree, that is the 'cpu_dt_id'. This is why sPAPR uses the helper
xics_get_cpu_index_by_dt_id() to do the mapping in a couple of places.
To provide a more generic XICS layer, we need to abstract the IRQ
'server' number and remove any assumption made on its nature. It
should not be used as a 'cpu_index' for lookups like xics_cpu_setup()
and xics_cpu_destroy() do.
To reach that goal, we choose to introduce a generic 'intc' backlink
under PowerPCCPU, and let the machine core init routine do the
ICPState lookup. The resulting object is passed on to xics_cpu_setup()
which does the store under PowerPCCPU. The IRQ 'server' number in XICS
is now generic. sPAPR uses 'cpu_dt_id' and PowerNV will use 'PIR'
number.
This also has the benefit of simplifying the sPAPR hcall routines
which do not need to do any ICPState lookups anymore.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The ibm,processor-radix-AP-encodings device tree property of the cpu node
is used to specify the radix mode supported page sizes of the processor
to the guest os. Contained in the top 3 bits of the msb is the actual
page size (AP) encoding associated with the corresponding radix mode
supported page size. Add this property for a TCG guest, note the TCG code
is capable of translating any format so just add the 4 default page sizes.
The ibm,processor-radix-AP-encodings device tree property is defined as:
One to n cells in ascending order of radix mode supported page sizes
encoded as BE ints (32bit on ppc) in the form:
0bxxxyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
- 0bxxx -> AP encoding
- 0byyyyyyyyyyyyyyyyyyyyyyyyyyyyy -> supported page size encoded as a shift
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
If a page size used by QEMU is not enabled in the PHB IOMMU page mask,
in-kernel acceleration of TCE handling won't be enabled and performance
might be slower than expected.
This prints a warning if system page size is not enabled. This should
print a warning if huge pages are enabled but sphb.pgsz still uses
the default value of 4K|64K.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This enables in-kernel handling of H_PUT_TCE_INDIRECT and
H_STUFF_TCE hypercalls. The host kernel support is there since v4.6,
in particular d3695aa4f452
("KVM: PPC: Add support for multiple-TCE hcalls").
H_PUT_TCE is already accelerated and does not need any special enablement.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
For a little while around 4.9, Linux kernels that saw the radix bit in
ibm,pa-features would attempt to set up the MMU as if they were a
hypervisor, even if they were a guest, which would cause them to
crash.
Work around this by detecting pre-ISA 3.0 guests by their lack of that
bit in option vector 1, and then removing the radix bit from
ibm,pa-features. Note: This now requires regeneration of that node
after CAS negotiation.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
[dwg: Fix style nits]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Add the new node, /chosen/ibm,arch-vec-5-platform-support to the
device tree. This allows the guest to determine which modes are
supported by the hypervisor.
Update the option vector processing in h_client_architecture_support()
to handle the new MMU bits. This allows guests to request hash or
radix mode and QEMU to create the guest's HPT at this time if it is
necessary but hasn't yet been done. QEMU will terminate the guest if
it requests an unavailable mode, as required by the architecture.
Extend the ibm,pa-features node with the new ISA 3.0 values
and set the radix bit if KVM supports radix mode. This probably won't
be used directly by guests to determine the availability of radix mode
(that is indicated by the new node added above) but the architecture
requires that it be set when the hardware supports it.
If QEMU is using KVM, and KVM is capable of running in radix mode,
guests can be run in real-mode without allocating a HPT (because KVM
will use a minimal RPT). So in this case, we avoid creating the HPT
at reset time and later (during CAS) create it if it is necessary.
ISA 3.0 guests will now begin to call h_register_process_table(),
which has been added previously.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
[dwg: Strip some unneeded prefix from error messages]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
In the next patch, spapr_fixup_cpu_dt() will need to call
spapr_populate_pa_features() so move it's definition up without making
any other changes.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The H_REGISTER_PROCESS_TABLE H_CALL is used by a guest to indicate to the
hypervisor where in memory its process table is and how translation should
be performed using this process table.
Provide the implementation of this H_CALL for a guest.
We first check for invalid flags, then parse the flags to determine the
operation, and then check the other parameters for valid values based on
the operation (register new table/deregister table/maintain registration).
The process table is then stored in the appropriate location and registered
with the hypervisor (if running under KVM), and the LPCR_[UPRT/GTSE] bits
are updated as required.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
[dwg: Correct missing prototype and uninitialized variable]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The use of the new in memory tables introduced in ISAv3.00 for translation,
also referred to as process tables, requires the introduction of 3 new
H-CALLs; H_REGISTER_PROCESS_TABLE, H_CLEAN_SLB, and H_INVALIDATE_PID.
Add shells for each of these and register them as the hypercall handlers.
Currently they all log an unimplemented hypercall and return H_FUNCTION.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
[dwg: Fix style nits]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Query and cache the value of two new KVM capabilities that indicate
KVM's support for new radix and hash modes of the MMU.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Use the new ioctl, KVM_PPC_GET_RMMU_INFO, to fetch radix MMU
information from KVM and present the page encodings in the device tree
under ibm,processor-radix-AP-encodings. This provides page size
information to the guest which is necessary for it to use radix mode.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
[dwg: Compile fix for 32-bit targets, style nit fix]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
KVM_CAP_SPAPR_TCE capability allows creating TCE tables in KVM which
allows having in-kernel acceleration for H_PUT_TCE_xxx hypercalls.
However it only supports 32bit DMA windows at zero bus offset.
There is a new KVM_CAP_SPAPR_TCE_64 capability which supports 64bit
window size, variable page size and bus offset.
This makes use of the new capability. The kernel headers are already
updated as the kernel support went in to v4.6.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The devices that are derived from TYPE_PNV_CHIP currently show up
as "uncategorized" devices in the help text of "-device ?". Since
they obviously are related to the CPU, let's put them into the
CPU category instead.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Also use an 'sPAPRRTCState' attribute under the sPAPR machine to hold
the RTC object. Overall, these changes remove an unnecessary and
implicit dependency on SysBus.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
On Power8 hosts it is currently theoretically possible for QEMU/KVM-HV guests
to receive a ibm,pa-features property indicating that HTM support is available
when it is not. The situation would occur if the platform firmware of
a Power8 host cleared the HTM bit of the ibm,pa-features property.
QEMU would query KVM for the availability of HTM, which will return no
support, but workaround code in kvm_arch_init_vcpu() would then
re-enable it because KVM_HV is in use and the processor is P8.
This patch adjusts the workaround in kvm_arch_init_vcpu() so that it does not
enable HTM (in the above case) unless the host kernel indicates to the QEMU
process, via the auxiliary vector, that userspace can use HTM (via the HWCAP2
bit KVM_FEATURE2_HTM).
The reason to use the value from the auxiliary vector is that it is
set based only on what the host kernel found in the ibm,pa-features
HTM bit at boot time.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Once a request is completed, xen_9pfs_push_and_notify gets called. In
xen_9pfs_push_and_notify, update the indexes (data has already been
copied to the sg by the common code) and send a notification to the
frontend.
Schedule the bottom-half to check if we already have any other requests
pending.
Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: anthony.perard@citrix.com
CC: jgross@suse.com
CC: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
CC: Greg Kurz <groug@kaod.org>
Upon receiving an event channel notification from the frontend, schedule
the bottom half. From the bottom half, read one request from the ring,
create a pdu and call pdu_submit to handle it.
For now, only handle one request per ring at a time.
Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: anthony.perard@citrix.com
CC: jgross@suse.com
CC: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
CC: Greg Kurz <groug@kaod.org>
Write the limits of the backend to xenstore. Connect to the frontend.
Upon connection, allocate the rings according to the protocol
specification.
Initialize a QEMUBH to schedule work upon receiving an event channel
notification from the frontend.
Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: anthony.perard@citrix.com
CC: jgross@suse.com
CC: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
CC: Greg Kurz <groug@kaod.org>
# gpg: Signature made Tue 25 Apr 2017 12:22:03 BST
# gpg: using RSA key 0xEF04965B398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F 3562 EF04 965B 398D 6211
* remotes/jasowang/tags/net-pull-request:
COLO-compare: Optimize tcp compare trace event
COLO-compare: Optimize tcp compare for option field
slirp: add a fake NC-SI backend
aspeed: add a FTGMAC100 nic
net/ftgmac100: add a 'aspeed' property
net: add FTGMAC100 support
hw/net: add MII definitions
colo-compare: Fix old packet check bug.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
s390_virtio_hypercall can trigger IO events and interrupts, most notably
when using virtio-ccw devices.
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Fixes: 278f5e98c6 ("s390x/misc_helper.c: wrap IO instructions in BQL")
Signed-off-by: Alexander Graf <agraf@suse.de>
According to "CPU Signaling and Response", "Signal-Processor Orders",
the order field is bit position 56-63. Without this, the Linux
guest kernel is sometimes unable to stop emulation and enters
an infinite loop of "XXX unknown sigp: 0xffffffff00000005".
Signed-off-by: Philipp Kern <phil@philkern.de>
Reviewed-by: Thomas Huth <thuth@tuxfamily.org>
[agraf: add comment according to email]
Signed-off-by: Alexander Graf <agraf@suse.de>
Optimize two trace events as one, adjust print format make
it easy to read. rename trace_colo_compare_pkt_info_src/dst
to trace_colo_compare_tcp_info.
Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
In this patch we support packet that have tcp options field.
Add tcp options field check, If the packet have options
field we just skip it and compare tcp payload,
Avoid unnecessary checkpoint, optimize performance.
Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
NC-SI (Network Controller Sideband Interface) enables a BMC to manage
a set of NICs on a system. This model takes the simplest approach and
reverses the NC-SI packets to pretend a NIC is present and exercise
the Linux driver.
The NCSI header file <ncsi-pkt.h> comes from mainline Linux and was
untabified.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Acked-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
There is a second NIC but we do not use it for the moment. We use the
'aspeed' property to tune the definition of the end of ring buffer bit
for the Aspeed SoCs.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
The Aspeed SoCs have a different definition of the end of the ring
buffer bit. Add a property to specify which set of bits should be used
by the NIC.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Exynos4210 has four SD/MMC controllers supporting:
- SD Standard Host Specification Version 2.0,
- MMC Specification Version 4.3,
- SDIO Card Specification Version 2.0,
- DMA and ADMA.
Add emulation of SDHCI devices which allows accessing storage through SD
cards. Differences from real hardware:
- Devices are shipped with eMMC memory, not SD card.
- The Exynos4210 SDHCI has few more registers, e.g. for
controlling the clocks, additional status (0x80, 0x84, 0x8c). These
are not implemented.
Testing on smdkc210 machine with "-drive file=FILE,if=sd,bus=0,index=2".
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Message-id: 20170422190709.8676-1-krzk@kernel.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
# gpg: Signature made Mon 24 Apr 2017 20:18:05 BST
# gpg: using RSA key 0xBDBE7B27C0DE3057
# gpg: Good signature from "Jeffrey Cody <jcody@redhat.com>"
# gpg: aka "Jeffrey Cody <jeff@codyprime.org>"
# gpg: aka "Jeffrey Cody <codyprime@gmail.com>"
# Primary key fingerprint: 9957 4B4D 3474 90E7 9D98 D624 BDBE 7B27 C0DE 3057
* remotes/cody/tags/block-pull-request:
qemu-iotests: _cleanup_qemu must be called on exit
block/rbd: Add support for reopen()
block/rbd - update variable names to more apt names
block: use bdrv_can_set_read_only() during reopen
block: introduce bdrv_can_set_read_only()
block: code movement
block: honor BDRV_O_ALLOW_RDWR when clearing bs->read_only
block: do not set BDS read_only if copy_on_read enabled
block: add bdrv_set_read_only() helper function
qemu-iotests: exclude vxhs from image creation via protocol
block/vxhs.c: Add qemu-iotests for new block device type "vxhs"
block/vxhs.c: Add support for a new block device type called "vxhs"
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
For the tests that use the common.qemu functions for running a QEMU
process, _cleanup_qemu must be called in the exit function.
If it is not, if the qemu process aborts, then not all of the droppings
are cleaned up (e.g. pidfile, fifos).
This updates those tests that did not have a cleanup in qemu-iotests.
(I swapped spaces for tabs in test 102 as well)
Reported-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Message-id: d59c2f6ad6c1da8b9b3c7f357c94a7122ccfc55a.1492544096.git.jcody@redhat.com
Update 'clientname' to be 'user', which tracks better with both
the QAPI and rados variable naming.
Update 'name' to be 'image_name', as it indicates the rbd image.
Naming it 'image' would have been ideal, but we are using that for
the rados_image_t value returned by rbd_open().
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: b7ec1fb2e1cf36f9b6911631447a5b0422590b7d.1491597120.git.jcody@redhat.com
A few block drivers will set the BDS read_only flag from their
.bdrv_open() function. This means the bs->read_only flag could
be set after we enable copy_on_read, as the BDRV_O_COPY_ON_READ
flag check occurs prior to the call to bdrv->bdrv_open().
This adds an error return to bdrv_set_read_only(), and an error will be
return if we try to set the BDS to read_only while copy_on_read is
enabled.
This patch also changes the behavior of vvfat. Before, vvfat could
override the drive 'readonly' flag with its own, internal 'rw' flag.
For instance, this -drive parameter would result in a writable image:
"-drive format=vvfat,dir=/tmp/vvfat,rw,if=virtio,readonly=on"
This is not correct. Now, attempting to use the above -drive parameter
will result in an error (i.e., 'rw' is incompatible with 'readonly=on').
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 0c5b4c1cc2c651471b131f21376dfd5ea24d2196.1491597120.git.jcody@redhat.com
The protocol VXHS does not support image creation. Some tests expect
to be able to create images through the protocol. Exclude VXHS from
these tests.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Source code for the qnio library that this code loads can be downloaded from:
https://github.com/VeritasHyperScale/libqnio.git
Sample command line using JSON syntax:
./x86_64-softmmu/qemu-system-x86_64 -name instance-00000008 -S -vnc 0.0.0.0:0
-k en-us -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
-msg timestamp=on
'json:{"driver":"vxhs","vdisk-id":"c3e9095a-a5ee-4dce-afeb-2a59fb387410",
"server":{"host":"172.172.17.4","port":"9999"}}'
Sample command line using URI syntax:
qemu-img convert -f raw -O raw -n
/var/lib/nova/instances/_base/0c5eacd5ebea5ed914b6a3e7b18f1ce734c386ad
vxhs://192.168.0.1:9999/c6718f6b-0401-441d-a8c3-1f0064d75ee0
Sample command line using TLS credentials (run in secure mode):
./qemu-io --object
tls-creds-x509,id=tls0,dir=/etc/pki/qemu/vxhs,endpoint=client -c 'read
-v 66000 2.5k' 'json:{"server.host": "127.0.0.1", "server.port": "9999",
"vdisk-id": "/test.raw", "driver": "vxhs", "tls-creds":"tls0"}'
[Jeff: Modified trace-events with the correct string formatting]
Signed-off-by: Ashish Mittal <Ashish.Mittal@veritas.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Message-id: 1491277689-24949-2-git-send-email-Ashish.Mittal@veritas.com
Error reporting patches for 2017-04-24
# gpg: Signature made Mon 24 Apr 2017 08:16:34 BST
# gpg: using RSA key 0x3870B400EB918653
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>"
# gpg: aka "Markus Armbruster <armbru@pond.sub.org>"
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653
* remotes/armbru/tags/pull-error-2017-04-24:
error: Apply error_propagate_null.cocci again
qga: Make errp the last parameter of qga_vss_fsfreeze
migration: Make errp the last parameter of local functions
scsi: Make errp the last parameter of virtio_scsi_common_realize
fdc: Make errp the last parameter of fdctrl_connect_drives
nfs: Make errp the last parameter of nfs_client_open
block: Make errp the last parameter of commit_active_start
mirror: Make errp the last parameter of mirror_start_job
crypto: Make errp the last parameter of functions
block: Make errp the last parameter of bdrv_img_create
socket: Make errp the last parameter of vsock_connect_saddr
socket: Make errp the last parameter of unix_connect_saddr
socket: Make errp the last parameter of inet_connect_saddr
socket: Make errp the last parameter of socket_connect
util/error: Fix leak in error_vprepend()
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This is to allow clients to initialise these without failing as long
as no 2D engine function is called that would use the written value.
Saved values are not used yet (may get used when more of 2D engine is
added sometimes) and clients normally only write to most of these
registers, nothing is known to ever read them but they are documented
as read/write so also implement read for these.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Tested-by: Aurelien Jarno <aurelien@aurel32.net>
Message-id: 80adf8e4d084ec6cc30d149f8e8215debb67314a.1492787889.git.balaton@eik.bme.hu
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Rework HWC handling to simplify it and fix cursor not updating on
screen as needed. Previously cursor was not updated because checking
for changes in a line overrode the update flag set for the cursor but
fixing this is not enough because the cursor should also be updated if
its shape or location changes. Introduce hwc_invalidate() function to
handle that similar to other display controller models.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Tested-by: Aurelien Jarno <aurelien@aurel32.net>
Message-id: 6970a5e9868b7246656c1d02038dc5d5fa369507.1492787889.git.balaton@eik.bme.hu
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
We only emulate the sysbus device in its default LE mode and PCI is LE
as well so specify this for registers and framebuffer memory.
Note that though the Linux kernel driver has code which claims to
handle both big and little endian, it is obviously bogus for 16 bit
and cannot be trusted as a source of information on the framebuffer
pixel format. This is our best guess about device behaviour based on
the specs and testing with MorphOS that is known to work on real HW.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Tested-by: Aurelien Jarno <aurelien@aurel32.net>
Message-id: 8b9605a569f8bf54074e15903620b18cd9967c89.1492787889.git.balaton@eik.bme.hu
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
qemu-sparc update
# gpg: Signature made Fri 21 Apr 2017 20:09:35 BST
# gpg: using RSA key 0x5BC2C56FAE0F321F
# gpg: Good signature from "Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>"
# Primary key fingerprint: CC62 1AB9 8E82 200D 915C C9C4 5BC2 C56F AE0F 321F
* remotes/mcayland/tags/qemu-sparc-signed:
tcx: switch to load_image_mr() and remove prom_addr hack
tcx: use tcx_set_dirty() for accelerated ops
tcx: remove primitives for non-32-bit surfaces
tcx: remove TARGET_PAGE_SIZE from tcx24_update_display()
tcx: remove TARGET_PAGE_SIZE from tcx_update_display()
tcx: remove page24 and cpage from tcx24_update_display()
tcx: alter tcx24_reset_dirty() to accept address and length parameters
tcx: alter tcx24_check_dirty() to accept address and length parameters
tcx: ensure tcx_set_dirty() also invalidates the 24-bit plane and cplane
tcx: alter tcx_set_dirty() to accept address and length parameters
cg3: switch to load_image_mr() and remove prom-addr hack
cg3: fix up size parameter for memory_region_get_dirty()
cg3: remove TARGET_PAGE_SIZE rounding on dirty page detection
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Add properties for the default display resolution, pass
on that information to the guest so the driver can use it.
Also move up qxl_crc32() function so we don't need a
forward declaration.
Additionally guest driver updates are needed so the
guest driver will actually pick this up, which will
probably land in linux kernel 4.12.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170421092234.8368-1-kraxel@redhat.com
Fix standard vga mode check: Both s->config and s->enabled must be set
to enable vmware command fifo processing.
Drop dirty tracking code from the fifo rendering code path, it isn't
used anyway because vmsvga turns off dirty tracking when leaving
standard vga mode.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170421091632.30900-9-kraxel@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
The vga code clears the dirty bits *after* reading the framebuffer
memory. So if the guest framebuffer updates hits the race window
between vga reading the framebuffer and vga clearing the dirty bits
vga will miss that update
Fix it by using the new memory_region_copy_and_clear_dirty()
memory_region_copy_get_dirty() functions. That way we clear the
dirty bitmap before reading the framebuffer. Any guest display
updates happening in parallel will be properly tracked in the
dirty bitmap then and the next display refresh will pick them up.
Problem triggers with mttcg only. Before mttcg was merged tcg
never ran in parallel to vga emulation. Using kvm will hide the
problem too, due to qemu operating on a userspace copy of the
kernel's dirty bitmap.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170421091632.30900-5-kraxel@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Add vga_scanline_invalidated helper to check whenever a scanline was
invalidated. Add a sanity check to fix OOB read access for display
heights larger than 2048.
Only cirrus uses this, for hardware cursor rendering, so having this
work properly for the first 2048 scanlines only shouldn't be a problem
as the cirrus can't handle large resolutions anyway. Also changing the
invalidated_y_table size would break live migration.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170421091632.30900-4-kraxel@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
This patch adds support for getting and using a local copy of the dirty
bitmap.
memory_region_snapshot_and_clear_dirty() will create a snapshot of the
dirty bitmap for the specified range, clear the dirty bitmap and return
the copy. The returned bitmap can be a bit larger than requested, the
range is expanded so the code can copy unsigned longs from the bitmap
and avoid atomic bit update operations.
memory_region_snapshot_get_dirty() will return the dirty status of
pages, pretty much like memory_region_get_dirty(), but using the copy
returned by memory_region_copy_and_clear_dirty().
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170421091632.30900-3-kraxel@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
The FTGMAC100 device is an Ethernet controller with DMA function that
can be found on Aspeed SoCs (which include NCSI).
It is fully compliant with IEEE 802.3 specification for 10/100 Mbps
Ethernet and IEEE 802.3z specification for 1000 Mbps Ethernet and
includes Reduced Media Independent Interface (RMII) and Reduced
Gigabit Media Independent Interface (RGMII) interfaces. It adopts an
AHB bus interface and integrates a link list DMA engine with direct
M-Bus accesses for transmitting and receiving packets. It has
independent TX/RX fifos, supports half and full duplex (1000 Mbps mode
only supports full duplex), flow control for full duplex and
backpressure for half duplex.
The FTGMAC100 also implements IP, TCP, UDP checksum offloads and
supports IEEE 802.1Q VLAN tag insertion and removal. It offers
high-priority transmit queue for QoS and CoS applications
This model is backed with a RealTek 8211E PHY which is the chip found
on the AST2500 EVB. It is complete enough to satisfy two different
Linux drivers and a U-Boot driver. Not supported features are :
- IEEE 802.1Q VLAN
- High Priority Transmit Queue
- Wake-On-LAN functions
The code is based on the Coldfire Fast Ethernet Controller model.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
This adds comments on the Basic mode control and status registers bit
definitions. It also adds a couple of bits for 1000BASE-T and the
RealTek 8211E PHY for the FTGMAC100 model to use.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
If colo-compare find one old packet,we can notify colo-frame
do checkpoint, no need continue find more old packet here.
Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Do not use the ring.h header installed on the system. Instead, import
the header into the QEMU codebase. This avoids problems when QEMU is
built against a Xen version too old to provide all the ring macros.
Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
CC: anthony.perard@citrix.com
CC: jgross@suse.com
Instead of trying to guess the Xen version to use by compiling various
test programs first just ask the system via pkg-config. Only if it
can't return the version fall back to the test program scheme.
If configure is being called with dedicated flags for the Xen libraries
use those instead of the pkg-config output. This will avoid breaking
an in-tree Xen build of an old Xen version while a new Xen version is
installed on the build machine: pkg-config would pick up the installed
Xen config files as the Xen tree wouldn't contain any of them.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Commit f0f272baf3a7 "xen: use libxendevice model to restrict operations"
added a command-line option (-xen-domid-restrict) to limit operations
using the libxendevicemodel API to a specified domid. The commit also
noted that the restriction would be extended to cover operations issued
via other xen libraries by subsequent patches.
My recent Xen patch [1] added a call to the xenforeignmemory API to allow
it to be restricted. This patch now makes use of that new call when the
-xen-domid-restrict option is passed.
[1] http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=5823d6eb
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
This patch adds a command-line option (-xen-domid-restrict) which will
use the new libxendevicemodel API to restrict devicemodel [1] operations
to the specified domid. (Such operations are not applicable to the xenpv
machine type).
This patch also adds a tracepoint to allow successful enabling of the
restriction to be monitored.
[1] I.e. operations issued by libxendevicemodel. Operation issued by other
xen libraries (e.g. libxenforeignmemory) are currently still unrestricted
but this will be rectified by subsequent patches.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Today qemu is using e.g. the value 480 for Xen version 4.8.0. As some
Xen version tests are using ">" relations this scheme will lead to
problems when Xen version 4.10.0 is being reached.
Instead of the 3 digit schem use a 5 digit scheme (e.g. 40800 for
version 4.8.0).
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
This patch modifies the wrapper functions in xen_common.h to use the
new xendevicemodel interface if it is available along with compatibility
code to use the old libxenctrl interface if it is not.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Anthony Perard <anthony.perard@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
This patch adds code in configure to set CONFIG_XEN_CTRL_INTERFACE_VERSION
to a new value of 490 if libxendevicemodel is present in the build
environment.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Anthony Perard <anthony.perard@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
migration/next for 20170421
# gpg: Signature made Fri 21 Apr 2017 11:28:13 BST
# gpg: using RSA key 0xF487EF185872D723
# gpg: Good signature from "Juan Quintela <quintela@redhat.com>"
# gpg: aka "Juan Quintela <quintela@trasno.org>"
# Primary key fingerprint: 1899 FF8E DEBF 58CC EE03 4B82 F487 EF18 5872 D723
* remotes/juanquintela/tags/migration/20170421: (65 commits)
hmp: info migrate_parameters format tunes
hmp: info migrate_capability format tunes
migration: rename max_size to threshold_size
migration: set current_active_state once
virtio-rng: stop virtqueue while the CPU is stopped
migration: don't close a file descriptor while it can be in use
ram: Remove migration_bitmap_extend()
migration: Disable hotplug/unplug during migration
qdev: Move qdev_unplug() to qdev-monitor.c
qdev: Export qdev_hot_removed
qdev: qdev_hotplug is really a bool
migration: Remove MigrationState parameter from migration_is_idle()
ram: Use RAMBitmap type for coherence
ram: rename last_ram_offset() last_ram_pages()
ram: Use ramblock and page offset instead of absolute offset
ram: Change offset field in PageSearchStatus to page
ram: Remember last_page instead of last_offset
ram: Use page number instead of an address for the bitmap operations
ram: reorganize last_sent_block
ram: ram_discard_range() don't use the mis parameter
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
# gpg: Signature made Fri 21 Apr 2017 10:43:04 BST
# gpg: using RSA key 0x9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>"
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8
* remotes/stefanha/tags/block-pull-request:
MAINTAINERS: update my email address
MAINTAINERS: update Wen's email address
migration/block: use blk_pwrite_zeroes for each zero cluster
throttle: make throttle_config(throttle_get_config()) symmetric
throttle: do not use invalid config in test
qemu-options: explain disk I/O throttling options
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The first batch of s390x changes for 2.10:
- the new compat machine
- several cleanups and optimizations
- introspection for css ids
# gpg: Signature made Fri 21 Apr 2017 08:36:25 BST
# gpg: using RSA key 0xDECF6B93C6F02FAF
# gpg: Good signature from "Cornelia Huck <huckc@linux.vnet.ibm.com>"
# gpg: aka "Cornelia Huck <cornelia.huck@de.ibm.com>"
# Primary key fingerprint: C3D0 D66D C362 4FF6 A8C0 18CE DECF 6B93 C6F0 2FAF
* remotes/cohuck/tags/s390x-20170421:
s390x: Drop useless casts
s390x: register I/O adapters per ISC during init
s390x/flic: cache flic in s390_get_flic
s390x: initialize flic before I/O subsystems
s390x: use enum for adapter type and standardize its naming
s390x/css: consolidate the devno property for ccw devices
s390x/css: provide introspection for virtual subchannel and device busid
s390x/css: introduce read-only property type for device ids
s390x/pci: make printf always compile in debug output
s390x/kvm: make printf always compile in debug output
s390x: introduce 2.10 compat machine
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Dump the info in a single line is hard to read. Do it one per line.
Also, the first "capabilities:" didn't help much. Let's remove it.
CC: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
In migration codes (especially in migration_thread()), max_size is used
in many place for the threshold value that we will start to do the final
flush and jump to the next stage to dump the whole rest things to
destination. However its name is confusing to first readers. Let's
rename it to "threshold_size" when proper and add a comment for it. No
functional change is made.
CC: Juan Quintela <quintela@redhat.com>
CC: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
If we modify the virtio-rng virqueue while the
vmstate is already migrated we can have some
inconsistencies between the virtqueue state and
the memory content.
To avoid this, stop the virtqueue while the CPU
is stopped.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Amit Shah <amit@kernel.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
If we close the QEMUFile descriptor in process_incoming_migration_co()
while it has been stopped by an error, the postcopy_ram_listen_thread()
can try to continue to use it. And as the memory has been freed
it is working with an invalid pointer and crashes.
Fix this by releasing the memory after having managed the error
case (which, in fact, calls exit())
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Amit Shah <amit@kernel.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Until we have reviewed what can/can't be hotplugged during migration,
disable it. We can enable it later for the things that we know that
work. For instance, memory hotplug during postcopy doesn't work
currently.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
--
- Fix typo. Thanks Thomas.
- Delay migration check after we have checked that we can hotplug that
device.
- more typos
Only user don't have a MigrationState handly.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
This removes the needto pass also the absolute offset.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
We are moving everything to work on pages, not addresses.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
We use an unsigned long for the page number. Notice that our bitmaps
already got that for the index, so we have that limit.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
--
rename page to page_abs everywhere.
fix trace types for pages
We were setting it far away of when we changed it. Now everything is
done inside save_page_header. Once there, reorganize code to pass
RAMState. We also set CONTINUE flag in a single place.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
We change the meaning of start to be the offset from the beggining of
the block.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
The number of dirty pages is output in 'pages' in the command
'info migrate', so add page-size to calculate the number of dirty
pages in bytes.
Signed-off-by: Chao Fan <fanc.fnst@cn.fujitsu.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
It was used as a size in all cases except one.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
We need to call for the migrate_get_current() in more that half of the
uses, so call that inside.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
We can calculate its value, so we don't create a variable for it.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
--
After Peter and Dave review, I dropped the variable and just inlined
the condition.
Fix typo
We receive the file from save_live operations and we don't use it
until 3 or 4 levels of calls down.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Treat it like the rest of ram stats counters. Export its value the
same way. As an added bonus, no more MigrationState used in
migration_bitmap_sync();
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
--
Again, dave was the one reviewing it
It can be recalculated from dirty_pages_rate.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
--
Dave was the one that reviewed it O:-)
This is a ram field that was inside MigrationState. Move it to
RAMState and make it the same that the other ram stats.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
This are the last postcopy fields still at MigrationState. Once there
Move MigrationSrcPageRequest to ram.c and remove MigrationState
parameters where appropiate.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
It was on MigrationState when it is only used inside ram.c for
postcopy. Problem is that we need to access it without being able to
pass it RAMState directly.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Just unfold it. Move ram_bytes_remaining() with the rest of exported
functions.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Its value can be calculated by other exported.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
For compatibility, we need to still send a value, but just specify it
and comment the fact.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
We create a struct where to put all the ram state
Start with the following fields:
last_seen_block, last_sent_block, last_offset, last_version and
ram_bulk_stage are globals that are really related together.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
--
Fix typo and warnings
So all places are consistent on the naming of a block name parameter.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Added doc comments for existing functions comment and rewrite them in
a common style.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
--
Fix Peter Xu comments
Improve postcopy comments as per reviews.
Users can inherit from the simpletrace.Analyzer class and receive
callbacks when events of interest occur in a trace file. The method
signature is a little magic because the timestamp and pid arguments are
optional. Document this.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20170411095654.18383-1-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Currently all trace.o are linked into qemu-system, qemu-img,
qemu-nbd, qemu-io etc., even the corresponding components
are not included.
Put all trace.o into libqemuutil.a that the linker would only pull in .o
files containing symbols that are actually referenced by the
program.
Signed-off -by: Anthony Xu <anthony.xu@intel.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The ./configure script should produce --help output even if Python is
not installed.
Listing trace backends is simple: show the names of all Python modules
in scripts/tracetool/backend/ whose source code contains 'PUBLIC =
True'.
Perform the backend enumeration in shell instead of Python so that we
can move the Python check until after ./configure --help.
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20170328134418.3426-1-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
BLOCK_SIZE is (1 << 20), qcow2 cluster size is 65536 by default,
this may cause the qcow2 file size to be bigger after migration.
This patch checks each cluster, using blk_pwrite_zeroes for each
zero cluster.
[Initialize cluster_size to BLOCK_SIZE to prevent a gcc uninitialized
variable compiler warning. In reality we always initialize cluster_size
in a conditional but gcc doesn't know that.
--Stefan]
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Lidong Chen <lidongchen@tencent.com>
Message-id: 1492050868-16200-1-git-send-email-lidongchen@tencent.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Throttling has a weird property that throttle_get_config() does not
always return the same throttling settings that were given with
throttle_config(). In other words, the set and get functions aren't
symmetric.
If .max is 0 then the throttling code assigns a default value of .avg /
10 in throttle_config(). This is an implementation detail of the
throttling algorithm. When throttle_get_config() is called the .max
value returned should still be 0.
Users are exposed to this quirk via "info block" or "query-block"
monitor commands. This has caused confusion because it looks like a bug
when an unexpected value is reported.
This patch hides the .max value adjustment in throttle_get_config() and
updates test-throttle.c appropriately.
Reported-by: Nini Gu <ngu@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20170301115026.22621-4-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The (burst) max parameter cannot be smaller than the avg parameter.
There is a test case that uses avg = 56, max = 1 and gets away with it
because no input validation is performed by the test case.
This patch switches to valid test input parameters.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20170301115026.22621-3-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Machine queue for 2.10
# gpg: Signature made Thu 20 Apr 2017 19:44:27 BST
# gpg: using RSA key 0x2807936F984DC5A6
# gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>"
# Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF D1AA 2807 936F 984D C5A6
* remotes/ehabkost/tags/machine-pull-request:
qdev: Constify local variable returned by blk_bs
qdev: Constify value passed to qdev_prop_set_macaddr
hostmem: use host_memory_backend_mr_inited() where proper
hostmem: introduce host_memory_backend_mr_inited()
hw/core/null-machine: Print error message when using the -kernel parameter
qdev: Make "hotplugged" property read-only
intel_iommu: enable remote IOTLB
intel_iommu: allow dynamic switch of IOMMU region
intel_iommu: provide its own replay() callback
intel_iommu: use the correct memory region for device IOTLB notification
memory: add MemoryRegionIOMMUOps.replay() callback
memory: introduce memory_region_notify_one()
memory: provide iommu_replay_all()
memory: provide IOMMU_NOTIFIER_FOREACH macro
memory: add section range info for IOMMU notifier
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Previous to the existence of load_image_mr(), the only way to load in the
FCode ROM image was to pass in its physical address via qdev properties
and use load_image_targphys().
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Rather than calling memory_region_set_dirty() directly, make sure that we call
tcx_set_dirty() instead. This ensures that the 24-bit plane and cplane are
also invalidated correctly.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
As all surfaces in QEMU are now either shared or 32-bit ARGB regardless of
the guest depth, remove all non-32-bit primitives from tcx_update_display()
and consequence their implementation which are no longer required.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Now that page alignment is handled by the memory API, there is no need to
duplicate the code 4 times (4 * 1024 == 4096 == TARGET_PAGE_SIZE).
Finally we have now removed all traces of TARGET_PAGE_SIZE.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Now that page alignment is handled by the memory API, there is no need to
duplicate the code 4 times (4 * 1024 == 4096 == TARGET_PAGE_SIZE).
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Since all of the tcx_*_dirty() functions now calculate the 24-bit and
cplane offsets themselves from the base address, these variables are no
longer needed.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
This can now be used by both the 8-bit and 24-bit display code, so rename
to tcx_check_dirty().
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
This can now be used by both the 8-bit and 24-bit display code, so rename
to tcx_check_dirty().
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Previous to the existence of load_image_mr(), the only way to load in the
FCode ROM image was to pass in its physical address via qdev properties
and use load_image_targphys().
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
An upcoming Coccinelle cleanup script wanted to reformat the casts
present in this file - but on closer look, we don't need the casts
at all because C automatically converts void* to any other pointer.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170405194741.18956-4-eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
The I/O adapters should exist as soon as the bus/infrastructure
exists, and not only when the guest is actually trying to do something
with them. While the lazy allocation was not wrong, allocating at init
time is cleaner, both for the architecture and the code. Let's adjust
this by having each device type (currently for PCI and virtio-ccw)
register the adapters for each ISC (as now we don't know which ISC the
guest will use) as soon as it initializes.
Use a two-dimensional array io_adapters[type][isc] to store adapters
in ChannelSubSys, so that we can conveniently get the adapter id by
the helper function css_get_adapter_id(type, isc).
Signed-off-by: Fei Li <sherrylf@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
s390_get_flic() is called many times to obtain the flic. This wastes a
lot of time as it calls object_resolve_path() every time. Let's cache
S390FLICState by defining it as static.
Signed-off-by: Fei Li <sherrylf@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Let's use an enum for io adapter type, and standardize its naming to
CSS_IO_ADAPTER_* by changing S390_PCIPT_ADAPTER to CSS_IO_ADAPTER_PCI.
Signed-off-by: Fei Li <sherrylf@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
'devno' should rather be a property of the ccw device, instead of a
property of a specific virtio-ccw device. Let's consolidate it.
While we are at here, also rename CcwDevice.bus_id to CcwDevice.devno to
make things clearer.
Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Expose the busids of the virtual I/O subchannel and the virtual CCW
device to ease debugging. This is needed because:
1. subchannel id are assigned dynamically, and cannot be set from
outside.
2. device busid could possibly be auto generated.
An example of using HMP to retrieve the property values of a
virtio-balloon-ccw device looks like:
[root@localhost ~]# lscss -d 0.0.0004
Device Subchan. DevType CU Type Use PIM PAM POM CHPIDs
----------------------------------------------------------------------
0.0.0004 0.0.0003 0000/00 3832/05 yes 80 80 ff 00000000 00000000
(qemu) info qtree
... ...
dev: virtio-balloon-ccw, id "balloon0"
devno = "<unset>"
ioeventfd = true
max_revision = 2 (0x2)
dev_id = "fe.0.0004"
subch_id = "fe.0.0003"
... ...
After migration, if we have the same device that shows up on a
different subchannel, we must re-fill the subch_id of the ccw
device with the new schid, or the subch_id will have an old wrong
schid value. So this also re-fills the subch_id after migration.
While we are at it, also neaten the related error handling a bit.
Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Let's introduce a read-only property type that handles device ids of the
CssDevId type used for channel devices for future use. e.g. exposing the
busid of an I/O subchannel that is assigned to a ccw device.
Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
The code was incorrectly calculating the end address rather than the size of
the required region.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
This was an artifact from very early versions of the code from before the
memory API and is no longer needed.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
cannot_destroy_with_object_finalize_yet was added by 4c315c2
("qdev: Protect device-list-properties against broken devices")
because "realview_pci" and "versatile_pci" were hanging
during "device-list-properties" cleanup (an infinite loop in
bus_unparent()).
We have this problem because the child is not removed from
the list of the PCI bus children because it has no defined parent:
qdev_set_parent_bus() set the device parent_bus pointer to bus, and
adds the device in the bus children list, but doesn't update the
device parent pointer.
To fix the problem, move all the involved parts to the realize function.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20170414083717.13641-4-lvivier@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Peter Maydell <peter.maydell@linaro.org>
[Commit message tweaked]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
This removes the assert(kvm_enabled()) from kvmppc_host_cpu_initfn()
This assert can never be triggered as the function is only registered
when KVM is available (see also 4c315c2
"qdev: Protect device-list-properties against broken devices").
So we can remove the cannot_destroy_with_object_finalize_yet from
kvmppc_host_cpu_class_init() without fear and beyond reproach.
(as it has already be done for i386 with 771a13e "i386: Unset
cannot_destroy_with_object_finalize_yet on "host" model" and
e435601 "target-i386: Remove assert(kvm_enabled()) from
host_x86_cpu_initfn()")
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20170414083717.13641-3-lvivier@redhat.com>
Acked-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Inside qdev_prop_set_drive() the value returned by blk_bs() is passed
only as pointer to const to bdrv_get_node_name() and pointed values is
not modified in other places so this can be made const for code
safeness.
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Message-Id: <20170310200550.13313-3-krzk@kernel.org>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
If the user currently tries to use the -kernel parameter, simply nothing
happens, and the user might get confused that there is nothing loaded
to memory, but also no error message has been issued. Since there is no
real generic way to load a kernel on all CPU types (but on some targets,
the generic loader can be used instead), issue an appropriate error
message here now to avoid the possible confusion.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1488271971-12624-1-git-send-email-thuth@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
The "hotplugged" property is user visible, but it was never meant
to be set by the user. There are probably multiple ways to break
or crash device code by overriding the property. For example, we
recently fixed a crash in rtc_set_memory() related to the
property (commit 26ef65beab).
There has been some discussion about making management software
use "hotplugged=on" on migration, to indicate devices that were
hotplugged in the migration source. There were other suggestions
to address this, like including the "hotplugged" field in the
migration stream instead of requiring it to be set explicitly.
Whatever solution we choose in the future, this patch disables
setting "hotplugged" explicitly in the command-line by now,
because the ability to set the property is unused, untested, and
undocumented.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20170222192647.19690-1-ehabkost@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
This patch is based on Aviv Ben-David (<bd.aviv@gmail.com>)'s patch
upstream:
"IOMMU: enable intel_iommu map and unmap notifiers"
https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg01453.html
However I removed/fixed some content, and added my own codes.
Instead of translate() every page for iotlb invalidations (which is
slower), we walk the pages when needed and notify in a hook function.
This patch enables vfio devices for VT-d emulation.
And, since we already have vhost DMAR support via device-iotlb, a
natural benefit that this patch brings is that vt-d enabled vhost can
live even without ATS capability now. Though more tests are needed.
Signed-off-by: Aviv Ben-David <bdaviv@cs.technion.ac.il>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: \"Michael S. Tsirkin\" <mst@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1491562755-23867-10-git-send-email-peterx@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
This is preparation work to finally enabled dynamic switching ON/OFF for
VT-d protection. The old VT-d codes is using static IOMMU address space,
and that won't satisfy vfio-pci device listeners.
Let me explain.
vfio-pci devices depend on the memory region listener and IOMMU replay
mechanism to make sure the device mapping is coherent with the guest
even if there are domain switches. And there are two kinds of domain
switches:
(1) switch from domain A -> B
(2) switch from domain A -> no domain (e.g., turn DMAR off)
Case (1) is handled by the context entry invalidation handling by the
VT-d replay logic. What the replay function should do here is to replay
the existing page mappings in domain B.
However for case (2), we don't want to replay any domain mappings - we
just need the default GPA->HPA mappings (the address_space_memory
mapping). And this patch helps on case (2) to build up the mapping
automatically by leveraging the vfio-pci memory listeners.
Another important thing that this patch does is to seperate
IR (Interrupt Remapping) from DMAR (DMA Remapping). IR region should not
depend on the DMAR region (like before this patch). It should be a
standalone region, and it should be able to be activated without
DMAR (which is a common behavior of Linux kernel - by default it enables
IR while disabled DMAR).
Reviewed-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: \"Michael S. Tsirkin\" <mst@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1491562755-23867-9-git-send-email-peterx@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
The default replay() don't work for VT-d since vt-d will have a huge
default memory region which covers address range 0-(2^64-1). This will
normally consumes a lot of time (which looks like a dead loop).
The solution is simple - we don't walk over all the regions. Instead, we
jump over the regions when we found that the page directories are empty.
It'll greatly reduce the time to walk the whole region.
To achieve this, we provided a page walk helper to do that, invoking
corresponding hook function when we found an page we are interested in.
vtd_page_walk_level() is the core logic for the page walking. It's
interface is designed to suite further use case, e.g., to invalidate a
range of addresses.
Reviewed-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: \"Michael S. Tsirkin\" <mst@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1491562755-23867-8-git-send-email-peterx@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Originally we have one memory_region_iommu_replay() function, which is
the default behavior to replay the translations of the whole IOMMU
region. However, on some platform like x86, we may want our own replay
logic for IOMMU regions. This patch adds one more hook for IOMMUOps for
the callback, and it'll override the default if set.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: \"Michael S. Tsirkin\" <mst@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1491562755-23867-6-git-send-email-peterx@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
In this patch, IOMMUNotifier.{start|end} are introduced to store section
information for a specific notifier. When notification occurs, we not
only check the notification type (MAP|UNMAP), but also check whether the
notified iova range overlaps with the range of specific IOMMU notifier,
and skip those notifiers if not in the listened range.
When removing an region, we need to make sure we removed the correct
VFIOGuestIOMMU by checking the IOMMUNotifier.start address as well.
This patch is solving the problem that vfio-pci devices receive
duplicated UNMAP notification on x86 platform when vIOMMU is there. The
issue is that x86 IOMMU has a (0, 2^64-1) IOMMU region, which is
splitted by the (0xfee00000, 0xfeefffff) IRQ region. AFAIK
this (splitted IOMMU region) is only happening on x86.
This patch also helps vhost to leverage the new interface as well, so
that vhost won't get duplicated cache flushes. In that sense, it's an
slight performance improvement.
Suggested-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1491562755-23867-2-git-send-email-peterx@redhat.com>
[ehabkost: included extra vhost_iommu_region_del() change from Peter Xu]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
We already require gcc 4.1 or newer (for the atomic
support), so the fallback codepaths for older gcc
versions than that are now dead code and we can
just delete them.
NB: clang reports itself as gcc 4.2 (regardless of
clang version), so clang won't be using the fallbacks
either.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
target-arm queue:
* implement M profile exception return properly
* cadence GEM: fix multiqueue handling bugs
* pxa2xx.c: QOMify a device
* arm/kvm: Remove trailing newlines from error_report()
* stellaris: Don't hw_error() on bad register accesses
* Add assertion about FSC format for syndrome registers
* Move excnames[] array into arm_log_exceptions()
* exynos: minor code cleanups
* hw/arm/boot: take Linux/arm64 TEXT_OFFSET header field into account
* Fix APSR writes via M profile MSR
# gpg: Signature made Thu 20 Apr 2017 17:39:35 BST
# gpg: using RSA key 0x3C2525ED14360CDE
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>"
# gpg: aka "Peter Maydell <pmaydell@gmail.com>"
# gpg: aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>"
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83 15CF 3C25 25ED 1436 0CDE
* remotes/pmaydell/tags/pull-target-arm-20170420: (24 commits)
arm: Remove workarounds for old M-profile exception return implementation
arm: Implement M profile exception return properly
arm: Track M profile handler mode state in TB flags
arm: Abstract out "are we singlestepping" test to utility function
arm: Move condition-failed codepath generation out of if()
arm: Move gen_set_condexec() and gen_set_pc_im() up in the file
arm: Factor out "generate right kind of step exception"
arm: Thumb shift operations should not permit interworking branches
arm: Don't implement BXJ on M-profile CPUs
xlnx-zynqmp: Set the Cadence GEM revision
cadence_gem: Make the revision a property
cadence_gem: Correct the interupt logic
cadence_gem: Correct the multi-queue can rx logic
cadence_gem: Read the correct queue descriptor
hw/arm: Qomify pxa2xx.c
arm/kvm: Remove trailing newlines from error_report()
stellaris: Don't hw_error() on bad register accesses
target/arm: Add assertion about FSC format for syndrome registers
arm: Move excnames[] array into arm_log_exceptions()
target/arm: Add missing entries to excnames[] for log strings
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
On M profile, return from exceptions happen when code in Handler mode
executes one of the following function call return instructions:
* POP or LDM which loads the PC
* LDR to PC
* BX register
and the new PC value is 0xFFxxxxxx.
QEMU tries to implement this by not treating the instruction
specially but then catching the attempt to execute from the magic
address value. This is not ideal, because:
* there are guest visible differences from the architecturally
specified behaviour (for instance jumping to 0xFFxxxxxx via a
different instruction should not cause an exception return but it
will in the QEMU implementation)
* we have to account for it in various places (like refusing to take
an interrupt if the PC is at a magic value, and making sure that
the MPU doesn't deny execution at the magic value addresses)
Drop these hacks, and instead implement exception return the way the
architecture specifies -- by having the relevant instructions check
for the magic value and raise the 'do an exception return' QEMU
internal exception immediately.
The effect on the generated code is minor:
bx lr, old code (and new code for Thread mode):
TCG:
mov_i32 tmp5,r14
movi_i32 tmp6,$0xfffffffffffffffe
and_i32 pc,tmp5,tmp6
movi_i32 tmp6,$0x1
and_i32 tmp5,tmp5,tmp6
st_i32 tmp5,env,$0x218
exit_tb $0x0
set_label $L0
exit_tb $0x7f2aabd61993
x86_64 generated code:
0x7f2aabe87019: mov %ebx,%ebp
0x7f2aabe8701b: and $0xfffffffffffffffe,%ebp
0x7f2aabe8701e: mov %ebp,0x3c(%r14)
0x7f2aabe87022: and $0x1,%ebx
0x7f2aabe87025: mov %ebx,0x218(%r14)
0x7f2aabe8702c: xor %eax,%eax
0x7f2aabe8702e: jmpq 0x7f2aabe7c016
bx lr, new code when in Handler mode:
TCG:
mov_i32 tmp5,r14
movi_i32 tmp6,$0xfffffffffffffffe
and_i32 pc,tmp5,tmp6
movi_i32 tmp6,$0x1
and_i32 tmp5,tmp5,tmp6
st_i32 tmp5,env,$0x218
movi_i32 tmp5,$0xffffffffff000000
brcond_i32 pc,tmp5,geu,$L1
exit_tb $0x0
set_label $L1
movi_i32 tmp5,$0x8
call exception_internal,$0x0,$0,env,tmp5
x86_64 generated code:
0x7fe8fa1264e3: mov %ebp,%ebx
0x7fe8fa1264e5: and $0xfffffffffffffffe,%ebx
0x7fe8fa1264e8: mov %ebx,0x3c(%r14)
0x7fe8fa1264ec: and $0x1,%ebp
0x7fe8fa1264ef: mov %ebp,0x218(%r14)
0x7fe8fa1264f6: cmp $0xff000000,%ebx
0x7fe8fa1264fc: jae 0x7fe8fa126509
0x7fe8fa126502: xor %eax,%eax
0x7fe8fa126504: jmpq 0x7fe8fa122016
0x7fe8fa126509: mov %r14,%rdi
0x7fe8fa12650c: mov $0x8,%esi
0x7fe8fa126511: mov $0x56095dbeccf5,%r10
0x7fe8fa12651b: callq *%r10
which is a difference of one cmp/branch-not-taken. This will
be lost in the noise of having to exit generated code and
look up the next TB anyway.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 1491844419-12485-9-git-send-email-peter.maydell@linaro.org
For M profile exception-return handling we'd like to generate different
code for some instructions depending on whether we are in Handler
mode or Thread mode. This isn't the same as "are we privileged
or user", so we need an extra bit in the TB flags to distinguish.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 1491844419-12485-8-git-send-email-peter.maydell@linaro.org
We now test for "are we singlestepping" in several places and
it's not a trivial check because we need to care about both
architectural singlestep and QEMU gdbstub singlestep. We're
also about to add another place that needs to make this check,
so pull the condition out into a function.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 1491844419-12485-7-git-send-email-peter.maydell@linaro.org
Move the code to generate the "condition failed" instruction
codepath out of the if (singlestepping) {} else {}. This
will allow adding support for handling a new is_jmp type
which can't be neatly split into "singlestepping case"
versus "not singlestepping case".
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Message-id: 1491844419-12485-6-git-send-email-peter.maydell@linaro.org
We currently have two places that do:
if (dc->ss_active) {
gen_step_complete_exception(dc);
} else {
gen_exception_internal(EXCP_DEBUG);
}
Factor this out into its own function, as we're about to add
a third place that needs the same logic.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Message-id: 1491844419-12485-4-git-send-email-peter.maydell@linaro.org
In Thumb mode, the only instructions which can cause an interworking
branch by writing the PC are BLX, BX, BXJ, LDR, POP and LDM. Unlike
ARM mode, data processing instructions which target the PC do not
cause interworking branches.
When we added support for doing interworking branches on writes to
PC from data processing instructions in commit 21aeb3430c, we
accidentally changed a Thumb instruction to have interworking
branch behaviour for writes to PC. (MOV, MOVS register-shifted
register, encoding T2; this is the standard encoding for
LSL/LSR/ASR/ROR (register).)
For this encoding, behaviour with Rd == R15 is specified as
UNPREDICTABLE, so allowing an interworking branch is within
spec, but it's confusing and differs from our handling of this
class of UNPREDICTABLE for other Thumb ALU operations. Make
it perform a simple (non-interworking) branch like the others.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 1491844419-12485-3-git-send-email-peter.maydell@linaro.org
This patch fixes two mistakes in the interrupt logic.
First we only trigger single-queue or multi-queue interrupts if the status
register is set. This logic was already used for non multi-queue interrupts
but it also applies to multi-queue interrupts.
Secondly we need to lower the interrupts if the ISR isn't set. As part
of this we can remove the other interrupt lowering logic and consolidate
it inside gem_update_int_status().
Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Message-id: 438bcc014f8f8a2f8f68f322cb6a53f4c04688c2.1491947224.git.alistair.francis@xilinx.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Current recommended style is to log a guest error on bad register
accesses, not kill the whole system with hw_error(). Change the
hw_error() calls to log as LOG_GUEST_ERROR or LOG_UNIMP or use
g_assert_not_reached() as appropriate.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 1491486314-25823-1-git-send-email-peter.maydell@linaro.org
In tlb_fill() we construct a syndrome register value from a
fault status register value which is filled in by arm_tlb_fill().
arm_tlb_fill() returns FSR values which might be in the format
used with short-format page descriptors, or the format used
with long-format (LPAE) descriptors. The syndrome register
always uses LPAE-format FSR status codes.
It isn't actually possible to end up delivering a syndrome
register value to the guest for a fault which is reported
with a short-format FSR (that kind of stage 1 fault will only
happen for an AArch32 translation regime which doesn't have
a syndrome register, and can never be redirected to an AArch64
or Hyp exception level). Add an assertion which checks this,
and adjust the code so that we construct a syndrome with
an invalid status code, rather than allowing set bits in
the FSR input to randomly corrupt other fields in the syndrome.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Message-id: 1491486152-24304-1-git-send-email-peter.maydell@linaro.org
The excnames[] array is defined in internals.h because we used
to use it from two different source files for handling logging
of AArch32 and AArch64 exception entry. Refactoring means that
it's now used only in arm_log_exception() in helper.c, so move
the array into that function.
Suggested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 1491821097-5647-1-git-send-email-peter.maydell@linaro.org
Recent changes have added new EXCP_ values to ARM but forgot
to update the excnames[] array which is used to provide
human-readable strings when printing information about the
exception for debug logging. Add the missing entries, and
add a comment to the list of #defines to help avoid the mistake
being repeated in future.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Message-id: 1491486340-25988-1-git-send-email-peter.maydell@linaro.org
Short declaration of 'i' was in the middle of declarations with
assignments. Make it a little bit more readable. Additionally switch
from "unsigned" to "unsigned int" as this pattern is more widely used.
No functional change.
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20170313184750.429-4-krzk@kernel.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The static array exynos4210_uart_regs with register values is not
modified so it can be made const.
Few other functions accept driver or uart state as an argument but they
do not change it and do not cast it so this can be made const for code
safeness.
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Message-id: 20170313184750.429-3-krzk@kernel.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
qemu_log_mask() and error_report() are preferred over fprintf() for
logging errors. Also remove square brackets [] and additional new line
characters in printed messages.
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20170313184750.429-2-krzk@kernel.org
[PMM: wrapped long line]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The arm64 boot protocol stipulates that the kernel must be loaded
TEXT_OFFSET bytes beyond a 2 MB aligned base address, where TEXT_OFFSET
could be any 4 KB multiple between 0 and 2 MB, and whose value can be
found in the header of the Image file.
So after attempts to load the arm64 kernel image as an ELF file or as a
U-Boot image have failed (both of which have their own way of specifying
the load offset), try to determine the TEXT_OFFSET from the image after
loading it but before mapping it as a ROM mapping into the guest address
space.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1489414630-21609-1-git-send-email-ard.biesheuvel@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
# gpg: Signature made Tue 18 Apr 2017 15:58:32 BST
# gpg: using RSA key 0xCA35624C6A9171C6
# gpg: Good signature from "Fam Zheng <famz@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 5003 7CB7 9706 0F76 F021 AD56 CA35 624C 6A91 71C6
* remotes/famz/tags/block-pull-request:
block: Drain BH in bdrv_drained_begin
block: Walk bs->children carefully in bdrv_drain_recurse
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
During block job completion, nothing is preventing
block_job_defer_to_main_loop_bh from being called in a nested
aio_poll(), which is a trouble, such as in this code path:
qmp_block_commit
commit_active_start
bdrv_reopen
bdrv_reopen_multiple
bdrv_reopen_prepare
bdrv_flush
aio_poll
aio_bh_poll
aio_bh_call
block_job_defer_to_main_loop_bh
stream_complete
bdrv_reopen
block_job_defer_to_main_loop_bh is the last step of the stream job,
which should have been "paused" by the bdrv_drained_begin/end in
bdrv_reopen_multiple, but it is not done because it's in the form of a
main loop BH.
Similar to why block jobs should be paused between drained_begin and
drained_end, BHs they schedule must be excluded as well. To achieve
this, this patch forces draining the BH in BDRV_POLL_WHILE.
As a side effect this fixes a hang in block_job_detach_aio_context
during system_reset when a block job is ready:
#0 0x0000555555aa79f3 in bdrv_drain_recurse
#1 0x0000555555aa825d in bdrv_drained_begin
#2 0x0000555555aa8449 in bdrv_drain
#3 0x0000555555a9c356 in blk_drain
#4 0x0000555555aa3cfd in mirror_drain
#5 0x0000555555a66e11 in block_job_detach_aio_context
#6 0x0000555555a62f4d in bdrv_detach_aio_context
#7 0x0000555555a63116 in bdrv_set_aio_context
#8 0x0000555555a9d326 in blk_set_aio_context
#9 0x00005555557e38da in virtio_blk_data_plane_stop
#10 0x00005555559f9d5f in virtio_bus_stop_ioeventfd
#11 0x00005555559fa49b in virtio_bus_stop_ioeventfd
#12 0x00005555559f6a18 in virtio_pci_stop_ioeventfd
#13 0x00005555559f6a18 in virtio_pci_reset
#14 0x00005555559139a9 in qdev_reset_one
#15 0x0000555555916738 in qbus_walk_children
#16 0x0000555555913318 in qdev_walk_children
#17 0x0000555555916738 in qbus_walk_children
#18 0x00005555559168ca in qemu_devices_reset
#19 0x000055555581fcbb in pc_machine_reset
#20 0x00005555558a4d96 in qemu_system_reset
#21 0x000055555577157a in main_loop_should_exit
#22 0x000055555577157a in main_loop
#23 0x000055555577157a in main
The rationale is that the loop in block_job_detach_aio_context cannot
make any progress in pausing/completing the job, because bs->in_flight
is 0, so bdrv_drain doesn't process the block_job_defer_to_main_loop
BH. With this patch, it does.
Reported-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20170418143044.12187-3-famz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Tested-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
The recursive bdrv_drain_recurse may run a block job completion BH that
drops nodes. The coming changes will make that more likely and use-after-free
would happen without this patch
Stash the bs pointer and use bdrv_ref/bdrv_unref in addition to
QLIST_FOREACH_SAFE to prevent such a case from happening.
Since bdrv_unref accesses global state that is not protected by the AioContext
lock, we cannot use bdrv_ref/bdrv_unref unconditionally. Fortunately the
protection is not needed in IOThread because only main loop can modify a graph
with the AioContext lock held.
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20170418143044.12187-2-famz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Tested-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
The local backend was recently converted to using "at*()" syscalls in order
to ensure all accesses happen below the shared directory. This requires that
we only pass relative paths, otherwise the dirfd argument to the "at*()"
syscalls is ignored and the path is treated as an absolute path in the host.
This is actually the case for paths in all fids, with the notable exception
of the root fid, whose path is "/". This causes the following backend ops to
act on the "/" directory of the host instead of the virtfs shared directory
when the export root is involved:
- lstat
- chmod
- chown
- utimensat
ie, chmod /9p_mount_point in the guest will be converted to chmod / in the
host for example. This could cause security issues with a privileged QEMU.
All "*at()" syscalls are being passed an open file descriptor. In the case
of the export root, this file descriptor points to the path in the host that
was passed to -fsdev.
The fix is thus as simple as changing the path of the export root fid to be
"." instead of "/".
This is CVE-2017-7471.
Cc: qemu-stable@nongnu.org
Reported-by: Léo Gaspard <leo@gaspard.io>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This fixes a regression introduced in commit 9d456654.
aio_co_wake() can only be used to reenter a coroutine that was already
previously entered, otherwise co->ctx is uninitialised and we access
garbage. Using it immediately after qemu_coroutine_create() like in
co_read_response() is wrong and causes segfaults.
Replace the call with aio_co_enter(), which gets an explicit AioContext
parameter and works even for new coroutines.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Tested-by: Kashyap Chamarthy <kchamart@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1491919733-21065-1-git-send-email-kwolf@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Since d5895fcb (iscsi: Split URL into individual options), creating
qcow2 image on an iscsi LUN fails:
qemu-img create -f qcow2 iscsi://$SERVER/$IQN/0 1G
qemu-img: iscsi://$SERVER/$IQN/0: Could not create image: Invalid
argument
The problem is iscsi_open now expects that transport_name, portal and
target are already parsed into structured options by
iscsi_parse_filename, but it is not called in iscsi_create.
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 20170410075451.21329-1-famz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
[mreitz: Dropped now superfluous
qdict_put(bs_options, "filename", ...)]
Signed-off-by: Max Reitz <mreitz@redhat.com>
When a block device that is part of a throttle group is hot-unplugged,
we forgot to remove it from the throttle group. This leaves stale
memory around, and causes an easily reproducible crash:
$ ./x86_64-softmmu/qemu-system-x86_64 -nodefaults -nographic -qmp stdio \
-device virtio-scsi-pci,bus=pci.0 -drive \
id=drive_image2,if=none,format=raw,file=file2,bps=512000,iops=100,group=foo \
-device scsi-hd,id=image2,drive=drive_image2 -drive \
id=drive_image3,if=none,format=raw,file=file3,bps=512000,iops=100,group=foo \
-device scsi-hd,id=image3,drive=drive_image3
{'execute':'qmp_capabilities'}
{'execute':'device_del','arguments':{'id':'image3'}}
{'execute':'system_reset'}
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1428810
Suggested-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 20170406190847.29347-1-eblake@redhat.com
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
raw_open() expects the caller always passing in the right actual
@options parameter. But when trying to applying snapshot on a RBD
image, bdrv_snapshot_goto() calls raw_open() (by calling the
bdrv_open callback on the BlockDriver) with a NULL @options, and
that will result in a Segmentation fault.
For the other non-raw format drivers, it also makes sense to passing
in the actual options, althought they don't trigger the problem so
far.
Let's prepare a @options by adding the "file" key-value pair to a
copy of the actual options that were given for the node (i.e.
bs->options), and pass it to the callback.
BlockDriver.bdrv_open() expects bs->file to be NULL and just
overwrites it with the result from bdrv_open_child(). That means we
should actually make sure it's NULL because otherwise the child BDS
will have a reference count that is 1 too high. So we unconditionally
invoke bdrv_unref_child() before calling BlockDriver.bdrv_open(), and
we wrap everything in bdrv_ref()/bdrv_unref() so the BDS isn't
deleted in the meantime.
Suggested-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Message-id: 20170405091909.36357-2-bjsdjshi@linux.vnet.ibm.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
# gpg: Signature made Tue 11 Apr 2017 13:10:55 BST
# gpg: using RSA key 0xCA35624C6A9171C6
# gpg: Good signature from "Fam Zheng <famz@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 5003 7CB7 9706 0F76 F021 AD56 CA35 624C 6A91 71C6
* remotes/famz/tags/block-pull-request:
sheepdog: Use bdrv_coroutine_enter before BDRV_POLL_WHILE
block: Fix bdrv_co_flush early return
block: Use bdrv_coroutine_enter to start I/O coroutines
qemu-io-cmds: Use bdrv_coroutine_enter
blockjob: Use bdrv_coroutine_enter to start coroutine
block: Introduce bdrv_coroutine_enter
async: Introduce aio_co_enter
coroutine: Extract qemu_aio_coroutine_enter
tests/block-job-txn: Don't start block job before adding to txn
block: Quiesce old aio context during bdrv_set_aio_context
block: Make bdrv_parent_drained_begin/end public
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
When called from main thread, the coroutine should run in the context of
bs. Use bdrv_coroutine_enter to ensure that.
Signed-off-by: Fam Zheng <famz@redhat.com>
bdrv_inc_in_flight and bdrv_dec_in_flight are mandatory for
BDRV_POLL_WHILE to work, even for the shortcut case where flush is
unnecessary. Move the if block to below bdrv_dec_in_flight, and BTW fix
the variable declaration position.
Signed-off-by: Fam Zheng <famz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
BDRV_POLL_WHILE waits for the started I/O by releasing bs's ctx then polling
the main context, which relies on the yielded coroutine continuing on bs->ctx
before notifying qemu_aio_context with bdrv_wakeup().
Thus, using qemu_coroutine_enter to start I/O is wrong because if the coroutine
is entered from main loop, co->ctx will be qemu_aio_context, as a result of the
"release, poll, acquire" loop of BDRV_POLL_WHILE, race conditions happen when
both main thread and the iothread access the same BDS:
main loop iothread
-----------------------------------------------------------------------
blockdev_snapshot
aio_context_acquire(bs->ctx)
virtio_scsi_data_plane_handle_cmd
bdrv_drained_begin(bs->ctx)
bdrv_flush(bs)
bdrv_co_flush(bs) aio_context_acquire(bs->ctx).enter
...
qemu_coroutine_yield(co)
BDRV_POLL_WHILE()
aio_context_release(bs->ctx)
aio_context_acquire(bs->ctx).return
...
aio_co_wake(co)
aio_poll(qemu_aio_context) ...
co_schedule_bh_cb() ...
qemu_coroutine_enter(co) ...
/* (A) bdrv_co_flush(bs) /* (B) I/O on bs */
continues... */
aio_context_release(bs->ctx)
aio_context_acquire(bs->ctx)
Note that in above case, bdrv_drained_begin() doesn't do the "release,
poll, acquire" in BDRV_POLL_WHILE, because bs->in_flight == 0.
Fix this by using bdrv_coroutine_enter and enter coroutine in the right
context.
iotests 109 output is updated because the coroutine reenter flow during
mirror job complete is different (now through co_queue_wakeup, instead
of the unconditional qemu_coroutine_switch before), making the end job
len different.
Signed-off-by: Fam Zheng <famz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
qemu_coroutine_create associates @co to qemu_aio_context but we poll
blk's context below. If the coroutine yields, it may never get resumed
again.
Use bdrv_coroutine_enter to make sure we are starting the I/O on the
right context.
Signed-off-by: Fam Zheng <famz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Resuming and especially starting of the block job coroutine, could be issued in
the main thread. However the coroutine's "home" ctx should be set to the same
context as job->blk. Use bdrv_coroutine_enter to ensure that.
Signed-off-by: Fam Zheng <famz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
It's a variant of qemu_coroutine_enter with an explicit AioContext
parameter.
Signed-off-by: Fam Zheng <famz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Previously, before test_block_job_start returns, the job can already
complete, as a result, the transactional state of other jobs added to
the same txn later cannot be handled correctly.
Move the block_job_start() calls to callers after
block_job_txn_add_job() calls.
Signed-off-by: Fam Zheng <famz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
The fact that the bs->aio_context is changing can confuse the dataplane
iothread, because of the now fine granularity aio context lock.
bdrv_drain should rather be a bdrv_drained_begin/end pair, but since
bs->aio_context is changing, we can just use aio_disable_external and
bdrv_parent_drained_begin.
Reported-by: Ed Swierk <eswierk@skyportsystems.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Fixes a memory leak.
# gpg: Signature made Mon 10 Apr 2017 13:20:39 BST
# gpg: using DSA key 0x02FC3AEB0101DBC2
# gpg: Good signature from "Greg Kurz <groug@kaod.org>"
# gpg: aka "Greg Kurz <groug@free.fr>"
# gpg: aka "Greg Kurz <gkurz@linux.vnet.ibm.com>"
# gpg: aka "Gregory Kurz (Groug) <groug@free.fr>"
# gpg: aka "[jpeg image of size 3330]"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 2BD4 3B44 535E C0A7 9894 DBA2 02FC 3AEB 0101 DBC2
* remotes/gkurz/tags/for-upstream:
9pfs: xattr: fix memory leak in v9fs_list_xattr
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Final icount and misc MTTCG fixes for 2.9
Minor differences from:
Message-Id: <20170405132503.32125-1-alex.bennee@linaro.org>
- dropped new feature patches
- last minute typo fix from Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
# gpg: Signature made Mon 10 Apr 2017 11:38:10 BST
# gpg: using RSA key 0xFBD0DB095A9E2A44
# gpg: Good signature from "Alex Bennée (Master Work Key) <alex.bennee@linaro.org>"
# Primary key fingerprint: 6685 AE99 E751 67BC AFC8 DF35 FBD0 DB09 5A9E 2A44
* remotes/stsquad/tags/pull-mttcg-fixups-for-rc2-100417-1:
replay: assert time only goes forward
cpus: call cpu_update_icount on read
cpu-exec: update icount after each TB_EXIT
cpus: introduce cpu_update_icount helper
cpus: don't credit executed instructions before they have run
cpus: move icount preparation out of tcg_exec_cpu
cpus: check cpu->running in cpu_get_icount_raw()
cpus: remove icount handling from qemu_tcg_cpu_thread_fn
target/i386/misc_helper: wrap BQL around another IRQ generator
cpus: fix wrong define name
scripts/qemugdb/mtree.py: fix up mtree dump
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
In the 2.7 release we stated in the ChangeLog that the
minimum glib version for Windows hosts was 2.30, but we
didn't update configure to enforce this because we were
very close to the release at the point where we noticed
the issue, and it only affected building the test suite.
We then forgot that we needed to do it. Fix the omission.
(The reason for the 2.30 requirement is use of
g_dir_make_tmp() -- our fallback implementation uses
mkdtemp(), which isn't available on Windows.)
Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Message-id: 1491224655-5776-1-git-send-email-peter.maydell@linaro.org
If we find ourselves trying to add an event to the log where time has
gone backwards it is because a vCPU event has occurred and the
main-loop is not yet aware of time moving forward. This should not
happen and if it does its better to fail early than generate a log
that will have weird behaviour.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
This ensures each time the vCPU thread reads the icount we update the
master timer_state.qemu_icount field. This way as long as updates are
in BQL protected sections (which they should be) the main-loop can
never come to update the log and find time has gone backwards.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
There is no particular reason we shouldn't update the global system
icount time as we exit each TranslationBlock run. This ensures the
main-loop doesn't have to wait until we exit to the outer loop for
executed instructions to be credited to timer_state.
The prepare_icount_for_run function is slightly tweaked to match the
logic we run in cpu_loop_exec_tb.
Based on Paolo's original suggestion.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
By holding off updates to timer_state.qemu_icount we can run into
trouble when the non-vCPU thread needs to know the time. This helper
ensures we atomically update timers_state.qemu_icount based on what
has been currently executed.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Outside of the vCPU thread icount time will only be tracked against
timers_state.qemu_icount. We no longer credit cycles until they have
completed the run. Inside the vCPU thread we adjust for passage of
time by looking at how many have run so far. This is only valid inside
the vCPU thread while it is running.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
As icount is only supported for single-threaded execution due to the
requirement for determinism let's remove it from the common
tcg_exec_cpu path.
Also remove the additional fiddling which shouldn't be required as the
icount counters should all be rectified as you enter the loop.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
The lifetime of current_cpu is now the lifetime of the vCPU thread.
However get_icount_raw() can apply a fudge factor if called while code
is running to take into account the current executed instruction
count.
To ensure this is always the case we also check cpu->running.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
We should never be running in multi-threaded mode with icount enabled.
There is no point calling handle_icount_deadline here so remove it and
assert !use_icount.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
While the configure script generates TARGET_SUPPORTS_MTTCG define, one
of the define is cpus.c is checking wrong name: TARGET_SUPPORT_MTTCG
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Since QEMU has been able to build with native Int128 support this was
broken as it attempts to fish values out of the non-existent
structure. Also the alias print was trying to make a %x out of
gdb.ValueType directly which didn't seem to work.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
bdrv_replace_child_noperm tries to hand over the quiesce_counter state
from old bs to the new one, but if they are not on the same aio context
this causes unbalance.
Fix this by setting the correct aio context before calling
bdrv_append().
Reported-by: Ed Swierk <eswierk@skyportsystems.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The assertion is currently failing. We can't require callers to have
write permissions when all they are doing is a read, so comment it out.
Add a FIXME comment in the code so that the check is re-enabled when
copy on read is refactored into its own filter driver.
Reported-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
The documentation and help for qemu-img claims that 'qemu-img create'
will take the '--image-opts' argument. This is not true, so this
patch removes those claims.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
If @bs does not have any parents, the only reference to @mirror_top_bs
will be held by the BlockJob object after the bdrv_unref() following
block_job_create(). However, if block_job_create() fails, this reference
will not exist and @mirror_top_bs will have been deleted when we
goto fail.
The issue comes back at all later entries to the fail label: We delete
the BlockJob object before rolling back our changes to the node graph.
This means that we will delete @mirror_top_bs in the process.
All in all, whenever @bs does not have any parents and we go down the
fail path we will dereference @mirror_top_bs after it has been deleted.
Fix this by invoking bdrv_unref() only when block_job_create() was
successful and by bdrv_ref()'ing @mirror_top_bs in the fail path before
deleting the BlockJob object. Finally, bdrv_unref() it at the end of the
fail path after we actually no longer need it.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Like in the mirror filter driver, we also need to set the image size for
the commit filter driver. This is less likely to be a problem in
practice than for the mirror because we're not at the active layer here,
but attaching new parents to a node in the middle of the chain is
possible, so the size needs to be correct anyway.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
The filter driver that is inserted by the commit job needs to use the
same AioContext as its parent and child nodes.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Usually guest devices don't like other writers to the same image, so
they use blk_set_perm() to prevent this from happening. In the migration
phase before the VM is actually running, though, they don't have a
problem with writes to the image. On the other hand, storage migration
needs to be able to write to the image in this phase, so the restrictive
blk_set_perm() call of qdev devices breaks it.
This patch flags all BlockBackends with a qdev device as
blk->disable_perm during incoming migration, which means that the
requested permissions are stored in the BlockBackend, but not actually
applied to its root node yet.
Once migration has finished and the VM should be resumed, the
permissions are applied. If they cannot be applied (e.g. because the NBD
server used for block migration hasn't been shut down), resuming the VM
fails.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Tested-by: Kashyap Chamarthy <kchamart@redhat.com>
Since commit cd958edb1f, same size console resize is skipped. This
change broke QXL incoming migration in VGA mode,
qemu_spice_display_switch() is no longer called during qxl_post_load(),
because default message surface is of the same size, and during
displaychangelistener registration, PCIQXLDevice.mode is
QXL_MODE_UNDEFINED. This triggers a later crash on refresh:
==2634== Invalid read of size 4
==3516== at 0x65F3050: pixman_image_get_data (in /usr/lib64/libpixman-1.so.0.34.0)
==3516== by 0x6F0CEB: qemu_spice_create_update (spice-display.c:215)
==3516== by 0x6F1CC7: qemu_spice_display_refresh (spice-display.c:502)
==3516== by 0x58CF77: display_refresh (qxl.c:1948)
==3516== by 0x6E8084: do_safe_dpy_refresh (console.c:1591)
==3516== by 0x6E80D5: dpy_refresh (console.c:1604)
==3516== by 0x6E4508: gui_update (console.c:201)
==3516== by 0x81898E: timerlist_run_timers (qemu-timer.c:536)
==3516== by 0x8189D6: qemu_clock_run_timers (qemu-timer.c:547)
==3516== by 0x818D98: qemu_clock_run_all_timers (qemu-timer.c:662)
==3516== by 0x81952A: main_loop_wait (main-loop.c:514)
==3516== by 0x4ADD29: main_loop (vl.c:1898)
One way to solve this is to explicitely call qemu_spice_display_switch()
on entering VGA mode, which is called during qxl_post_load().
Fixes:
"null pointer access on migration resume of systemrescuecd boot menu with qxl-vga"
https://bugs.launchpad.net/qemu/+bug/1679126https://bugzilla.redhat.com/show_bug.cgi?id=1438566
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20170406120513.638-4-marcandre.lureau@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
The NVIDIA BAR5 quirk is targeting an ioport BAR. Some older devices
have a BAR5 which is not ioport and can induce a segfault here. Test
the BAR type to skip these devices.
Link: https://bugs.launchpad.net/qemu/+bug/1678466
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
This behavior is not indicated in the datasheet and can confuse the OS.
The TCO can trap NMIs from SERR# or IOCHK# and convert them to SMIs; but
any other TCO event is either delivered as an SMI or completely disabled.
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Some 9pfs bugs fixes: potential hang at reset, migration blocker leak.
# gpg: Signature made Tue 04 Apr 2017 17:07:55 BST
# gpg: using DSA key 0x02FC3AEB0101DBC2
# gpg: Good signature from "Greg Kurz <groug@kaod.org>"
# gpg: aka "Greg Kurz <groug@free.fr>"
# gpg: aka "Greg Kurz <gkurz@linux.vnet.ibm.com>"
# gpg: aka "Gregory Kurz (Groug) <groug@free.fr>"
# gpg: aka "[jpeg image of size 3330]"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 2BD4 3B44 535E C0A7 9894 DBA2 02FC 3AEB 0101 DBC2
* remotes/gkurz/tags/for-upstream:
9pfs: clear migration blocker at session reset
9pfs: fix multiple flush for same request
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The migration blocker survives a device reset: if the guest mounts a 9p
share and then gets rebooted with system_reset, it will be unmigratable
until it remounts and umounts the 9p share again.
This happens because the migration blocker is supposed to be cleared when
we put the last reference on the root fid, but virtfs_reset() wrongly calls
free_fid() instead of put_fid().
This patch fixes virtfs_reset() so that it honor the way fids are supposed
to be manipulated: first get a reference and later put it back when you're
done.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Li Qiang <liqiang6-s@360.cn>
If a client tries to flush the same outstanding request several times, only
the first flush completes. Subsequent ones keep waiting for the request
completion in v9fs_flush() and, therefore, leak a PDU. This will cause QEMU
to hang when draining active PDUs the next time the device is reset.
Let have each flush request wake up the next one if any. The last waiter
frees the cancelled PDU.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Normally pci_init_bus_master() would be called either via
bus->machine_done.notify or directly from do_pci_register_device().
However if a device's realize() failed, pci_init_bus_master() is not
called, and do_pci_unregister_device() fails on
memory_region_del_subregion() as it was not mapped.
This adds a check that subregion was mapped before unmapping it.
Fixes: c53598ed18 ("pci: Add missing drop of bus master AS reference")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Tested-by: John Snow <jsnow@redhat.com>
The qio_dns_resolver_lookup_sync() method is required to be a no-op
for socket kinds that don't require name resolution. Thus the KIND_FD
handling should not return an error.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The channel socket was initialized manually, but forgot to set
QIO_CHANNEL_FEATURE_SHUTDOWN. Thus, the colo_process_incoming_thread
would hang at recvmsg. This patch just call qio_channel_socket_new to
get channel, Which set QIO_CHANNEL_FEATURE_SHUTDOWN already.
Signed-off-by: Wang Guang<wang.guang55@zte.com.cn>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Occasionally if a test crashes or is interrupted by the user
at the wrong moment it could leave behind a stale UNIX
socket in /tmp/. This will then cause a subsequent test
run to fail spuriously with
tests/libqtest.c:70:init_socket: assertion failed (ret != -1): (-1 != -1)
if it happens to reuse the same PID.
Defend against this by deleting any stray stale socket before
trying to open the new ones for this test.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1490963801-27870-1-git-send-email-peter.maydell@linaro.org
When running virt-rescue the serial console hangs from time to time.
Virt-rescue runs an ordinary Linux kernel "appliance", but there is
only a single idle process running inside, so the qemu main loop is
largely idle. With virt-rescue >= 1.37 you may be able to observe the
hang by doing:
$ virt-rescue -e ^] --scratch
><rescue> while true; do ls -l /usr/bin; done
The hang in virt-rescue can be resolved by pressing a key on the
serial console.
Possibly with the same root cause, we also observed hangs during very
early boot of regular Linux VMs with a serial console. Those hangs
are extremely rare, but you may be able to observe them by running
this command on baremetal for a sufficiently long time:
$ while libguestfs-test-tool -t 60 >& /tmp/log ; do echo -n . ; done
(Check in /tmp/log that the failure was caused by a hang during early
boot, and not some other reason)
During investigation of this bug, Paolo Bonzini wrote:
> glib is expecting QEMU to use g_main_context_acquire around accesses to
> GMainContext. However QEMU is not doing that, instead it is taking its
> own mutex. So we should add g_main_context_acquire and
> g_main_context_release in the two implementations of
> os_host_main_loop_wait; these should undo the effect of Frediano's
> glib patch.
This patch exactly implements Paolo's suggestion in that paragraph.
This fixes the serial console hang in my testing, across 3 different
physical machines (AMD, Intel Core i7 and Intel Xeon), over many hours
of automated testing. I wasn't able to reproduce the early boot hangs
(but as noted above, these are extremely rare in any case).
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1435432
Reported-by: Richard W.M. Jones <rjones@redhat.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
Message-Id: <20170331205133.23906-1-rjones@redhat.com>
[Paolo: this is actually a glib bug: recent glib versions are also
expecting g_main_context_acquire around g_poll---but that is not
documented and probably not even intended].
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Change the types of variables in allocate_clusters() to int64_t so we do
not have to worry about potential overflows.
Add an assertion that our accesses to s->bat[] do not result in a buffer
overflow and that the implicit conversion performed when invoking
bat_entry_off() does not result in an integer overflow.
Coverity-id: 1307776
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170331170512.10381-1-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Tweak 097 and 176 to operate on an image that is not cluster-aligned,
to give further coverage of clearing out an entire image, including
the recent fix to eliminate the difference between fast path (97) and
slow (176) for qcow2. Also tested on qcow (97 only, since qcow lacks
snapshots).
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 20170331185356.2479-4-eblake@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
There is a subtle difference between the fast (qcow2v3 with no
extra data) and slow path (qcow2v2 format [aka 0.10], or when a
snapshot is present) of qcow2_make_empty(). The slow path fails
to discard the final (partial) cluster of an unaligned image.
The problem stems from the fact that qcow2_discard_clusters() was
silently ignoring sub-cluster head and tail on unaligned requests.
A quick audit of all callers shows that qcow2_snapshot_create() has
always passed a cluster-aligned request since the call was added
in commit 1ebf561; qcow2_co_pdiscard() has passed a cluster-aligned
request since commit ecdbead taught the block layer about preferred
discard alignment; and qcow2_make_empty() was fixed to pass an
aligned start (but not necessarily end) in commit a3e1505.
Asserting that the start is always aligned also points out that we
now have a dead check: rounding the end offset down can never result
in a value less than the aligned start offset (the check was rendered
dead with commit ecdbead). Meanwhile, we do not want to round the
end cluster down in the one case of the end offset matching the
(unaligned) file size - that final partial cluster should still be
discarded.
With those fixes in place, the fast and slow paths are back in sync
at discarding an entire image; the next patch will update
qemu-iotests to ensure we don't regress.
Note that bdrv_co_pdiscard ignores ALL partial cluster requests,
including the partial cluster at the end of an image; it can be
argued that the partial cluster at the end should be special-cased
so that a guest issuing discard requests at proper alignments
everywhere else can likewise empty the entire image. But that
optimization is left for another day.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 20170331185356.2479-3-eblake@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
The previous commit:
commit a3e1505dae
Author: Eric Blake <eblake@redhat.com>
Date: Mon Dec 5 09:49:34 2016 -0600
qcow2: Don't strand clusters near 2G intervals during commit
extended the 097 test case so that it did two passes, once
with an internal snapshot, once without.
qcow (v1) does not support internal snapshots, so this change
broke test 097 when run against qcow.
This splits 097 in two, creating a new 176 that tests the
internal snapshot codepath, effectively putting 097 back
to its content before the above commit.
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <20170221115512.21918-8-berrange@redhat.com>
[eblake: test collisions: s/173/176/g]
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 20170331185356.2479-2-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
It would be a bug for a command with the CMD_NOFILE_OK or
CMD_FLAG_GLOBAL flags set to also set the ct->perms field,
because the former says "OK for a file not to be open"
but the latter is a check on a file.
Add an assertion in qemuio_add_command() so we can catch that
sort of buggy command definition immediately rather than it
being a bug that only manifests when a particular set of
command line options is used.
(Coverity gets confused about this (CID 1371723) and reports
that we might dereference a NULL blk pointer in this case,
because it can't tell that that code path never happens with
the cmdinfo_t that we have. This commit won't help unconfuse
it, but it does fix the underlying issue.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1490967529-4767-1-git-send-email-peter.maydell@linaro.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Commit 831acdc "sheepdog: Implement bdrv_parse_filename()" and commit
d282f34 "sheepdog: Support blockdev-add" have different ideas on how
the QemuOpts parameters for the server address are named. Fix that.
While there, rename BlockdevOptionsSheepdog member addr to server, for
consistency with BlockdevOptionsSsh, BlockdevOptionsGluster,
BlockdevOptionsNbd.
Commit 831acdc's example becomes
--drive driver=sheepdog,server.type=inet,server.host=fido,server.port=7000,vdi=dolly
instead of
--drive driver=sheepdog,host=fido,vdi=dolly
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Tested-by: Kashyap Chamarthy <kchamart@redhat.com>
Message-id: 1490895797-29094-10-git-send-email-armbru@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
SocketAddress is a simple union, and simple unions are awkward: they
have their variant members wrapped in a "data" object on the wire, and
require additional indirections in C. I intend to limit its use to
existing external interfaces, and convert all internal interfaces to
SocketAddressFlat.
BlockdevOptionsNbd is an external interface using SocketAddress. We
already use SocketAddressFlat elsewhere in blockdev-add. Replace it
by SocketAddressFlat while we can (it's new in 2.9) for simplicity and
consistency. For example,
{ "execute": "blockdev-add",
"arguments": { "node-name": "foo", "driver": "nbd",
"server": { "type": "inet",
"data": { "host": "localhost",
"port": "12345" } } } }
becomes
{ "execute": "blockdev-add",
"arguments": { "node-name": "foo", "driver": "nbd",
"server": { "type": "inet",
"host": "localhost", "port": "12345" } } }
Since the internal interfaces still take SocketAddress, this requires
conversion function socket_address_crumple(). It'll go away when I
update the interfaces.
Unfortunately, SocketAddress is also visible in -drive since 2.8:
-drive if=none,driver=nbd,server.type=inet,server.data.host=127.0.0.1,server.data.port=12345
Nobody should be using it, as it's fairly new and has never been
documented, so adding still more compatibility gunk to keep it working
isn't worth the trouble. You now have to use
-drive if=none,driver=nbd,server.type=inet,server.host=127.0.0.1,server.port=12345
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-id: 1490895797-29094-9-git-send-email-armbru@redhat.com
[mreitz: Change iotest 147 accordingly]
Because of this interface change, iotest 147 has to be adapted.
Unfortunately, we cannot just flatten all of the addresses because
nbd-server-start still takes a plain SocketAddress. Therefore, we need
both and this is most easily achieved by writing the SocketAddress into
the code and flattening it where necessary.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170330221243.17333-1-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
SocketAddress is a simple union, and simple unions are awkward: they
have their variant members wrapped in a "data" object on the wire, and
require additional indirections in C. I intend to limit its use to
existing external interfaces. New ones should use SocketAddressFlat.
I further intend to convert all internal interfaces to
SocketAddressFlat. This helper should go away then.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-id: 1490895797-29094-8-git-send-email-armbru@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
qemu_gluster_glfs_init() and qemu_gluster_parse_json() rely on the
fact that SocketAddressFlatType has only two members
SOCKET_ADDRESS_FLAT_TYPE_INET and SOCKET_ADDRESS_FLAT_TYPE_UNIX.
Correct, but won't stay correct. Make them more robust.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1490895797-29094-6-git-send-email-armbru@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
-blockdev and blockdev_add convert their arguments via QObject to
BlockdevOptions for qmp_blockdev_add(), which converts them back to
QObject, then to a flattened QDict. The QDict's members are typed
according to the QAPI schema.
-drive converts its argument via QemuOpts to a (flat) QDict. This
QDict's members are all QString.
Thus, the QType of a flat QDict member depends on whether it comes
from -drive or -blockdev/blockdev_add, except when the QAPI type maps
to QString, which is the case for 'str' and enumeration types.
The block layer core extracts generic configuration from the flat
QDict, and the block driver extracts driver-specific configuration.
Both commonly do so by converting (parts of) the flat QDict to
QemuOpts, which turns all values into strings. Not exactly elegant,
but correct.
However, A few places access the flat QDict directly:
* Most of them access members that are always QString. Correct.
* bdrv_open_inherit() accesses a boolean, carefully. Correct.
* nfs_config() uses a QObject input visitor. Correct only because the
visited type contains nothing but QStrings.
* nbd_config() and ssh_config() use a QObject input visitor, and the
visited types contain non-QStrings: InetSocketAddress members
@numeric, @to, @ipv4, @ipv6. -drive works as long as you don't try
to use them (they're all optional). @to is ignored anyway.
Reproducer:
-drive driver=ssh,server.host=h,server.port=22,server.ipv4,path=p
-drive driver=nbd,server.type=inet,server.data.host=h,server.data.port=22,server.data.ipv4
both fail with "Invalid parameter type for 'data.ipv4', expected: boolean"
Add suitable comments to all these places. Mark the buggy ones FIXME.
"Fortunately", -drive's driver-specific options are entirely
undocumented.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-id: 1490895797-29094-5-git-send-email-armbru@redhat.com
[mreitz: Fixed two typos]
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
We have quite a few switches over SocketAddressKind. Some have case
labels for all enumeration values, others rely on a default label.
Some abort when the value isn't a valid SocketAddressKind, others
report an error then.
Unify as follows. Always provide case labels for all enumeration
values, to clarify intent. Abort when the value isn't a valid
SocketAddressKind, because the program state is messed up then.
Improve a few error messages while there.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1490895797-29094-4-git-send-email-armbru@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Certain features make sense only with certain address families. For
instance, passing file descriptors requires AF_UNIX. Testing
SocketAddress's saddr->type == SOCKET_ADDRESS_KIND_UNIX is obvious,
but problematic: it can't recognize AF_UNIX when type ==
SOCKET_ADDRESS_KIND_FD.
Mark such tests of saddr->type TODO. We may want to check the address
family with getsockname() there.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1490895797-29094-2-git-send-email-armbru@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Recently we expirience hang with iothreads enabled with the following
call trace:
Thread 1 (Thread 0x7fa95efebc80 (LWP 177117)):
0 ppoll () from /lib64/libc.so.6
2 qemu_poll_ns () at qemu-timer.c:313
3 aio_poll () at aio-posix.c:457
4 bdrv_flush () at block/io.c:2641
5 bdrv_close () at block.c:2143
6 bdrv_delete () at block.c:2352
7 bdrv_unref () at block.c:3429
8 blk_remove_bs () at block/block-backend.c:427
9 blk_delete () at block/block-backend.c:178
10 blk_unref () at block/block-backend.c:226
11 object_property_del_all () at qom/object.c:399
12 object_finalize () at qom/object.c:461
13 object_unref () at qom/object.c:898
14 object_property_del_child () at qom/object.c:422
15 qmp_marshal_device_del () at qmp-marshal.c:1145
16 handle_qmp_command () at /usr/src/debug/qemu-2.6.0/monitor.c:3929
Technically bdrv_flush() stucks in
while (rwco.ret == NOT_DONE) {
aio_poll(aio_context, true);
}
but rwco.ret is equal to 0 thus we have missed wakeup. Code investigation
reveals that we do not have performed aio_context_acquire() on this call
stack.
This patch adds missed lock.
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Eric Blake <eblake@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
Message-id: 1490717566-25516-1-git-send-email-den@openvz.org
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
libusbx doesn't exist any more, the fork got merged back to libusb. So
stop using LIBUSBX_API_VERSION and use LIBUSB_API_VERSION instead. For
backward compatibility alias LIBUSB_API_VERSION to LIBUSBX_API_VERSION
in case we figure LIBUSB_API_VERSION isn't defined.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20170403105238.23262-1-kraxel@redhat.com
The C store helper functions take the address argument as a
target_ulong type; if this is 32 bit but the host is 64 bit
then the SPARC calling convention requires that the caller
must zero extend the value. We weren't doing this, which
meant we could pass values to the caller with high bits set
and QEMU would crash if it was compiled with optimizations.
In particular, the i386 BIOS would not start.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1490871151-29029-3-git-send-email-peter.maydell@linaro.org
Reviewed-by: Richard Henderson <rth@twiddle.net>
The C store helper functions take the data argument as a uint8_t,
uint16_t, etc depending on the store size. The SPARC calling
convention requires that data types smaller than the register
size must be extended by the caller. We weren't doing this,
which meant that if QEMU was compiled with optimizations enabled
we could end up storing incorrect values to guest memory.
(In particular the i386 guest BIOS would crash on startup.)
Add code to the trampolines that call the store helpers to
do the zero extension as required.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 1490871151-29029-2-git-send-email-peter.maydell@linaro.org
Reviewed-by: Richard Henderson <rth@twiddle.net>
MemoryRegionCache did not know about virtio support for IOMMUs (because the
two features were developed at the same time). Revert MemoryRegionCache
to "normal" address_space_* operations for 2.9, as it is simpler than
undoing the virtio patches.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
ppc patch queue 2017-04-03
A single bugfix in this pull request, for an ugly assert() failure, if
the user ignores the information in query-hotpluggable-cpus and tries
to hot add CPUs to pseries with bad parameters.
# gpg: Signature made Mon 03 Apr 2017 11:06:58 BST
# gpg: using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-2.9-20170403:
pseries: Enforce homogeneous threads-per-core
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The evdev devices in input-linux.c are read in blocks of one whole
event. If there are not enough bytes available, they are discarded,
instead of being kept for the next read operation. This results in
lost events, of even non-working devices.
This patch keeps track of the number of bytes to be read to fill up
a whole event, and then handle it.
Changes from v1 to v2:
- Fix: Calculate offset on each iteration
Changes from v2 to v3:
- Fix coding style
- Store offset instead of bytes to be read
Signed-off-by: Javier Celaya <jcelaya@gmail.com>
Message-id: 20170327182624.2914-1-jcelaya@gmail.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
When done processing a endpoint ring we must update the dequeue pointer
in the endpoint context in guest memory. This is needed to make sure
the guest has a correct view of things and also to make live migration
work properly, because xhci post_load restores alot of the state from
xhci data structures in guest memory.
Add xhci_set_ep_state() call to do that.
The recursive calls stopped by commit
ddb603ab6c had the (unintentional) side
effect to hiding this bug. xhci_set_ep_state() was called before
processing, to set the state to running, which updated the dequeue
pointer too.
Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Tested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-id: 20170331102521.29253-1-kraxel@redhat.com
For reasons that may be useful in future, CPU core objects, as used on the
pseries machine type have their own nr-threads property, potentially
allowing cores with different numbers of threads in the same system.
If the user/management uses the values specified in query-hotpluggable-cpus
as they're expected to do, this will never matter in pratice. But that's
not actually enforced - it's possible to manually specify a core with
a different number of threads from that in -smp. That will confuse the
platform - most immediately, this can be used to create a CPU thread with
index above max_cpus which leads to an assertion failure in
spapr_cpu_core_realize().
For now, enforce that all cores must have the same, standard, number of
threads.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
If the user has explicitly specified a block driver and thus a protocol,
we have to make sure the URL's protocol prefix matches. Otherwise the
latter will silently override the former which might catch some users by
surprise.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20170331120431.1767-3-mreitz@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
Commit c7cacb3 accidentally broke legacy key-value parsing through
pseudo-filename parsing of -drive file=rbd://..., for any key that
contains an escaped ':'. Such a key is surprisingly common, thanks
to mon_host specifying a 'host:port' string. The break happens
because passing things from QDict through QemuOpts back to another
QDict requires that we pack our parsed key/value pairs into a string,
and then reparse that string, but the intermediate string that we
created ("key1=value1:key2=value2") lost the \: escaping that was
present in the original, so that we could no longer see which : were
used as separators vs. those used as part of the original input.
Fix it by collecting the key/value pairs through a QList, and
sending that list on a round trip through a JSON QString (as in
'["key1","value1","key2","value2"]') on its way through QemuOpts,
rather than hand-rolling our own string. Since the string is only
handled internally, this was faster than creating a full-blown
struct of '[{"key1":"value1"},{"key2":"value2"}]', and safer at
guaranteeing order compared to '{"key1":"value1","key2":"value2"}'.
It would be nicer if we didn't have to round-trip through QemuOpts
in the first place, but that's a much bigger task for later.
Reproducer:
./x86_64-softmmu/qemu-system-x86_64 -nodefaults -nographic -qmp stdio \
-drive 'file=rbd:volumes/volume-ea141b5c-cdb3-4765-910d-e7008b209a70'\
':id=compute:key=AQAVkvxXAAAAABAA9ZxWFYdRmV+DSwKr7BKKXg=='\
':auth_supported=cephx\;none:mon_host=192.168.1.2\:6789'\
',format=raw,if=none,id=drive-virtio-disk0,'\
'serial=ea141b5c-cdb3-4765-910d-e7008b209a70,cache=writeback'
Even without an RBD setup, this serves a test of whether we get
the incorrect parser error of:
qemu-system-x86_64: -drive file=rbd:...cache=writeback: conf option 6789 has no value
or the correct behavior of hanging while trying to connect to
the requested mon_host of 192.168.1.2:6789.
Reported-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170331152730.12514-1-eblake@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
This reverts commit c2b2e158cc.
The original patch intend to prevent linux i915 driver from using
stolen meory. But this patch breaks windows IGD driver loading on
Gen9+, as IGD HW will use stolen memory on Gen9+, once windows IGD
driver see zero size stolen memory, it will unload.
Meanwhile stolen memory will be disabled in 915 when i915 run as
a guest.
Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
[aw: Gen9+ is SkyLake and newer]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
HMP pull (one bugfix)
# gpg: Signature made Fri 31 Mar 2017 11:57:17 BST
# gpg: using RSA key 0x0516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A 9FA9 0516 331E BC5B FDE7
* remotes/dgilbert/tags/pull-hmp-20170331:
hmp: fix "dump-quest-memory" segfault
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
qemu-ga patch queue for 2.9
* fix make check failure of guest-get-fsinfo when nested virtual block
device partitions are mounted in the test environment
* fix static compilation for mingw builds
# gpg: Signature made Fri 31 Mar 2017 04:52:40 BST
# gpg: using RSA key 0x3353C9CEF108B584
# gpg: Good signature from "Michael Roth <flukshun@gmail.com>"
# gpg: aka "Michael Roth <mdroth@utexas.edu>"
# gpg: aka "Michael Roth <mdroth@linux.vnet.ibm.com>"
# Primary key fingerprint: CEAC C9E1 5534 EBAB B82D 3FA0 3353 C9CE F108 B584
* remotes/mdroth/tags/qga-pull-2017-03-30-tag:
qga: Make qemu-ga compile statically for Windows
qga: don't fail if mount doesn't have slave devices
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
# gpg: Signature made Fri 31 Mar 2017 01:50:55 BST
# gpg: using RSA key 0xEF04965B398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F 3562 EF04 965B 398D 6211
* remotes/jasowang/tags/net-pull-request:
e1000: disable debug by default
virtio-net: avoid call tap_enable when there's only one queue
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Attempting to compile qemu-ga statically as follows for Windows causes
the following error:
Compilation:
./configure --disable-docs --target-list=x86_64-softmmu \
--cross-prefix=x86_64-w64-mingw32- --static \
--enable-guest-agent-msi --with-vss-sdk=/path/to/VSSSDK72
make -j8 qemu-ga
Error:
path/to/qemu/stubs/error-printf.c:7: undefined reference to `__imp_g_test_config_vars'
collect2: error: ld returned 1 exit status
Makefile:444: recipe for target 'qemu-ga.exe' failed
make: *** [qemu-ga.exe] Error 1
This is caused by a bug in the pkg-config file for glib as it doesn't define
GLIB_STATIC_COMPILATION for pkg-config --static.
Signed-off-by: Sameeh Jubran <sameeh@daynix.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
We call tap_enable() even if for multiqueue is not enabled. This is
wrong since it should be used for multiqueue codes to enable a
disabled queue. Fixing this by only calling this when multiqueue is
used.
Fixes: 16dbaf905b ("tap: support enabling or disabling a queue")
Reported-by: Andrew Baumann <Andrew.Baumann@microsoft.com>
Tested-by: Andrew Baumann <Andrew.Baumann@microsoft.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Jason Wang <jasowang@redhat.com>
In some cases the slave devices of a virtual block device are tracked
by the parent in the corresponding sysfs node. For instance, if we
have a loop-back mount of the form:
/dev/loop3p1 on /home/mdroth/mnt type ext4 (rw,relatime,data=ordered)
this will be reflected in sysfs as:
/sys/devices/virtual/block/loop3/
...
/sys/devices/virtual/block/loop3/slaves
/sys/devices/virtual/block/loop3/loop3p1
The current code however assumes the mounted virtual block device,
loop3p1 in this case, contains the slaves directory, and reports an
error otherwise. This breaks 'make check' in certain environments.
Fix this by simply skipping attempts to generate disk topology
information in these cases. Since this information is documented
in QAPI as optionally-reported, this should be ok from an API
perspective.
In the future, this can possibly be improved upon by collecting
topology information from the parent in these cases.
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Cc: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Tested-by: Peter Maydell <peter.maydell@linaro.org>
vhost, pc: fixes
More fixes for 2.9. Region caching is still causing
issues around reset, but we seem to be getting there.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Thu 30 Mar 2017 17:14:45 BST
# gpg: using RSA key 0x281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>"
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* remotes/mst/tags/for_upstream:
tests/acpi: don't pack a structure
vhost: generalize iommu memory region
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
There's no reason to pack structures where we don't care about size or
padding, this applies to AcpiStdTable in tests/acpi-utils.h.
OTOH bios-tables-test happens to be passing the address of a field in
this struct to a function that expects a pointer to normally aligned
data which results in a SIGBUS on architectures like SPARC that have
strict alignment requirements.
Fixes: 9e8458c02 ("acpi unit-test: compare DSDT and SSDT tables against expected values")
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Peter Maydell <peter.maydell@linaro.org>
We assumes the iommu_ops were attached to the root region of address
space. This may not be true for all kinds of IOMMU implementation and
especially after commit 3716d5902d ("pci: introduce a bus master
container"). So fix this by not assuming as->root has iommu_ops,
instead depending on the regions reported by memory listener through:
- register a memory listener to dma_as
- during region_add, if it's a region of IOMMU, register a specific
IOMMU notifier, and store all notifiers in a list.
- during region_del, compare and delete the IOMMU notifier from the list
This is also a must for making vhost device IOTLB works for all types
of IOMMUs. Note, since we register one notifier during each
.region_add, the IOTLB may be flushed more than one times, this is
suboptimal and could be optimized in the future.
Reported-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Fixes: 3716d5902d ("pci: introduce a bus master container")
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com>
slirp updates
# gpg: Signature made Tue 28 Mar 2017 23:51:51 BST
# gpg: using RSA key 0xB0A51BF58C9179C5
# gpg: Good signature from "Samuel Thibault <samuel.thibault@aquilenet.fr>"
# gpg: aka "Samuel Thibault <sthibault@debian.org>"
# gpg: aka "Samuel Thibault <samuel.thibault@gnu.org>"
# gpg: aka "Samuel Thibault <samuel.thibault@inria.fr>"
# gpg: aka "Samuel Thibault <samuel.thibault@labri.fr>"
# gpg: aka "Samuel Thibault <samuel.thibault@ens-lyon.org>"
# gpg: aka "Samuel Thibault <samuel.thibault@u-bordeaux.fr>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 900C B024 B679 31D4 0F82 304B D017 8C76 7D06 9EE6
# Subkey fingerprint: AEBF 7448 FAB9 453A 4552 390E B0A5 1BF5 8C91 79C5
* remotes/thibault/tags/samuel-thibault:
slirp: Send RDNSS in RA only if host has an IPv6 DNS server
slirp: Make RA build more flexible
slirp: fix compilation errors with DEBUG set
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
virtio, pci: fixes
More fixes for 2.9.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Wed 29 Mar 2017 00:35:49 BST
# gpg: using RSA key 0x281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>"
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* remotes/mst/tags/for_upstream:
virtio: fix vring_align() on 64-bit windows
pci: Add missing drop of bus master AS reference
event_notifier: prevent accidental use after close
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The change in commit 898be3e041 which made completely
unrecognized OSes cause an error_exit "Unsupported host OS"
has some unfortunate unintended effects:
* if you run 'configure --help' on an unsupported host OS
(eg if intending to use it as a build machine for a
cross compile to a supported host) then the message
is printed instead of --help
* if the C compiler doesn't work or is missing (eg if
you passed an incorrect --cross-prefix by mistake)
the message is printed instead of the more useful
'compiler does not exist or does not work' message
Fix this by postponing the error_exit in this situation
until later, when we have already identified the more
useful cases for this.
The long term fix for this would be to move handling
of --help much further up in the configure script,
and make its output not dependent on checks that configure
runs. However for 2.9 this would be too invasive.
Reported-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Tested-by: Stefan Weil <sw@weilnetz.de>
If, once the kernel has booted, we try to remove a memory
hotplugged while the kernel was not started, QEMU crashes on
an assert:
qemu-system-ppc64: hw/virtio/vhost.c:651:
vhost_commit: Assertion `r >= 0' failed.
...
#4 in vhost_commit
#5 in memory_region_transaction_commit
#6 in pc_dimm_memory_unplug
#7 in spapr_memory_unplug
#8 spapr_machine_device_unplug
#9 in hotplug_handler_unplug
#10 in spapr_lmb_release
#11 in detach
#12 in set_allocation_state
#13 in rtas_set_indicator
...
If we take a closer look to the guest kernel log, we can see when
we try to unplug the memory:
pseries-hotplug-mem: Attempting to hot-add 4 LMB(s)
What happens:
1- The kernel has ignored the memory hotplug event because
it was not started when it was generated.
2- When we hot-unplug the memory,
QEMU starts to remove the memory,
generates an hot-unplug event,
and signals the kernel of the incoming new event
3- as the kernel is started, on the QEMU signal, it reads
the event list, decodes the hotplug event and tries to
finish the hotplugging.
4- QEMU receive the the hotplug notification while it
is trying to hot-unplug the memory. This moves the memory
DRC to an invalid state
This patch prevents this by not allowing to set the allocation
state to USABLE while the DRC is awaiting release.
RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1432382
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Running postcopy-test with ASAN produces the following error:
QTEST_QEMU_BINARY=ppc64-softmmu/qemu-system-ppc64 tests/postcopy-test
...
=================================================================
==23641==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7f1556600000 at pc 0x55b8e9d28208 bp 0x7f1555f4d3c0 sp 0x7f1555f4d3b0
READ of size 8 at 0x7f1556600000 thread T6
#0 0x55b8e9d28207 in htab_save_first_pass /home/elmarco/src/qq/hw/ppc/spapr.c:1528
#1 0x55b8e9d2939c in htab_save_iterate /home/elmarco/src/qq/hw/ppc/spapr.c:1665
#2 0x55b8e9beae3a in qemu_savevm_state_iterate /home/elmarco/src/qq/migration/savevm.c:1044
#3 0x55b8ea677733 in migration_thread /home/elmarco/src/qq/migration/migration.c:1976
#4 0x7f15845f46c9 in start_thread (/lib64/libpthread.so.0+0x76c9)
#5 0x7f157d9d0f7e in clone (/lib64/libc.so.6+0x107f7e)
0x7f1556600000 is located 0 bytes to the right of 2097152-byte region [0x7f1556400000,0x7f1556600000)
allocated by thread T0 here:
#0 0x7f159bb76980 in posix_memalign (/lib64/libasan.so.3+0xc7980)
#1 0x55b8eab185b2 in qemu_try_memalign /home/elmarco/src/qq/util/oslib-posix.c:106
#2 0x55b8eab186c8 in qemu_memalign /home/elmarco/src/qq/util/oslib-posix.c:122
#3 0x55b8e9d268a8 in spapr_reallocate_hpt /home/elmarco/src/qq/hw/ppc/spapr.c:1214
#4 0x55b8e9d26e04 in ppc_spapr_reset /home/elmarco/src/qq/hw/ppc/spapr.c:1261
#5 0x55b8ea12e913 in qemu_system_reset /home/elmarco/src/qq/vl.c:1697
#6 0x55b8ea13fa40 in main /home/elmarco/src/qq/vl.c:4679
#7 0x7f157d8e9400 in __libc_start_main (/lib64/libc.so.6+0x20400)
Thread T6 created by T0 here:
#0 0x7f159bae0488 in __interceptor_pthread_create (/lib64/libasan.so.3+0x31488)
#1 0x55b8eab1d9cb in qemu_thread_create /home/elmarco/src/qq/util/qemu-thread-posix.c:465
#2 0x55b8ea67874c in migrate_fd_connect /home/elmarco/src/qq/migration/migration.c:2096
#3 0x55b8ea66cbb0 in migration_channel_connect /home/elmarco/src/qq/migration/migration.c:500
#4 0x55b8ea678f38 in socket_outgoing_migration /home/elmarco/src/qq/migration/socket.c:87
#5 0x55b8eaa5a03a in qio_task_complete /home/elmarco/src/qq/io/task.c:142
#6 0x55b8eaa599cc in gio_task_thread_result /home/elmarco/src/qq/io/task.c:88
#7 0x7f15823e38e6 (/lib64/libglib-2.0.so.0+0x468e6)
SUMMARY: AddressSanitizer: heap-buffer-overflow /home/elmarco/src/qq/hw/ppc/spapr.c:1528 in htab_save_first_pass
index seems to be wrongly incremented, unless I miss something that
would be worth a comment.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
long is 32-bits on 64-bit windows, which caused the top half of the
address to be truncated; this patch changes it to use the
QEMU_ALIGN_UP macro which does not suffer the same problem
Signed-off-by: Andrew Baumann <Andrew.Baumann@microsoft.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
The recent introduction of a bus master container added
memory_region_add_subregion() into the PCI device registering path but
missed memory_region_del_subregion() in the unregistering path leaving
a reference to the root memory region of the new container.
This adds missing memory_region_del_subregion().
Fixes: 3716d5902d ("pci: introduce a bus master container")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Let's set the handles to the underlying facilities to their extremal
value so no accidental misuse can happen, and to make it obvious that the
notifier is dysfunctional. E.g. if we just close an fd but do not touch
the int holding the fd eventually a read/write could succeed again when
the fd gets reused, and corrupt the file addressed by the fd.
Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Previously we would always send an RDNSS option in the RA, making the guest
try to resolve DNS through IPv6, even if the host does not actually have
and IPv6 DNS server available.
This makes the RDNSS option enabled only when an IPv6 DNS server is
available.
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Do not hardcode the RA size at all, use a pl_size variable which
accounts the accumulated size, and fill rip->ip_pl at the end.
This will allow to make some blocks optional.
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
The existing code for "host" and "max" CPU models overrides every
single feature in the CPU object at realize time, even the ones
that were explicitly enabled or disabled by the user using
"feat=on" or "feat=off", while features set using +feat/-feat are
kept.
This means "-cpu host,+invtsc" works as expected, while
"-cpu host,invtsc=on" doesn't.
This was a known bug, already documented in a comment inside
x86_cpu_expand_features(). What makes this bug worse now is that
libvirt 3.0.0 and newer now use "feat=on|off" instead of
+feat/-feat when it detects a QEMU version that supports it (see
libvirt commit d47db7b16dd5422c7e487c8c8ee5b181a2f9cd66).
Change the feature property getter/setter to set a
env->user_features field, to keep track of features that were
explicitly changed using QOM properties. Then make the
max_features code not override user features when handling "-cpu
host" and "-cpu max".
This will also allow us to remove the plus_features/minus_features
hack in the future, but I plan to do that after 2.9.0 is
released.
Reported-by: Jiri Denemark <jdenemar@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20170327144815.8043-3-ehabkost@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Tested-by: Jiri Denemark <jdenemar@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Instead of passing a pointer to the feature property getter and
setter functions, pass a FeatureWord enum so they can perform
other actions related to the feature flag.
This will be used to add a new "user_features" field to keep
track of features that were explicitly set by the user.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20170327144815.8043-2-ehabkost@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Tested-by: Jiri Denemark <jdenemar@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
We first snprintf() to a fixed buffer, then g_strdup() the result
*boggle*.
Worse, the size of the fixed buffer INET6_ADDRSTRLEN + 5 + 4 is bogus:
the 4 correctly accounts for '[', ']', ':' and '\0', but
INET6_ADDRSTRLEN is not a suitable limit for inet->host, and 5 is not
one for inet->port! They are for host and port in *numeric* form
(exploiting that INET6_ADDRSTRLEN > INET_ADDRSTRLEN), but inet->host
can also be a hostname, and inet->port can be a service name, to be
resolved with getaddrinfo().
Fortunately, the only user so far is the "socket" network backend's
net_socket_connected(), which uses it to initialize a NetSocketState's
info_str[]. info_str[] has considerable more space: 256 instead of
55. So the bug's impact appears to be limited to truncated "info
networks" with the "socket" network backend.
The fix is obvious: use g_strdup_printf().
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1490268208-23368-1-git-send-email-armbru@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
qemu_rbd_open() takes option parameters as a flattened QDict, with
keys of the form server.%d.host, server.%d.port, where %d counts up
from zero.
qemu_rbd_array_opts() extracts these values as follows. First, it
calls qdict_array_entries() to find the list's length. For each list
element, it formats the list's key prefix (e.g. "server.0."), then
creates a new QDict holding the options with that key prefix, then
converts that to a QemuOpts, so it can finally get the member values
from there.
If there's one surefire way to make code using QDict more awkward,
it's creating more of them and mixing in QemuOpts for good measure.
The extraction of keys starting with server.%d into another QDict
makes us ignore parameters like server.0.neither-host-nor-port
silently.
The conversion to QemuOpts abuses runtime_opts, as described a few
commits ago.
Rewrite to simply get the values straight from the options QDict.
Fixes -drive not to crash when server.*.* are present, but
server.*.host is absent.
Fixes -drive to reject invalid server.*.*.
Permits cleaning up runtime_opts. Do that, and fix -drive to reject
bogus parameters host and port instead of silently ignoring them.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1490691368-32099-11-git-send-email-armbru@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
This reverts a part of commit 8a47e8e. We're having second thoughts
on the QAPI schema (and thus the external interface), and haven't
reached consensus, yet. Issues include:
* BlockdevOptionsRbd member @password-secret isn't actually a
password, it's a key generated by Ceph.
* We're not sure where member @password-secret belongs (see the
previous commit).
* How @password-secret interacts with settings from a configuration
file specified with @conf is undocumented.
Let's avoid painting ourselves into a corner now, and revert the
feature for 2.9.
Note that users can still configure an authentication key with a
configuration file. They probably do that anyway if they use Ceph
outside QEMU as well.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1490691368-32099-10-git-send-email-armbru@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
This reverts half of commit 0a55679. We're having second thoughts on
the QAPI schema (and thus the external interface), and haven't reached
consensus, yet. Issues include:
* The implementation uses deprecated rados_conf_set() key
"auth_supported". No biggie.
* The implementation makes -drive silently ignore invalid parameters
"auth" and "auth-supported.*.X" where X isn't "auth". Fixable (in
fact I'm going to fix similar bugs around parameter server), so
again no biggie.
* BlockdevOptionsRbd member @password-secret applies only to
authentication method cephx. Should it be a variant member of
RbdAuthMethod?
* BlockdevOptionsRbd member @user could apply to both methods cephx
and none, but I'm not sure it's actually used with none. If it
isn't, should it be a variant member of RbdAuthMethod?
* The client offers a *set* of authentication methods, not a list.
Should the methods be optional members of BlockdevOptionsRbd instead
of members of list @auth-supported? The latter begs the question
what multiple entries for the same method mean. Trivial question
now that RbdAuthMethod contains nothing but @type, but less so when
RbdAuthMethod acquires other members, such the ones discussed above.
* How BlockdevOptionsRbd member @auth-supported interacts with
settings from a configuration file specified with @conf is
undocumented. I suspect it's untested, too.
Let's avoid painting ourselves into a corner now, and revert the
feature for 2.9.
Note that users can still configure authentication methods with a
configuration file. They probably do that anyway if they use Ceph
outside QEMU as well.
Further note that this doesn't affect use of key "auth-supported" in
-drive file=rbd:...:key=value.
qemu_rbd_array_opts()'s parameter @type now must be RBD_MON_HOST,
which is silly. This will be cleaned up shortly.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1490691368-32099-9-git-send-email-armbru@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
runtime_opts is used for three different purposes:
* qemu_rbd_open() uses it to accept options it recognizes, such as
"pool" and "image". Other .bdrv_open() methods do it similarly.
* qemu_rbd_open() accepts additional list-valued options
auth-supported and server, with the help of qemu_rbd_array_opts().
The list elements are again dictionaries. qemu_rbd_array_opts()
uses runtime_opts to accept their members. Thus, runtime_opts
contains recognized sub-sub-options "auth", "host", "port" in
addition to recognized options. No other block driver does that.
* qemu_rbd_create() uses it to convert the QDict produced by
qemu_rbd_parse_filename() to QemuOpts. No other block driver does
that. The keys produced by qemu_rbd_parse_filename() are "pool",
"image", "snapshot", "conf", "user" and "keyvalue-pairs".
qemu_rbd_open() accepts these, so no additional ones here.
This is a confusing mess. Dates back to commit 0f9d252. First step
to clean it up is documenting runtime_opts.desc[]:
* Reorder entries to match the QAPI schema, like we do in other block
drivers.
* Document why the schema's "server" and "auth-supported" aren't in
.desc[].
* Document why "keyvalue-pairs", "host", "port" and "auth" are in
.desc[], but not the schema.
* Delete "filename", because none of the three users actually uses it.
This fixes -drive to reject parameter filename instead of silently
ignoring it.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1490691368-32099-7-git-send-email-armbru@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
The way we communicate extra key-value pairs from
qemu_rbd_parse_filename() to qemu_rbd_open() exposes option parameter
"keyvalue-pairs" on the command line. It's not wanted there. Hack:
rename the parameter to "=keyvalue-pairs" to make it inaccessible.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1490691368-32099-6-git-send-email-armbru@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
We laboriously enforce that parameter values are between one and some
arbitrary limit in length. Only RBD_MAX_IMAGE_NAME_SIZE comes from
librbd.h, and I'm not sure it applies. Where the other limits come
from is unclear.
Drop the length checking. The limits librbd actually imposes must be
checked by librbd anyway.
There's one minor complication: BDRVRBDState member name is a
fixed-size array. Depends on the length limit. Make it a pointer to
a dynamically allocated string.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1490691368-32099-4-git-send-email-armbru@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
qemu_rbd_open() neglects to check pool and image are present. Missing
image is caught by rbd_open(), but missing pool crashes. Reproducer:
$ qemu-system-x86_64 -nodefaults -drive driver=rbd,id=rbd,image=i,...
terminate called after throwing an instance of 'std::logic_error'
what(): basic_string::_M_construct null not valid
Aborted (core dumped)
where ... is a working server.0.{host,port} configuration.
Doesn't affect -drive with file=..., because qemu_rbd_parse_filename()
always sets both pool and image.
Doesn't affect -blockdev, because pool and image are mandatory in the
QAPI schema.
Fix by adding the missing checks.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1490691368-32099-3-git-send-email-armbru@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
We use InetSocketAddress in the QAPI schema. However, the code
doesn't use inet_connect_saddr(), but formats "host" and "port" into a
configuration string for rados_conf_set(). Thus, members "numeric",
"to", "ipv4" and "ipv6" are silently ignored. Not nice. Example:
-blockdev rbd,node-name=nn,pool=p,image=i,server.0.host=h0,server.0.port=12345,server.0.ipv4=off
Factor a suitable InetSocketAddressBase out of InetSocketAddress, and
use that. "numeric", "to", "ipv4" and "ipv6" are now rejected.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1490691368-32099-2-git-send-email-armbru@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
It's been a long journey, but here we are.
The supported blockdev-add is not compatible to its experimental
predecessors; bump all Since: tags to 2.9.
x-blockdev-remove-medium, x-blockdev-insert-medium and
x-blockdev-change need a bit more work, so leave them alone for now.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
MTTCG regression fixes for rc2
# gpg: Signature made Tue 28 Mar 2017 10:54:38 BST
# gpg: using RSA key 0xFBD0DB095A9E2A44
# gpg: Good signature from "Alex Bennée (Master Work Key) <alex.bennee@linaro.org>"
# Primary key fingerprint: 6685 AE99 E751 67BC AFC8 DF35 FBD0 DB09 5A9E 2A44
* remotes/stsquad/tags/pull-mttcg-fixups-for-rc2-280317-1:
replay/replay.c: bump REPLAY_VERSION
tcg: Add a new line after incompatibility warning
ui/console: use exclusive mechanism directly
ui/console: ensure do_safe_dpy_refresh holds BQL
bsd-user: align use of mmap_lock to that of linux-user
user-exec: handle synchronous signals from QEMU gracefully
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Commit 0ab8ed18a6 ("trace: switch to
modular code generation for sub-directories") forgot to convert "tcg"
trace events to the modular code generation approach where each
sub-directory has its own trace-events file.
This patch fixes compilation for "tcg" trace events. Currently they are
only used in the root ./trace-events file.
"tcg" trace events can only be used in the root ./trace-events file for
the time being.
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Tested-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20170327131718.18268-1-stefanha@redhat.com
Suggested-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Parallels driver should not call bdrv_truncate if the image was opened
in the read-only mode. Without the patch
qemu-img check harddisk.hds
asserts with
bdrv_truncate: Assertion `child->perm & BLK_PERM_RESIZE' failed.
Parameters used on the write path are not needed if the image is opened
in the read-only mode.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reported-by: Edgar Kaziahmedov <edos@virtuozzo.mipt.ru>
Message-id: 1490625488-7980-1-git-send-email-den@openvz.org
CC: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
A previous commit (3d4d16f4) added support for audio record/playback.
However this breaks the logfile ABI due to the re-ordering of the
ReplayEvents enum. The REPLAY_VERSION check is meant to prevent you
from using old log files in newer QEMUs but this is currently broken.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
The previous commit (8bb93c6f99) using async_safe_run_on_cpu() doesn't
work on graphics sub-system which restrict which threads can do GUI
updates. Rather the special casing MacOS we just directly call the
helper and move all the exclusive handling into do_dafe_dpy_refresh().
The unfortunate bouncing of the BQL is to ensure there is no deadlock
as vCPUs waiting on the BQL are kicked into their quiescent state.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
I missed the fact that when an exclusive work item runs it drops the
BQL to ensure all no vCPUs are stuck waiting for it, hence causing a
deadlock. However the actual helper needs to take the BQL especially
as we'll be messing with device emulation bits during the update which
all assume BQL is held.
We make a minor cpu_reloading_memory_map which must try and unlock the
RCU if we are actually outside the running context.
Reported-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
The introduction of stricter mmap_lock checking in translate-all broke
the BSD user build. The working mmap_lock functions were hidden behind
CONFIG_USE_NPTL which is never defined. This patch brings them inline
with linux-user.
Despite the disapearence of the comment "We aren't threadsafe to start
with..." this doesn't make bsd-user so. It will still need the rest of
the fixes that have been done in linux-user ported over.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
When "tcg: enable thread-per-vCPU" (commit 3725794) was merged the
lifetime of current_cpu was changed. Previously a broken linux-user
call might abort() which can eventually escalate into a SIGSEGV which
would then crash qemu as it attempted to deref a NULL current_cpu.
After commit 3725794 it would attempt to fixup state and re-start the
run-loop and much hilarity (i.e. a looping lockup) would ensue from
jumping into a stale jmp_env.
As we can actually tell if we are in the run-loop from looking at the
cpu->running flag we should catch this badness first and abort()
cleanly rather than try to soldier on. There is a theoretical race
between the flag being set and sigsetjmp refreshing the jump buffer
but we can try really hard to not introduce crashes into that code.
[LV: setgroups03 fails on powerpc LTP]
Reported-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
This series fixes potential memory/fd leaks in 9pfs and a crash when
running tests/virtio-9p-test on SPARC hosts.
# gpg: Signature made Tue 28 Mar 2017 09:44:05 BST
# gpg: using DSA key 0x02FC3AEB0101DBC2
# gpg: Good signature from "Greg Kurz <groug@kaod.org>"
# gpg: aka "Greg Kurz <groug@free.fr>"
# gpg: aka "Greg Kurz <gkurz@linux.vnet.ibm.com>"
# gpg: aka "Gregory Kurz (Groug) <groug@free.fr>"
# gpg: aka "[jpeg image of size 3330]"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 2BD4 3B44 535E C0A7 9894 DBA2 02FC 3AEB 0101 DBC2
* remotes/gkurz/tags/for-upstream:
tests/virtio-9p-test: Don't call le*_to_cpus on fields of packed struct
9pfs: fix file descriptor leak
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
For a packed struct like 'P9Hdr' the fields within it may not be
aligned as much as the natural alignment for their types. This means
it is not valid to pass the address of such a field to a function
like le32_to_cpus() which operate on uint32_t* and assume alignment.
Doing this results in a SIGBUS on hosts like SPARC which have strict
alignment requirements.
Use ldl_le_p() instead, which is specified to correctly handle
unaligned pointers.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
The v9fs_create() and v9fs_lcreate() functions are used to create a file
on the backend and to associate it to a fid. The fid shouldn't be already
in-use, otherwise both functions may silently leak a file descriptor or
allocated memory. The current code doesn't check that.
This patch ensures that the fid isn't already associated to anything
before using it.
Signed-off-by: Li Qiang <liqiang6-s@360.cn>
(reworded the changelog, Greg Kurz)
Signed-off-by: Greg Kurz <groug@kaod.org>
* MTTCG fix for win32
* virtio-scsi assertion failure
* mem-prealloc coverity fix
* x86 migration revert which requires more thought
* x86 instruction limit (avoids >2 page translation blocks)
* nbd dead code cleanup
* small memory.c logic fix
# gpg: Signature made Mon 27 Mar 2017 17:03:04 BST
# gpg: using RSA key 0xBFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>"
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1
# Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83
* remotes/bonzini/tags/for-upstream:
scsi-generic: Fill in opt_xfer_len in INQUIRY reply if it is zero
Revert "apic: save apic_delivered flag"
nbd: drop unused NBDClientSession.is_unix field
win32: replace custom mutex and condition variable with native primitives
mem-prealloc: fix sysconf(_SC_NPROCESSORS_ONLN) failure case.
tcg/i386: Check the size of instruction being translated
virtio-scsi: Fix acquire/release in dataplane handlers
virtio-scsi: Make virtio_scsi_acquire/release public
clear pending status before calling memory commit
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Block patches for 2.9-rc2.
# gpg: Signature made Mon 27 Mar 2017 16:47:54 BST
# gpg: using RSA key 0xF407DB0061D5CF40
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>"
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1 1829 F407 DB00 61D5 CF40
* remotes/maxreitz/tags/pull-block-2017-03-27:
block/file-posix.c: Fix unused variable warning on OpenBSD
file-posix: Make bdrv_flush() failure permanent without O_DIRECT
nbd-client: fix handling of hungup connections
qemu-img: print short help on getopt failure
qemu-img: fix switch indentation in img_amend()
qemu-img: show help for invalid global options
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
When opt_xfer_len is zero, Linux ignores max_xfer_len erroneously.
While that obviously should be fixed, we do older guests a favor to
always filling in a value.
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20170327142625.1249-1-famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Success for bdrv_flush() means that all previously written data is safe
on disk. For fdatasync(), the best semantics we can hope for on Linux
(without O_DIRECT) is that all data that was written since the last call
was successfully written back. Therefore, and because we can't redo all
writes after a flush failure, we have to give up after a single
fdatasync() failure. After this failure, we would never be able to make
the promise that a successful bdrv_flush() makes.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 20170322210005.16533-1-kwolf@redhat.com
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
After the switch to reading replies in a coroutine, nothing is
reentering pending receive coroutines if the connection hangs.
Move nbd_recv_coroutines_enter_all to the reply read coroutine,
which is the place where hangups are detected. nbd_teardown_connection
can simply wait for the reply read coroutine to detect the hangup
and clean up after itself.
This wouldn't be enough though because nbd_receive_reply returns 0
(rather than -EPIPE or similar) when reading from a hung connection.
Fix the return value check in nbd_read_reply_entry.
This fixes qemu-iotests 083.
Reported-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20170314111157.14464-1-pbonzini@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Printing the full help output obscures the error message for an invalid
command-line option or missing argument.
Before this patch:
$ ./qemu-img --foo
...pages of output...
After this patch:
$ ./qemu-img --foo
qemu-img: unrecognized option '--foo'
Try 'qemu-img --help' for more information
This patch adds the getopt ':' character so that it can distinguish
between missing arguments and unrecognized options. This helps provide
more detailed error messages.
Suggested-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20170317104541.28979-4-stefanha@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
QEMU coding style indents 'case' to the same level as the 'switch'
statement:
switch (foo) {
case 1:
Fix this coding style violation so checkpatch.pl doesn't complain about
the next patch.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20170317104541.28979-3-stefanha@redhat.com
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Max Reitz <mreitz@redhat.com>
The qemu-img sub-command executes regardless of invalid global options:
$ qemu-img --foo info test.img
qemu-img: unrecognized option '--foo'
image: test.img
...
The unrecognized option warning may be missed by the user. This can
hide incorrect command-lines in scripts and confuse users.
This patch prints the help information and terminates instead of
executing the sub-command.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20170317104541.28979-2-stefanha@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
This reverts commit 07bfa35477.
The global variable is only read as part of a
apic_reset_irq_delivered();
qemu_irq_raise(s->irq);
if (!apic_get_irq_delivered()) {
sequence, so the value never matters at migration time.
Reported-by: Dr. David Alan Gilbert <dglibert@redhat.com>
Cc: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The multithreaded TCG implementation exposed deadlocks in the win32
condition variables: as implemented, qemu_cond_broadcast waited on
receivers, whereas the pthreads API it was intended to emulate does
not. This was causing a deadlock because broadcast was called while
holding the IO lock, as well as all possible waiters blocked on the
same lock.
This patch replaces all the custom synchronisation code for mutexes
and condition variables with native Windows primitives (SRWlocks and
condition variables) with the same semantics as their POSIX
equivalents. To enable that, it requires a Windows Vista or newer host
OS.
Signed-off-by: Andrey Shedel <ashedel@microsoft.com>
[AB: edited commit message]
Signed-off-by: Andrew Baumann <Andrew.Baumann@microsoft.com>
Message-Id: <20170324220141.10104-1-Andrew.Baumann@microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
vnc server in reverse mode (qemu -vnc localhost:$nr,reverse) interprets
$nr as display number (i.e. with 5900 offset) in recent qemu versions.
Historical and documented behavior is interpreting $nr as port number
though. So we should bring code and documentation in line.
Given that default listening port for viewers is 5500 the 5900 offset is
pretty inconvinient, because it is simply impossible to connect to port
5500. So, lets fix the code not the docs.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1489480018-11443-1-git-send-email-kraxel@redhat.com
virtio_input_send buffers input events until it sees a SYNC. Then it
either sends or drops the entire batch, depending on whether eventq
has enough space available. The case to avoid here is partial sends
where only part of the batch would get to the guest.
Using virtqueue_get_avail_bytes to check the state of eventq was not
correct. The queue may have a smaller number of larger buffers
available so bytes may be enough but the batch would still not be
possible to send, leading to the "Huh? No vq elem available" error.
Instead of checking available bytes, this patch optimistically pops
buffers from the queue and puts them back in case it runs out of
space and the batch needs to be dropped.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Message-id: 1490365490-4854-3-git-send-email-lprosek@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
This was spotted by Coverity, in case where sysconf(_SC_NPROCESSORS_ONLN)
fails and returns -1. This results in memset_num_threads getting set to -1.
Which we then pass to g_new0().
The patch replaces MAX_MEM_PREALLOC_THREAD_COUNT macro with a function call
get_memset_num_threads() to handle sysconf() failure gracefully. In case
sysconf() fails, we fall back to single threaded.
(Spotted by Coverity, CID 1372465.)
Signed-off-by: Jitendra Kolhe <jitendra.kolhe@hpe.com>
Message-Id: <1490079006-32495-1-git-send-email-jitendra.kolhe@hpe.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
After the AioContext lock push down, there is a race between
virtio_scsi_dataplane_start and those "assert(s->ctx &&
s->dataplane_started)", because the latter doesn't isn't wrapped in
aio_context_acquire.
Reproducer is simply booting a Fedora guest with an empty
virtio-scsi-dataplane controller:
qemu-system-x86_64 \
-drive if=none,id=root,format=raw,file=Fedora-Cloud-Base-25-1.3.x86_64.raw \
-device virtio-scsi \
-device scsi-disk,drive=root,bootindex=1 \
-object iothread,id=io \
-device virtio-scsi-pci,iothread=io \
-net user,hostfwd=tcp::10022-:22 -net nic,model=virtio -m 2048 \
--enable-kvm
Fix this by moving acquire/release pairs from virtio_scsi_handle_*_vq to
their callers - and wrap the broken assertions in.
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20170317061447.16243-3-famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The REG_PC define in disas/microblaze.c clashes with a define in
the Linux SPARC system headers:
/home/pm215/qemu/disas/microblaze.c:162:0: error: "REG_PC" redefined [-Werror]
#define REG_PC 32 /* PC */
In file included from /usr/include/signal.h:326:0,
from /home/pm215/qemu/include/qemu/osdep.h:86,
from /home/pm215/qemu/disas/microblaze.c:36:
/usr/include/sparc64-linux-gnu/sys/ucontext.h:96:0: note: this is the location of the previous definition
#define REG_PC (1)
Since the code doesn't actually use the REG_PC define
anywhere, the simplest fix is just to remove it.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Message-id: 1490272961-1128-1-git-send-email-peter.maydell@linaro.org
hw/i386/trace-events has an amdvi_mmio_read trace that is used for
both normal reads (listing the register name, address, size, and
offset) and for an error case (abusing the register name to show
an error message, the address to show the maximum value supported,
then shoehorning address and size into the size and offset
parameters). The change from a wide address to a narrower size
parameter could truncate a (rather-large) bogus read attempt, so
it's better to create a separate dedicated trace with correct types,
rather than abusing the trace mechanism. Broken since its
introduction in commit d29a09c.
[Change trace event argument type from hwaddr to uint64_t since
user-defined types should not be used for trace events. This fixes a
build failure with LTTng UST.
--Stefan]
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
hw/scsi/trace-events lists cmd as the first parameter for both
megasas_iovec_overflow and megasas_iovec_underflow, but the caller
was mistakenly passing cmd->iov_size twice instead of the command
index. Also, trace_megasas_abort_invalid is called with parameters
in the wrong order. Broken since its introduction in commit
e8f943c3.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
block/trace-events lists the parameters for mirror_yield
consistently with other mirror events (cnt just after s, like in
mirror_before_sleep; in_flight last, like in mirror_yield_in_flight).
But the callers were passing parameters in the wrong order, leading
to poor trace messages, including type truncation when there are
more than 4G dirty sectors involved. Broken since its introduction
in commit bd48bde.
While touching this, ensure that all callers use the same type
(uint64_t) for cnt, as a later patch will enable the compiler to do
stricter type-checking.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Commit 9a6d1ac assumed that 'qom-type' could be removed from QemuOpts
with no ill effects. However, this command line proves otherwise:
$ ./x86_64-softmmu/qemu-system-x86_64 -nodefaults -nographic -qmp stdio \
-object rng-random,filename=/dev/urandom,id=rng0 \
-device virtio-rng-pci,rng=rng0
qemu-system-x86_64: -object rng-random,filename=/dev/urandom,id=rng0: Parameter 'qom-type' is missing
Fix the regression by restoring qom-type in opts after its temporary
removal that was needed for the duration of user_creatable_add_opts().
Reported-by: Richard W. M. Jones <rjones@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Message-id: 20170323160315.19696-1-eblake@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
ppc patch queue for 2017-03-23
Just a single bugfix in this batch. It's not strictly in ppc code,
though it's for the pseries machine's benefit. Eduardo suggested it
go through my tree however.
# gpg: Signature made Thu 23 Mar 2017 10:09:17 GMT
# gpg: using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-2.9-20170323:
numa,spapr: align default numa node memory size to 256MB
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
cryptodev fixes
# gpg: Signature made Thu 23 Mar 2017 09:22:44 GMT
# gpg: using RSA key 0x2ED7FDE9063C864D
# gpg: Good signature from "Gonglei <arei.gonglei@huawei.com>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 3EF1 8E53 3459 E6D1 963A 3C05 2ED7 FDE9 063C 864D
* remotes/gonglei/tags/cryptodev-next-20170323:
cryptodev: fix asserting single queue
cryptodev: setiv only when really need
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
We already check for queues == 1 in cryptodev_builtin_init and when that
is not true raise an error. But before that error is reported the
assertion in cryptodev_builtin_cleanup kicks in (because object is being
finalized and freed).
Let's remove assert(queues == 1) form cryptodev_builtin_cleanup as it
does only harm and no good.
Reported-by: Boris Fiuczynski <fiuczy@linux.vnet.ibm.com>
Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
ECB mode cipher doesn't need IV, if we setiv for it then qemu
crypto API would report "Expected IV size 0 not **", so we should
setiv only when the cipher algos really need.
Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
This patch creates inline wrapper functions in xen_common.h for all open
coded calls to xc_hvm_XXX() functions outside of xen_common.h so that use
of xen_xc can be made implicit. This again is in preparation for the move
to using libxendevicemodel.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Anthony Perard <anthony.perard@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Doing this will make the transition to using the new libxendevicemodel
interface less intrusive on the callers of these functions, since using
the new library will require a change of handle.
NOTE: The patch also moves the 'externs' for xen_xc and xen_fmem from
xen_backend.h to xen_common.h, and the declarations from
xen_backend.c to xen-common.c, which is where they belong.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Anthony Perard <anthony.perard@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
An off-by-one in commit 15c2f669e meant that we were failing to
check for unparsed input in all QemuOpts visitors. Recent testsuite
additions show that fixing the obvious bug with bogus fields will
also fix the case of an incomplete list visit; update the tests to
match the new behavior.
Simple testcase:
./x86_64-softmmu/qemu-system-x86_64 -nodefaults -nographic -qmp stdio -numa node,size=1g
failed to diagnose that 'size' is not a valid argument to -numa, and
now once again reports:
qemu-system-x86_64: -numa node,size=1g: Invalid parameter 'size'
See also https://bugzilla.redhat.com/show_bug.cgi?id=1434666
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Tested-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20170322144525.18964-4-eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
A regression in commit 15c2f669e caused us to silently ignore
excess input to the QemuOpts visitor. Later, commit ea4641
accidentally abused that situation, by removing "qom-type" and
"id" from the corresponding QDict but leaving them defined in
the QemuOpts, when using the pair of containers to create a
user-defined object. Note that since we are already traversing
two separate items (a QDict and a QemuOpts), we are already
able to flag bogus arguments, as in:
$ ./x86_64-softmmu/qemu-system-x86_64 -nodefaults -nographic -qmp stdio -object memory-backend-ram,id=mem1,size=4k,bogus=huh
qemu-system-x86_64: -object memory-backend-ram,id=mem1,size=4k,bogus=huh: Property '.bogus' not found
So the only real concern is that when we re-enable strict checking
in the QemuOpts visitor, we do not want to start flagging the two
leftover keys as unvisited. Rearrange the code to clean out the
QemuOpts listing in advance, rather than removing items from the
QDict. Since "qom-type" is usually an automatic implicit default,
we don't have to restore it (this does mean that once instantiated,
QemuOpts is not necessarily an accurate representation of the
original command line - but this is not the first place to do that);
however "id" has to be put back (requiring us to cast away a const).
[As a side note, hmp_object_add() turns a QDict into a QemuOpts,
then calls user_creatable_add_opts() which converts QemuOpts into
a new QDict. There are probably a lot of wasteful conversions like
this, but cleaning them up is a much bigger task than the immediate
regression fix.]
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170322144525.18964-3-eblake@redhat.com>
Tested-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
This lets us hook into drained_begin and drained_end requests from the
backend level, which is particularly useful for making sure that all
jobs associated with a particular node (whether the source or the target)
receive a drain request.
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 20170316212351.13797-4-jsnow@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
Allow block backends to forward drain requests to their devices/users.
The initial intended purpose for this patch is to allow BBs to forward
requests along to BlockJobs, which will want to pause if their associated
BB has entered a drained region.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 20170316212351.13797-3-jsnow@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
The purpose of this shim is to allow us to pause pre-started jobs.
The purpose of *that* is to allow us to buffer a pause request that
will be able to take effect before the job ever does any work, allowing
us to create jobs during a quiescent state (under which they will be
automatically paused), then resuming the jobs after the critical section
in any order, either:
(1) -block_job_start
-block_job_resume (via e.g. drained_end)
(2) -block_job_resume (via e.g. drained_end)
-block_job_start
The problem that requires a startup wrapper is the idea that a job must
start in the busy=true state only its first time-- all subsequent entries
require busy to be false, and the toggling of this state is otherwise
handled during existing pause and yield points.
The wrapper simply allows us to mandate that a job can "start," set busy
to true, then immediately pause only if necessary. We could avoid
requiring a wrapper, but all jobs would need to do it, so it's been
factored out here.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 20170316212351.13797-2-jsnow@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
Streaming or any other block job hangs when performed on a block device
that has a non-default iothread. This happens because the AioContext
is acquired twice by block_job_defer_to_main_loop_bh and then released
only once by BDRV_POLL_WHILE. (Insert rants on recursive mutexes, which
unfortunately are a temporary but necessary evil for iothreads at the
moment).
Luckily, the reason for the double acquisition is simple; the function
acquires the AioContext for both the job iothread and the BDS iothread,
in case the BDS iothread was changed while the job was running. It
is therefore enough to skip the second acquisition when the two
AioContexts are one and the same.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1490118490-5597-1-git-send-email-pbonzini@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
Commit ad07cd6 ("virtio-scsi: always use dataplane path if ioeventfd is
active", 2016-10-30) and 9ffe337 ("virtio-blk: always use dataplane
path if ioeventfd is active", 2016-10-30) broke the virtio 1.0
indirect access registers.
The indirect access registers bypass the ioeventfd, so that virtio-blk
and virtio-scsi now repeatedly try to initialize dataplane instead of
triggering the guest->host EventNotifier. Detect the situation by
checking vq->handle_aio_output; if it is not NULL, trigger the
EventNotifier, which is how the device expects to get notifications
and in fact the only thread-safe manner to deliver them.
Fixes: ad07cd6
Fixes: 9ffe337
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Commit 15c2f669e broke the ability of the QemuOpts visitor to
flag extra input parameters, but the regression went unnoticed
because of missing testsuite coverage. Add a test to cover this;
take the approach already used in 9cb8ef3 of adding a test that
passes (to avoid breaking bisection) but marks with BUG the
behavior that we don't like, so that the actual impact of the
fix in a later patch is easier to see.
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Message-Id: <20170322144525.18964-2-eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
For one thing we shouldn't continue if an error happened, for the other
two steps failing can cause an abort() in error_setg because we reuse
the same errp blindly.
Add error handling checks to fix both issues.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Since commit 224245b ("spapr: Add LMB DR connectors"), NUMA node
memory size must be aligned to 256MB (SPAPR_MEMORY_BLOCK_SIZE).
But when "-numa" option is provided without "mem" parameter,
the memory is equally divided between nodes, but 8MB aligned.
This can be not valid for pseries.
In that case we can have:
$ ./ppc64-softmmu/qemu-system-ppc64 -m 4G -numa node -numa node -numa node
qemu-system-ppc64: Node 0 memory size 0x55000000 is not aligned to 256 MiB
With this patch, we have:
(qemu) info numa
3 nodes
node 0 cpus: 0
node 0 size: 1280 MB
node 1 cpus:
node 1 size: 1280 MB
node 2 cpus:
node 2 size: 1536 MB
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We plan to drop support in a future QEMU release for host OSes
and host architectures for which we have no test machine where
we can build and run tests. For the 2.9 release, make configure
print a warning if it is run on such a host, so that the user
has some warning of the plans and can volunteer to help us
maintain the port if they need it to continue to function.
This commit flags up as deprecated the CPU architectures:
* ia64
* sparc
* anything which we don't have a TCG port for
(and which was presumably using TCI)
and the OSes:
* GNU/kFreeBSD
* DragonFly BSD
* NetBSD
* OpenBSD
* Solaris
* AIX
* Haiku
It also makes entirely unrecognized host OS strings be
rejected rather than treated as if they were Linux (which
likely never worked).
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1490106717-9542-1-git-send-email-peter.maydell@linaro.org
This pull request fixes a potential QEMU hang in 9pfs and two issues
reported by Coverity.
# gpg: Signature made Tue 21 Mar 2017 09:57:58 GMT
# gpg: using DSA key 0x02FC3AEB0101DBC2
# gpg: Good signature from "Greg Kurz <groug@kaod.org>"
# gpg: aka "Greg Kurz <groug@free.fr>"
# gpg: aka "Greg Kurz <gkurz@linux.vnet.ibm.com>"
# gpg: aka "Gregory Kurz (Groug) <groug@free.fr>"
# gpg: aka "[jpeg image of size 3330]"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 2BD4 3B44 535E C0A7 9894 DBA2 02FC 3AEB 0101 DBC2
* remotes/gkurz/tags/for-upstream:
9pfs: proxy: assert if unmarshal fails
9pfs: don't try to flush self and avoid QEMU hang on reset
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
parallels block driver is completely broken since commit
commit 75cdcd1553
Author: Markus Armbruster <armbru@redhat.com>
Date: Tue Feb 21 21:14:08 2017 +0100
option: Fix checking of sizes for overflow and trailing crap
Right now even simple
qemu-io -c "read 512 64k" 1.hds
ends up with
Unexpected error in parse_option_size() at util/qemu-option.c:188:
Parameter 'prealloc-size' expects a non-negative number below 2^64
Aborted (core dumped)
The cure is simple - we should use 'M' as a suffix in default option value
instead of 'MiB'.
Signed-off-by: Edgar Kaziahmedov <edos@virtuozzo.mipt.ru>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Message-id: 1490002022-22653-1-git-send-email-den@openvz.org
CC: Markus Armbruster <armbru@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This reverts commit 1454d33f05.
The string input visitor regression fixed in the previous commit made
visit_type_uint16List() fail on empty input. query_memdev() calls it
via object_property_get_uint16List(). Because it doesn't expect it to
fail, it passes &error_abort, and duly crashes.
Commit 1454d33 "fixes" this crash by making
host_memory_backend_get_host_nodes() return a list containing just
MAX_NODES instead of the empty list. Papers over the regression, and
leads to bogus "info memdev" output, as shown below; revert.
I suspect that if we had bisected the crash back then, we would have
found and fixed the actual bug instead of papering over it.
To reproduce, run HMP command "info memdev" with
$ qemu-system-x86_64 --nodefaults -S -display none -monitor stdio -object memory-backend-ram,id=mem1,size=4k
With this commit, "info memdev" prints
memory backend: mem1
size: 4096
merge: true
dump: true
prealloc: false
policy: default
host nodes:
exactly like before commit 74f24cb.
Between commit 1454d33 and this commit, it prints
memory backend: mem1
size: 4096
merge: true
dump: true
prealloc: false
policy: default
host nodes: 128
The last line is bogus.
Between commit 74f24cb and 1454d33, it crashes like this:
Unexpected error in parse_str() at /work/armbru/tmp/qemu/qapi/string-input-visitor.c:126:
Parameter 'null' expects an int64 value or range
Aborted (core dumped)
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1490026424-11330-3-git-send-email-armbru@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Visiting a list when input is the empty string should result in an
empty list, not an error. Noticed when commit 3d089ce belatedly added
tests, but simply accepted as weird then. It's actually a regression:
broken in commit 74f24cb, v2.7.0. Fix it, and throw in another test
case for empty string.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1490026424-11330-2-git-send-email-armbru@redhat.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
We have a negative test case for a list index with leading zero. Add
positive ones.
Tweak the test case for list index greater or equal the number of
elements: test "equal" instead of "greater" to guard against
off-by-one mistakes.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1490014548-15083-2-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Replies from the virtfs proxy are made up of a fixed-size header (8 bytes)
and a payload of variable size (maximum 64kb). When receiving a reply,
the proxy backend first reads the whole header and then unmarshals it.
If the header is okay, it then does the same operation with the payload.
Since the proxy backend uses a pre-allocated buffer which has enough room
for a header and the maximum payload size, marshalling should never fail
with fixed size arguments. Any error here is likely to result from a more
serious corruption in QEMU and we'd better dump core right away.
This patch adds error checks where they are missing and converts the
associated error paths into assertions.
This should also address Coverity's complaints CID 1348519 and CID 1348520,
about not always checking the return value of proxy_unmarshal().
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
According to the 9P spec [*], when a client wants to cancel a pending I/O
request identified by a given tag (uint16), it must send a Tflush message
and wait for the server to respond with a Rflush message before reusing this
tag for another I/O. The server may still send a completion message for the
I/O if it wasn't actually cancelled but the Rflush message must arrive after
that.
QEMU hence waits for the flushed PDU to complete before sending the Rflush
message back to the client.
If a client sends 'Tflush tag oldtag' and tag == oldtag, QEMU will then
allocate a PDU identified by tag, find it in the PDU list and wait for
this same PDU to complete... i.e. wait for a completion that will never
happen. This causes a tag and ring slot leak in the guest, and a PDU
leak in QEMU, all of them limited by the maximal number of PDUs (128).
But, worse, this causes QEMU to hang on device reset since v9fs_reset()
wants to drain all pending I/O.
This insane behavior is likely to denote a bug in the client, and it would
deserve an Rerror message to be sent back. Unfortunately, the protocol
allows it and requires all flush requests to suceed (only a Tflush response
is expected).
The only option is to detect when we have to handle a self-referencing
flush request and report success to the client right away.
[*] http://man.cat-v.org/plan_9/5/flush
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Greg Kurz <groug@kaod.org>
The Cygwin target is really compiling for native Win32 with -mno-cygwin.
Except, GCC 4.7.0 has finally removed the long deprecated -mno-cygwin
option, and that happened about five years ago.
Let it rest in peace.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Message-id: 20170317160811.28370-1-pbonzini@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
MIPS patches 2017-03-20
Changes:
* Fix clang warnings
* Fix delay slot detection in gen_msa_branch()
* Fix rc4030 interval timer
* Fix rc4030 to tranlate memory accesses only when they occur
* Fix 4c4030 a mixed declarations and code warning
* Update MAINTAINERS file
# gpg: Signature made Mon 20 Mar 2017 12:46:01 GMT
# gpg: using RSA key 0x2238EB86D5F797C2
# gpg: Good signature from "Yongbok Kim <yongbok.kim@imgtec.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 8600 4CF5 3415 A5D9 4CFA 2B5C 2238 EB86 D5F7 97C2
* remotes/yongbok/tags/mips-20170320:
MAINTAINERS: update for MIPS devices
dma/rc4030: fix a mixed declarations and code warning
dma/rc4030: translate memory accesses only when they occur
dma: rc4030: limit interval timer reload value
target/mips: fix delay slot detection in gen_msa_branch()
target-mips: replace few LOG_DISAS() with trace points
target-mips: replace break by goto cp0_unimplemented
target-mips: log bad coprocessor0 register accesses with LOG_UNIMP
target-mips: remove old & unuseful comments
target-mips: fix compiler warnings (clang 5)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Our implementation of writes to the APSR for M-profile via the MSR
instruction was badly broken.
First and worst, we had the sense wrong on the test of bit 2 of the
SYSm field -- this is supposed to request an APSR write if bit 2 is 0
but we were doing it if bit 2 was 1. This bug was introduced in
commit 58117c9bb4, so hasn't been in a QEMU release.
Secondly, the choice of exactly which parts of APSR should be written
is defined by bits in the 'mask' field. We were not passing these
through from instruction decode, making it impossible to check them
in the helper.
Pass the mask bits through from the instruction decode to the helper
function and process them appropriately; fix the wrong sense of the
SYSm bit 2 check.
Invalid mask values and invalid combinations of mask and register
number are UNPREDICTABLE; we choose to treat them as if the mask
values were valid.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1487616072-9226-5-git-send-email-peter.maydell@linaro.org
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
The MRS instruction requires that bits [19..16] are all 1s, and for
A/R profile also that bits [7..0] are all 0s. At this point in the
decode tree we have checked all of the rest of the instruction but
were allowing these to be any value. If these bits are not set then
the result is architecturally UNPREDICTABLE, but choosing to UNDEF is
more helpful to the user and avoids unexpected odd behaviour if the
encodings are used for some purpose in future architecture versions.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 1487616072-9226-4-git-send-email-peter.maydell@linaro.org
M profile doesn't have the MSR(banked) and MRS(banked) instructions
and uses the encodings for different kinds of M-profile MRS/MSR.
Guard the relevant bits of the decode logic to make sure we don't
accidentally fall into them by accident on M-profile.
(The bit being checked for this (bit 5) is part of the SYSm field on
M-profile, but since no currently allocated system registers have
encodings with bit 5 of SYSm set, this hasn't been a problem in
practice.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 1487616072-9226-3-git-send-email-peter.maydell@linaro.org
use qemu_mutex_lock_iothread consistently in qemu_hax_cpu_thread_fn() as
done in other _thread_fn functions, instead of grabbing directly the
BQL. This way we ensure that iothread_locked is properly set.
On v2.9.0-rc0, QEMU was dying in an assertion in the mutex code when
running with '--enable-hax' either on OSX or Windows. This bug was triggered
since the code modification for multithreading added new usages of
qemu_mutex_iothread_locked.
This fixes the breakage on both platforms, I can now run again a full
Chromium OS image with HAX kernel acceleration.
Signed-off-by: Vincent Palatin <vpalatin@chromium.org>
Message-Id: <20170320101549.150076-1-vpalatin@chromium.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This simplifies the code a lot, and this fixes big memory leaks
introduced in a3d586f704
Windows NT is now able to boot without using gigabytes of ram on the host.
Signed-off-by: Hervé Poussineau <hpoussin@reactos.org>
Reviewed-by: Yongbok Kim <yongbok.kim@imgtec.com>
Signed-off-by: Yongbok Kim <yongbok.kim@imgtec.com>
The JAZZ RC4030 chipset emulator has a periodic timer and
associated interval reload register. The reload value is used
as divider when computing timer's next tick value. If reload
value is large, it could lead to divide by zero error. Limit
the interval reload value to avoid it.
Reported-by: Huawei PSIRT <psirt@huawei.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Tested-by: Hervé Poussineau <hpoussin@reactos.org>
Signed-off-by: Yongbok Kim <yongbok.kim@imgtec.com>
this fixes many warnings like:
target/mips/translate.c:6253:13: warning: Value stored to 'rn' is never read
rn = "invalid sel";
^ ~~~~~~~~~~~~~
Reported-by: Clang Static Analyzer
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Yongbok Kim <yongbok.kim@imgtec.com>
Signed-off-by: Yongbok Kim <yongbok.kim@imgtec.com>
static code analyzer complain:
target/mips/helper.c:453:5: warning: Function call argument is an uninitialized value
qemu_log_mask(CPU_LOG_MMU,
^~~~~~~~~~~~~~~~~~~~~~~~~~
'physical' and 'prot' are uninitialized if 'ret' is not TLBRET_MATCH.
Reported-by: Clang Static Analyzer
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Yongbok Kim <yongbok.kim@imgtec.com>
Signed-off-by: Yongbok Kim <yongbok.kim@imgtec.com>
The subchannel is a means to access a device. While the device number is
assigned by the administrator, the subchannel number is assigned by
the channel subsystem in an ascending order on cold and hot plug.
When doing unplug and replug operations, the same device may end up on
a different subchannel; for example
- We start with a device fe.1.2222, which ends up at subchannel
fe.1.0000.
- Now we detach the device, attach a device fe.1.3333 (which would get
the now-free subchannel fe.1.0000), re-attach fe.1.2222 (which ends
up at subchannel fe.1.0001) and detach fe.1.3333.
- We now have the same device (fe.1.2222) available to the guest; it
just shows up on a different subchannel.
In such a case, the subchannel numbers are different from what a
QEMU would create during cold plug when parsing the command line.
As this would cause a guest visible change on migration, we do restore
the source system's value of the subchannel number on load.
So we are now fine from the guest perspective. From the host
perspective this will cause an inconsistent state in our internal data
structures, though.
For example, the subchannel 0 might not be at array position 0. This will
lead to problems when we continue doing hot (un/re) plug operations.
Let's fix this by cleaning up our internal data structures.
Reported-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
The Cygwin target is really compiling for native Win32 with -mno-cygwin.
Except, GCC 4.7.0 has finally removed the long deprecated -mno-cygwin
option, and that happened about five years ago.
Let it rest in peace.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
qemu-ga's socket activation support was not obeying the LISTEN_PID
environment variable, which avoids that a process uses a socket-activation
file descriptor meant for its parent.
Mess can for example ensue if a process forks a children before consuming
the socket-activation file descriptor and therefore setting O_CLOEXEC
on it.
Luckily, qemu-nbd also got socket activation code, and its copy does
support LISTEN_PID. Some extra fixups are needed to ensure that the
code can be used for both, but that's what this patch does. The
main change is to replace get_listen_fds's "consume" argument with
the FIRST_SOCKET_ACTIVATION_FD macro from the qemu-nbd code.
Cc: "Richard W.M. Jones" <rjones@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Block patches for 2.9-rc1
# gpg: Signature made Fri Mar 17 12:59:20 2017 CET
# gpg: using RSA key 0xF407DB0061D5CF40
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>"
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1 1829 F407 DB00 61D5 CF40
* mreitz/tags/pull-block-2017-03-17:
block: quiesce AioContext when detaching from it
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
While it is true that bdrv_set_aio_context only works on a single
BlockDriverState subtree (see commit message for 53ec73e, "block: Use
bdrv_drain to replace uncessary bdrv_drain_all", 2015-07-07), it works
at the AioContext level rather than the BlockDriverState level.
Therefore, it is also necessary to trigger pending bottom halves too,
even if no requests are pending.
For NBD this ensures that the aio_co_schedule of a previous call to
nbd_attach_aio_context is completed before detaching from the old
AioContext; it fixes qemu-iotest 094. Another similar bug happens
when the VM is stopped and the virtio-blk dataplane irqfd is torn down.
In this case it's possible that guest I/O gets stuck if notify_guest_bh
was scheduled but doesn't run.
Calling aio_poll from another AioContext is safe if non-blocking; races
such as the one mentioned in the commit message for c9d1a56 ("block:
only call aio_poll on the current thread's AioContext", 2016-10-28)
are a concern for blocking calls.
I considered other options, including:
- moving the bs->wakeup mechanism to AioContext, and letting the caller
check. This might work for virtio which has a clear place to wakeup
(notify_place_bh) and check the condition (virtio_blk_data_plane_stop).
For aio_co_schedule I couldn't find a clear place to check the condition.
- adding a dummy oneshot bottom half and waiting for it to trigger.
This has the complication that bottom half list is LIFO for historical
reasons. There were performance issues caused by bottom half ordering
in the past, so I decided against it for 2.9.
Fixes: 9972354856
Reported-by: Max Reitz <mreitz@redhat.com>
Reported-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Tested-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20170314111157.14464-2-pbonzini@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
commit 3c80ca15 fixed a deadlock scenarion with nested aio_poll invocations.
However, the rescheduling of the completion BH introcuded unnecessary spinning
in the main-loop. On very fast file backends this can even lead to the
"WARNING: I/O thread spun for 1000 iterations" message popping up.
Callgrind reports about 3-4% less instructions with this patch running
qemu-img bench on a ramdisk based VMDK file.
Fixes: 3c80ca158c
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
bdrv_child_set_perm alone is not very usable because the caller must
call bdrv_child_check_perm first. This is already encapsulated
conveniently in bdrv_child_try_set_perm, so remove the other prototypes
from the header and fix the one wrong caller, block/mirror.c.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Even if hidden_disk, secondary_disk are backing files, they all need
write permissions in replication scenario. Otherwise we will encouter
below exceptions on secondary side during adding nbd server:
{'execute': 'nbd-server-add', 'arguments': {'device': 'colo-disk', 'writable': true } }
{"error": {"class": "GenericError", "desc": "Conflicts with use by hidden-qcow2-driver as 'backing', which does not allow 'write' on sec-qcow2-driver-for-nbd"}}
CC: Zhang Hailiang <zhang.zhanghailiang@huawei.com>
CC: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
CC: Wen Congyang <wencongyang2@huawei.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The following pattern is unsafe:
char buf[32];
ret = read(fd, buf, sizeof(buf));
...
buf[ret] = 0;
If read(2) returns 32 then a byte beyond the end of the buffer is
zeroed.
In practice this buffer overflow does not occur because the sysfs
max_segments file only contains an unsigned short + '\n'. The string is
always shorter than 32 bytes.
Regardless, avoid this pattern because static analysis tools might
complain and it could lead to real buffer overflows if copy-pasted
elsewhere in the codebase.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Commit 8d04fb55..
tcg: drop global lock during TCG code execution
..broke the assumption that updates to the GUI couldn't happen at the
same time as TCG vCPUs where running. As a result the TCG vCPU could
still be updating a directly mapped frame-buffer while the display
side was updating. This would cause artefacts to appear when the
update code assumed that memory block hadn't changed.
The simplest solution is to ensure the two things can't happen at the
same time like the old BQL locking scheme. Here we use the solution
introduced for MTTCG and schedule the update as async_safe_work when
we know no vCPUs can be running.
Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20170315144825.3108-1-alex.bennee@linaro.org
Cc: BALATON Zoltan <balaton@eik.bme.hu>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
[ kraxel: updated comment clarifying the display adapters are buggy
and this is a temporary workaround ]
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Commit c2cabb3422 inadvertently downgraded the 'dtc' submodule,
undoing the increments added in earlier commits. Revert this,
returning the submodule state to where we should be.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
QAPI patches for 2017-03-16
# gpg: Signature made Thu 16 Mar 2017 06:18:38 GMT
# gpg: using RSA key 0x3870B400EB918653
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>"
# gpg: aka "Markus Armbruster <armbru@pond.sub.org>"
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653
* remotes/armbru/tags/pull-qapi-2017-03-16: (49 commits)
qapi: Fix a misleading parser error message
qapi: Make pylint a bit happier
qapi: Drop unused .check_clash() parameter schema
qapi: union_types is a list used like a dict, make it one
qapi: struct_types is a list used like a dict, make it one
qapi: enum_types is a list used like a dict, make it one
qapi: Factor add_name() calls out of the meta conditional
qapi: Simplify what gets stored in enum_types
qapi: Drop unused variable events
qapi: Eliminate check_docs() and drop QAPIDoc.expr
qapi: Fix detection of bogus member documentation
tests/qapi-schema: Improve coverage of bogus member docs
tests/qapi-schema: Rename doc-bad-args to doc-bad-command-arg
qapi: Move empty doc section checking to doc parser
qapi: Improve error message on @NAME: in free-form doc
qapi: Move detection of doc / expression name mismatch
qapi: Fix detection of doc / expression mismatch
tests/qapi-schema: Improve doc / expression mismatch coverage
qapi2texi: Use category "Object" for all object types
qapi2texi: Generate descriptions for simple union tags
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
virtio, pci: fixes
More fixes missed in the previous pull request.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Thu 16 Mar 2017 02:29:49 GMT
# gpg: using RSA key 0x281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>"
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* remotes/mst/tags/for_upstream:
virtio-serial-bus: Delete timer from list before free it
hw/virtio: fix Power Management Control Register for PCI Express virtio devices
hw/virtio: fix Link Control Register for PCI Express virtio devices
hw/virtio: fix error enabling flags in Device Control register
hw/pcie: fix Extended Configuration Space for devices with no Extended Capabilities
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Postcopy doesn't support migration of RAM shared with another process
yet (we've got a bunch of things to understand).
Check for the case and don't allow postcopy to be enabled.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Provide a helper to say whether a RAMBlock was created as a
shared mapping.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
This problem affects s390x only if we are running without KVM.
Basically, S390CPU.irqstate is unused if we do not use KVM,
and thus no buffer is allocated.
This causes size=0, first_elem=NULL and n_elems=1 in
vmstate_load_state and vmstate_save_state. And the assert fails.
With this fix we can go back to the old behavior and support
VMS_VBUFFER with size 0 and nullptr.
Signed-off-by: QingFeng Hao <haoqf@linux.vnet.ibm.com>
Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Increase bmds->cur_dirty after submit io, so reduce the frequency
involve into blk_drain, and improve the performance obviously
when block migration.
The performance test result of this patch:
During the block dirty save phase, this patch improve guest os IOPS
from 4.0K to 9.5K. and improve the migration speed from
505856 rsec/s to 855756 rsec/s.
Signed-off-by: Lidong Chen <jemmy858585@gmail.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Does basically the same as "cirrus: stop passing around dst pointers in
the blitter", just for the src pointer instead of the dst pointer.
For the src we have to care about cputovideo blits though and fetch the
data from s->cirrus_bltbuf instead of vga memory. The cirrus_src*()
helper functions handle that.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 1489584487-3489-1-git-send-email-kraxel@redhat.com
Instead pass around the address (aka offset into vga memory). Calculate
the pointer in the rop_* functions, after applying the mask to the
address, to make sure the address stays within the valid range.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 1489574872-8679-1-git-send-email-kraxel@redhat.com
off_cur_end is exclusive, so off_cur_end == cirrus_addr_mask is valid.
Fix calculation to make sure to allow that, otherwise the assert added
by commit f153b563f8 can trigger for valid
blits.
Test case: boot windows nt 4.0
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 1489579606-26020-1-git-send-email-kraxel@redhat.com
Ok, we have this beast in the cirrus code which is not used at all by
modern guests, except when you try to find security holes in qemu. So,
add an option to disable blitter altogether. Guests released within
the last ten years should not show any rendering issues if you turn off
blitter support.
There are no known bugs in the cirrus blitter code. But in the past we
hoped a few times already that we've finally nailed the last issue. So
having some easy way to mitigate in case yet another blitter issue shows
up certainly makes me sleep a bit better at night.
For completeness: The by far better way to mitigate is to switch away
from cirrus and use stdvga instead. Or something more modern like
virtio-vga in case your guest has support for it.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 1489494540-15745-1-git-send-email-kraxel@redhat.com
Quoting cirrus source code:
Follow real hardware, cirrus card emulated has 4 MB video memory.
Also accept 8 MB/16 MB for backward compatibility.
So just use 4MB by default. We decided to leave that at 8MB by default
a while ago, for live migration compatibility reasons. But we have
compat properties to handle that, so that isn't a compeling reason.
This also removes some sanity check inconsistencies in the cirrus code.
Some places check against the allocated video memory, some places check
against the 4MB physical hardware has. Guest code can trigger asserts
because of that.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 1489494514-15606-1-git-send-email-kraxel@redhat.com
There is a special code path (dpy_gfx_copy) to allow graphic emulation
notify user interface code about bitblit operations carryed out by
guests. It is supported by cirrus and vnc server. The intended purpose
is to optimize display scrolls and just send over the scroll op instead
of a full display update.
This is rarely used these days though because modern guests simply don't
use the cirrus blitter any more. Any linux guest using the cirrus drm
driver doesn't. Any windows guest newer than winxp doesn't ship with a
cirrus driver any more and thus uses the cirrus as simple framebuffer.
So this code tends to bitrot and bugs can go unnoticed for a long time.
See for example commit "3e10c3e vnc: fix qemu crash because of SIGSEGV"
which fixes a bug lingering in the code for almost a year, added by
commit "c7628bf vnc: only alloc server surface with clients connected".
Also the vnc server will throttle the frame rate in case it figures the
network can't keep up (send buffers are full). This doesn't work with
dpy_gfx_copy, for any copy operation sent to the vnc client we have to
send all outstanding updates beforehand, otherwise the vnc client might
run the client side blit on outdated data and thereby corrupt the
display. So this dpy_gfx_copy "optimization" might even make things
worse on slow network links.
Lets kill it once for all.
Oh, and one more reason: Turns out (after writing the patch) we have a
security bug in that code path ...
Fixes: CVE-2016-9603
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 1489494419-14340-1-git-send-email-kraxel@redhat.com
check the validity of parameters in cirrus_bitblt_rop_fwd_transp_xxx
and cirrus_bitblt_rop_fwd_xxx to avoid the OOB read which causes qemu Segmentation fault.
After the fix, we will touch the assert in
cirrus_invalidate_region:
assert(off_cur_end >= off_cur);
Signed-off-by: fangying <fangying1@huawei.com>
Signed-off-by: hangaohuai <hangaohuai@huawei.com>
Message-id: 20170314063919.16200-1-hangaohuai@huawei.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
The tls-creds parameter has a default value of NULL indicating
that TLS should not be used. Setting it to non-NULL enables
use of TLS. Once tls-creds are set to a non-NULL value via the
monitor, it isn't possible to set them back to NULL again, due
to current implementation limitations. The empty string is not
a valid QObject identifier, so this switches to use "" as the
default, indicating that TLS will not be used
The tls-hostname parameter has a default value of NULL indicating
the the hostname from the migrate connection URI should be used.
Again, once tls-hostname is set non-NULL, to override the default
hostname for x509 cert validation, it isn't possible to reset it
back to NULL via the monitor. The empty string is not a valid
hostname, so this switches to use "" as the default, indicating
that the migrate URI hostname should be used.
Using "" as the default for both, also means that the monitor
commands "info migrate_parameters" / "query-migrate-parameters"
will report existance of tls-creds/tls-parameters even when set
to their default values.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
In function cpu_physical_memory_sync_dirty_bitmap, file
include/exec/ram_addr.h:
if (src[idx][offset]) {
unsigned long bits = atomic_xchg(&src[idx][offset], 0);
unsigned long new_dirty;
new_dirty = ~dest[k];
dest[k] |= bits;
new_dirty &= bits;
num_dirty += ctpopl(new_dirty);
}
After these codes executed, only the pages not dirtied in bitmap(dest),
but dirtied in dirty_memory[DIRTY_MEMORY_MIGRATION] will be calculated.
For example:
When ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] = 0b00001111,
and atomic_rcu_read(&migration_bitmap_rcu)->bmap = 0b00000011,
the new_dirty will be 0b00001100, and this function will return 2 but not
4 which is expected.
the dirty pages in dirty_memory[DIRTY_MEMORY_MIGRATION] are all new,
so these should be calculated also.
Signed-off-by: Chao Fan <fanc.fnst@cn.fujitsu.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
check_definition_doc() checks for member documentation without a
matching member. It laboriously second-guesses what members
QAPISchema._def_exprs() will create. That's a stupid game.
Move the check into QAPISchema.check(), where the members are known.
Delegate the actual checking to new QAPIDoc.check().
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1489582656-31133-38-git-send-email-armbru@redhat.com>
Move the check whether the doc matches the expression name from
check_definition_doc() to check_exprs(). This changes the error
location from the comment to the expression. Makes sense as the
message talks about the expression: "Definition of '%s' follows
documentation for '%s'". It's also a step towards getting rid of
check_docs().
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1489582656-31133-33-git-send-email-armbru@redhat.com>
At the protocol level, the distinction between struct, flat union and
simple union is meaningless, they are all JSON objects. Document them
that way.
Example change (qemu-qmp-ref.txt):
- -- Simple Union: InputEvent
+ -- Object: InputEvent
Input event union.
This also fixes the completely broken headings for flat and simple
unions in qemu-qmp-ref.7 and qemu-ga-ref.7, by sidestepping a bug in
texi2pod.pl. For instance, it mistranslates "@deftp {Simple Union}
InputEvent" to "B<Union> (Simple)", but translates "@deftp Object
InputEvent" to "B<SocketAddress> (Object)".
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1489582656-31133-30-git-send-email-armbru@redhat.com>
Simple union tags carry no type information, because their type is
implicit. Their description should make up for it, but many have
none. Generate one automatically then.
Example change (qemu-qmp-ref.txt):
-- Simple Union: ImageInfoSpecific
A discriminated record of image format specific information
structures.
Members:
'type'
- Not documented
+ One of "qcow2", "vmdk", "luks"
'data: ImageInfoSpecificQCow2' when 'type' is "qcow2"
'data: ImageInfoSpecificVmdk' when 'type' is "vmdk"
'data: QCryptoBlockInfoLUKS' when 'type' is "luks"
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1489582656-31133-29-git-send-email-armbru@redhat.com>
A flat union's branch brings in the members of another type. Generate
a suitable reference to that type.
Example change (qemu-qmp-ref.txt):
-- Flat Union: QCryptoBlockOpenOptions
The options that are available for all encryption formats when
opening an existing volume
Members:
The members of 'QCryptoBlockOptionsBase'
+ The members of 'QCryptoBlockOptionsQCow' when 'format' is "qcow"
+ The members of 'QCryptoBlockOptionsLUKS' when 'format' is "luks"
Since: 2.6
A simple union's branch adds a member 'data' of some other type.
Generate documentation for that member.
Example change (qemu-qmp-ref.txt):
-- Simple Union: SocketAddress
Captures the address of a socket, which could also be a named file
descriptor
Members:
'type'
Not documented
+ 'data: InetSocketAddress' when 'type' is "inet"
+ 'data: UnixSocketAddress' when 'type' is "unix"
+ 'data: VsockSocketAddress' when 'type' is "vsock"
+ 'data: String' when 'type' is "fd"
Since: 1.3
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1489582656-31133-28-git-send-email-armbru@redhat.com>
The generated documentation doesn't mention object type members
inherited from a base type. Fix that.
Example change (qemu-qmp-ref.txt):
-- Struct: VncServerInfo
The network connection information for server
Members:
'auth' (optional)
authentication method used for the plain (non-websocket) VNC
server
+ The members of 'VncBasicInfo'
Since: 2.1
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1489582656-31133-27-git-send-email-armbru@redhat.com>
The recent merge of docs/qmp-commands.txt and docs/qmp-events.txt into
the schema lost type information. Fix this documentation regression.
Example change (qemu-qmp-ref.txt):
-- Struct: InputKeyEvent
Keyboard input event.
Members:
- 'button'
+ 'button: InputButton'
Which button this event is for.
- 'down'
+ 'down: boolean'
True for key-down and false for key-up events.
Since: 2.0
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1489582656-31133-26-git-send-email-armbru@redhat.com>
Show undocumented object, alternate type members and command, event
arguments exactly like undocumented enumeration type values.
Example change (qemu-qmp-ref.txt):
-- Command: query-rocker
Return rocker switch information.
+ Arguments:
+ 'name'
+ Not documented
+
Returns: 'Rocker' information
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1489582656-31133-24-git-send-email-armbru@redhat.com>
Instead of not saying anything when we have no documentation, say "Not
documented".
Example change (qemu-qmp-ref.txt):
-- Enum: GuestPanicAction
An enumeration of the actions taken when guest OS panic is detected
Values:
'pause'
system pauses
'poweroff'
+ Not documented
Since: 2.1 (poweroff since 2.8)
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1489582656-31133-23-git-send-email-armbru@redhat.com>
The table of members follows the main descriptive text immediately.
Makes it hard to see what it is about. Start a new paragraph, and
lead with a line "Members:" for object and alternate types, "Values:"
for enumeration types, and "Arguments:" for commands and events.
Example change (qemu-qmp-ref.txt):
-- Command: set_link
Sets the link status of a virtual network adapter.
+
+ Arguments:
'name'
the device name of the virtual network adapter
'up'
true to set the link status to be up
Returns: Nothing on success If 'name' is not a valid network
device, DeviceNotFound
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1489582656-31133-22-git-send-email-armbru@redhat.com>
PEP 8 advises:
In Python, single-quoted strings and double-quoted strings are the
same. This PEP does not make a recommendation for this. Pick a
rule and stick to it. When a string contains single or double
quote characters, however, use the other one to avoid backslashes
in the string. It improves readability.
The QAPI generators succeed at picking a rule, but fail at sticking to
it. Convert a bunch of double-quoted strings to single-quoted ones.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1489582656-31133-20-git-send-email-armbru@redhat.com>
We traditionally mark optional members #optional in the doc comment.
Before commit 3313b61, this was entirely manual.
Commit 3313b61 added some automation because its qapi2texi.py relied
on #optional to determine whether a member is optional. This is no
longer the case since the previous commit: the only thing qapi2texi.py
still does with #optional is stripping it out. We still reject bogus
qapi-schema.json and six places for qga/qapi-schema.json.
Thus, you can't actually rely on #optional to see whether something is
optional. Yet we still make people add it manually. That's just
busy-work.
Drop the code to check, fix up and strip out #optional, along with all
instances of #optional. To keep it out, add code to reject it, to be
dropped again once the dust settles.
No change to generated documentation.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1489582656-31133-18-git-send-email-armbru@redhat.com>
qapi2texi works with schema expression trees. Such a tight coupling
to schema language syntax is not a good idea. Convert it to the visitor
interface the other generators use.
No change to generated documentation.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1489582656-31133-17-git-send-email-armbru@redhat.com>
qapi2texi.py already conjures up ArgSections for undocumented
enumeration values, in texi_enum. Drop that, and conjure them up for
all kinds of "arguments" (enumeration values, object and alternate
type members) in qapi.py instead.
Take care to keep generated documentation exactly the same for now.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1489582656-31133-16-git-send-email-armbru@redhat.com>
We currently neglect to check all enumeration values, common members
of object types and members of alternate types are documented.
Unsurprisingly, many aren't.
Add the necessary plumbing to find undocumented ones, except for
variant members of object types. Don't enforce anything just yet, but
connect each QAPIDoc.ArgSection to its QAPISchemaMember.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1489582656-31133-15-git-send-email-armbru@redhat.com>
Talking about #optional like this
# Note: fields are marked #optional to indicate that they may or may
# not appear ...
doesn't work so well in generated documentation, because the #optional
tag is not visible there. Replace by
# Note: optional members may or may not appear ...
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1489582656-31133-13-git-send-email-armbru@redhat.com>
We silently fix missing #optional tags for QAPIDoc by appending a line
"#optional" to the section's .content. However, this interferes with
.__repr__ stripping trailing blank lines from .content.
Use new ArgSection instance variable .optional instead, and leave
.content alone.
To permit testing .optional in texi_body(), clean up texi_enum()'s
hack to add empty documentation for undocumented enum values: add an
ArgSection instead of ''.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1489582656-31133-12-git-send-email-armbru@redhat.com>
We use tag #optional to mark optional members, like this:
# @name: #optional The name of the guest
texi_body() strips #optional, but not whitespace around it. For the
above, we get in qemu-qmp-qapi.texi
@item @code{'name'} (optional)
The name of the guest
@end table
The extra space can lead to artifacts in output, e.g in
qemu-qmp-ref.7.pod
=item C<'name'> (optional)
The name of the guest
and then in qemu-qmp-ref.7
.IX Item "name (optional)"
.Vb 1
\& The name of the guest
.Ve
instead of intended plain
.IX Item "name (optional)"
The name of the guest
Get rid of these artifacts by removing whitespace around #optional
along with it.
This turns three minus signs in qapi-schema.json into markup, because
they're now at the beginning of the line. Drop them, they're unwanted
there.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1489582656-31133-11-git-send-email-armbru@redhat.com>
Rename intermediate qemu-qapi.texi to qemu-qmp-qapi.texi to match its
user qemu-qmp-ref.texi, just like qemu-ga-qapi.texi matches
qemu-ga-ref.texi.
Build the intermediate .texi next to the sources and the final output
in docs/ instead of dumping them into the build root.
Fix version.texi dependencies so that only the targets that actually
need it depend on it.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1489582656-31133-8-git-send-email-armbru@redhat.com>
qapi.py has a hardcoded white-list of type names that may violate the
rule on use of upper and lower case. Add a new pragma directive
'name-case-whitelist', and use it to replace the hard-coded
white-list.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1489582656-31133-7-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
qapi.py has a hardcoded white-list of command names that may violate
the rules on permitted return types. Add a new pragma directive
'returns-whitelist', and use it to replace the hard-coded white-list.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1489582656-31133-6-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
This reverts commit 3313b61's changes to tests/qapi-schema/, except
for tests/qapi-schema/doc-*.
We could keep some of these doc comments to serve as positive test
cases. However, they don't actually add to what we get from doc
comment use in actual schemas, as we we don't test output matches
expectations, and don't systematically cover doc comment features.
Proper positive test coverage would be nice.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1489582656-31133-4-git-send-email-armbru@redhat.com>
Since we added the documentation generator in commit 3313b61, doc
comments are mandatory. That's a very good idea for a schema that
needs to be documented, but has proven to be annoying for testing.
Make doc comments optional again, but add a new directive
{ 'pragma': { 'doc-required': true } }
to let a QAPI schema require them.
Add test cases for the new pragma directive. While there, plug a
minor hole in includ directive test coverage.
Require documentation in the schemas we actually want documented:
qapi-schema.json and qga/qapi-schema.json.
We could probably make qapi2texi.py cope with incomplete
documentation, but for now, simply make it refuse to run unless the
schema has 'doc-required': true.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1489582656-31133-3-git-send-email-armbru@redhat.com>
[qapi-code-gen.txt wording tweaked]
Reviewed-by: Eric Blake <eblake@redhat.com>
The qmp-shell property parser currently rejects attempts to
set string properties to the empty string eg
(QEMU) migrate-set-parameters tls-hostname=
Error while parsing command line: Expected a key=value pair, got 'tls-hostname='
command format: <command-name> [arg-name1=arg1] ... [arg-nameN=argN]
This is caused by checking the wrong condition after splitting
the parameter on '='. The "partition" method will return "" for
the separator field, if the seperator was not present, so that
is the correct thing to check for malformed syntax.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <20170302122429.7737-1-berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
The build rules for trace files have a dependancy on $(tracetool-y).
This variable populated in the trace/Makefile.objs file and thus its
definition gets pulled into the top level makefile. This happens too
late in the process though, so by the time $(tracetool-y) is defined,
make has already evaluated $(tracetool-y) in the dependancies and
found it to be empty. The result is that when the tracetool source
is changed, the generated files are not rebuilt. The solution is to
define the variable in the top level makefile too
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Tested-by: Eric Blake <eblake@redhat.com>
Message-id: 20170315123421.28815-1-berrange@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The only functional difference between the GENERATED_HEADERS
and GENERATED_SOURCES variables is that 'Makefile' has a
dependancy on GENERATED_HEADERS, causing generated header files
to be created immediatey at the start of the build process.
There is no reason why this early creation should be restricted
to the .h files, and not include .c files too. Merge both of
the variables into a single GENERATED_FILES variable to make
it clear it is for any type of generated file.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20170228122901.24520-2-berrange@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
As the pci ahci can be hotplug and unplug, in the ahci unrealize
function it should free all the resource once allocated in the
realized function. This patch add ide_exit to free the resource.
Signed-off-by: Li Qiang <liqiang6-s@360.cn>
Message-id: 1488449293-80280-3-git-send-email-liqiang6-s@360.cn
Signed-off-by: John Snow <jsnow@redhat.com>
we have an idebus unrealize function, but it was being
registered as the unrealize function for the IDE Device,
so it was not getting invoked on device teardown because
nothing is "unrealizing" the IDE devices themselves.
Suggested-by: John Snow <jsnow@redhat.com>
Signed-off-by: Li Qiang <liqiang6-s@360.cn>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 1488449293-80280-2-git-send-email-liqiang6-s@360.cn
Signed-off-by: John Snow <jsnow@redhat.com>
Make Power Management State flag writable to conform
with the PCI Express spec.
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Make several Link Control Register flags writable to conform
with the PCI Express spec.
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
When the virtio devices are PCI Express, make error-enabling flags
writable to respect the PCIe spec.
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Absence of any Extended Capabilities is required to be
indicated by an Extended Capability header with a Capability ID of
0000h, a Capability Version of 0h, and a Next Capability Offset of 000h.
Instead of inserting a 'NULL' capability is simpler to mark the start
of the Extended Configuration Space as read-only to achieve the same
behaviour.
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
virtio, pc: fixes
Some fixes to fallback from using virtio caching,
pls a minor vm gen id fix.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Wed 15 Mar 2017 17:59:25 GMT
# gpg: using RSA key 0x281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>"
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* remotes/mst/tags/for_upstream:
virtio-pci: reset modern vq meta data
Revert "virtio: unbreak virtio-pci with IOMMU after caching ring translations"
pci: introduce a bus master container
virtio: validate address space cache during init
virtio: destroy region cache during reset
virtio: guard against NULL pfn
Bugfix: Handle error if VM Generation ID device not present
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
We don't reset proxy->vqs[].{num|desc[]|avail[]|used[]}. This means if
a driver enable the vq without setting vq address after reset. The old
addresses were leaked. Fixing this by resetting modern vq meta data
during device reset.
Cc: qemu-stable@nongnu.org
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This reverts commit
96a8821d21. Previous patch is a better
solution which does not require a strict order between virtio and IOMMU.
CC: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
96a8821d21 ("virtio: unbreak virtio-pci with IOMMU after caching ring
translations") tries to make IOMMU works with virtio memory region
cache, but it requires IOMMU to be created before any virtio
devices. This is sub optimal, fixing this by introduce a bus master
container to make sure address space can be initialized during device
registering, and then we can safely set alias and make
bus_master_enable_region as its subregion during bus master
initialization.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
We don't check the return value of address_space_cache_init(), this
may lead buggy driver use incorrect region caches. Instead of
triggering an assert, catch and warn this early in
virtio_init_region_cache().
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
We don't destroy region cache during reset which can make the maps
of previous driver leaked to a buggy or malicious driver that don't
set vring address before starting to use the device. Fix this by
destroy the region cache during reset and validate it before trying to
see them.
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
To avoid access stale memory region cache after reset, this patch
check the existence of virtqueue pfn for all exported virtqueue access
helpers before trying to use them.
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This was crashing due to NULL-pointer dereference
QMP Test case:
==============
(QEMU) query-vm-generation-id
{"error": {"class": "GenericError", "desc": "VM Generation ID device not
found"}}
HMP Test case:
==============
virsh # qemu-monitor-command --hmp 3 info vm-generation-id
VM Generation ID device not found
Signed-off-by: Ben Warren <ben@skyportsystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Fix global property and -cpu handling bug
This bug fix was supposed to be applied just after 2.8.0 was
released, but it slipped through the cracks. Sending it now for
the next -rc.
# gpg: Signature made Tue 14 Mar 2017 20:04:50 GMT
# gpg: using RSA key 0x2807936F984DC5A6
# gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>"
# Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF D1AA 2807 936F 984D C5A6
* remotes/ehabkost/tags/machine-pull-request:
machine: Convert abstract typename on compat_props to subclass names
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Commit eb7eeb8 ("memory: split address_space_read and
address_space_write", 2015-12-17) made address_space_rw
dispatch to one of address_space_read or address_space_write,
rather than vice versa.
For callers of address_space_read and address_space_write this
causes false positive defects when Coverity sees a length-8 write in
address_space_read and a length-4 (e.g. int*) buffer to read into.
As long as the size of the buffer is okay, this is a false positive.
Reflect the code change into the model.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20170315081641.20588-1-pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
When using a memory-backend object with prealloc turned on, QEMU
will memset() the first byte in every memory page to zero. While
this might have been acceptable for memory backends associated
with RAM, this corrupts application data for NVDIMMs.
Instead of setting every page to zero, read the current byte
value and then just write that same value back, so we are not
corrupting the original data. Directly write the value instead
of memset()ing it, since there's no benefit to memset for a
single byte write.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Message-id: 20170303113255.28262-1-berrange@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Original problem description by Greg Kurz:
> Since commit "9a4c0e220d8a hw/virtio-pci: fix virtio
> behaviour", passing -device virtio-blk-pci.disable-modern=off
> has no effect on 2.6 machine types because the internal
> virtio-pci.disable-modern=on compat property always prevail.
The same bug also affects other abstract type names mentioned on
compat_props by machine-types: apic-common, i386-cpu, pci-device,
powerpc64-cpu, s390-skeys, spapr-pci-host-bridge, usb-device,
virtio-pci, x86_64-cpu.
The right fix for this problem is to make sure compat_props and
-global options are always applied in the order they are
registered, instead of reordering them based on the type
hierarchy. But changing the ordering rules of -global is risky
and might break existing configurations, so we shouldn't do that
on a stable branch.
This is a temporary hack that will work around the bug when
registering compat_props properties: if we find an abstract class
on compat_props, register properties for all its non-abstract
subtypes instead. This will make sure -global won't be overridden
by compat_props, while keeping the existing ordering rules on
-global options.
Note that there's one case that won't be fixed by this hack:
"-global spapr-pci-vfio-host-bridge.<option>=<value>" won't be
able to override compat_props, because spapr-pci-host-bridge is
not an abstract class.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <1481575745-26120-1-git-send-email-ehabkost@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Commit 4881658a4b introduced a call to arm_get_cpu_by_id(),
and Coverity noticed that we weren't checking that it didn't
return NULL (CID 1371652).
Normally this won't happen (because all 4 CPUs are expected
to exist), but it's possible the user requested fewer CPUs
on the command line. Handle this possibility by silently
doing nothing, which is the same behaviour as before commit
4881658a4b and also how we handle the other CPU operations
(since we ignore the INVALID_PARAM returns from arm_set_cpu_on()
and friends).
There is a slight behavioural difference to the pre-4881658a4b
situation: the "reset this core" bit will remain set rather
than not being permitted to be set. The imx6 datasheet is
unclear about the behaviour in this odd corner case, so we
opt for the simpler code rather than complicated logic to
maintain identical behaviour.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1488542374-1256-1-git-send-email-peter.maydell@linaro.org
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
icount has become much slower after tcg_cpu_exec has stopped
using the BQL. There is also a latent bug that is masked by
the slowness.
The slowness happens because every occurrence of a QEMU_CLOCK_VIRTUAL
timer now has to wake up the I/O thread and wait for it. The rendez-vous
is mediated by the BQL QemuMutex:
- handle_icount_deadline wakes up the I/O thread with BQL taken
- the I/O thread wakes up and waits on the BQL
- the VCPU thread releases the BQL a little later
- the I/O thread raises an interrupt, which calls qemu_cpu_kick
- the VCPU thread notices the interrupt, takes the BQL to
process it and waits on it
All this back and forth is extremely expensive, causing a 6 to 8-fold
slowdown when icount is turned on.
One may think that the issue is that the VCPU thread is too dependent
on the BQL, but then the latent bug comes in. I first tried removing
the BQL completely from the x86 cpu_exec, only to see everything break.
The only way to fix it (and make everything slow again) was to add a dummy
BQL lock/unlock pair.
This is because in -icount mode you really have to process the events
before the CPU restarts executing the next instruction. Therefore, this
series moves the processing of QEMU_CLOCK_VIRTUAL timers straight in
the vCPU thread when running in icount mode.
The required changes include:
- make the timer notification callback wake up TCG's single vCPU thread
when run from another thread. By using async_run_on_cpu, the callback
can override all_cpu_threads_idle() when the CPU is halted.
- move handle_icount_deadline after qemu_tcg_wait_io_event, so that
the timer notification callback is invoked after the dummy work item
wakes up the vCPU thread
- make handle_icount_deadline run the timers instead of just waking the
I/O thread.
- stop processing the timers in the main loop
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This optimization is not necessary anymore, because the vCPU now drops
the I/O thread lock even with TCG. Drop it to simplify the code and
avoid the "I/O thread spun for 1000 iterations" warning.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
There is no change for now, because the callback just invokes
qemu_notify_event.
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This dependency is the wrong way, and we will need util/qemu-timer.h from
sysemu/cpus.h in the next patch.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If the first timer is exactly at the current value of the clock, the
deadline is met and the timer should fire. This fixes itself on the next
iteration of the loop without icount; with icount, however, execution
of instructions will stop exactly at the deadline and won't proceed.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Most machines don't allow sysbus devices like "kvmclock" to be
created from the command-line, but some of them do (the ones with
has_dynamic_sysbus=true). In those cases, it's possible to
manually create a kvmclock device without KVM being enabled,
making QEMU crash:
$ qemu-system-x86_64 -machine q35,accel=tcg -device kvmclock
Segmentation fault (core dumped)
This changes kvmclock's realize method to return an error if KVM
is disabled, to ensure it won't crash QEMU.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20170309185046.17555-1-ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When a KVM_{GET,SET}_MSRS ioctl() fails, it is difficult to find
out which MSR caused the problem. Print an error message for
debugging, before we trigger the (ret == cpu->kvm_msr_buf->nmsrs)
assert.
Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20170309194634.28457-1-ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
I sometimes got "Cannot access memory" when using the x command
on the monitor. Turns out that the cpu env did contain stale data
(e.g. wrong control register content for page table origin).
We must synchronize the state of the CPU before walking the page
tables. A similar issues happens for a remote gdb, so lets
do the cpu_synchronize_state in cpu_memory_rw_debug.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Message-Id: <1488896348-13560-1-git-send-email-borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Using "-mem-prealloc" option for a large guest leads to higher guest
start-up and migration time. This is because with "-mem-prealloc" option
qemu tries to map every guest page (create address translations), and
make sure the pages are available during runtime. virsh/libvirt by
default, seems to use "-mem-prealloc" option in case the guest is
configured to use huge pages. The patch tries to map all guest pages
simultaneously by spawning multiple threads. Currently limiting the
change to QEMU library functions on POSIX compliant host only, as we are
not sure if the problem exists on win32. Below are some stats with
"-mem-prealloc" option for guest configured to use huge pages.
------------------------------------------------------------------------
Idle Guest | Start-up time | Migration time
------------------------------------------------------------------------
Guest stats with 2M HugePage usage - single threaded (existing code)
------------------------------------------------------------------------
64 Core - 4TB | 54m11.796s | 75m43.843s
64 Core - 1TB | 8m56.576s | 14m29.049s
64 Core - 256GB | 2m11.245s | 3m26.598s
------------------------------------------------------------------------
Guest stats with 2M HugePage usage - map guest pages using 8 threads
------------------------------------------------------------------------
64 Core - 4TB | 5m1.027s | 34m10.565s
64 Core - 1TB | 1m10.366s | 8m28.188s
64 Core - 256GB | 0m19.040s | 2m10.148s
-----------------------------------------------------------------------
Guest stats with 2M HugePage usage - map guest pages using 16 threads
-----------------------------------------------------------------------
64 Core - 4TB | 1m58.970s | 31m43.400s
64 Core - 1TB | 0m39.885s | 7m55.289s
64 Core - 256GB | 0m11.960s | 2m0.135s
-----------------------------------------------------------------------
Changed in v2:
- modify number of memset threads spawned to min(smp_cpus, 16).
- removed 64GB memory restriction for spawning memset threads.
Changed in v3:
- limit number of threads spawned based on
min(sysconf(_SC_NPROCESSORS_ONLN), 16, smp_cpus)
- implement memset thread specific siglongjmp in SIGBUS signal_handler.
Changed in v4
- remove sigsetjmp/siglongjmp and SIGBUS unblock/block for main thread
as main thread no longer touches any pages.
- simplify code my returning memset_thread_failed status from
touch_all_pages.
Signed-off-by: Jitendra Kolhe <jitendra.kolhe@hpe.com>
Message-Id: <1487907103-32350-1-git-send-email-jitendra.kolhe@hpe.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Occasionally the users try to mix the bootindex properties with the
"-boot order" parameter - and this likely does not give the expected
results. So let's add a proper statement that these two concepts
should not be used together.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1488303601-23741-1-git-send-email-thuth@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The 'name' parameter to memory_region_init_* had been marked as debug
only, however vmstate_region_ram uses it as a parameter to
qemu_ram_set_idstr to set RAMBlock names and these form part of the
migration stream.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20170309152708.30635-1-dgilbert@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In armv8, this register implements more than a single bit, with
fine-grained enables for read access to event counters, cycles
counters, and write access to the software increment. This change
implements those checks using custom access functions for the relevant
registers.
Signed-off-by: Andrew Baumann <Andrew.Baumann@microsoft.com>
Message-id: 20170228215801.10472-2-Andrew.Baumann@microsoft.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: move a couple of access functions to be only compiled
ifndef CONFIG_USER_ONLY to avoid compiler warnings]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
# gpg: Signature made Tue 14 Mar 2017 07:55:01 GMT
# gpg: using RSA key 0xEF04965B398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F 3562 EF04 965B 398D 6211
* remotes/jasowang/tags/net-pull-request:
hw/net: implement MIB counters in mcf_fec driver
COLO-compare: Fix trace_event print bug
e1000e: correctly tear down MSI-X memory regions
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
ppc patch queue for 2017-03-14
This set has a handful og bugfixes to go into qemu-2.9. This includes
an update to the dtc/libfdt submodule which will fix the build errors
seen on some distributions.
# gpg: Signature made Tue 14 Mar 2017 04:00:41 GMT
# gpg: using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-2.9-20170314:
dtc: Update submodule to avoid build errors
pseries: Don't expose PCIe extended config space on older machine types
target/ppc: fix cpu_ov setting for 32-bit
target/ppc: Fix wrong number of UAMR register
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The definition of the major() and minor() macros are moving within glibc to
<sys/sysmacros.h>. Include this header when it is available to avoid the
following sorts of build-stopping messages:
qga/commands-posix.c: In function ‘dev_major_minor’:
qga/commands-posix.c:656:13: error: In the GNU C Library, "major" is defined
by <sys/sysmacros.h>. For historical compatibility, it is
currently defined by <sys/types.h> as well, but we plan to
remove this soon. To use "major", include <sys/sysmacros.h>
directly. If you did not intend to use a system-defined macro
"major", you should undefine it after including <sys/types.h>. [-Werror]
*devmajor = major(st.st_rdev);
^~~~~~~~~~~~~~~~~~~~~~~~~~
qga/commands-posix.c:657:13: error: In the GNU C Library, "minor" is defined
by <sys/sysmacros.h>. For historical compatibility, it is
currently defined by <sys/types.h> as well, but we plan to
remove this soon. To use "minor", include <sys/sysmacros.h>
directly. If you did not intend to use a system-defined macro
"minor", you should undefine it after including <sys/types.h>. [-Werror]
*devminor = minor(st.st_rdev);
^~~~~~~~~~~~~~~~~~~~~~~~~~
The additional include allows the build to complete on Fedora 26 (Rawhide)
with glibc version 2.24.90.
Signed-off-by: Christopher Covington <cov@codeaurora.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The FEC ethernet hardware module used on ColdFire SoC parts contains a
block of RAM used to maintain hardware counters. This block is accessible
via the usual FEC register address space. There is currently no support
for this in the QEMU mcf_fec driver.
Add support for storing a MIB RAM block, and provide register level
access to it. Also implement a basic set of stats collection functions
to populate MIB data fields.
This support tested running a Linux target and using the net-tools
"ethtool -S" option. As of linux-4.9 the kernels FEC driver makes
accesses to the MIB counters during its initialization (which it never
did before), and so this version of Linux will now fail with the QEMU
error:
qemu: hardware error: mcf_fec_read: Bad address 0x200
This MIB counter support fixes this problem.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Because of inet_ntoa() return a statically allocated buffer,
subsequent calls will overwrite, So we fix this bug.
Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
MSI-X has been disabled by the time the e1000e device is unrealized, hence
msix_uninit is never called. This causes the object to be leaked, which
shows up as a RAMBlock with empty name when attempting migration.
Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Tested-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
The currently included version of the dtc/libfdt submodule has some build
errors on certain distributions (including RHEL7). This is due to some
poorly named macros in libfdt.h; they're designed for use with the sparse
static checker, but use reserved names which conflict with some symbols in
the standard headers.
That's been corrected in upstream dtc, this updates the qemu submodule to
bring the fix to qemu.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
bb9986452 "spapr_pci: Advertise access to PCIe extended config space"
allowed guests to access the extended config space of PCI Express devices
via the PAPR interfaces, even though the paravirtualized bus mostly acts
like plain PCI.
However, that patch enabled access unconditionally, including for existing
machine types, which is an unwise change in behaviour. This patch limits
the change to pseries-2.9 (and later) machine types.
Suggested-by: Andrea Bolognani <abologna@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
A bug was introduced in following commit:
dc0ad84 target/ppc: update overflow flags for add/sub
As for 32-bit ppc target extracting bit 63 for overflow is not correct.
Made it dependent on TARGET_LOG_BITS. This had broken booting MacOS
9.2.1 image
Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
The SPR UAMR has the number 13, and not 12. (Fortunately it seems like
Linux is not using this register yet - only the privileged version with
number 29 ... that's why nobody noticed this problem yet)
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We want query-block to return the right filename, even if a commit job
put a bdrv_commit_top on top of the actual image format driver. Let
bdrv_commit_top.bdrv_refresh_filename get the filename from its backing
file.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
We want query-block to return the right filename, even if a mirror job
put a bdrv_mirror_top on top of the actual image format driver. Let
bdrv_mirror_top.bdrv_refresh_filename get the filename from its backing
file.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
In bdrv_open_inherit(), the filename is refreshed after opening the
backing file, but we neglected to do the same when the backing file
changes later.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
In some cases, bdrv_co_get_block_status() is called recursively for the
whole backing chain. The automatically inserted bdrv_commit_top filter
driver must not stop the recursion, so implement a callback that simply
forwards the request to bs->backing.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
This fixes bdrv_co_get_block_status() for the bdrv_mirror_top block
driver, which must fall through to bs->backing instead of bs->file.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
All callers pass false now, so the parameter can go away again.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Migration is the only code left in the tree that does not react
to bdrv_is_allocated() failures. But as there is no useful way
to react to the failure, and we are merely skipping unallocated
sectors on success, just document that our choice of handling
is intended.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
If bdrv_is_allocated() fails, we should react to that failure.
For 2 of the 3 callers, reporting the error was easy. But in
cluster_was_modified() and its lone caller
get_cluster_count_for_direntry(), it's rather invasive to update
the logic to pass the error back; so there, I went with merely
documenting the issue by changing the return type to bool (in
all likelihood, treating the cluster as modified will then
trigger a read which will also fail, and eventually get to an
error - but given the appalling number of abort() calls in this
code, I'm not making it any worse).
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
If bdrv_is_allocated() fails, we should immediately do the backup
error action, rather than attempting backup_do_cow() (although
that will likely fail too).
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The driver has failed to build since commit da34e65, in qemu 2.6,
due to a missing include of qapi/error.h for error_setg().
Since no one has complained in three releases, it is easier to
remove the dead code than to keep it around, especially since it
is not being built by default and therefore prone to bitrot.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
BlockLimits.max_transfer can be too high without this fix, guest will
encounter I/O error or even get paused with werror=stop or rerror=stop. The
cause is explained below.
Linux has a separate limit, /sys/block/.../queue/max_segments, which in
the worst case can be more restrictive than the BLKSECTGET which we
already consider (note that they are two different things). So, the
failure scenario before this patch is:
1) host device has max_sectors_kb = 4096 and max_segments = 64;
2) guest learns max_sectors_kb limit from QEMU, but doesn't know
max_segments;
3) guest issues e.g. a 512KB request thinking it's okay, but actually
it's not, because it will be passed through to host device as an
SG_IO req that has niov > 64;
4) host kernel doesn't like the segmenting of the request, and returns
-EINVAL;
This patch checks the max_segments sysfs entry for the host device and
calculates a "conservative" bytes limit using the page size, which is
then merged into the existing max_transfer limit. Guest will discover
this from the usual virtual block device interfaces. (In the case of
scsi-generic, it will be done in the INQUIRY reply interception in
device model.)
The other possibility is to actually propagate it as a separate limit,
but it's not better. On the one hand, there is a big complication: the
limit is per-LUN in QEMU PoV (because we can attach LUNs from different
host HBAs to the same virtio-scsi bus), but the channel to communicate
it in a per-LUN manner is missing down the stack; on the other hand,
two limits versus one doesn't change much about the valid size of I/O
(because guest has no control over host segmenting).
Also, the idea to fall back to bounce buffering in QEMU, upon -EINVAL,
was explored. Unfortunately there is no neat way to ensure the bounce
buffer is less segmented (in terms of DMA addr) than the guest buffer.
Practically, this bug is not very common. It is only reported on a
Emulex (lpfc), so it's okay to get it fixed in the easier way.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Currently backup to nbd target is broken, as nbd doesn't have
.bdrv_get_info realization.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
# gpg: Signature made Fri 10 Mar 2017 07:15:38 GMT
# gpg: using RSA key 0xCA35624C6A9171C6
# gpg: Good signature from "Fam Zheng <famz@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 5003 7CB7 9706 0F76 F021 AD56 CA35 624C 6A91 71C6
* remotes/famz/tags/docker-pull-request:
docker/dockerfiles/debian-s390-cross: include clang
tests/docker: support proxy / corporate firewall
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
So far xtensa provides fixed dummy argc/argv for the corresponding
semihosting calls. Now that there are semihosting_get_argc and
semihosting_get_arg, use them to pass actual command line arguments
to guest.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
xtensa linux can use DTB but does not require it, so FDT support is not
a requirement for target/xtensa. Don't try to load DTB when FDT support
is not configured.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
It's a silly little limitation on Shippable that is looks for clang
in the container even though we won't use it. The arm/aarch64 cross
builds inherit this from debian.docker but as we needed to use
debian-testing for this we add it here. We also collapse the update
step into one RUN line to remove and intermediate layer of the docker
build.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20170306112848.659-1-alex.bennee@linaro.org>
Signed-off-by: Fam Zheng <famz@redhat.com>
Fix-ups for MTTCG regressions for 2.9
This is the same as v3 posted a few days ago except with a few extra
Reviewed-by tags added.
# gpg: Signature made Thu 09 Mar 2017 10:45:18 GMT
# gpg: using RSA key 0xFBD0DB095A9E2A44
# gpg: Good signature from "Alex Bennée (Master Work Key) <alex.bennee@linaro.org>"
# Primary key fingerprint: 6685 AE99 E751 67BC AFC8 DF35 FBD0 DB09 5A9E 2A44
* remotes/stsquad/tags/pull-mttcg-fixups-090317-1:
hw/intc/arm_gic: modernise the DPRINTF
target/arm/helper: make it clear the EC field is also in hex
target-i386: defer VMEXIT to do_interrupt
target/mips: hold BQL for timer interrupts
translate-all: exit cpu_restore_state early if translating
target/xtensa: hold BQL for interrupt processing
s390x/misc_helper.c: wrap IO instructions in BQL
sparc/sparc64: grab BQL before calling cpu_check_irqs
cpus.c: add additional error_report when !TARGET_SUPPORT_MTTCG
target/i386/cpu.h: declare TCG_GUEST_DEFAULT_MO
vl/cpus: be smarter with icount and MTTCG
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
While I was debugging the icount issues I realised a bunch of the
messages look quite similar. I've fixed this by including __func__ in
the debug print. At the same time I move the a modern if (GATE) style
printf which ensures the compiler can check for format string errors
even if the code gets optimised away in the non-DEBUG_GIC case.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
..just like the rest of the displayed ESR register. Otherwise people
might scratch their heads if a not obviously hex number is displayed
for the EC field.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: KONRAD Frederic <fred.konrad@greensocs.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Paths through the softmmu code during code generation now need to be audited
to check for double locking of tb_lock. In particular, VMEXIT can take tb_lock
through cpu_vmexit -> cpu_x86_update_cr4 -> tlb_flush.
To avoid this, split VMEXIT delivery in two parts, similar to what is done with
exceptions. cpu_vmexit only records the VMEXIT exit code and information, and
cc->do_interrupt can then deliver it when it is safe to take the lock.
Reported-by: Alexander Boettcher <alexander.boettcher@genode-labs.com>
Suggested-by: Richard Henderson <rth@twiddle.net>
Tested-by: Alexander Boettcher <alexander.boettcher@genode-labs.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
The translation code uses cpu_ld*_code which can trigger a tlb_fill
which if it fails will erroneously attempts a fault resolution. This
never works during translation as the TB being generated hasn't been
added yet. The target should have checked retaddr before calling
cpu_restore_state but for those that have yet to be fixed we do it
here to avoid a recursive tb_lock() under MTTCG's new locking regime.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Helpers that can trigger IO events (including interrupts) need to be
protected by the BQL. I've updated all the helpers that call into an
ioinst_handle_* functions.
Reported-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
IRQ modification is part of device emulation and should be done while
the BQL is held to prevent races when MTTCG is enabled. This adds
assertions in the hw emulation layer and wraps the calls from helpers
in the BQL.
Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
While we may fail the memory ordering check later that can be
confusing. So in cases where TARGET_SUPPORT_MTTCG has yet to be
defined we should say so specifically.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
This suppresses the incorrect warning when forcing MTTCG for x86
guests on x86 hosts. A future patch will still warn when
TARGET_SUPPORT_MTTCG hasn't been defined for the guest (which is still
pending for x86).
Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Acked-by: Eduardo Habkost <ehabkost@redhat.com>
The sense of the test was inverted. Make it simple, if icount is
enabled then we disabled MTTCG by default. If the user tries to force
MTTCG upon us then we tell them "no".
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Block layer fixes for 2.9.0-rc0
# gpg: Signature made Tue 07 Mar 2017 14:59:18 GMT
# gpg: using RSA key 0x7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* remotes/kevin/tags/for-upstream: (27 commits)
commit: Don't use error_abort in commit_start
block: Don't use error_abort in blk_new_open
sheepdog: Support blockdev-add
qapi-schema: Rename SocketAddressFlat's variant tcp to inet
qapi-schema: Rename GlusterServer to SocketAddressFlat
gluster: Plug memory leaks in qemu_gluster_parse_json()
gluster: Don't duplicate qapi-util.c's qapi_enum_parse()
gluster: Drop assumptions on SocketTransport names
sheepdog: Implement bdrv_parse_filename()
sheepdog: Use SocketAddress and socket_connect()
sheepdog: Report errors in pseudo-filename more usefully
sheepdog: Don't truncate long VDI name in _open(), _create()
sheepdog: Fix snapshot ID parsing in _open(), _create, _goto()
sheepdog: Mark sd_snapshot_delete() lossage FIXME
sheepdog: Fix error handling sd_create()
sheepdog: Fix error handling in sd_snapshot_delete()
sheepdog: Defuse time bomb in sd_open() error handling
block: Fix error handling in bdrv_replace_in_backing_chain()
block: Handle permission errors in change_parent_backing_link()
block: Ignore multiple children in bdrv_check_update_perm()
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
bdrv_set_backing_hd failure needn't be abort. Since we already have
error parameter, use it.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We have an errp and bdrv_root_attach_child can fail permission check,
error_abort is not the best choice here.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
QAPI type SocketAddressFlat differs from SocketAddress pointlessly:
the discriminator value for variant InetSocketAddress is 'tcp' instead
of 'inet'. Rename.
The type is so far only used by the Gluster block drivers. Take care
to keep 'tcp' working in things like -drive's file.server.0.type=tcp.
The "gluster+tcp" URI scheme in pseudo-filenames stays the same.
blockdev-add changes, but it has changed incompatibly since 2.8
already.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
As its documentation says, it's not specific to Gluster. Rename it,
as I'm going to use it for something else.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
To reproduce, run
$ valgrind qemu-system-x86_64 --nodefaults -S --drive driver=gluster,volume=testvol,path=/a/b/c,server.0.type=xxx
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
qemu_gluster_glfs_init() passes the names of QAPI enumeration type
SocketTransport to glfs_set_volfile_server(). Works, because they
were chosen to match. But the coupling is artificial. Use the
appropriate literal strings instead.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This permits configuration with driver-specific options in addition to
pseudo-filename parsed as URI. For instance,
--drive driver=sheepdog,host=fido,vdi=dolly
instead of
--drive driver=sheepdog,file=sheepdog://fido/dolly
It's also a first step towards supporting blockdev-add.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
sd_parse_uri() builds a string from host and port parts for
inet_connect(). inet_connect() parses it into host, port and options.
Whether this gets exactly the same host, port and no options for all
inputs is not obvious.
Cut out the string middleman and build a SocketAddress for
socket_connect() instead.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Errors in the pseudo-filename are all reported with the same laconic
"Can't parse filename" message.
Add real error reporting, such as:
$ qemu-system-x86_64 --drive driver=sheepdog,filename=sheepdog:///
qemu-system-x86_64: --drive driver=sheepdog,filename=sheepdog:///: missing file path in URI
$ qemu-system-x86_64 --drive driver=sheepdog,filename=sheepgod:///vdi
qemu-system-x86_64: --drive driver=sheepdog,filename=sheepgod:///vdi: URI scheme must be 'sheepdog', 'sheepdog+tcp', or 'sheepdog+unix'
$ qemu-system-x86_64 --drive driver=sheepdog,filename=sheepdog+unix:///vdi?socke=sheepdog.sock
qemu-system-x86_64: --drive driver=sheepdog,filename=sheepdog+unix:///vdi?socke=sheepdog.sock: unexpected query parameters
The code to translate legacy syntax to URI fails to escape URI
meta-characters. The new error messages are misleading then. Replace
them by the old "Can't parse filename" message. "Internal error"
would be more honest. Anyway, no worse than before. Also add a FIXME
comment.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
sd_parse_uri() truncates long VDI names silently. Reject them
instead.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
sd_parse_uri() and sd_snapshot_goto() screw up error checking after
strtoul(), and truncate long tag names silently. Fix by replacing
those parts by new sd_parse_snapid_or_tag(), which checks more
carefully.
sd_snapshot_delete() also parses snapshot IDs, but is currently too
broken for me to touch. Mark TODO.
Two calls of strtol() without error checking remain in
parse_redundancy(). Mark them FIXME.
More silent truncation of configuration strings remains elsewhere.
Not marked.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
sd_snapshot_delete() should delete the snapshot whose ID matches
@snapshot_id and whose name matches @name. But that's not what it
does. If @snapshot_id is a valid ID, it deletes the snapshot with
that ID, else it deletes the snapshot with that name. It doesn't use
@name at all. Add suitable FIXME comments, so someone who actually
knows Sheepdog can fix it.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
As a bdrv_create() method, sd_create() must set an error and return
negative errno on failure. It prints the error instead of setting it
when connect_to_sdog() fails. Fix that.
While there, return the value of connect_to_sdog() like we do
elsewhere, instead of -EIO. No functional change, as
connect_to_sdog() returns no other error code.
Many more suspicious uses of error_report() and error_report_err()
remain in other functions. Left for another day.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
As a bdrv_snapshot_delete() method, sd_snapshot_delete() must set an
error and return negative errno on failure. It sometimes returns -1,
and sometimes neglects to set an error. It also prints error messages
with error_report(). Fix all that.
Moreover, its handling of an attempt to delete a nonexistent snapshot
is wrong: it error_report()s and succeeds. Fix it to set an error and
return -ENOENT instead.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When qemu_opts_absorb_qdict() fails, sd_open() closes stdin, because
sd->fd is still zero. Fortunately, qemu_opts_absorb_qdict() can't
fail, because:
1. it only fails when qemu_opt_parse() fails, and
2. the only member of runtime_opts.desc[] is a QEMU_OPT_STRING, and
3. qemu_opt_parse() can't fail for QEMU_OPT_STRING.
Defuse this ticking time bomb by jumping behind the file descriptor
cleanup on error.
Also do that for the error paths where sd->fd is still -1. The file
descriptor cleanup happens to do nothing then, but let's not rely on
that here.
While there, rename label out to err, because it's on the error path,
not the normal path out of the function.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When adding an Error parameter, bdrv_replace_in_backing_chain() would
become nothing more than a wrapper around change_parent_backing_link().
So make the latter public, renamed as bdrv_replace_node(), and remove
bdrv_replace_in_backing_chain().
Most of the callers just remove a node from the graph that they just
inserted, so they can use &error_abort, but completion of a mirror job
with 'replaces' set can actually fail.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Instead of just trying to change parents by parent over to reference @to
instead of @from, and abort()ing whenever the permissions don't allow
this, do proper permission checking beforehand and pass any error to the
callers.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
change_parent_backing_link() will need to update multiple BdrvChild
objects at once. Checking permissions reference by reference doesn't
work because permissions need to be consistent only with all parents
moved to the new child.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
For blockdev-snapshot, external_snapshot_prepare() accepts an arbitrary
node reference at first and only checks later whether it already has a
backing file. Between those places, other errors can occur.
Therefore checking in external_snapshot_abort() whether state->new_bs
has a backing file is not sufficient to tell whether bdrv_append() was
already completed or not. Trying to undo the bdrv_append() when it
wasn't even executed is wrong.
Introduce a new boolean flag in the state to fix this.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
mirror_top_bs must be removed from the graph again when creating the
dirty bitmap fails.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
mirror_top_bs takes write permissions on its backing file, which can
make it impossible to attach that backing file node to another parent.
However, this is exactly what needs to be done in order to remove
mirror_top_bs from the backing chain. So give up the write permission
first.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
The 'replaces' option of drive-mirror can be used to mirror a Quorum
node to a new image and then let the target image replace one of the
Quorum children. In order for this graph modification to succeed, the
mirror job needs to lift its restrictions on the target node first
before actually replacing the child.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Apparently some kind of mismerge happened in commit 8dfba279, which
broke the error handling without any real reason by removing the
assignment of the return value to ret in a blk_insert_bs() call.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
if ftp_proxy/http_proxy/https_proxy standard environment variables available,
pass them to the docker daemon to build images.
this is required when building behind corporate proxy/firewall, but also help
when using local cache server (ie: apt/yum).
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20170306205520.32311-1-f4bug@amsat.org>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2017-03-07 18:20:40 +08:00
775 changed files with 18684 additions and 9699 deletions
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.