Change qemu_config_parse() to return the number of config groups
in success and -EINVAL on error. This will allow callers of
qemu_config_parse() to check if something was really loaded from
the config file.
All existing callers of qemu_config_parse() and
qemu_read_config_file() only check if the return value was
negative, so the change shouldn't affect them.
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20171004025043.3788-2-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
This patch add a MachineClass element that can be set in the machine C
code to specify a list of supported CPU types. If the supported CPU
types are specified the user enter CPU (by -cpu at runtime) is checked
against the supported types and QEMU exits if they aren't supported.
Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Message-Id: <b8474e9d2e0a219d9bac901342f983b13d009301.1507059418.git.alistair.francis@xilinx.com>
[ehabkost: removed assert(), rewrote comment]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Block layer patches
# gpg: Signature made Fri 06 Oct 2017 16:52:59 BST
# gpg: using RSA key 0x7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* remotes/kevin/tags/for-upstream: (54 commits)
block/mirror: check backing in bdrv_mirror_top_flush
qcow2: truncate the tail of the image file after shrinking the image
qcow2: fix return error code in qcow2_truncate()
iotests: Fix 195 if IMGFMT is part of TEST_DIR
block/mirror: check backing in bdrv_mirror_top_refresh_filename
block: support passthrough of BDRV_REQ_FUA in crypto driver
block: convert qcrypto_block_encrypt|decrypt to take bytes offset
block: convert crypto driver to bdrv_co_preadv|pwritev
block: fix data type casting for crypto payload offset
crypto: expose encryption sector size in APIs
block: use 1 MB bounce buffers for crypto instead of 16KB
iotests: Add test 197 for covering copy-on-read
block: Perform copy-on-read in loop
block: Add blkdebug hook for copy-on-read
iotests: Restore stty settings on completion
block: Uniform handling of 0-length bdrv_get_block_status()
qemu-io: Add -C for opening with copy-on-read
commit: Remove overlay_bs
qemu-iotests: Test commit block job where top has two parents
qemu-iotests: Allow QMP pretty printing in common.qemu
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
target-arm:
* v8M: more preparatory work
* nvic: reset properly rather than leaving the nvic in a weird state
* xlnx-zynqmp: Mark the "xlnx, zynqmp" device with user_creatable = false
* sd: fix out-of-bounds check for multi block reads
* arm: Fix SMC reporting to EL2 when QEMU provides PSCI
# gpg: Signature made Fri 06 Oct 2017 16:58:15 BST
# gpg: using RSA key 0x3C2525ED14360CDE
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>"
# gpg: aka "Peter Maydell <pmaydell@gmail.com>"
# gpg: aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>"
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83 15CF 3C25 25ED 1436 0CDE
* remotes/pmaydell/tags/pull-target-arm-20171006:
nvic: Add missing code for writing SHCSR.HARDFAULTPENDED bit
target/arm: Factor out "get mmuidx for specified security state"
target/arm: Fix calculation of secure mm_idx values
target/arm: Implement security attribute lookups for memory accesses
nvic: Implement Security Attribution Unit registers
target/arm: Add v8M support to exception entry code
target/arm: Add support for restoring v8M additional state context
target/arm: Update excret sanity checks for v8M
target/arm: Add new-in-v8M SFSR and SFAR
target/arm: Don't warn about exception return with PC low bit set for v8M
target/arm: Warn about restoring to unaligned stack
target/arm: Check for xPSR mismatch usage faults earlier for v8M
target/arm: Restore SPSEL to correct CONTROL register on exception return
target/arm: Restore security state on exception return
target/arm: Prepare for CONTROL.SPSEL being nonzero in Handler mode
target/arm: Don't switch to target stack early in v7M exception return
nvic: Clear the vector arrays and prigroup on reset
hw/arm/xlnx-zynqmp: Mark the "xlnx, zynqmp" device with user_creatable = false
hw/sd: fix out-of-bounds check for multi block reads
arm: Fix SMC reporting to EL2 when QEMU provides PSCI
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
For the SG instruction and secure function return we are going
to want to do memory accesses using the MMU index of the CPU
in secure state, even though the CPU is currently in non-secure
state. Write arm_v7m_mmu_idx_for_secstate() to do this job,
and use it in cpu_mmu_index().
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-17-git-send-email-peter.maydell@linaro.org
In cpu_mmu_index() we try to do this:
if (env->v7m.secure) {
mmu_idx += ARMMMUIdx_MSUser;
}
but it will give the wrong answer, because ARMMMUIdx_MSUser
includes the 0x40 ARM_MMU_IDX_M field, and so does the
mmu_idx we're adding to, and we'll end up with 0x8n rather
than 0x4n. This error is then nullified by the call to
arm_to_core_mmu_idx() which masks out the high part, but
we're about to factor out the code that calculates the
ARMMMUIdx values so it can be used without passing it through
arm_to_core_mmu_idx(), so fix this bug first.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-16-git-send-email-peter.maydell@linaro.org
Implement the security attribute lookups for memory accesses
in the get_phys_addr() functions, causing these to generate
various kinds of SecureFault for bad accesses.
The major subtlety in this code relates to handling of the
case when the security attributes the SAU assigns to the
address don't match the current security state of the CPU.
In the ARM ARM pseudocode for validating instruction
accesses, the security attributes of the address determine
whether the Secure or NonSecure MPU state is used. At face
value, handling this would require us to encode the relevant
bits of state into mmu_idx for both S and NS at once, which
would result in our needing 16 mmu indexes. Fortunately we
don't actually need to do this because a mismatch between
address attributes and CPU state means either:
* some kind of fault (usually a SecureFault, but in theory
perhaps a UserFault for unaligned access to Device memory)
* execution of the SG instruction in NS state from a
Secure & NonSecure code region
The purpose of SG is simply to flip the CPU into Secure
state, so we can handle it by emulating execution of that
instruction directly in arm_v7m_cpu_do_interrupt(), which
means we can treat all the mismatch cases as "throw an
exception" and we don't need to encode the state of the
other MPU bank into our mmu_idx values.
This commit doesn't include the actual emulation of SG;
it also doesn't include implementation of the IDAU, which
is a per-board way to specify hard-coded memory attributes
for addresses, which override the CPU-internal SAU if they
specify a more secure setting than the SAU is programmed to.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-15-git-send-email-peter.maydell@linaro.org
Implement the register interface for the SAU: SAU_CTRL,
SAU_TYPE, SAU_RNR, SAU_RBAR and SAU_RLAR. None of the
actual behaviour is implemented here; registers just
read back as written.
When the CPU definition for Cortex-M33 is eventually
added, its initfn will set cpu->sau_sregion, in the same
way that we currently set cpu->pmsav7_dregion for the
M3 and M4.
Number of SAU regions is typically a configurable
CPU parameter, but this patch doesn't provide a
QEMU CPU property for it. We can easily add one when
we have a board that requires it.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-14-git-send-email-peter.maydell@linaro.org
Add support for v8M and in particular the security extension
to the exception entry code. This requires changes to:
* calculation of the exception-return magic LR value
* push the callee-saves registers in certain cases
* clear registers when taking non-secure exceptions to avoid
leaking information from the interrupted secure code
* switch to the correct security state on entry
* use the vector table for the security state we're targeting
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-13-git-send-email-peter.maydell@linaro.org
Attempting to do an exception return with an exception frame that
is not 8-aligned is UNPREDICTABLE in v8M; warn about this.
(It is not UNPREDICTABLE in v7M, and our implementation can
handle the merely-4-aligned case fine, so we don't need to
do anything except warn.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-8-git-send-email-peter.maydell@linaro.org
ARM v8M specifies that the INVPC usage fault for mismatched
xPSR exception field and handler mode bit should be checked
before updating the PSR and SP, so that the fault is taken
with the existing stack frame rather than by pushing a new one.
Perform this check in the right place for v8M.
Since v7M specifies in its pseudocode that this usage fault
check should happen later, we have to retain the original
code for that check rather than being able to merge the two.
(The distinction is architecturally visible but only in
very obscure corner cases like attempting an invalid exception
return with an exception frame in read only memory.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-7-git-send-email-peter.maydell@linaro.org
On exception return for v8M, the SPSEL bit in the EXC_RETURN magic
value should be restored to the SPSEL bit in the CONTROL register
banked specified by the EXC_RETURN.ES bit.
Add write_v7m_control_spsel_for_secstate() which behaves like
write_v7m_control_spsel() but allows the caller to specify which
CONTROL bank to use, reimplement write_v7m_control_spsel() in
terms of it, and use it in exception return.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-6-git-send-email-peter.maydell@linaro.org
Now that we can handle the CONTROL.SPSEL bit not necessarily being
in sync with the current stack pointer, we can restore the correct
security state on exception return. This happens before we start
to read registers off the stack frame, but after we have taken
possible usage faults for bad exception return magic values and
updated CONTROL.SPSEL.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-5-git-send-email-peter.maydell@linaro.org
In the v7M architecture, there is an invariant that if the CPU is
in Handler mode then the CONTROL.SPSEL bit cannot be nonzero.
This in turn means that the current stack pointer is always
indicated by CONTROL.SPSEL, even though Handler mode always uses
the Main stack pointer.
In v8M, this invariant is removed, and CONTROL.SPSEL may now
be nonzero in Handler mode (though Handler mode still always
uses the Main stack pointer). In preparation for this change,
change how we handle this bit: rename switch_v7m_sp() to
the now more accurate write_v7m_control_spsel(), and make it
check both the handler mode state and the SPSEL bit.
Note that this implicitly changes the point at which we switch
active SP on exception exit from before we pop the exception
frame to after it.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-4-git-send-email-peter.maydell@linaro.org
Currently our M profile exception return code switches to the
target stack pointer relatively early in the process, before
it tries to pop the exception frame off the stack. This is
awkward for v8M for two reasons:
* in v8M the process vs main stack pointer is not selected
purely by the value of CONTROL.SPSEL, so updating SPSEL
and relying on that to switch to the right stack pointer
won't work
* the stack we should be reading the stack frame from and
the stack we will eventually switch to might not be the
same if the guest is doing strange things
Change our exception return code to use a 'frame pointer'
to read the exception frame rather than assuming that we
can switch the live stack pointer this early.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-3-git-send-email-peter.maydell@linaro.org
Reset for devices does not include an automatic clear of the
device state (unlike CPU state, where most of the state
structure is cleared to zero). Add some missing initialization
of NVIC state that meant that the device was left in the wrong
state if the guest did a warm reset.
(In particular, since we were resetting the computed state like
s->exception_prio but not all the state it was computed
from like s->vectors[x].active, the NVIC wound up in an
inconsistent state that could later trigger assertion failures.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 1506092407-26985-2-git-send-email-peter.maydell@linaro.org
The device uses serial_hds in its realize function and thus can't be
used twice. Apart from that, the comma in its name makes it quite hard
to use for the user anyway, since a comma is normally used to separate
the device name from its properties when using the "-device" parameter
or the "device_add" HMP command.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Message-id: 1506441116-16627-1-git-send-email-thuth@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Block patches
# gpg: Signature made Fri Oct 6 16:30:57 2017 CEST
# gpg: using RSA key F407DB0061D5CF40
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>"
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1 1829 F407 DB00 61D5 CF40
* mreitz/tags/pull-block-2017-10-06:
block/mirror: check backing in bdrv_mirror_top_flush
qcow2: truncate the tail of the image file after shrinking the image
qcow2: fix return error code in qcow2_truncate()
iotests: Fix 195 if IMGFMT is part of TEST_DIR
block/mirror: check backing in bdrv_mirror_top_refresh_filename
block: support passthrough of BDRV_REQ_FUA in crypto driver
block: convert qcrypto_block_encrypt|decrypt to take bytes offset
block: convert crypto driver to bdrv_co_preadv|pwritev
block: fix data type casting for crypto payload offset
crypto: expose encryption sector size in APIs
block: use 1 MB bounce buffers for crypto instead of 16KB
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
do_run_qemu() in iotest 195 first applies _filter_imgfmt when printing
qemu's command line and _filter_testdir only afterwards. Therefore, if
the image format is part of the test directory path, _filter_testdir
will no longer apply and the actual output will differ from the
reference output even in case of success.
For example, TEST_DIR might be "/tmp/test-qcow2", in which case
_filter_imgfmt first transforms this to "/tmp/test-IMGFMT" which is no
longer recognized as the TEST_DIR by _filter_testdir.
Fix this by not applying _filter_imgfmt in do_run_qemu() but in
run_qemu() instead, and only after _filter_testdir.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170927211334.3988-1-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Make the crypto driver implement the bdrv_co_preadv|pwritev
callbacks, and also use bdrv_co_preadv|pwritev for I/O
with the protocol driver beneath. This replaces sector based
I/O with byte based I/O, and allows us to stop assuming the
physical sector size matches the encryption sector size.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170927125340.12360-5-berrange@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
The crypto APIs report the offset of the data payload as an uint64_t
type, but the block driver is casting to size_t or ssize_t which will
potentially truncate.
Most of the block APIs use int64_t for offsets meanwhile, so even if
using uint64_t in the crypto block driver we are still at risk of
truncation.
Change the block crypto driver to use uint64_t, but add asserts that
the value is less than INT64_MAX.
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170927125340.12360-4-berrange@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
While current encryption schemes all have a fixed sector size of
512 bytes, this is not guaranteed to be the case in future. Expose
the sector size in the APIs so the block layer can remove assumptions
about fixed 512 byte sectors.
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170927125340.12360-3-berrange@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Using 16KB bounce buffers creates a significant performance
penalty for I/O to encrypted volumes on storage which high
I/O latency (rotating rust & network drives), because it
triggers lots of fairly small I/O operations.
On tests with rotating rust, and cache=none|directsync,
write speed increased from 2MiB/s to 32MiB/s, on a par
with that achieved by the in-kernel luks driver. With
other cache modes the in-kernel driver is still notably
faster because it is able to report completion of the
I/O request before any encryption is done, while the
in-QEMU driver must encrypt the data before completion.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170927125340.12360-2-berrange@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Add a test for qcow2 copy-on-read behavior, including exposure
for the just-fixed bugs.
The copy-on-read behavior is always to a qcow2 image, but the
test is careful to allow running with most image protocol/format
combos as the backing file being copied from (luks being the
exception, as it is harder to pass the right secret to all the
right places). In fact, for './check nbd', this appears to be
the first time we've had a qcow2 image wrapping NBD, requiring
an additional line in _filter_img_create to match the similar
line in _filter_img_info.
Invoking blkdebug to prove we don't write too much took some
effort to get working; and it requires that $TEST_WRAP (based
on $TEST_DIR) not be subject to word splitting. We may decide
later to have the entire iotests suite use relative rather than
absolute names, to avoid problems inherited by the absolute
name of $PWD or $TEST_DIR, at which point the sanity check in
this commit could be simplified.
This test requires at least 2G of consecutive memory to succeed;
as such, it is prone to spurious failures, particularly on
32-bit machines under load. This situation is detected and
triggers an early exit to skip the test, rather than a failure.
To manually provoke this setup on a beefier machine, I used:
$ (ulimit -S -v 1000000; ./check -qcow2 197)
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Improve our braindead copy-on-read implementation. Pre-patch,
we have multiple issues:
- we create a bounce buffer and perform a write for the entire
request, even if the active image already has 99% of the
clusters occupied, and really only needs to copy-on-read the
remaining 1% of the clusters
- our bounce buffer was as large as the read request, and can
needlessly exhaust our memory by using double the memory of
the request size (the original request plus our bounce buffer),
rather than a capped maximum overhead beyond the original
- if a driver has a max_transfer limit, we are bypassing the
normal code in bdrv_aligned_preadv() that fragments to that
limit, and instead attempt to read the entire buffer from the
driver in one go, which some drivers may assert on
- a client can request a large request of nearly 2G such that
rounding the request out to cluster boundaries results in a
byte count larger than 2G. While this cannot exceed 32 bits,
it DOES have some follow-on problems:
-- the call to bdrv_driver_pread() can assert for exceeding
BDRV_REQUEST_MAX_BYTES, if the driver is old and lacks
.bdrv_co_preadv
-- if the buffer is all zeroes, the subsequent call to
bdrv_co_do_pwrite_zeroes is a no-op due to a negative size,
which means we did not actually copy on read
Fix all of these issues by breaking up the action into a loop,
where each iteration is capped to sane limits. Also, querying
the allocation status allows us to optimize: when data is
already present in the active layer, we don't need to bounce.
Note that the code has a telling comment that copy-on-read
should probably be a filter driver rather than a bolt-on hack
in io.c; but that remains a task for another day.
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Executing qemu with a terminal as stdin will temporarily alter stty
settings on that terminal (for example, disabling echo), because of
how we run both the monitor and any multiplexing with guest input.
Normally, qemu restores the original settings on exit; but if an
iotest triggers qemu to abort in the middle, we can be left with
the altered terminal setup. This can make life very annoying when
debugging an iotest failure (not everyone remembers the trick of
blind-typing 'stty sane' without echo, and some people prefer
terminal settings that are slightly different than the defaults
picked by 'stty sane').
It is possible to avoid qemu corrupting the terminal by not passing
a terminal to qemu's stdin in the first place (as in, use
'./check ... </dev/null'), but that's extra typing to have to
remember. But running 'exec </dev/null' in the harness seems like
it might be too heavy of a hammer. So I instead went the the
solution of saving and restoring the stty settings, only when the
harness detects that it is run interactively.
I tested this patch by forcing an allocation failure (I can't
guarantee that this particular limit will work on all setups, but
it shows the idea):
$ (ulimit -S -v 500000; ./check -qcow2 1)
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Handle a 0-length block status request up front, with a uniform
return value claiming the area is not allocated.
Most callers don't pass a length of 0 to bdrv_get_block_status()
and friends; but it definitely happens with a 0-length read when
copy-on-read is enabled. While we could audit all callers to
ensure that they never make a 0-length request, and then assert
that fact, it was just as easy to fix things to always report
success (as long as the callers are careful to not go into an
infinite loop). However, we had inconsistent behavior on whether
the status is reported as allocated or defers to the backing
layer, depending on what callbacks the driver implements, and
possibly wasting quite a few CPU cycles to get to that answer.
Consistently reporting unallocated up front doesn't really hurt
anything, and makes it easier both for callers (0-length requests
now have well-defined behavior) and for drivers (drivers don't
have to deal with 0-length requests).
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We don't need to make any assumptions about the graph layout above the
top node of the commit operation any more. Remove the use of
bdrv_find_overlay() and related variables from the commit job code.
bdrv_drop_intermediate() doesn't use the 'active' parameter any more, so
we can just drop it.
The overlay node was previously added to the block job to get a
BLK_PERM_GRAPH_MOD. We really need to respect those permissions in
bdrv_drop_intermediate() now, but as long as we haven't figured out yet
how BLK_PERM_GRAPH_MOD is actually supposed to work, just leave a TODO
comment there.
With this change, it is now possible to perform another block job on an
overlay node without conflicts. qemu-iotests 030 is changed accordingly.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
QMP responses to certain commands can become quite long, which doesn't
only make reading them hard, but also means that the maximum line length
in patch emails can be exceeded. Allow tests to switch to QMP pretty
printing, which results in more, but shorter lines.
We also need to make sure to keep indentation in the response for this
to work as expected.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
This changes the commit block job to support operation in a graph where
there is more than a single active layer that references the top node.
This involves inserting the commit filter node not only on the path
between the given active node and the top node, but between the top node
and all of its parents.
On completion, bdrv_drop_intermediate() must consider all parents for
updating the backing file link. These parents may be backing files
themselves and as such read-only; reopen them temporarily if necessary.
Previously this was achieved by the bdrv_reopen() calls in the commit
block job that made overlay_bs read-write for the whole duration of the
block job, even though write access is only needed on completion.
Now that we consider all parents, overlay_bs is meaningless. It is left
in place in this commit, but we'll remove it soon.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
There is no good reason for bdrv_drop_intermediate() to know the active
layer above the subchain it is operating on - even more so, because
the assumption that there is a single active layer above it is not
generally true.
In order to prepare removal of the active parameter, use a BdrvChildRole
callback to update the backing file string in the overlay image instead
of directly calling bdrv_change_backing_file().
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
"check" is full of qemu-iotests--specific details. Separating it
from "common" does not make much sense anymore.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The variable is almost unused, and one of the two uses is actually
uninitialized.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The variable is used in "common" but defined only after the file
is sourced.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Split "check" parts from tests part.
For the directory setup, the actual computation of directories goes
in "check", while the sanity checks go in the tests.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
These are never used by "check", with one exception that does not need
$QEMU_OPTIONS. Keep them in common.rc, which will be soon included only
by the tests.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Instead of ./check failing when a binary is missing, we try each test
case now and each one fails with tons of test case diffs. Also, all the
variables were initialized by "check" prior to "common" being sourced,
and then (uselessly) checked for emptiness again in "check".
Centralize the search for programs in "common" (which will soon be
one with "check"), including the "realpath" invocation which can be done
just once in "check" rather than in the tests.
For qnio_server, move the detection to "common", simplifying
set_prog_path to stop handling the unused second argument, and
embedding the "realpath" pass.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Some functions in common.rc are never used by the tests. Move
them out of that file and into common, which is already included
only by "check".
Code that actually *is* common to "check" and tests can be placed in
common.config.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This includes shell function, shell variables and command line options
(randomize.awk does not exist).
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Now that all callers are using byte-based interfaces, there's no
reason for our internal hbitmap to remain with sector-based
granularity. It also simplifies our internal scaling, since we
already know that hbitmap widens requests out to granularity
boundaries.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Both callers already had bytes available, but were scaling to
sectors. Move the scaling to internal code. In the case of
bdrv_aligned_pwritev(), we are now passing the exact offset
rather than a rounded sector-aligned value, but that's okay
as long as dirty bitmap widens start/bytes to granularity
boundaries.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Now that we have adjusted the majority of the calls this function
makes to be byte-based, it is easier to read the code if it makes
passes over the image using bytes rather than sectors.
iotests 165 was rather weak - on a default 64k-cluster image, where
bitmap granularity also defaults to 64k bytes, a single cluster of
the bitmap table thus covers (64*1024*8) bits which each cover 64k
bytes, or 32G of image space. But the test only uses a 1G image,
so it cannot trigger any more than one loop of the code in
store_bitmap_data(); and it was writing to the first cluster. In
order to test that we are properly aligning which portions of the
bitmap are being written to the file, we really want to test a case
where the first dirty bit returned by bdrv_dirty_iter_next() is not
aligned to the start of a cluster, which we can do by modifying the
test to write data that doesn't happen to fall in the first cluster
of the image.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Now that we have adjusted the majority of the calls this function
makes to be byte-based, it is easier to read the code if it makes
passes over the image using bytes rather than sectors.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This is new code, but it is easier to read if it makes passes over
the image using bytes rather than sectors (and will get easier in
the future when bdrv_get_block_status is converted to byte-based).
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Now that we have adjusted the majority of the calls this function
makes to be byte-based, it is easier to read the code if it makes
passes over the image using bytes rather than sectors.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Some of the callers were already scaling bytes to sectors; others
can be easily converted to pass byte offsets, all in our shift
towards a consistent byte interface everywhere. Making the change
will also make it easier to write the hold-out callers to use byte
rather than sectors for their iterations; it also makes it easier
for a future dirty-bitmap patch to offload scaling over to the
internal hbitmap. Although all callers happen to pass
sector-aligned values, make the internal scaling robust to any
sub-sector requests.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Half the callers were already scaling bytes to sectors; the other
half can eventually be simplified to use byte iteration. Both
callers were already using the result as a bool, so make that
explicit. Making the change also makes it easier for a future
dirty-bitmap patch to offload scaling over to the internal hbitmap.
Remember, asking whether a byte is dirty is effectively asking
whether the entire granularity containing the byte is dirty, since
we only track dirtiness by granularity.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Thanks to recent cleanups, all callers were scaling a return value
of sectors into bytes; do the scaling internally instead.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Thanks to recent cleanups, most callers were scaling a return value
of sectors into bytes (the exception, in qcow2-bitmap, will be
converted to byte-based iteration later). Update the interface to
do the scaling internally instead.
In qcow2-bitmap, the code was specifically checking for an error
return of -1. To avoid a regression, we either have to make sure
we continue to return -1 (rather than a scaled -512) on error, or
we have to fix the caller to treat all negative values as error
rather than just one magic value. It's easy enough to make both
changes at the same time, even though either one in isolation
would work.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
All callers to bdrv_dirty_iter_new() passed 0 for their initial
starting point, drop that parameter.
Most callers to bdrv_set_dirty_iter() were scaling a byte offset to
a sector number; the exception qcow2-bitmap will be converted later
to use byte rather than sector iteration. Move the scaling to occur
internally to dirty bitmap code instead, so that callers now pass
in bytes.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based. Change the qcow2 bitmap
helper function sectors_covered_by_bitmap_cluster(), renaming it
to bytes_covered_by_bitmap_cluster() in the process.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Right now, the dirty-bitmap code exposes the fact that we use
a scale of sector granularity in the underlying hbitmap to anything
that wants to serialize a dirty bitmap. It's nicer to uniformly
expose bytes as our dirty-bitmap interface, matching the previous
change to bitmap size. The only caller to serialization is currently
qcow2-cluster.c, which becomes a bit more verbose because it is still
tracking sectors for other reasons, but a later patch will fix that
to more uniformly use byte offsets everywhere. Likewise, within
dirty-bitmap, we have to add more assertions that we are not
truncating incorrectly, which can go away once the internal hbitmap
is byte-based rather than sector-based.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We are still using an internal hbitmap that tracks a size in sectors,
with the granularity scaled down accordingly, because it lets us
use a shortcut for our iterators which are currently sector-based.
But there's no reason we can't track the dirty bitmap size in bytes,
since it is (mostly) an internal-only variable (remember, the size
is how many bytes are covered by the bitmap, not how many bytes the
bitmap occupies). A later cleanup will convert dirty bitmap
internals to be entirely byte-based, eliminating the intermediate
sector rounding added here; and technically, since bdrv_getlength()
already rounds up to sectors, our use of DIV_ROUND_UP is more for
theoretical completeness than for any actual rounding.
Use is_power_of_2() while at it, instead of open-coding that.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We're already reporting bytes for bdrv_dirty_bitmap_granularity();
mixing bytes and sectors in our return values is a recipe for
confusion. A later cleanup will convert dirty bitmap internals
to be entirely byte-based, but in the meantime, we should report
the bitmap size in bytes.
The only external caller in qcow2-bitmap.c is temporarily more verbose
(because it is still using sector-based math), but will later be
switched to track progress by bytes instead of sectors.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We've previously fixed several places where we failed to account
for possible errors from bdrv_nb_sectors(). Fix another one by
making bdrv_dirty_bitmap_truncate() take the new size from the
caller instead of querying itself; then adjust the sole caller
bdrv_truncate() to pass the size just determined by a successful
resize, or to reuse the size given to the original truncate
operation when refresh_total_sectors() was not able to confirm the
actual size (the two sizes can potentially differ according to
rounding constraints), thus avoiding sizing the bitmaps to -1.
This also fixes a bug where not all failure paths in
bdrv_truncate() would set errp.
Note that bdrv_truncate() is still a bit awkward. We may want
to revisit it later and clean up things to better guarantee that
a resize attempt either fails cleanly up front, or cannot fail
after guest-visible changes have been made (if temporary changes
are made, then they need to be cleanly rolled back). But that
is a task for another day; for now, the goal is the bare minimum
fix to ensure that just bdrv_dirty_bitmap_truncate() cannot fail.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We had several functions that no one is currently using, and which
use sector-based interfaces. I'm trying to convert towards byte-based
interfaces, so it's easier to just drop the unused functions:
bdrv_dirty_bitmap_get_meta
bdrv_dirty_bitmap_get_meta_locked
bdrv_dirty_bitmap_reset_meta
bdrv_dirty_bitmap_meta_granularity
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When subdividing a bitmap serialization, the code in hbitmap.c
enforces that start/count parameters are aligned (except that
count can end early at end-of-bitmap). We exposed this required
alignment through bdrv_dirty_bitmap_serialization_align(), but
forgot to actually check that we comply with it.
Fortunately, qcow2 is never dividing bitmap serialization smaller
than one cluster (which is a minimum of 512 bytes); so we are
always compliant with the serialization alignment (which insists
that we partition at least 64 bits per chunk) because we are doing
at least 4k bits per chunk.
Still, it's safer to add an assertion (for the unlikely case that
we'd ever support a cluster smaller than 512 bytes, or if the
hbitmap implementation changes what it considers to be aligned),
rather than leaving bdrv_dirty_bitmap_serialization_align()
without a caller.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The only client of hbitmap_serialization_granularity() is dirty-bitmap's
bdrv_dirty_bitmap_serialization_align(). Keeping the two names consistent
is worthwhile, and the shorter name is more representative of what the
function returns (the required alignment to be used for start/count of
other serialization functions, where violating the alignment causes
assertion failures).
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
All callers of bdrv_img_create() pass in a size, or -1 to read the
size from the backing file. We then set that size as the QemuOpt
default, which means we will reuse that default rather than the
final parameter to qemu_opt_get_size() several lines later. But
it is rather confusing to read subsequent checks of 'size == -1'
when it looks (without seeing the full context) like size defaults
to 0; it also doesn't help that a size of 0 is valid (for some
formats).
Rework the logic to make things more legible.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
s390x changes:
- support for IDA (indirect addressing in ccws) via ccw data stream
- support for extended TOD-Clock (z14 feature)
- various fixes and improvements all over the place
# gpg: Signature made Fri 06 Oct 2017 10:52:22 BST
# gpg: using RSA key 0xDECF6B93C6F02FAF
# gpg: Good signature from "Cornelia Huck <conny@cornelia-huck.de>"
# gpg: aka "Cornelia Huck <huckc@linux.vnet.ibm.com>"
# gpg: aka "Cornelia Huck <cornelia.huck@de.ibm.com>"
# gpg: aka "Cornelia Huck <cohuck@kernel.org>"
# gpg: aka "Cornelia Huck <cohuck@redhat.com>"
# Primary key fingerprint: C3D0 D66D C362 4FF6 A8C0 18CE DECF 6B93 C6F0 2FAF
* remotes/cohuck/tags/s390x-20171006: (33 commits)
hw/s390x: Mark the "sclpquiesce" device with user_creatable = false
s390x/tcg: initialize machine check queue
s390x/sclp: mark sclp-cpu-hotplug as non-usercreatable
s390x/sclp: Mark the sclp device with user_creatable = false
s390/kvm: make TOD setting failures fatal for migration
s390/kvm: Support for get/set of extended TOD-Clock for guest
s390x/css: fix css migration compat handling
s390x: sort some devices into categories
s390x/tcg: make STFL store into the lowcore
s390x: introduce and use S390_MAX_CPUS
target/s390x: get rid of next_core_id
s390x/cpumodel: fix max STFL(E) bit number
s390x: raise CPU hotplug irq after really hotplugged
MAINTAINERS: use KVM s390x maintainers for kvm-stubs.c and kvm_s390x.h
s390x/3270: handle writes of arbitrary length
s390x/3270: IDA support for 3270 via CcwDataStream
Revert "s390x/ccw: create s390 phb conditionally"
s390x/tcg: make idte/ipte use the new _real mmu
s390x/tcg: make testblock use the new _real mmu
s390x/tcg: make stora(g) use the new _real mmu
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The "sclpquiesce" device is just an internal device that should not be
created by the user directly. Though it currently does not seem to cause
any obvious trouble when the user instantiates an additional device, let's
better mark it with user_creatable = false to avoid unexpected behavior,
e.g. because the quiesce notifier gets registered multiple times.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1507193105-15627-1-git-send-email-thuth@redhat.com>
Reviewed-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Just as for external interrupts and I/O interrupts, we need to
initialize mchk_index during cpu reset.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
A TYPE_SCLP_CPU_HOTPLUG device for handling cpu hotplug events
is already created by the sclp event facility. Adding a second
TYPE_SCLP_CPU_HOTPLUG device via -device sclp-cpu-hotplug creates
an ambiguity in raise_irq_cpu_hotplug(), leading to a crash once
a cpu is hotplugged.
To fix this, disallow creating a sclp-cpu-hotplug device manually.
Reviewed-by: Thomas Huth <thuth@redhat.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
The "sclp" device is just an internal device that can not be instantiated
by the users. If they try to use it, they only get a simple error message:
$ qemu-system-s390x -nographic -device sclp
qemu-system-s390x: Option '-device s390-sclp-event-facility' cannot be
handled by this machine
Since sclp_init() tries to create a TYPE_SCLP_EVENT_FACILITY which is
a non-pluggable sysbus device, there is really no way that the "sclp"
device can be used by the user, so let's set the user_creatable = false
accordingly.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1507125199-22562-1-git-send-email-thuth@redhat.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Reviewed-by: Farhan Ali <alifm@linux.vnet.ibm.com>
Acked-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
If we fail to set a proper TOD clock on the target system, this can
already result in some problematic cases. We print several warn messages
on source and target in that case.
If kvm fails to set a nonzero epoch index, then we must ultimately fail
the migration as this will result in a giant time leap backwards. This
patch lets the migration fail if we can not set the guest time on the
target.
On failure the guest will resume normally on the original host machine.
Signed-off-by: Collin L. Walling <walling@linux.vnet.ibm.com>
Reviewed-by: Eric Farman <farman@linux.vnet.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
[split failure change from epoch index change, minor fixups]
Message-Id: <20171004105751.24655-3-borntraeger@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Commit e996583eb3 ("s390x/css: activate ChannelSubSys migration",
2017-07-11) was supposed to enable css migration for virtio-ccw
machines starting 2.10, but it ended up effectively enabling it
only for 2.10 as the registration of the appropriate VMStateDescription
happens in ccw_machine_2_10_instance_options which does not get
called for machines more recent than 2_10.
Let us move the corresponding chunk of code (which conditionally enables
the migration based on the value of the corresponding class property) to
ccw_init, which is called for each virtio-ccw machine instance.
Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reported-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20171004110109.16525-1-pasic@linux.vnet.ibm.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Using virtual memory access is wrong and will soon include low-address
protection checks, which is to be bypassed for STFL.
STFL is a privileged instruction and using LowCore requires
!CONFIG_USER_ONLY, so add the ifdef and move the declaration to the
right place.
This was originally part of a bigger STFL(E) refactoring.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170927170027.8539-4-david@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
core_id is not needed by linux-user, as the core_id a.k.a. CPU address
is only accessible from kernel space.
Therefore, drop next_core_id and make cpu_index get autoassigned again
for linux-user.
While at it, shield core_id and cpuid completely from linux-user. cpuid
can also only be queried from kernel space.
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170928134609.16985-5-david@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Let's move it into the machine, so we trigger the IRQ after setting
ms->possible_cpus (which SCLP uses to construct the list of
online CPUs).
This also fixes a problem reported by Thomas Huth, whereby qemu can be
crashed using the none machine
qemu-s390x-softmmu -M none -monitor stdio
-> device_add qemu-s390-cpu
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170928134609.16985-3-david@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
The problem is, that the current implementation places unrealistic and
arbitrary constraints on the length of writes to the device (that is the
outbound requests), by asserting ccw.count being such that that even the
worst case escaped payload will fit an more or less arbitrary sized
buffer. Actually on protocol level there is nothing to justify such
a limitation.
Another strange thing is the return value which more or less reflects
the size (written) after escaping instead of before escaping. This
is strange, because this return value is used to calculate SCSW.count.
Let us teach 3270 how to deal with arbitrary long writes.
Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Reported-by: Jason J . Herne <jjherne@linux.vnet.ibm.com>
Tested-by: Jason J . Herne <jjherne@linux.vnet.ibm.com>
Message-Id: <20170920172314.102710-3-pasic@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Let us convert the 3270 code so it uses the recently introduced
CcwDataStream abstraction instead of blindly assuming direct data access.
This patch does not change behavior beyond introducing IDA support: for
direct data access CCWs everything stays as-is. (If there are bugs, they
are also preserved).
Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Message-Id: <20170920172314.102710-2-pasic@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
This reverts commit d32bd032d8.
Turns out that old QEMUs always created a pci host bridge
and for many CPU models the migration from old QEMUs to new
QEMUs will fail with
qemu-system-s390x: Unknown savevm section or instance 'PCIBUS' 0
qemu-system-s390x: load of migration failed: Invalid argument
As a quick fix we will revert the commit and always create the
pci host bridge.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
[fixed revert to keep the comment fixup, added a comment in the code]
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Message-Id: <20170928131831.81393-1-borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
This makes it easy to access real addresses (prefix) and in addition
checks for valid memory addresses, which is missing when using e.g.
stl_phys().
We can later reuse it to implement low address protection checks (then
we might even decide to introduce yet another MMU for absolute
addresses, just for handling storage keys and low address protection).
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170926183318.12995-3-david@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
The architecture mandates the addresses to be accessed on the first
indirection level (that is, the data addresses without IDA, and the
(M)IDAW addresses with (M)IDA) to be checked against an CCW format
dependent limit maximum address. If a violation is detected, the storage
access is not to be performed and a channel program check needs to be
generated. As of today, we fail to do this check.
Let us stick even closer to the architecture specification.
Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Message-Id: <20170921180841.24490-5-pasic@linux.vnet.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
env->psa is a 64bit value, while we copy 4 bytes into the save area,
resulting always in 0 getting stored.
Let's try to reduce such errors by using a proper structure. While at
it, use correct cpu->be conversion (and get_psw_mask()), as we will be
reusing this code for TCG soon.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170922140338.6068-1-david@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
The STFLE bits for the MSA (extension) facilities simply indicate that
the respective instructions can be executed. The QUERY subfunction can then
be used to identify which features exactly are available.
Availability of subfunctions can also vary on real hardware. For now, we
simply implement a CPU model without any available subfunctions except
QUERY (which is always around).
As all MSA functions behave quite similarly, we can use one translation
handler for now. Prepare the code for implementation of actual subfunctions.
At least MSA is helpful for now, as older Linux kernels require this
facility when compiled for a z9 model. Allow to enable the facilities
for the qemu cpu model.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170920153016.3858-4-david@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
HMP pull 2017-10-05
# gpg: Signature made Thu 05 Oct 2017 11:50:13 BST
# gpg: using RSA key 0x0516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A 9FA9 0516 331E BC5B FDE7
* remotes/dgilbert/tags/pull-hmp-20171005:
hmp-commands-info: Change "@findex FOO" to "@findex info FOO"
hmp-commands-info: Move Texinfo stanzas to conventional place
hmp-commands-info: Fix "info rocker-FOO" misspellings
hmp: Fix unknown command for subtable
hmp: Missing handle_errors
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Merge qio 2017/10/04 v1
# gpg: Signature made Wed 04 Oct 2017 13:23:04 BST
# gpg: using RSA key 0xBE86EBB415104FDF
# gpg: Good signature from "Daniel P. Berrange <dan@berrange.com>"
# gpg: aka "Daniel P. Berrange <berrange@redhat.com>"
# Primary key fingerprint: DAF3 A6FD B26B 6291 2D0E 8E3F BE86 EBB4 1510 4FDF
* remotes/berrange/tags/pull-qio-2017-10-04-1:
io: add trace events for websockets frame handling
io: Attempt to send websocket close messages to client
io: Reply to ping frames
io: Ignore websocket PING and PONG frames
io: Allow empty websocket payload
io: Add support for fragmented websocket binary frames
io: Small updates in preparation for websocket changes
ui: Always remove an old VNC channel watch before adding a new one
io: use case insensitive check for Connection & Upgrade websock headers
io: include full error message in websocket handshake trace
io: send proper HTTP response for websocket errors
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
It is useful to trace websockets frame encoding/decoding when debugging
problems.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Make a best effort attempt to close websocket connections according to
the RFC. Sends the close message, as room permits in the socket buffer,
and immediately closes the socket.
Signed-off-by: Brandon Carpenter <brandon.carpenter@cypherpath.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add an immediate ping reply (pong) to the outgoing stream when a ping
is received. Unsolicited pongs are ignored.
Signed-off-by: Brandon Carpenter <brandon.carpenter@cypherpath.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Keep pings and gratuitous pongs generated by web browsers from killing
websocket connections.
Signed-off-by: Brandon Carpenter <brandon.carpenter@cypherpath.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Some browsers send pings/pongs with no payload, so allow empty payloads
instead of closing the connection.
Signed-off-by: Brandon Carpenter <brandon.carpenter@cypherpath.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Allows fragmented binary frames by saving the previous opcode. Handles
the case where an intermediary (i.e., web proxy) fragments frames
originally sent unfragmented by the client.
Signed-off-by: Brandon Carpenter <brandon.carpenter@cypherpath.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Gets rid of unnecessary bit shifting and performs proper EOF checking to
avoid a large number of repeated calls to recvmsg() when a client
abruptly terminates a connection (bug fix).
Signed-off-by: Brandon Carpenter <brandon.carpenter@cypherpath.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When checking the value of the Connection and Upgrade HTTP headers
the websock RFC (6455) requires the comparison to be case insensitive.
The Connection value should be an exact match not a substring.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When the websocket handshake fails it is useful to log the real
error message via the trace points for debugging purposes.
Fixes bug: #1715186
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
When any error occurs while processing the websockets handshake,
QEMU just terminates the connection abruptly. This is in violation
of the HTTP specs and does not help the client understand what they
did wrong. This is particularly bad when the client gives the wrong
path, as a "404 Not Found" would be very helpful.
Refactor the handshake code so that it always sends a response to
the client unless there was an I/O error.
Fixes bug: #1715186
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
NVIDIA has defined a specification for creating GPUDirect "cliques",
where devices with the same clique ID support direct peer-to-peer DMA.
When running on bare-metal, tools like NVIDIA's p2pBandwidthLatencyTest
(part of cuda-samples) determine which GPUs can support peer-to-peer
based on chipset and topology. When running in a VM, these tools have
no visibility to the physical hardware support or topology. This
option allows the user to specify hints via a vendor defined
capability. For instance:
<qemu:commandline>
<qemu:arg value='-set'/>
<qemu:arg value='device.hostdev0.x-nv-gpudirect-clique=0'/>
<qemu:arg value='-set'/>
<qemu:arg value='device.hostdev1.x-nv-gpudirect-clique=1'/>
<qemu:arg value='-set'/>
<qemu:arg value='device.hostdev2.x-nv-gpudirect-clique=1'/>
</qemu:commandline>
This enables two cliques. The first is a singleton clique with ID 0,
for the first hostdev defined in the XML (note that since cliques
define peer-to-peer sets, singleton clique offer no benefit). The
subsequent two hostdevs are both added to clique ID 1, indicating
peer-to-peer is possible between these devices.
QEMU only provides validation that the clique ID is valid and applied
to an NVIDIA graphics device, any validation that the resulting
cliques are functional and valid is the user's responsibility. The
NVIDIA specification allows a 4-bit clique ID, thus valid values are
0-15.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
If the hypervisor needs to add purely virtual capabilties, give us a
hook through quirks to do that. Note that we determine the maximum
size for a capability based on the physical device, if we insert a
virtual capability, that can change. Therefore if maximum size is
smaller after added virt capabilities, use that.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
If vfio_add_std_cap() errors then going to out prepends irrelevant
errors for capabilities we haven't attempted to add as we unwind our
recursive stack. Just return error.
Fixes: 7ef165b9a8 ("vfio/pci: Pass an error object to vfio_add_capabilities")
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
After iothread is enabled internally inside QEMU with GMainContext, we
may encounter this warning when destroying the iothread:
(qemu-system-x86_64:19925): GLib-CRITICAL **: g_source_remove_poll:
assertion '!SOURCE_DESTROYED (source)' failed
The problem is that g_source_remove_poll() does not allow to remove one
source from array if the source is detached from its owner
context. (peterx: which IMHO does not make much sense)
Fix it on QEMU side by avoid calling g_source_remove_poll() if we know
the object is during destruction, and we won't leak anything after all
since the array will be gone soon cleanly even with that fd.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-id: 20170928025958.1420-6-peterx@redhat.com
[peterx: write the commit message]
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
When gcontext is used with iothread, the context will be destroyed
during iothread_stop(). That's not good since sometimes we would like
to keep the resources until iothread is destroyed, but we may want to
stop the thread before that point.
Delay the destruction of gcontext to iothread finalize. Then we can do:
iothread_stop(thread);
some_cleanup_on_resources();
iothread_destroy(thread);
We may need this patch if we want to run chardev IOs in iothreads and
hopefully clean them up correctly. For more specific information,
please see 2b316774f6 ("qemu-char: do not operate on sources from
finalize callbacks").
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-id: 20170928025958.1420-5-peterx@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
So that internal iothread users can explicitly stop one iothread without
destroying it.
Since at it, fix iothread_stop() to allow it to be called multiple
times. Before this patch we may call iothread_stop() more than once on
single iothread, while that may not be correct since qemu_thread_join()
is not allowed to run twice. From manual of pthread_join():
Joining with a thread that has previously been joined results in
undefined behavior.
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-id: 20170928025958.1420-4-peterx@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
IOThread is a general framework that contains IO loop environment and a
real thread behind. It's also good to be used internally inside qemu.
Provide some helpers for it to create iothreads to be used internally.
Put all the internal used iothreads into the internal object container.
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-id: 20170928025958.1420-3-peterx@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
# gpg: Signature made Fri 29 Sep 2017 04:17:37 BST
# gpg: using RSA key 0xCA35624C6A9171C6
# gpg: Good signature from "Fam Zheng <famz@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 5003 7CB7 9706 0F76 F021 AD56 CA35 624C 6A91 71C6
* remotes/famz/tags/docker-testing-pull-request:
docker: Don't mount ccache db if NOUSER=1
docker: test-block: Don't continue if build fails
tests/docker/run: don't source /etc/profile
docker: Fix test-mingw
docker: add installation to build tests
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
On a modern server-class ppc host with the following CPU topology:
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0,8,16,24
Off-line CPU(s) list: 1-7,9-15,17-23,25-31
Thread(s) per core: 1
If both KVM PR and KVM HV loaded and we pass:
-machine pseries,accel=kvm,kvm-type=PR -smp 8
We expect QEMU to warn that this exceeds the number of online CPUs:
Warning: Number of SMP cpus requested (8) exceeds the recommended
cpus supported by KVM (4)
Warning: Number of hotpluggable cpus requested (8) exceeds the
recommended cpus supported by KVM (4)
but nothing is printed...
This happens because on ppc the KVM_CAP_NR_VCPUS capability is VM
specific ndreally depends on the KVM type, but we currently use it
as a global capability. And KVM returns a fallback value based on
KVM HV being present. Maybe KVM on POWER shouldn't presume anything
as long as it doesn't have a VM, but in all cases, we should call
KVM_CREATE_VM first and use KVM_CAP_NR_VCPUS as a VM capability.
This patch hence changes kvm_recommended_vcpus() accordingly and
moves the sanity checking of smp_cpus after the VM creation.
It is okay for the other archs that also implement KVM_CAP_NR_VCPUS,
ie, mips, s390, x86 and arm, because they don't depend on the VM
being created or not.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <150600966286.30533.10909862523552370889.stgit@bahia.lan>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
On a server-class ppc host, this capability depends on the KVM type,
ie, HV or PR. If both KVM are present in the kernel, we will always
get the HV specific value, even if we explicitely requested PR on
the command line.
This can have an impact if we're using hugepages or a balloon device.
Since we've already created the VM at the time any user calls
kvm_has_sync_mmu(), switching to kvm_vm_check_extension() is
enough to fix any potential issue.
It is okay for the other archs that also implement KVM_CAP_SYNC_MMU,
ie, mips, s390, x86 and arm, because they don't depend on the VM being
created or not.
While here, let's cache the state of this extension in a bool variable,
since it has several users in the code, as suggested by Thomas Huth.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <150600965332.30533.14702405809647835716.stgit@bahia.lan>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This is a library header, so angle brackets are more appropriate; also
move the line to before QEMU headers, as is recommended in HACKING.
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 20170920085952.3872-1-famz@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Valgrind detects an invalid read operation when hot-plugging of an
USB device fails:
$ valgrind x86_64-softmmu/qemu-system-x86_64 -device usb-ehci -nographic -S
==30598== Memcheck, a memory error detector
==30598== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==30598== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==30598== Command: x86_64-softmmu/qemu-system-x86_64 -device usb-ehci -nographic -S
==30598==
QEMU 2.10.50 monitor - type 'help' for more information
(qemu) device_add usb-tablet
(qemu) device_add usb-tablet
(qemu) device_add usb-tablet
(qemu) device_add usb-tablet
(qemu) device_add usb-tablet
(qemu) device_add usb-tablet
==30598== Invalid read of size 8
==30598== at 0x60EF50: object_unparent (object.c:445)
==30598== by 0x580F0D: usb_try_create_simple (bus.c:346)
==30598== by 0x581BEB: usb_claim_port (bus.c:451)
==30598== by 0x582310: usb_qdev_realize (bus.c:257)
==30598== by 0x4CB399: device_set_realized (qdev.c:914)
==30598== by 0x60E26D: property_set_bool (object.c:1886)
==30598== by 0x61235E: object_property_set_qobject (qom-qobject.c:27)
==30598== by 0x61000F: object_property_set_bool (object.c:1162)
==30598== by 0x4567C3: qdev_device_add (qdev-monitor.c:630)
==30598== by 0x456D52: qmp_device_add (qdev-monitor.c:807)
==30598== by 0x470A99: hmp_device_add (hmp.c:1933)
==30598== by 0x3679C3: handle_hmp_command (monitor.c:3123)
The object_unparent() here is not necessary anymore since commit
69382d8b3e ("qdev: Fix object reference leak in case device.realize()
fails"), so let's remove it now.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-id: 1506526106-30971-1-git-send-email-thuth@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Currently, iothread_stop_all() makes all iothread objects unsafe
to be destroyed, because qemu_thread_join() ends up being called
twice.
To fix this, make iothread_stop() idempotent by checking
thread->stopped.
Fixes the following crash:
qemu-system-x86_64 -object iothread,id=iothread0 -monitor stdio -display none
QEMU 2.10.50 monitor - type 'help' for more information
(qemu) quit
qemu: qemu_thread_join: No such process
Aborted (core dumped)
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20170926130028.12471-1-ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
qemu uses wheel-up/down button events for mouse wheel input, however
linux applications typically want REL_WHEEL events.
This fixes wheel with linux guests. Tested with X11/wayland, and
windows virtio-input driver.
Based on a patch from Marc.
Added property to enable/disable wheel axis.
Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170926113243.26081-1-kraxel@redhat.com
Rename the functions to to say "setup" instead of "create" because they
support being called multiple times on the same egl framebuffer.
Properly delete unused textures, update function interfaces to support
this.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170927115031.12063-1-kraxel@redhat.com
Handle the translation from vga chars to curses chars in curses_update()
instead of console_write_ch(). Purge any curses support bits from
ui/console.h include file.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170927103811.19249-1-kraxel@redhat.com
The usual behaviour of /etc/profile is to set the default PATH for
users. This runs into problems when we have updated PATH in our
dockerfile e.g. to access a cross-compiler in a non-standard
location. It shouldn't be needed anyway as we inherit the env from the
image when it was setup.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
CC: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20170926133622.14991-1-alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Fam Zheng <famz@redhat.com>
Feature "dtc" is explicitly required by test-mingw, but is not detected
by the run script since we switched to archive-source.sh in b7f404201e.
Since it isn't available in the Fedora image which runs this test on
patchew, the way we get dtc is still from submodule.
archive-source.sh takes care of bundling the submodule files already, so
what we need to do is just checking if files are there. Makefile is
chosen because it is one that is unlikely to get renamed in the future.
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20170925082913.22089-1-famz@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Migration pull 2017-09-27
# gpg: Signature made Wed 27 Sep 2017 14:56:23 BST
# gpg: using RSA key 0x0516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A 9FA9 0516 331E BC5B FDE7
* remotes/dgilbert/tags/pull-migration-20170927a:
migration: Route more error paths
migration: Route errors up through vmstate_save
migration: wire vmstate_save_state errors up to vmstate_subsection_save
migration: Check field save returns
migration: check pre_save return in vmstate_save_state
migration: pre_save return int
migration: disable auto-converge during bulk block migration
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
ppc patch queue 2017-09-27
Contains
* a number of Mac machine type fixes
* a number of embedded machine type fixes (preliminary to adding the
Sam460ex board)
* a important fix for handling of migration with KVM PR
* assorted other minor fixes and cleanups
# gpg: Signature made Wed 27 Sep 2017 08:40:48 BST
# gpg: using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-2.11-20170927: (26 commits)
macio: use object link between MACIO_IDE and MAC_DBDMA object
macio: pass channel into MACIOIDEState via qdev property
mac_dbdma: remove DBDMA_init() function
mac_dbdma: QOMify
mac_dbdma: remove unused IO fields from DBDMAState
spapr: fix the value of SDR1 in kvmppc_put_books_sregs()
ppc/pnv: check for OPAL firmware file presence
ppc: remove all unused CPU definitions
ppc: remove unused CPU definitions
spapr_pci: make index property mandatory
macio: convert pmac_ide_ops from old_mmio
ppc/pnv: Improve macro parenthesization
spapr: introduce helpers to migrate HPT chunks and the end marker
ppc/kvm: generalize the use of kvmppc_get_htab_fd()
ppc/kvm: change kvmppc_get_htab_fd() to return -errno on error
ppc: Fix OpenPIC model
ppc/ide/macio: Add missing registers
ppc/mac: More rework of the DBDMA emulation
ppc/mac: Advertise a high clock frequency for NewWorld Macs
ppc: QOMify g3beige machine
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Block layer patches
# gpg: Signature made Tue 26 Sep 2017 14:52:32 BST
# gpg: using RSA key 0x7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* remotes/kevin/tags/for-upstream: (24 commits)
block/qcow2-bitmap: fix use of uninitialized pointer
qemu-iotests: add shrinking image test
qcow2: add shrink image support
qcow2: add qcow2_cache_discard
qemu-img: add --shrink flag for resize
iotests: fix 181: enable postcopy-ram capability on target
qemu-iotests: Test change-backing-file command
block: Fix permissions after bdrv_reopen()
block: reopen: Queue children after their parents
block: Base permissions on rw state after reopen
block: Add reopen queue to bdrv_check_perm()
block: Add reopen_queue to bdrv_child_perm()
qemu-io: Drop write permissions before read-only reopen
block: Clean up some bad code in the vvfat driver
block/throttle-groups.c: allocate RestartData on the heap
throttle: Assert that bkt->max is valid in throttle_compute_wait()
iotests: Print full path of bad output if mismatch
iotests: use virtio aliases for 067
iotests: use -ccw on s390x for 051
iotests: use -ccw on s390x for 040, 139, and 182
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Modify the pre_save method on VMStateDescription to return an int
rather than void so that it potentially can fail.
Changed zillions of devices to make them return 0; the only
case I've made it return non-0 is hw/intc/s390_flic_kvm.c that already
had an error_report/return case.
Note: If you add an error exit in your pre_save you must emit
an error_report to say why.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20170925112917.21340-2-dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
auto-converge and block migration currently do not play well together.
During block migration the auto-converge logic detects that ram
migration makes no progress and thus throttles down the vm until
it nearly stalls completely. Avoid this by disabling the throttling
logic during the bulk phase of the block migration.
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <1506421996-12513-1-git-send-email-pl@kamp.de>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Using a standard QOM object link we can pass a reference to the MAC_DBDMA
controller to the MACIO_IDE object which removes the last external parameter
to macio_ide_register_dma().
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
One of the reasons macio_ide_register_dma() needs to exist is because the
channel id isn't passed into the MACIO_IDE object. Pass in the channel id
using a qdev property to remove this requirement.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Instead we can now instantiate the MAC_DBDMA object directly within the
macio device. We also add the DBDMA device as a child property so that
it is possible to retrieve later.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
These fields were used to manually handle IO requests that weren't aligned
to a sector boundary before this feature was supported by the block API.
Once the block API changed to support byte-aligned IO requests, the macio
controller was switched over to use it in commit be1e343 but these fields
were accidentally left behind. Remove them, including the initialisation
in DBDMA_init().
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When running with KVM PR, if a new HPT is allocated we need to inform
KVM about the HPT address and size. This is currently done by hacking
the value of SDR1 and pushing it to KVM in several places.
Also, migration breaks the guest since it is very unlikely the HPT has
the same address in source and destination, but we push the incoming
value of SDR1 to KVM anyway.
This patch introduces a new virtual hypervisor hook so that the spapr
code can provide the correct value of SDR1 to be pushed to KVM each
time kvmppc_put_books_sregs() is called.
It allows to get rid of all the hacking in the spapr/kvmppc code and
it fixes migration of nested KVM PR.
Suggested-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
and exit before uselessly trying to load it if the file does not
exists.
Issue discovered by Coverity Scan.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Remove *all* unused CPU definitions as indicated by compile-time
`#if 0` constructs.
Signed-off-by: John Snow <jsnow@redhat.com>
[dwg: Removed some additional now-useless comments]
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
PHBs can be created with an index property, in which case the machine
code automatically sets all the MMIO windows at addresses derived from
the index. Alternatively, they can be manually created without index,
but the user has to provide addresses for all MMIO windows.
The non-index way happens to be more trouble than it's worth: it's
difficult to use, keeps requiring (potentially incompatible) changes
when some new parameter needs adding, and is awkward to check for
collisions. It currently even has a bug that prevents to use two
non-index PHBs because their child DRCs are all derived from the
same index == -1 value, and, thus, collide.
This patch hence makes the index property mandatory. As a consequence,
the PHB's memory regions and BUID are now always configured according
to the index, and it is no longer possible to set them from the command
line.
This DOES BREAK backwards compat, but we don't think the non-index
PHB feature was used in practice (at least libvirt doesn't) and the
simplification is worth it.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Although none of the existing macro call-sites were broken,
it's always better to write macros that properly parenthesize
arguments that can be complex expressions, so that the intended
order of operations is not broken.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The use of KVM_PPC_GET_HTAB_FD is open-coded in kvmppc_read_hptes()
and kvmppc_write_hpte().
This patch modifies kvmppc_get_htab_fd() so that it can be used
everywhere we need to access the in-kernel htab:
- add an index argument
=> only kvmppc_read_hptes() passes an actual index, all other users
pass 0
- add an errp argument to propagate error messages to the caller.
=> spapr migration code prints the error
=> hpte helpers pass &error_abort to keep the current behavior
of hw_error()
While here, this also fixes a bug in kvmppc_write_hpte() so that it
opens the htab fd for writing instead of reading as it currently does.
This never broke anything because we currently never call this code,
as explained in the changelog of commit c138593380:
"This support updating htab managed by the hypervisor. Currently
we don't have any user for this feature. This actually bring the
store_hpte interface in-line with the load_hpte one. We may want
to use this when we want to emulate henter hcall in qemu for HV
kvm."
The above is still true today.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
When kvmppc_get_htab_fd() fails, its return value is propagated up to
qemu_savevm_state_iterate() or to qemu_savevm_state_complete_precopy().
All savevm handlers expect to receive a negative errno on error.
Let's patch kvmppc_get_htab_fd() accordingly.
While here, let's change htab_load() in the spapr code to also
propagate the error, since it doesn't make sense to abort() if
we couldn't get the htab fd from KVM.
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The timing register exists on all variants of MacIO IDE, we just
store and return its value.
The interrupts register only exists on KeyLargo but it doesn't
hurt to have it. The lack of this register causes MacOS X to
hangs under some circumstances.
Both are 32-bit only. The HW might support smaller access sizes
but no known OS uses them.
Because the core IDE subsystem doesn't provide us with a way
to query the main (level) interrupt state, nor do we have a way
to know that DBDMA issued a (edge) interrupt, we reflect both
through a private pair of qirq's in order to maintain the
register state.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This completely reworks the handling of the control register
according to my understanding of the HW and the spec.
It should (hopefully ... still testing) fix a number of issues
most notably cases of MacOS hanging.
Also update dbdma_unassigned_rw() and dbdma_unassigned_flush() to
have the expected behaviour now that flush is handled slightly
differently.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
These registers are present in 440 SoCs (and maybe in others too) and
U-Boot accesses them when printing register info. We don't emulate
these but add them to avoid crashing when they are read or written.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The following capabilities are VM specific:
- KVM_CAP_PPC_SMT_POSSIBLE
- KVM_CAP_PPC_HTAB_FD
- KVM_CAP_PPC_ALLOC_HTAB
If both KVM HV and KVM PR are present, checking them always return
the HV value, even if we explicitely requested to use PR.
This has no visible effect for KVM_CAP_PPC_ALLOC_HTAB, because we also
try the KVM_PPC_ALLOCATE_HTAB ioctl which is only suppored by HV. As
a consequence, the spapr code doesn't even check KVM_CAP_PPC_HTAB_FD.
However, this will cause kvmppc_hint_smt_possible(), introduced by
commit fa98fbfcdf, to report several VSMT modes (eg, Available
VSMT modes: 8 4 2 1) whereas PR only support mode 1.
This patch fixes all three anyway to use kvm_vm_check_extension(). It
is okay since the VM is already created at the time kvm_arch_init() or
kvmppc_reset_htab() is called.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
BQL bug fix
# gpg: Signature made Mon 25 Sep 2017 23:14:48 BST
# gpg: using RSA key 0x64DF38E8AF7E215F
# gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A 05C0 64DF 38E8 AF7E 215F
* remotes/rth/tags/pull-tcg-20170925:
accel/tcg/cputlb: avoid recursive BQL (fixes#1706296)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This patch add shrinking of the image file for qcow2. As a result, this allows
us to reduce the virtual image size and free up space on the disk without
copying the image. Image can be fragmented and shrink is done by punching holes
in the image file.
Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20170918124230.8152-4-pbutsykin@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Whenever l2/refcount table clusters are discarded from the file we can
automatically drop unnecessary content of the cache tables. This reduces
the chance of eviction useful cache data and eliminates inconsistent data
in the cache with the data in the file.
Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20170918124230.8152-3-pbutsykin@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
The flag is additional precaution against data loss. Perhaps in the future the
operation shrink without this flag will be blocked for all formats, but for now
we need to maintain compatibility with raw.
Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20170918124230.8152-2-pbutsykin@virtuozzo.com
[mreitz: Added a missing space to a warning]
Signed-off-by: Max Reitz <mreitz@redhat.com>
Migration capabilities should be enabled on both source and
destination qemu processes.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This involves a temporary read-write reopen if the backing file link in
the middle of a backing file chain should be changed and is therefore a
good test for the latest bdrv_reopen() vs. op blockers fixes.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
If we switch between read-only and read-write, the permissions that
image format drivers need on bs->file change, too. Make sure to update
the permissions during bdrv_reopen().
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
We will calculate the required new permissions in the prepare stage of a
reopen. Required permissions of children can be influenced by the
changes made to their parents, but parents are independent from their
children. This means that permissions need to be calculated top-down. In
order to achieve this, queue parents before their children rather than
queuing the children first.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
When new permissions are calculated during bdrv_reopen(), they need to
be based on the state of the graph as it will be after the reopen has
completed, not on the current state of the involved nodes.
This patch makes bdrv_is_writable() optionally accept a BlockReopenQueue
from which the new flags are taken. This is then used for determining
the new bs->file permissions of format drivers as soon as we add the
code to actually pass a non-NULL reopen queue to the .bdrv_child_perm
callbacks.
While moving bdrv_is_writable(), make it static. It isn't used outside
block.c.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
In the context of bdrv_reopen(), we'll have to look at the state of the
graph as it will be after the reopen. This interface addition is in
preparation for the change.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
In the context of bdrv_reopen(), we'll have to look at the state of the
graph as it will be after the reopen. This interface addition is in
preparation for the change.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
qemu-io provides a 'reopen' command that allows switching from writable
to read-only access. We need to make sure that we don't try to keep
write permissions to a BlockBackend that becomes read-only, otherwise
things are going to fail.
This requires a bdrv_drain() call because otherwise in-flight AIO
write requests could issue new internal requests while the permission
has already gone away, which would cause assertion failures. Draining
the queue doesn't break AIO requests in any new way, bdrv_reopen() would
drain it anyway only a few lines later.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Remove the unnecessary home-grown redefinition of the assert() macro here,
and remove the unusable debug code at the end of the checkpoint() function.
The code there uses assert() with side-effects (assignment to the "mapping"
variable), which should be avoided. Looking more closely, it seems as it is
apparently also only usable for one certain directory layout (with a file
named USB.H in it) and thus is of no use for the rest of the world.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
RestartData is the opaque data of the throttle_group_restart_queue_entry
coroutine. By being stack allocated, it isn't available anymore if
aio_co_enter schedules the coroutine with a bottom half and runs after
throttle_group_restart_queue returns.
Cc: qemu-stable@nongnu.org
Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
If bkt->max == 0 and bkt->burst_length > 1 then we could have a
division by 0 in throttle_do_compute_wait(). That configuration is
however not permitted and is already detected by throttle_is_valid(),
but let's assert it in throttle_compute_wait() to make it explicit.
Found by Coverity (CID: 1381016).
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The default cpu model on s390x does not provide zPCI, which is
not yet wired up on tcg. Moreover, virtio-ccw is the standard
on s390x.
Using virtio-scsi will implicitly pick the right device, so just
switch to that for simplicity.
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: QingFeng Hao <haoqf@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The default cpu model on s390x does not provide zPCI, which is
not yet wired up on tcg. Moreover, virtio-ccw is the standard
on s390x, so use the -ccw instead of the -pci versions of virtio
devices on s390x.
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: QingFeng Hao <haoqf@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The default cpu model on s390x does not provide zPCI, which is
not yet wired up on tcg. Moreover, virtio-ccw is the standard
on s390x, so use the -ccw instead of the -pci versions of virtio
devices on s390x.
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: QingFeng Hao <haoqf@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Block driver documentation is available in qemu-doc.html. It would be
convenient to have documentation for formats, protocols, and filter
drivers in a man page.
Extract the relevant part of qemu-doc.html into a new file called
docs/qemu-block-drivers.texi. This file can also be built as a
stand-alone document (man, html, etc).
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
People get surprised when, after "qemu-img create -f raw /dev/sdX", they
still see qcow2 with "qemu-img info", if previously the bdev had a qcow2
header. While this is natural because raw doesn't need to write any
magic bytes during creation, hdev_create is free to clear out the first
sector to make sure the stale qcow2 header doesn't cause such confusion.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
It's not too surprising when a user specifies the backing file relative
to the current working directory instead of the top layer image. This
causes error when they differ. Though the error message has enough
information to infer the fact about the misunderstanding, it is better
if we document this explicitly, so that users don't have to learn from
mistakes.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
A basic set of qemu options is initialised in ./common:
export QEMU_OPTIONS="-nodefaults -machine accel=qtest"
However, two test cases (172 and 186) overwrite QEMU_OPTIONS and neglect
to manually set '-machine accel=qtest'. Add the missing option for 172.
186 probably only copied the code from 172, it doesn't actually need to
overwrite QEMU_OPTIONS, so remove that in 186.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Tested-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Add a firmware path config option to configure. Multiple directories
are accepted, with the usual colon as separator. Default value is
${prefix}/share/qemu-firmware. The path is searched in addition to the
current search path (typically ${prefix}/share/qemu).
This prepares qemu for the planned split of the prebuilt firmware blobs
into a separate project.
Distributions can also use this to get rid of the firmware symlink farm
and add -- for example -- /usr/share/seabios to the firmware path
instead.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170914114236.25343-3-kraxel@redhat.com
Add helper function to add a directory to the qemu search path, so we
don't duplicate the checks. Add a check for duplicate entries, so we
stop trying to open files twice.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170914114236.25343-2-kraxel@redhat.com
QEMU currently aborts if you try to use the device at the command
line:
$ ppc64-softmmu/qemu-system-ppc64 -S -machine prep -device pc87312
Unexpected error in qemu_chr_fe_init() at chardev/char-fe.c:222:
qemu-system-ppc64: -device pc87312: Device 'parallel0' is in use
Aborted (core dumped)
It uses parallel_hds in its realize function, so I can not be
instantiated by the user again.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Hervé Poussineau <hpoussin@reactos.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
This is required to be removed on SmartOS (Illumos).
As of now there are no alternative supported SunOS distributions.
Signed-off-by: Kamil Rytarowski <n54@gmx.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
If QEMU has been compiled with the flags --enable-tcg-interpreter and
--enable-debug, the guest is running incredibly slow. The pxe boot test
can take up to 400 seconds when testing the pseries ppc64 machine. While
we should still look for ways to speed up the test on the pseries machine,
it's better to increase the timeout in this test to 600 seconds anyway to
allow the test to pass successfully now with this unusal configuration
already.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
If 'bs' is a complex expression, we were only casting the front half
rather than the full expression. Luckily, none of the callers were
passing bad arguments, but it's better to be robust up front.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
The virtio-gpu-pci device is already in the display category, so the
virtio-gpu-device should be there, too.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
When using bit-wise operations that exploit the power-of-two
nature of the second argument of ROUND_UP(), we still need to
ensure that the mask is as wide as the first argument (done
by using a ternary to force proper arithmetic promotion).
Unpatched, ROUND_UP(2ULL*1024*1024*1024*1024, 512U) produces 0,
instead of the intended 2TiB, because negation of an unsigned
32-bit quantity followed by widening to 64-bits does not
sign-extend the mask.
Broken since its introduction in commit 292c8e50 (v1.5.0).
Callers that passed the same width type to both macro parameters,
or that had other code to ensure the first parameter's maximum
runtime value did not exceed the second parameter's width, are
unaffected, but I did not audit to see which (if any) existing
clients of the macro could trigger incorrect behavior (I found
the bug while adding a new use of the macro).
While preparing the patch, checkpatch complained about poor
spacing, so I also fixed that here and in the nearby DIV_ROUND_UP.
CC: qemu-trivial@nongnu.org
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Instead of using the hardcoded (MemTxAttrs){0} for no memory attributes
let's use the already defined MEMTXATTRS_UNSPECIFIED macro instead.
This is technically a change of behaviour as MEMTXATTRS_UNSPECIFIED sets
the unspecified field to 1, but it doesn't look like anything is
checking this field.
Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Error process of baum_chr_open needs to set brlapi null, so it won't
get released twice in char_braille_finalize, which will cause
"/usr/bin/qemu-system-x86_64: double free or corruption (!prev)"
Signed-off-by: Liang Yan <lyan@suse.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Remove trailing whitespace in qemu-options documentation, as it causes
reproducibility issues depending on the echo implementation used by
the Makefile.
Reported-By: Vagrant Cascadian <vagrant@debian.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
It may be better to add a trace event to monitor the last moment of
a key event from QEMU to guest VM
Signed-off-by: Liang Yan <lyan@suse.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
This device is private and is created once per aux-bus.
So don't allow the user to create one from command-line.
Reported-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: KONRAD Frederic <frederic.konrad@adacore.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
In qemu-thread-posix.c we have two implementations of the
various qemu_sem_* functions, one of which uses native POSIX
sem_* and the other of which emulates them with pthread conditions.
This is necessary because not all our host OSes support
sem_timedwait().
Instead of a hard-coded list of OSes which don't implement
sem_timedwait(), which gets out of date, make configure
test for the presence of the function and set a new
CONFIG_HAVE_SEM_TIMEDWAIT appropriately.
In particular, newer NetBSDs have sem_timedwait(), so this
commit will switch them over to using it. OSX still does
not have an implementation.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Kamil Rytarowski <n54@gmx.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
This change fixes conflict with the DragonFly BSD headers.
Signed-off-by: Kamil Rytarowski <n54@gmx.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
If we are woken up from while() loop in nbd_read_reply_entry
handles must be equal. If we are woken up from
nbd_recv_coroutines_wake_all s->quit must be true, so we do
not need checking handles equality.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170920124507.18841-3-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
If 'bs' is a complex expression, we were only casting the front half
rather than the full expression. Luckily, none of the callers were
passing bad arguments, but it's better to be robust up front.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170918214649.17550-1-eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
NULL sockets are used for NDP, BOOTP, and other critical operations.
If the topmost mbuf in a NULL session is blocked pending resolution,
it may cause problems if it blocks other packets with a NULL socket.
So do not add mbufs with a NULL socket field to the same session.
Signed-off-by: Kevin Cernekee <cernekee@chromium.org>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
if_output() originally sent one mbuf per call and used the slirp->next_m
variable to keep track of where it left off. But nowadays it tries to
send all of the mbufs from the fastq, and one mbuf from each session on
the batchq. The next_m variable is both redundant and harmful: there is
a case[0] involving delayed packets in which next_m ends up pointing
to &slirp->if_batchq when an active session still exists, and this
blocks all traffic for that session until qemu is restarted.
The test case was created to reproduce a problem that was seen on
long-running Chromium OS VM tests[1] which rapidly create and
destroy ssh connections through hostfwd.
[0] https://pastebin.com/NNy6LreF
[1] https://bugs.chromium.org/p/chromium/issues/detail?id=766323
Signed-off-by: Kevin Cernekee <cernekee@chromium.org>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
e.g.
./x86_64-softmmu/qemu-system-x86_64 -nographic -netdev 'user,id=vnet,hostfwd=:555.0.0.0:0-:22'
qemu-system-x86_64: -netdev user,id=vnet,hostfwd=:555.0.0.0:0-:22: Invalid host forwarding rule ':555.0.0.0:0-:22' (Bad host address)
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
We had a per-chardev cache for context, then we don't need this
parameter to be passed in every time when chr_update_read_handler()
called. As long as we are calling chr_update_read_handler() using
qemu_chr_be_update_read_handlers() we'll be fine.
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1505975754-21555-5-git-send-email-peterx@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
It was only passed in by chr_update_read_handlers(). However when
reconnect, we'll lose that context information. So if a chardev was
running on another context (rather than the default context, the NULL
pointer), it'll switch back to the default context if reconnection
happens. But, it should really stick to the old context.
Convert all the callers of io_add_watch_poll() to use the internally
cached gcontext. Then the context should be able to survive even after
reconnections.
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1505975754-21555-4-git-send-email-peterx@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
It caches the gcontext that is used to poll the chardev IO. Before this
patch, we only passed it in via chr_update_read_handlers(). However
that may not be enough if the char backend is disconnected and
reconnected afterward. There are chardev codes that still assumed the
context be NULL (which is the main context). Will fix that up in
following up patches.
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1505975754-21555-3-git-send-email-peterx@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Proper support of persistent reservation for multipath devices requires
communication with the multipath daemon, so that the reservation is
registered and applied when a path comes up. The device mapper
utilities provide a library to do so; this patch makes qemu-pr-helper.c
detect multipath devices and, when one is found, delegate the operation
to libmpathpersist.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Introduce a privileged helper to run persistent reservation commands.
This lets virtual machines send persistent reservations without using
CAP_SYS_RAWIO or out-of-tree patches. The helper uses Unix permissions
and SCM_RIGHTS to restrict access to processes that can access its socket
and prove that they have an open file descriptor for a raw SCSI device.
The next patch will also correct the usage of persistent reservations
with multipath devices.
It would also be possible to support for Linux's IOC_PR_* ioctls in
the future, to support NVMe devices. For now, however, only SCSI is
supported.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This modification is necessary for userfault fd features which are
required to be requested from userspace.
UFFD_FEATURE_THREAD_ID is a one of such "on demand" feature, which will
be introduced in the next patch.
QEMU have to use separate userfault file descriptor, due to
userfault context has internal state, and after first call of
ioctl UFFD_API it changes its state to UFFD_STATE_RUNNING (in case of
success), but kernel while handling ioctl UFFD_API expects UFFD_STATE_WAIT_API.
So only one ioctl with UFFD_API is possible per ufd.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
That tiny refactoring is necessary to be able to set
UFFD_FEATURE_THREAD_ID while requesting features, and then
to create downtime context in case when kernel supports it.
Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Fill postcopy-able pending only if ram postcopy is enabled.
It is necessary because of there will be other postcopy-able states and
when ram postcopy is disabled, it should not spoil common postcopy
related pending.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Now postcopy-able states are recognized by not NULL
save_live_complete_postcopy handler. But when we have several different
postcopy-able states, it is not convenient. Ram postcopy may be
disabled, while some other postcopy enabled, in this case Ram state
should behave as it is not postcopy-able.
This patch add separate has_postcopy handler to specify behaviour of
savevm state.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Provide helpers to convert bitmaps to little endian format. It can be
used when we want to send one bitmap via network to some other hosts.
One thing to mention is that, these helpers only solve the problem of
endianess, but it does not solve the problem of different word size on
machines (the bitmaps managing same count of bits may contains different
size when malloced). So we need to take care of the size alignment issue
on the callers for now.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Creation of the threads, nothing inside yet.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
--
Use pointers instead of long array names
Move to use semaphores instead of conditions as paolo suggestion
Put all the state inside one struct.
Use a counter for the number of threads created. Needed during cancellation.
Add error return to thread creation
Add id field
Rename functions to multifd_save/load_setup/cleanup
Change recv parameters to a pointer to struct
Change back to a struct
Use Error * for _cleanup
Indicates how many pages we are going to send in each batch to a multifd
thread.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
--
Be consistent with defaults and documentation
Use new DEFINE_PROP_*
Rename x-multifd-group to x-multifd-page-count
Indicates the number of channels that we will create. By default we
create 2 channels.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
--
Catch inconsistent defaults (eric).
Improve comment stating that number of threads is the same than number
of sockets
Use new DEFIN_PROP_*
Rename x-multifd-threads to x-multifd-threads
This function allows us to decide when to close the listener socket.
For now, we only need one connection.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
As this is defined on glib 2.32, add compatibility macros for older glibs.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
We pass the ioc instead of the fd. This will allow us to have more
than one channel open. We also make sure that we set the
from_src_file sooner, so we don't need to pass it as a parameter.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
--
Do not assing mis->from_src_file (peterxu)
# gpg: Signature made Fri 22 Sep 2017 08:28:38 BST
# gpg: using RSA key 0xCA35624C6A9171C6
# gpg: Good signature from "Fam Zheng <famz@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 5003 7CB7 9706 0F76 F021 AD56 CA35 624C 6A91 71C6
* remotes/famz/tags/build-and-test-automation-pull-request: (36 commits)
docker: Drop 'set -e' from run script
docker: Use archive-source.py
tests: Add README for vm tests
MAINTAINERS: Add tests/vm entry
Makefile: Add rules to run vm tests
tests: Add OpenBSD image
tests: Add NetBSD image
tests: Add FreeBSD image
tests: Add ubuntu.i386 image
tests: Add vm test lib
tests: Add a test key pair
scripts: Add archive-source.sh
qemu.py: Add "wait()" method
gitignore: Ignore vm test images
MAINTAINERS: Fix subsystem name for "Build and test automation"
buildsys: Move rdma libs to per object
buildsys: Move brlapi libs to per object
buildsys: Move usb redir cflags/libs to per object
buildsys: Move libusb cflags/libs to per object
buildsys: Move libcacard cflags/libs to per object
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The only prototype doesn't need anything from the lib header, and not
including it here allows files that include this header, for example
vl.c, to compile without the libseccomp cflags.
The breakage is since c3883e1f93 for environments where `pkg-config
--cflags libseccomp" is non-empty.
Reported-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Acked-by: Eduardo Otubo <otubo@redhat.com>
Message-id: 20170920083647.14599-1-famz@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The migration interface for ais was introduced with kernel 4.13
but the capability itself had been active since 4.12. As migration
support is considered necessary lets disable ais in the 2.10
stable version. A proper fix and re-enablement will be done
for qemu 2.11.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Message-Id: <20170921140834.14233-2-borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
This is the common code to implement a "VM test" to
1) Download and initialize a pre-defined VM that has necessary
dependencies to build QEMU and SSH access.
2) Archive $SRC_PATH to a .tar file.
3) Boot the VM, and pass the source tar file to the guest.
4) SSH into the VM, untar the source tarball, build from the source.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
The subsystem name for the "Build test automation" section is
"-------------------------", because an actual subsystem name
line is missing:
$ ./scripts/get_maintainer.pl -f tests/docker/docker.py
"Alex Bennée" <alex.bennee@linaro.org> (maintainer:-----------------...)
Fam Zheng <famz@redhat.com> (maintainer:-----------------...)
"Philippe Mathieu-Daudé" <f4bug@amsat.org> (reviewer:-----------------...)
qemu-devel@nongnu.org (open list:-----------------...)
Fix the issue by inserting a subsystem name line where
get_maintainer.pl expects it.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20170921170209.9101-1-ehabkost@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
The 'run' script already creats src, build and install directories under
$TEST_DIR, use it in common.rc.
Also the tests always run from $QEMU_SRC/tests/docker, so use a relative
$CMD string.
Message-Id: <20170817035721.11064-1-famz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
If you invoke with NOCACHE=1 we pass --no-cache in the argv to
docker.py but may still not force a rebuild if the dockerfile checksum
hasn't changed. By testing for its presence we can force builds
without having to manually remove the docker image.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20170725133425.436-5-alex.bennee@linaro.org>
Signed-off-by: Fam Zheng <famz@redhat.com>
It is a common requirement for virtual machine to send persistent
reservations, but this currently requires either running QEMU with
CAP_SYS_RAWIO, or using out-of-tree patches that let an unprivileged
QEMU bypass Linux's filter on SG_IO commands.
As an alternative mechanism, the next patches will introduce a
privileged helper to run persistent reservation commands without
expanding QEMU's attack surface unnecessarily.
The helper is invoked through a "pr-manager" QOM object, to which
file-posix.c passes SG_IO requests for PERSISTENT RESERVE OUT and
PERSISTENT RESERVE IN commands. For example:
$ qemu-system-x86_64
-device virtio-scsi \
-object pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock
-drive if=none,id=hd,driver=raw,file.filename=/dev/sdb,file.pr-manager=helper0
-device scsi-block,drive=hd
or:
$ qemu-system-x86_64
-device virtio-scsi \
-object pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock
-blockdev node-name=hd,driver=raw,file.driver=host_device,file.filename=/dev/sdb,file.pr-manager=helper0
-device scsi-block,drive=hd
Multiple pr-manager implementations are conceivable and possible, though
only one is implemented right now. For example, a pr-manager could:
- talk directly to the multipath daemon from a privileged QEMU
(i.e. QEMU links to libmpathpersist); this makes reservation work
properly with multipath, but still requires CAP_SYS_RAWIO
- use the Linux IOC_PR_* ioctls (they require CAP_SYS_ADMIN though)
- more interestingly, implement reservations directly in QEMU
through file system locks or a shared database (e.g. sqlite)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This shares an cached empty FlatView among address spaces. The empty
FV is used every time when a root MR renders into a FV without memory
sections which happens when MR or its children are not enabled or
zero-sized. The empty_view is not NULL to keep the rest of memory
API intact; it also has a dispatch tree for the same reason.
On POWER8 with 255 CPUs, 255 virtio-net, 40 PCI bridges guest this halves
the amount of FlatView's in use (557 -> 260) and dispatch tables
(~800000 -> ~370000). In an unrelated experiment with 112 non-virtio
devices on x86 ("-M pc"), only 4 FlatViews are alive, and about ~2000
are created at startup.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-16-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
A container can be used instead of an alias to allow switching between
multiple subregions. In this case we cannot directly share the
subregions (since they only belong to a single parent), but if the
subregions are aliases we can in turn walk those.
This is not enough to remove all source of quadratic FlatView creation,
but it enables sharing of the PCI bus master FlatViews (and their
AddressSpaceDispatch structures) across all PCI devices. For 112
virtio-net-pci devices, boot time is reduced from 25 to 10 seconds and
memory consumption from 1.4 to 1 G.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This avoids usual memory_region_transaction_commit() which rebuilds
all FVs.
On POWER8 with 255 CPUs, 255 virtio-net, 40 PCI bridges guest this brings
down the boot time from 25s to 20s and reduces the amount of temporary FVs
allocated during machine constructon (~800000 -> ~640000) and amount of
temporary dispatch trees (~370000 -> ~300000), the total memory footprint
goes down (18G -> 17G).
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-18-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Since FlatViews are shared now and ASes not, this gets rid of
address_space_init_shareable().
This should cause no behavioural change.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-17-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This adds a new "-d" switch to "info mtree" to print dispatch tree
internals.
This changes the way "-f" is handled - it prints now flat views and
associated address spaces.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-15-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This allows sharing flat views between address spaces (AS) when
the same root memory region is used when creating a new address space.
This is done by walking through all ASes and caching one FlatView per
a physical root MR (i.e. not aliased).
This removes search for duplicates from address_space_init_shareable() as
FlatViews are shared elsewhere and keeping as::ref_count correct seems
an unnecessary and useless complication.
This should cause no change and memory use or boot time yet.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-13-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Address spaces get to keep a root MR (alias or not) but FlatView stores
the actual MR as this is going to be used later on to decide whether to
share a particular FlatView or not.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-10-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We store AddressSpaceDispatch* in FlatView anyway so there is no need
to carry it from mem_add() to register_subpage/register_multipage.
This should cause no behavioural change.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-8-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
AS in ASD is only used to pass AS from mem_begin() to register_subpage()
to store it in MemoryRegionSection, we can do this directly now.
This should cause no behavioural change.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-6-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
As we are going to share FlatView's between AddressSpace's,
and AddressSpaceDispatch is a structure to perform quick lookup
in FlatView, this moves ASD to FlatView.
After previosly open coded ASD rendering, we can also remove
as->next_dispatch as the new FlatView pointer is stored
on a stack and set to an AS atomically.
flatview_destroy() is executed under RCU instead of
address_space_dispatch_free() now.
This makes mem_begin/mem_commit to work with ASD and mem_add with FV
as later on mem_add will be taking FV as an argument anyway.
This should cause no behavioural change.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-5-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This moves a FlatView allocation and initialization to a helper.
While we are nere, replace g_new with g_new0 to not to bother if we add
new fields in the future.
This should cause no behavioural change.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-4-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We are going to share FlatView's between AddressSpace's and per-AS
memory listeners won't suit the purpose anymore so open code
the dispatch tree rendering.
Since there is a good chance that dispatch_listener was the only
listener, this avoids address_space_update_topology_pass() if there is
no registered listeners; this should improve starting time.
This should cause no behavioural change.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-3-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This adds an AS** parameter to address_space_do_translate()
to make it easier for the next patch to share FlatViews.
This should cause no behavioural change.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-2-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
It's possible for address_space_get_flatview() as it currently stands
to cause a use-after-free for the returned FlatView, if the reference
count is incremented after the FlatView has been replaced by a writer:
thread 1 thread 2 RCU thread
-------------------------------------------------------------
rcu_read_lock
read as->current_map
set as->current_map
flatview_unref
'--> call_rcu
flatview_ref
[ref=1]
rcu_read_unlock
flatview_destroy
<badness>
Since FlatViews are not updated very often, we can just detect the
situation using a new atomic op atomic_fetch_inc_nonzero, similar to
Linux's atomic_inc_not_zero, which performs the refcount increment only if
it hasn't already hit zero. This is similar to Linux commit de09a9771a53
("CRED: Fix get_task_cred() and task_state() to not resurrect dead
credentials", 2010-07-29).
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
target-arm queue:
* more preparatory work for v8M support
* convert some omap devices away from old_mmio
* remove out of date ARM ARM section references in comments
* add the Smartfusion2 board
# gpg: Signature made Thu 21 Sep 2017 17:40:40 BST
# gpg: using RSA key 0x3C2525ED14360CDE
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>"
# gpg: aka "Peter Maydell <pmaydell@gmail.com>"
# gpg: aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>"
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83 15CF 3C25 25ED 1436 0CDE
* remotes/pmaydell/tags/pull-target-arm-20170921: (31 commits)
msf2: Add Emcraft's Smartfusion2 SOM kit
msf2: Add Smartfusion2 SoC
msf2: Add Smartfusion2 SPI controller
msf2: Microsemi Smartfusion2 System Register block
msf2: Add Smartfusion2 System timer
hw/arm/omap2.c: Don't use old_mmio
hw/i2c/omap_i2c.c: Don't use old_mmio
hw/timer/omap_gptimer: Don't use old_mmio
hw/timer/omap_synctimer.c: Don't use old_mmio
hw/gpio/omap_gpio.c: Don't use old_mmio
hw/arm/palm.c: Don't use old_mmio for static_ops
target/arm: Remove out of date ARM ARM section references in A64 decoder
nvic: Support banked exceptions in acknowledge and complete
nvic: Make SHCSR banked for v8M
nvic: Make ICSR banked for v8M
target/arm: Handle banking in negative-execution-priority check in cpu_mmu_index()
nvic: Handle v8M changes in nvic_exec_prio()
nvic: Disable the non-secure HardFault if AIRCR.BFHFNMINS is clear
nvic: Implement v8M changes to fixed priority exceptions
nvic: In escalation to HardFault, support HF not being priority -1
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
In the A64 decoder, we have a lot of references to section numbers
from version A.a of the v8A ARM ARM (DDI0487). This version of the
document is now long obsolete (we are currently on revision B.a),
and various intervening versions renumbered all the sections.
The most recent B.a version of the document doesn't assign
section numbers at all to the individual instruction classes
in the way that the various A.x versions did. The simplest thing
to do is just to delete all the out of date C.x.x references.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20170915150849.23557-1-peter.maydell@linaro.org
Update armv7m_nvic_acknowledge_irq() and armv7m_nvic_complete_irq()
to handle banked exceptions:
* acknowledge needs to use the correct vector, which may be
in sec_vectors[]
* acknowledge needs to return to its caller whether the
exception should be taken to secure or non-secure state
* complete needs its caller to tell it whether the exception
being completed is a secure one or not
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505240046-11454-20-git-send-email-peter.maydell@linaro.org
Now that we have a banked FAULTMASK register and banked exceptions,
we can implement the correct check in cpu_mmu_index() for whether
the MPU_CTRL.HFNMIENA bit's effect should apply. This bit causes
handlers which have requested a negative execution priority to run
with the MPU disabled. In v8M the test has to check this for the
current security state and so takes account of banking.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505240046-11454-17-git-send-email-peter.maydell@linaro.org
Update nvic_exec_prio() to support the v8M changes:
* BASEPRI, FAULTMASK and PRIMASK are all banked
* AIRCR.PRIS can affect NS priorities
* AIRCR.BFHFNMINS affects FAULTMASK behaviour
These changes mean that it's no longer possible to
definitely say that if FAULTMASK is set it overrides
PRIMASK, and if PRIMASK is set it overrides BASEPRI
(since if PRIMASK_NS is set and AIRCR.PRIS is set then
whether that 0x80 priority should take effect or the
priority in BASEPRI_S depends on the value of BASEPRI_S,
for instance). So we switch to the same approach used
by the pseudocode of working through BASEPRI, PRIMASK
and FAULTMASK and overriding the previous values if
needed.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505240046-11454-16-git-send-email-peter.maydell@linaro.org
If AIRCR.BFHFNMINS is clear, then although NonSecure HardFault
can still be pended via SHCSR.HARDFAULTPENDED it mustn't actually
preempt execution. The simple way to achieve this is to clear the
enable bit for it, since the enable bit isn't guest visible.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505240046-11454-15-git-send-email-peter.maydell@linaro.org
In v7M, the fixed-priority exceptions are:
Reset: -3
NMI: -2
HardFault: -1
In v8M, this changes because Secure HardFault may need
to be prioritised above NMI:
Reset: -4
Secure HardFault if AIRCR.BFHFNMINS == 1: -3
NMI: -2
Secure HardFault if AIRCR.BFHFNMINS == 0: -1
NonSecure HardFault: -1
Make these changes, including support for changing the
priority of Secure HardFault as AIRCR.BFHFNMINS changes.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505240046-11454-14-git-send-email-peter.maydell@linaro.org
When escalating to HardFault, we must go into Lockup if we
can't take the synchronous HardFault because the current
execution priority is already at or below the priority of
HardFault. In v7M HF is always priority -1 so a simple < 0
comparison sufficed; in v8M the priority of HardFault can
vary depending on whether it is a Secure or NonSecure
HardFault, so we must check against the priority of the
HardFault exception vector we're about to use.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505240046-11454-13-git-send-email-peter.maydell@linaro.org
In armv7m_nvic_set_pending() we have to compare the
priority of an exception against the execution priority
to decide whether it needs to be escalated to HardFault.
In the specification this is a comparison against the
exception's group priority; for v7M we implemented it
as a comparison against the raw exception priority
because the two comparisons will always give the
same answer. For v8M the existence of AIRCR.PRIS and
the possibility of different PRIGROUP values for secure
and nonsecure exceptions means we need to explicitly
calculate the vector's group priority for this check.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505240046-11454-12-git-send-email-peter.maydell@linaro.org
Make the armv7m_nvic_set_pending() and armv7m_nvic_clear_pending()
functions take a bool indicating whether to pend the secure
or non-secure version of a banked interrupt, and update the
callsites accordingly.
In most callsites we can simply pass the correct security
state in; in a couple of cases we use TODO comments to indicate
that we will return the code in a subsequent commit.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505240046-11454-10-git-send-email-peter.maydell@linaro.org
For v8M, the NVIC has a new set of registers per interrupt,
NVIC_ITNS<n>. These determine whether the interrupt targets Secure
or Non-secure state. Implement the register read/write code for
these, and make them cause NVIC_IABR, NVIC_ICER, NVIC_ISER,
NVIC_ICPR, NVIC_IPR and NVIC_ISPR to RAZ/WI for non-secure
accesses to fields corresponding to interrupts which are
configured to target secure state.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505240046-11454-8-git-send-email-peter.maydell@linaro.org
The Application Interrupt and Reset Control Register has some changes
for v8M:
* new bits SYSRESETREQS, BFHFNMINS and PRIS: these all have
real state if the security extension is implemented and otherwise
are constant
* the PRIGROUP field is banked between security states
* non-secure code can be blocked from using the SYSRESET bit
to reset the system if SYSRESETREQS is set
Implement the new state and the changes to register read and write.
For the moment we ignore the effects of the secure PRIGROUP.
We will implement the effects of PRIS and BFHFNMIS later.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505240046-11454-6-git-send-email-peter.maydell@linaro.org
Instead of looking up the pending priority
in nvic_pending_prio(), cache it in a new state struct
field. The calculation of the pending priority given
the interrupt number is more complicated in v8M with
the security extension, so the caching will be worthwhile.
This changes nvic_pending_prio() from returning a full
(group + subpriority) priority value to returning a group
priority. This doesn't require changes to its callsites
because we use it only in comparisons of the form
execution_prio > nvic_pending_prio()
and execution priority is always a group priority, so
a test (exec prio > full prio) is true if and only if
(execprio > group_prio).
(Architecturally the expected comparison is with the
group priority for this sort of "would we preempt" test;
we were only doing a test with a full priority as an
optimisation to avoid the mask, which is possible
precisely because the two comparisons always give the
same answer.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505240046-11454-5-git-send-email-peter.maydell@linaro.org
With banked exceptions, just the exception number in
s->vectpending is no longer sufficient to uniquely identify
the pending exception. Add a vectpending_is_s_banked bool
which is true if the exception is using the sec_vectors[]
array.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1505240046-11454-4-git-send-email-peter.maydell@linaro.org
For the v8M security extension, some exceptions must be banked
between security states. Add the new vecinfo array which holds
the state for the banked exceptions and migrate it if the
CPU the NVIC is attached to implements the security extension.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
In v8M the MSR and MRS instructions have extra register value
encodings to allow secure code to access the non-secure banked
version of various special registers.
(We don't implement the MSPLIM_NS or PSPLIM_NS aliases, because
we don't currently implement the stack limit registers at all.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505240046-11454-2-git-send-email-peter.maydell@linaro.org
This avoids a name clash with the access macro on windows 64:
make
CHK version_gen.h
CC aarch64-softmmu/memory.o
/home/konrad/qemu/memory.c: In function 'access_with_adjusted_size':
/home/konrad/qemu/memory.c:591:73: error: macro "access" passed 7 arguments, \
but takes just 2
(size - access_size - i) * 8, access_mask, attrs);
^
Signed-off-by: KONRAD Frederic <frederic.konrad@adacore.com>
Message-Id: <1505988260-8483-1-git-send-email-frederic.konrad@adacore.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
pflash toggles mr->romd_mode. So this assert does not always hold.
1) a device was added with !mr->romd_mode, therefore effectively not
creating a kvm slot as we want to trap every access (add = false).
2) mr->romd_mode was toggled on before remove it. There is now
actually no slot to remove and the assert is wrong.
So let's just drop the assert.
Reported-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170920145025.19403-1-david@redhat.com>
Tested-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We should guarantee that RAM will not be modified while VM has a stopped
state, otherwise it can lead to negative consequences during post-copy
migration. In RUN_STATE_FINISH_MIGRATE step, it's expected that RAM on
source side will not be modified as this could lead to non-consistent vm state
on the destination side. Also RAM access during postcopy-ram migration with
enabled release-ram capability can lead to sad consequences.
Let's add enable_backend() callback to avoid undesirable virtioqueue changes
in the guest memory.
Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Message-Id: <20170919120733.22020-1-pbutsykin@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-21 11:51:49 +02:00
407 changed files with 14776 additions and 4884 deletions
@@ -167,16 +167,17 @@ gicv3_redist_set_irq(uint32_t cpu, int irq, int level) "GICv3 redistributor 0x%x
gicv3_redist_send_sgi(uint32_t cpu, int irq) "GICv3 redistributor 0x%x pending SGI %d"
# hw/intc/armv7m_nvic.c
nvic_recompute_state(int vectpending, int exception_prio) "NVIC state recomputed: vectpending %d exception_prio %d"
nvic_set_prio(int irq, uint8_t prio) "NVIC set irq %d priority %d"
nvic_recompute_state(int vectpending, int vectpending_prio, int exception_prio) "NVIC state recomputed: vectpending %d vectpending_prio %d exception_prio %d"
nvic_recompute_state_secure(int vectpending, bool vectpending_is_s_banked, int vectpending_prio, int exception_prio) "NVIC state recomputed: vectpending %d is_s_banked %d vectpending_prio %d exception_prio %d"
nvic_irq_update(int vectpending, int pendprio, int exception_prio, int level) "NVIC vectpending %d pending prio %d exception_prio %d: setting irq line to %d"
nvic_escalate_prio(int irq, int irqprio, int runprio) "NVIC escalating irq %d to HardFault: insufficient priority %d >= %d"
nvic_escalate_disabled(int irq) "NVIC escalating irq %d to HardFault: disabled"
nvic_set_pending(int irq, int en, int prio) "NVIC set pending irq %d (enabled: %d priority %d)"
nvic_clear_pending(int irq, int en, int prio) "NVIC clear pending irq %d (enabled: %d priority %d)"
nvic_set_pending(int irq, bool secure, int en, int prio) "NVIC set pending irq %d secure-bank %d (enabled: %d priority %d)"
nvic_clear_pending(int irq, bool secure, int en, int prio) "NVIC clear pending irq %d secure-bank %d (enabled: %d priority %d)"
nvic_set_pending_level(int irq) "NVIC set pending: irq %d higher prio than vectpending: setting irq line to 1"
nvic_acknowledge_irq(int irq, int prio) "NVIC acknowledge IRQ: %d now active (prio %d)"
nvic_complete_irq(int irq) "NVIC complete IRQ %d"
nvic_acknowledge_irq(int irq, int prio, bool targets_secure) "NVIC acknowledge IRQ: %d now active (prio %d targets_secure %d)"
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.