Compare commits

...

410 Commits

Author SHA1 Message Date
Todd Eisenberger
e0dd5fd41a x86: Correct translation of some rdgsbase and wrgsbase encodings
It looks like there was a transcription error when writing this code
initially.  The code previously only decoded src or dst of rax.  This
resolves
https://bugs.launchpad.net/qemu/+bug/1719984.

Signed-off-by: Todd Eisenberger <teisenbe@google.com>
Message-Id: <CAP26EVRNVb=Mq=O3s51w7fDhGVmf-e3XFFA73MRzc5b4qKBA4g@mail.gmail.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-10-09 23:29:20 -03:00
Seeteena Thoufeek
c0dd109919 vl: exit if maxcpus is negative
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>

---Steps to Reproduce---

When passed a negative number to 'maxcpus' parameter, Qemu aborts
with a core dump.

Run the following command with maxcpus argument as negative number

ppc64-softmmu/qemu-system-ppc64 --nographic -vga none -machine
pseries,accel=kvm,kvm-type=HV -m size=200g -device virtio-blk-pci,
drive=rootdisk -drive file=/home/images/pegas-1.0-ppc64le.qcow2,
if=none,cache=none,id=rootdisk,format=qcow2 -monitor telnet
:127.0.0.1:1234,server,nowait -net nic,model=virtio -net
user -redir tcp:2000::22 -device nec-usb-xhci -smp 8,cores=1,
threads=1,maxcpus=-12

(process:12149): GLib-ERROR **: gmem.c:130: failed to allocate
 18446744073709550568 bytes

Trace/breakpoint trap

Reported-by: R.Nageswara Sastry <rnsastry@linux.vnet.ibm.com>
Signed-off-by: Seeteena Thoufeek <s1seetee@linux.vnet.ibm.com>
Message-Id: <1504511031-26834-1-git-send-email-s1seetee@linux.vnet.ibm.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2017-10-09 23:21:52 -03:00
Igor Mammedov
31b9352192 qom: update doc comment for type_register[_static]()
type_register()/type_register_static() functions in current impl.
can't fail returning 0, also none of the users check for error
so update doc comment to reflect current behaviour.

Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1507111682-66171-2-git-send-email-imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-10-09 23:21:52 -03:00
Eduardo Habkost
e5766d6ec7 config: qemu_config_parse() return number of config groups
Change qemu_config_parse() to return the number of config groups
in success and -EINVAL on error. This will allow callers of
qemu_config_parse() to check if something was really loaded from
the config file.

All existing callers of qemu_config_parse() and
qemu_read_config_file() only check if the return value was
negative, so the change shouldn't affect them.

Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20171004025043.3788-2-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-10-09 23:21:52 -03:00
Eduardo Habkost
3478eae990 qemu-options: Deprecate -nodefconfig
Since 2012 (commit ba6212d8 "Eliminate cpus-x86_64.conf file") we
have no default config files that would be disabled using
-nodefconfig.  Update documentation and document -nodefconfig as
deprecated.

Cc: Markus Armbruster <armbru@redhat.com>
Acked-by: Alistair Francis <alistair.francis@xilinx.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20171004030025.7866-3-ehabkost@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-10-09 23:21:52 -03:00
Eduardo Habkost
1ea06c398c vl: Eliminate defconfig variable
Both -nodefconfig and -no-user-config options do the same thing
today, we only need one variable to keep track of them.

Suggested-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Alistair Francis <alistair.francis@xilinx.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20171004030025.7866-2-ehabkost@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-10-09 23:21:52 -03:00
Alistair Francis
c9cf636d48 machine: Add a valid_cpu_types property
This patch add a MachineClass element that can be set in the machine C
code to specify a list of supported CPU types. If the supported CPU
types are specified the user enter CPU (by -cpu at runtime) is checked
against the supported types and QEMU exits if they aren't supported.

Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Message-Id: <b8474e9d2e0a219d9bac901342f983b13d009301.1507059418.git.alistair.francis@xilinx.com>
[ehabkost: removed assert(), rewrote comment]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-10-09 23:21:52 -03:00
Philippe Mathieu-Daudé
8301ea444a qom/cpu: move cpu_model null check to cpu_class_by_name()
and clean every implementation.

Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20170917232842.14544-1-f4bug@amsat.org>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Artyom Tarasenko <atar4qemu@gmail.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-10-09 23:21:52 -03:00
Peter Maydell
530049bc1d Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches

# gpg: Signature made Fri 06 Oct 2017 16:52:59 BST
# gpg:                using RSA key 0x7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream: (54 commits)
  block/mirror: check backing in bdrv_mirror_top_flush
  qcow2: truncate the tail of the image file after shrinking the image
  qcow2: fix return error code in qcow2_truncate()
  iotests: Fix 195 if IMGFMT is part of TEST_DIR
  block/mirror: check backing in bdrv_mirror_top_refresh_filename
  block: support passthrough of BDRV_REQ_FUA in crypto driver
  block: convert qcrypto_block_encrypt|decrypt to take bytes offset
  block: convert crypto driver to bdrv_co_preadv|pwritev
  block: fix data type casting for crypto payload offset
  crypto: expose encryption sector size in APIs
  block: use 1 MB bounce buffers for crypto instead of 16KB
  iotests: Add test 197 for covering copy-on-read
  block: Perform copy-on-read in loop
  block: Add blkdebug hook for copy-on-read
  iotests: Restore stty settings on completion
  block: Uniform handling of 0-length bdrv_get_block_status()
  qemu-io: Add -C for opening with copy-on-read
  commit: Remove overlay_bs
  qemu-iotests: Test commit block job where top has two parents
  qemu-iotests: Allow QMP pretty printing in common.qemu
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-10-06 17:43:02 +01:00
Peter Maydell
5121d81e38 Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20171006' into staging
target-arm:
 * v8M: more preparatory work
 * nvic: reset properly rather than leaving the nvic in a weird state
 * xlnx-zynqmp: Mark the "xlnx, zynqmp" device with user_creatable = false
 * sd: fix out-of-bounds check for multi block reads
 * arm: Fix SMC reporting to EL2 when QEMU provides PSCI

# gpg: Signature made Fri 06 Oct 2017 16:58:15 BST
# gpg:                using RSA key 0x3C2525ED14360CDE
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>"
# gpg:                 aka "Peter Maydell <pmaydell@gmail.com>"
# gpg:                 aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>"
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83  15CF 3C25 25ED 1436 0CDE

* remotes/pmaydell/tags/pull-target-arm-20171006:
  nvic: Add missing code for writing SHCSR.HARDFAULTPENDED bit
  target/arm: Factor out "get mmuidx for specified security state"
  target/arm: Fix calculation of secure mm_idx values
  target/arm: Implement security attribute lookups for memory accesses
  nvic: Implement Security Attribution Unit registers
  target/arm: Add v8M support to exception entry code
  target/arm: Add support for restoring v8M additional state context
  target/arm: Update excret sanity checks for v8M
  target/arm: Add new-in-v8M SFSR and SFAR
  target/arm: Don't warn about exception return with PC low bit set for v8M
  target/arm: Warn about restoring to unaligned stack
  target/arm: Check for xPSR mismatch usage faults earlier for v8M
  target/arm: Restore SPSEL to correct CONTROL register on exception return
  target/arm: Restore security state on exception return
  target/arm: Prepare for CONTROL.SPSEL being nonzero in Handler mode
  target/arm: Don't switch to target stack early in v7M exception return
  nvic: Clear the vector arrays and prigroup on reset
  hw/arm/xlnx-zynqmp: Mark the "xlnx, zynqmp" device with user_creatable = false
  hw/sd: fix out-of-bounds check for multi block reads
  arm: Fix SMC reporting to EL2 when QEMU provides PSCI

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-10-06 17:00:42 +01:00
Peter Maydell
04829ce334 nvic: Add missing code for writing SHCSR.HARDFAULTPENDED bit
When we added support for the new SHCSR bits in v8M in commit
437d59c17e the code to support writing to the new HARDFAULTPENDED
bit was accidentally only added for non-secure writes; the
secure banked version of the bit should also be writable.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-21-git-send-email-peter.maydell@linaro.org
2017-10-06 16:46:49 +01:00
Peter Maydell
b81ac0eb63 target/arm: Factor out "get mmuidx for specified security state"
For the SG instruction and secure function return we are going
to want to do memory accesses using the MMU index of the CPU
in secure state, even though the CPU is currently in non-secure
state. Write arm_v7m_mmu_idx_for_secstate() to do this job,
and use it in cpu_mmu_index().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-17-git-send-email-peter.maydell@linaro.org
2017-10-06 16:46:49 +01:00
Peter Maydell
fe768788d2 target/arm: Fix calculation of secure mm_idx values
In cpu_mmu_index() we try to do this:
        if (env->v7m.secure) {
            mmu_idx += ARMMMUIdx_MSUser;
        }
but it will give the wrong answer, because ARMMMUIdx_MSUser
includes the 0x40 ARM_MMU_IDX_M field, and so does the
mmu_idx we're adding to, and we'll end up with 0x8n rather
than 0x4n. This error is then nullified by the call to
arm_to_core_mmu_idx() which masks out the high part, but
we're about to factor out the code that calculates the
ARMMMUIdx values so it can be used without passing it through
arm_to_core_mmu_idx(), so fix this bug first.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-16-git-send-email-peter.maydell@linaro.org
2017-10-06 16:46:49 +01:00
Peter Maydell
35337cc391 target/arm: Implement security attribute lookups for memory accesses
Implement the security attribute lookups for memory accesses
in the get_phys_addr() functions, causing these to generate
various kinds of SecureFault for bad accesses.

The major subtlety in this code relates to handling of the
case when the security attributes the SAU assigns to the
address don't match the current security state of the CPU.

In the ARM ARM pseudocode for validating instruction
accesses, the security attributes of the address determine
whether the Secure or NonSecure MPU state is used. At face
value, handling this would require us to encode the relevant
bits of state into mmu_idx for both S and NS at once, which
would result in our needing 16 mmu indexes. Fortunately we
don't actually need to do this because a mismatch between
address attributes and CPU state means either:
 * some kind of fault (usually a SecureFault, but in theory
   perhaps a UserFault for unaligned access to Device memory)
 * execution of the SG instruction in NS state from a
   Secure & NonSecure code region

The purpose of SG is simply to flip the CPU into Secure
state, so we can handle it by emulating execution of that
instruction directly in arm_v7m_cpu_do_interrupt(), which
means we can treat all the mismatch cases as "throw an
exception" and we don't need to encode the state of the
other MPU bank into our mmu_idx values.

This commit doesn't include the actual emulation of SG;
it also doesn't include implementation of the IDAU, which
is a per-board way to specify hard-coded memory attributes
for addresses, which override the CPU-internal SAU if they
specify a more secure setting than the SAU is programmed to.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-15-git-send-email-peter.maydell@linaro.org
2017-10-06 16:46:49 +01:00
Peter Maydell
9901c576f6 nvic: Implement Security Attribution Unit registers
Implement the register interface for the SAU: SAU_CTRL,
SAU_TYPE, SAU_RNR, SAU_RBAR and SAU_RLAR. None of the
actual behaviour is implemented here; registers just
read back as written.

When the CPU definition for Cortex-M33 is eventually
added, its initfn will set cpu->sau_sregion, in the same
way that we currently set cpu->pmsav7_dregion for the
M3 and M4.

Number of SAU regions is typically a configurable
CPU parameter, but this patch doesn't provide a
QEMU CPU property for it. We can easily add one when
we have a board that requires it.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-14-git-send-email-peter.maydell@linaro.org
2017-10-06 16:46:49 +01:00
Peter Maydell
d3392718e1 target/arm: Add v8M support to exception entry code
Add support for v8M and in particular the security extension
to the exception entry code. This requires changes to:
 * calculation of the exception-return magic LR value
 * push the callee-saves registers in certain cases
 * clear registers when taking non-secure exceptions to avoid
   leaking information from the interrupted secure code
 * switch to the correct security state on entry
 * use the vector table for the security state we're targeting

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-13-git-send-email-peter.maydell@linaro.org
2017-10-06 16:46:49 +01:00
Peter Maydell
907bedb3f3 target/arm: Add support for restoring v8M additional state context
For v8M, exceptions from Secure to Non-Secure state will save
callee-saved registers to the exception frame as well as the
caller-saved registers. Add support for unstacking these
registers in exception exit when necessary.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-12-git-send-email-peter.maydell@linaro.org
2017-10-06 16:46:48 +01:00
Peter Maydell
bfb2eb5278 target/arm: Update excret sanity checks for v8M
In v8M, more bits are defined in the exception-return magic
values; update the code that checks these so we accept
the v8M values when the CPU permits them.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-11-git-send-email-peter.maydell@linaro.org
2017-10-06 16:46:48 +01:00
Peter Maydell
bed079da04 target/arm: Add new-in-v8M SFSR and SFAR
Add the new M profile Secure Fault Status Register
and Secure Fault Address Register.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-10-git-send-email-peter.maydell@linaro.org
2017-10-06 16:46:48 +01:00
Peter Maydell
4e4259d3c5 target/arm: Don't warn about exception return with PC low bit set for v8M
In the v8M architecture, return from an exception to a PC which
has bit 0 set is not UNPREDICTABLE; it is defined that bit 0
is discarded [R_HRJH]. Restrict our complaint about this to v7M.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-9-git-send-email-peter.maydell@linaro.org
2017-10-06 16:46:48 +01:00
Peter Maydell
cb484f9a6e target/arm: Warn about restoring to unaligned stack
Attempting to do an exception return with an exception frame that
is not 8-aligned is UNPREDICTABLE in v8M; warn about this.
(It is not UNPREDICTABLE in v7M, and our implementation can
handle the merely-4-aligned case fine, so we don't need to
do anything except warn.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-8-git-send-email-peter.maydell@linaro.org
2017-10-06 16:46:48 +01:00
Peter Maydell
224e0c300a target/arm: Check for xPSR mismatch usage faults earlier for v8M
ARM v8M specifies that the INVPC usage fault for mismatched
xPSR exception field and handler mode bit should be checked
before updating the PSR and SP, so that the fault is taken
with the existing stack frame rather than by pushing a new one.
Perform this check in the right place for v8M.

Since v7M specifies in its pseudocode that this usage fault
check should happen later, we have to retain the original
code for that check rather than being able to merge the two.
(The distinction is architecturally visible but only in
very obscure corner cases like attempting an invalid exception
return with an exception frame in read only memory.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-7-git-send-email-peter.maydell@linaro.org
2017-10-06 16:46:48 +01:00
Peter Maydell
3f0cddeee1 target/arm: Restore SPSEL to correct CONTROL register on exception return
On exception return for v8M, the SPSEL bit in the EXC_RETURN magic
value should be restored to the SPSEL bit in the CONTROL register
banked specified by the EXC_RETURN.ES bit.

Add write_v7m_control_spsel_for_secstate() which behaves like
write_v7m_control_spsel() but allows the caller to specify which
CONTROL bank to use, reimplement write_v7m_control_spsel() in
terms of it, and use it in exception return.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-6-git-send-email-peter.maydell@linaro.org
2017-10-06 16:46:48 +01:00
Peter Maydell
3919e60b6e target/arm: Restore security state on exception return
Now that we can handle the CONTROL.SPSEL bit not necessarily being
in sync with the current stack pointer, we can restore the correct
security state on exception return. This happens before we start
to read registers off the stack frame, but after we have taken
possible usage faults for bad exception return magic values and
updated CONTROL.SPSEL.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-5-git-send-email-peter.maydell@linaro.org
2017-10-06 16:46:47 +01:00
Peter Maydell
de2db7ec89 target/arm: Prepare for CONTROL.SPSEL being nonzero in Handler mode
In the v7M architecture, there is an invariant that if the CPU is
in Handler mode then the CONTROL.SPSEL bit cannot be nonzero.
This in turn means that the current stack pointer is always
indicated by CONTROL.SPSEL, even though Handler mode always uses
the Main stack pointer.

In v8M, this invariant is removed, and CONTROL.SPSEL may now
be nonzero in Handler mode (though Handler mode still always
uses the Main stack pointer). In preparation for this change,
change how we handle this bit: rename switch_v7m_sp() to
the now more accurate write_v7m_control_spsel(), and make it
check both the handler mode state and the SPSEL bit.

Note that this implicitly changes the point at which we switch
active SP on exception exit from before we pop the exception
frame to after it.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-4-git-send-email-peter.maydell@linaro.org
2017-10-06 16:46:47 +01:00
Peter Maydell
5b5223997c target/arm: Don't switch to target stack early in v7M exception return
Currently our M profile exception return code switches to the
target stack pointer relatively early in the process, before
it tries to pop the exception frame off the stack. This is
awkward for v8M for two reasons:
 * in v8M the process vs main stack pointer is not selected
   purely by the value of CONTROL.SPSEL, so updating SPSEL
   and relying on that to switch to the right stack pointer
   won't work
 * the stack we should be reading the stack frame from and
   the stack we will eventually switch to might not be the
   same if the guest is doing strange things

Change our exception return code to use a 'frame pointer'
to read the exception frame rather than assuming that we
can switch the live stack pointer this early.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-3-git-send-email-peter.maydell@linaro.org
2017-10-06 16:46:47 +01:00
Peter Maydell
8ff26a3344 nvic: Clear the vector arrays and prigroup on reset
Reset for devices does not include an automatic clear of the
device state (unlike CPU state, where most of the state
structure is cleared to zero). Add some missing initialization
of NVIC state that meant that the device was left in the wrong
state if the guest did a warm reset.

(In particular, since we were resetting the computed state like
s->exception_prio but not all the state it was computed
from like s->vectors[x].active, the NVIC wound up in an
inconsistent state that could later trigger assertion failures.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 1506092407-26985-2-git-send-email-peter.maydell@linaro.org
2017-10-06 16:46:47 +01:00
Thomas Huth
d858914435 hw/arm/xlnx-zynqmp: Mark the "xlnx, zynqmp" device with user_creatable = false
The device uses serial_hds in its realize function and thus can't be
used twice. Apart from that, the comma in its name makes it quite hard
to use for the user anyway, since a comma is normally used to separate
the device name from its properties when using the "-device" parameter
or the "device_add" HMP command.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Message-id: 1506441116-16627-1-git-send-email-thuth@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-10-06 16:46:47 +01:00
Michael Olbrich
8573378e62 hw/sd: fix out-of-bounds check for multi block reads
The current code checks if the next block exceeds the size of the card.
This generates an error while reading the last block of the card.
Do the out-of-bounds check when starting to read a new block to fix this.

This issue became visible with increased error checking in Linux 4.13.

Cc: qemu-stable@nongnu.org
Signed-off-by: Michael Olbrich <m.olbrich@pengutronix.de>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Message-id: 20170916091611.10241-1-m.olbrich@pengutronix.de
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-10-06 16:46:47 +01:00
Jan Kiszka
77077a8300 arm: Fix SMC reporting to EL2 when QEMU provides PSCI
This properly forwards SMC events to EL2 when PSCI is provided by QEMU
itself and, thus, ARM_FEATURE_EL3 is off.

Found and tested with the Jailhouse hypervisor. Solution based on
suggestions by Peter Maydell.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Message-id: 4f243068-aaea-776f-d18f-f9e05e7be9cd@siemens.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-10-06 16:46:47 +01:00
Kevin Wolf
fc3fd63fc0 Merge remote-tracking branch 'mreitz/tags/pull-block-2017-10-06' into queue-block
Block patches

# gpg: Signature made Fri Oct  6 16:30:57 2017 CEST
# gpg:                using RSA key F407DB0061D5CF40
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>"
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1  1829 F407 DB00 61D5 CF40

* mreitz/tags/pull-block-2017-10-06:
  block/mirror: check backing in bdrv_mirror_top_flush
  qcow2: truncate the tail of the image file after shrinking the image
  qcow2: fix return error code in qcow2_truncate()
  iotests: Fix 195 if IMGFMT is part of TEST_DIR
  block/mirror: check backing in bdrv_mirror_top_refresh_filename
  block: support passthrough of BDRV_REQ_FUA in crypto driver
  block: convert qcrypto_block_encrypt|decrypt to take bytes offset
  block: convert crypto driver to bdrv_co_preadv|pwritev
  block: fix data type casting for crypto payload offset
  crypto: expose encryption sector size in APIs
  block: use 1 MB bounce buffers for crypto instead of 16KB

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:32:08 +02:00
Vladimir Sementsov-Ogievskiy
ce960aa906 block/mirror: check backing in bdrv_mirror_top_flush
Backing may be zero after failed bdrv_append in mirror_start_job,
which leads to SIGSEGV.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20170929152255.5431-1-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06 16:30:48 +02:00
Pavel Butsykin
163bc39d2c qcow2: truncate the tail of the image file after shrinking the image
Now after shrinking the image, at the end of the image file, there might be a
tail that probably will never be used. So we can find the last used cluster and
cut the tail.

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20170929121613.25997-3-pbutsykin@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06 16:30:48 +02:00
Pavel Butsykin
76a2a30a99 qcow2: fix return error code in qcow2_truncate()
Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170929121613.25997-2-pbutsykin@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06 16:30:48 +02:00
Max Reitz
47500c6775 iotests: Fix 195 if IMGFMT is part of TEST_DIR
do_run_qemu() in iotest 195 first applies _filter_imgfmt when printing
qemu's command line and _filter_testdir only afterwards.  Therefore, if
the image format is part of the test directory path, _filter_testdir
will no longer apply and the actual output will differ from the
reference output even in case of success.

For example, TEST_DIR might be "/tmp/test-qcow2", in which case
_filter_imgfmt first transforms this to "/tmp/test-IMGFMT" which is no
longer recognized as the TEST_DIR by _filter_testdir.

Fix this by not applying _filter_imgfmt in do_run_qemu() but in
run_qemu() instead, and only after _filter_testdir.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170927211334.3988-1-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06 16:30:47 +02:00
Vladimir Sementsov-Ogievskiy
18775ff326 block/mirror: check backing in bdrv_mirror_top_refresh_filename
Backing may be zero after failed bdrv_attach_child in
bdrv_set_backing_hd, which leads to SIGSEGV.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20170928120300.58164-1-vsementsov@virtuozzo.com
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06 16:30:47 +02:00
Daniel P. Berrange
d67a6b09b4 block: support passthrough of BDRV_REQ_FUA in crypto driver
The BDRV_REQ_FUA flag can trivially be allowed in the crypt driver
as a passthrough to the underlying block driver.

Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170927125340.12360-7-berrange@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06 16:30:47 +02:00
Daniel P. Berrange
4609742a49 block: convert qcrypto_block_encrypt|decrypt to take bytes offset
Instead of sector offset, take the bytes offset when encrypting
or decrypting data.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170927125340.12360-6-berrange@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06 16:30:47 +02:00
Daniel P. Berrange
a73466fbad block: convert crypto driver to bdrv_co_preadv|pwritev
Make the crypto driver implement the bdrv_co_preadv|pwritev
callbacks, and also use bdrv_co_preadv|pwritev for I/O
with the protocol driver beneath. This replaces sector based
I/O with byte based I/O, and allows us to stop assuming the
physical sector size matches the encryption sector size.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170927125340.12360-5-berrange@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06 16:30:47 +02:00
Daniel P. Berrange
31376555c7 block: fix data type casting for crypto payload offset
The crypto APIs report the offset of the data payload as an uint64_t
type, but the block driver is casting to size_t or ssize_t which will
potentially truncate.

Most of the block APIs use int64_t for offsets meanwhile, so even if
using uint64_t in the crypto block driver we are still at risk of
truncation.

Change the block crypto driver to use uint64_t, but add asserts that
the value is less than INT64_MAX.

Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170927125340.12360-4-berrange@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06 16:30:47 +02:00
Daniel P. Berrange
850f49de9b crypto: expose encryption sector size in APIs
While current encryption schemes all have a fixed sector size of
512 bytes, this is not guaranteed to be the case in future. Expose
the sector size in the APIs so the block layer can remove assumptions
about fixed 512 byte sectors.

Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170927125340.12360-3-berrange@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06 16:30:47 +02:00
Daniel P. Berrange
161253e2d0 block: use 1 MB bounce buffers for crypto instead of 16KB
Using 16KB bounce buffers creates a significant performance
penalty for I/O to encrypted volumes on storage which high
I/O latency (rotating rust & network drives), because it
triggers lots of fairly small I/O operations.

On tests with rotating rust, and cache=none|directsync,
write speed increased from 2MiB/s to 32MiB/s, on a par
with that achieved by the in-kernel luks driver. With
other cache modes the in-kernel driver is still notably
faster because it is able to report completion of the
I/O request before any encryption is done, while the
in-QEMU driver must encrypt the data before completion.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170927125340.12360-2-berrange@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-10-06 16:30:47 +02:00
Eric Blake
461743390d iotests: Add test 197 for covering copy-on-read
Add a test for qcow2 copy-on-read behavior, including exposure
for the just-fixed bugs.

The copy-on-read behavior is always to a qcow2 image, but the
test is careful to allow running with most image protocol/format
combos as the backing file being copied from (luks being the
exception, as it is harder to pass the right secret to all the
right places).  In fact, for './check nbd', this appears to be
the first time we've had a qcow2 image wrapping NBD, requiring
an additional line in _filter_img_create to match the similar
line in _filter_img_info.

Invoking blkdebug to prove we don't write too much took some
effort to get working; and it requires that $TEST_WRAP (based
on $TEST_DIR) not be subject to word splitting.  We may decide
later to have the entire iotests suite use relative rather than
absolute names, to avoid problems inherited by the absolute
name of $PWD or $TEST_DIR, at which point the sanity check in
this commit could be simplified.

This test requires at least 2G of consecutive memory to succeed;
as such, it is prone to spurious failures, particularly on
32-bit machines under load.  This situation is detected and
triggers an early exit to skip the test, rather than a failure.
To manually provoke this setup on a beefier machine, I used:
  $ (ulimit -S -v 1000000; ./check -qcow2 197)

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake
cb2e28780c block: Perform copy-on-read in loop
Improve our braindead copy-on-read implementation.  Pre-patch,
we have multiple issues:
- we create a bounce buffer and perform a write for the entire
request, even if the active image already has 99% of the
clusters occupied, and really only needs to copy-on-read the
remaining 1% of the clusters
- our bounce buffer was as large as the read request, and can
needlessly exhaust our memory by using double the memory of
the request size (the original request plus our bounce buffer),
rather than a capped maximum overhead beyond the original
- if a driver has a max_transfer limit, we are bypassing the
normal code in bdrv_aligned_preadv() that fragments to that
limit, and instead attempt to read the entire buffer from the
driver in one go, which some drivers may assert on
- a client can request a large request of nearly 2G such that
rounding the request out to cluster boundaries results in a
byte count larger than 2G.  While this cannot exceed 32 bits,
it DOES have some follow-on problems:
-- the call to bdrv_driver_pread() can assert for exceeding
BDRV_REQUEST_MAX_BYTES, if the driver is old and lacks
.bdrv_co_preadv
-- if the buffer is all zeroes, the subsequent call to
bdrv_co_do_pwrite_zeroes is a no-op due to a negative size,
which means we did not actually copy on read

Fix all of these issues by breaking up the action into a loop,
where each iteration is capped to sane limits.  Also, querying
the allocation status allows us to optimize: when data is
already present in the active layer, we don't need to bounce.

Note that the code has a telling comment that copy-on-read
should probably be a filter driver rather than a bolt-on hack
in io.c; but that remains a task for another day.

CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake
d855ebcd3c block: Add blkdebug hook for copy-on-read
Make it possible to inject errors on writes performed during a
read operation due to copy-on-read semantics.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake
8803714b53 iotests: Restore stty settings on completion
Executing qemu with a terminal as stdin will temporarily alter stty
settings on that terminal (for example, disabling echo), because of
how we run both the monitor and any multiplexing with guest input.
Normally, qemu restores the original settings on exit; but if an
iotest triggers qemu to abort in the middle, we can be left with
the altered terminal setup.  This can make life very annoying when
debugging an iotest failure (not everyone remembers the trick of
blind-typing 'stty sane' without echo, and some people prefer
terminal settings that are slightly different than the defaults
picked by 'stty sane').

It is possible to avoid qemu corrupting the terminal by not passing
a terminal to qemu's stdin in the first place (as in, use
'./check ... </dev/null'), but that's extra typing to have to
remember.  But running 'exec </dev/null' in the harness seems like
it might be too heavy of a hammer.  So I instead went the the
solution of saving and restoring the stty settings, only when the
harness detects that it is run interactively.

I tested this patch by forcing an allocation failure (I can't
guarantee that this particular limit will work on all setups, but
it shows the idea):
 $ (ulimit -S -v 500000; ./check -qcow2 1)

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake
9cdcfd9f7a block: Uniform handling of 0-length bdrv_get_block_status()
Handle a 0-length block status request up front, with a uniform
return value claiming the area is not allocated.

Most callers don't pass a length of 0 to bdrv_get_block_status()
and friends; but it definitely happens with a 0-length read when
copy-on-read is enabled.  While we could audit all callers to
ensure that they never make a 0-length request, and then assert
that fact, it was just as easy to fix things to always report
success (as long as the callers are careful to not go into an
infinite loop).  However, we had inconsistent behavior on whether
the status is reported as allocated or defers to the backing
layer, depending on what callbacks the driver implements, and
possibly wasting quite a few CPU cycles to get to that answer.
Consistently reporting unallocated up front doesn't really hurt
anything, and makes it easier both for callers (0-length requests
now have well-defined behavior) and for drivers (drivers don't
have to deal with 0-length requests).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake
0f40444cc4 qemu-io: Add -C for opening with copy-on-read
Make it easier to enable copy-on-read during iotests, by
exposing a new bool option to main and open.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Kevin Wolf
bde70715b6 commit: Remove overlay_bs
We don't need to make any assumptions about the graph layout above the
top node of the commit operation any more. Remove the use of
bdrv_find_overlay() and related variables from the commit job code.

bdrv_drop_intermediate() doesn't use the 'active' parameter any more, so
we can just drop it.

The overlay node was previously added to the block job to get a
BLK_PERM_GRAPH_MOD. We really need to respect those permissions in
bdrv_drop_intermediate() now, but as long as we haven't figured out yet
how BLK_PERM_GRAPH_MOD is actually supposed to work, just leave a TODO
comment there.

With this change, it is now possible to perform another block job on an
overlay node without conflicts. qemu-iotests 030 is changed accordingly.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2017-10-06 16:28:58 +02:00
Kevin Wolf
7c61a4a3f9 qemu-iotests: Test commit block job where top has two parents
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Kevin Wolf
72538537d8 qemu-iotests: Allow QMP pretty printing in common.qemu
QMP responses to certain commands can become quite long, which doesn't
only make reading them hard, but also means that the maximum line length
in patch emails can be exceeded. Allow tests to switch to QMP pretty
printing, which results in more, but shorter lines.

We also need to make sure to keep indentation in the response for this
to work as expected.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2017-10-06 16:28:58 +02:00
Kevin Wolf
61f09cea01 commit: Support multiple roots above top node
This changes the commit block job to support operation in a graph where
there is more than a single active layer that references the top node.

This involves inserting the commit filter node not only on the path
between the given active node and the top node, but between the top node
and all of its parents.

On completion, bdrv_drop_intermediate() must consider all parents for
updating the backing file link. These parents may be backing files
themselves and as such read-only; reopen them temporarily if necessary.
Previously this was achieved by the bdrv_reopen() calls in the commit
block job that made overlay_bs read-write for the whole duration of the
block job, even though write access is only needed on completion.

Now that we consider all parents, overlay_bs is meaningless. It is left
in place in this commit, but we'll remove it soon.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Kevin Wolf
6858eba09e block: Introduce BdrvChildRole.update_filename
There is no good reason for bdrv_drop_intermediate() to know the active
layer above the subchain it is operating on - even more so, because
the assumption that there is a single active layer above it is not
generally true.

In order to prepare removal of the active parameter, use a BdrvChildRole
callback to update the backing file string in the overlay image instead
of directly calling bdrv_change_backing_file().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2017-10-06 16:28:58 +02:00
Paolo Bonzini
09d653e617 qemu-iotests: merge "check" and "common"
"check" is full of qemu-iotests--specific details.  Separating it
from "common" does not make much sense anymore.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Paolo Bonzini
4e670492ef qemu-iotests: get rid of $iam
The variable is almost unused, and one of the two uses is actually
uninitialized.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Paolo Bonzini
8f4dcaba9b qemu-iotests: fix uninitialized variable
The variable is used in "common" but defined only after the file
is sourced.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Paolo Bonzini
cce293a294 qemu-iotests: disintegrate more parts of common.config
Split "check" parts from tests part.

For the directory setup, the actual computation of directories goes
in "check", while the sanity checks go in the tests.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Paolo Bonzini
3817ce03bf qemu-iotests: do not include common.rc in "check"
It only provides functions used by the test programs.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Paolo Bonzini
d1f2447a3e qemu-iotests: limit non-_PROG-suffixed variables to common.rc
These are never used by "check", with one exception that does not need
$QEMU_OPTIONS.  Keep them in common.rc, which will be soon included only
by the tests.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Paolo Bonzini
cceaf1db6f qemu-iotests: cleanup and fix search for programs
Instead of ./check failing when a binary is missing, we try each test
case now and each one fails with tons of test case diffs.  Also, all the
variables were initialized by "check" prior to "common" being sourced,
and then (uselessly) checked for emptiness again in "check".

Centralize the search for programs in "common" (which will soon be
one with "check"), including the "realpath" invocation which can be done
just once in "check" rather than in the tests.

For qnio_server, move the detection to "common", simplifying
set_prog_path to stop handling the unused second argument, and
embedding the "realpath" pass.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Paolo Bonzini
48259488aa qemu-iotests: move "check" code out of common.rc
Some functions in common.rc are never used by the tests.  Move
them out of that file and into common, which is already included
only by "check".

Code that actually *is* common to "check" and tests can be placed in
common.config.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Paolo Bonzini
9ee4b6f803 qemu-iotests: get rid of AWK_PROG
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Paolo Bonzini
f06a8dcfc6 qemu-iotests: remove dead code
This includes shell function, shell variables and command line options
(randomize.awk does not exist).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Thomas Huth
dbfa934106 hw/block/onenand: Remove dead code block
The condition of the for-loop makes sure that b is always smaller
than s->blocks, so the "if (b >= s->blocks)" statement is completely
superfluous here.

Buglink: https://bugs.launchpad.net/qemu/+bug/1715007
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake
ca75962244 dirty-bitmap: Convert internal hbitmap size/granularity
Now that all callers are using byte-based interfaces, there's no
reason for our internal hbitmap to remain with sector-based
granularity.  It also simplifies our internal scaling, since we
already know that hbitmap widens requests out to granularity
boundaries.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake
0fdf1a4f68 dirty-bitmap: Switch bdrv_set_dirty() to bytes
Both callers already had bytes available, but were scaling to
sectors.  Move the scaling to internal code.  In the case of
bdrv_aligned_pwritev(), we are now passing the exact offset
rather than a rounded sector-aligned value, but that's okay
as long as dirty bitmap widens start/bytes to granularity
boundaries.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake
49d741b504 qcow2: Switch store_bitmap_data() to byte-based iteration
Now that we have adjusted the majority of the calls this function
makes to be byte-based, it is easier to read the code if it makes
passes over the image using bytes rather than sectors.

iotests 165 was rather weak - on a default 64k-cluster image, where
bitmap granularity also defaults to 64k bytes, a single cluster of
the bitmap table thus covers (64*1024*8) bits which each cover 64k
bytes, or 32G of image space.  But the test only uses a 1G image,
so it cannot trigger any more than one loop of the code in
store_bitmap_data(); and it was writing to the first cluster.  In
order to test that we are properly aligning which portions of the
bitmap are being written to the file, we really want to test a case
where the first dirty bit returned by bdrv_dirty_iter_next() is not
aligned to the start of a cluster, which we can do by modifying the
test to write data that doesn't happen to fall in the first cluster
of the image.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake
ab94db6f76 qcow2: Switch load_bitmap_data() to byte-based iteration
Now that we have adjusted the majority of the calls this function
makes to be byte-based, it is easier to read the code if it makes
passes over the image using bytes rather than sectors.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake
b85ee45334 qcow2: Switch qcow2_measure() to byte-based iteration
This is new code, but it is easier to read if it makes passes over
the image using bytes rather than sectors (and will get easier in
the future when bdrv_get_block_status is converted to byte-based).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake
23ca459a45 mirror: Switch mirror_dirty_init() to byte-based iteration
Now that we have adjusted the majority of the calls this function
makes to be byte-based, it is easier to read the code if it makes
passes over the image using bytes rather than sectors.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake
e0d7f73e63 dirty-bitmap: Change bdrv_[re]set_dirty_bitmap() to use bytes
Some of the callers were already scaling bytes to sectors; others
can be easily converted to pass byte offsets, all in our shift
towards a consistent byte interface everywhere.  Making the change
will also make it easier to write the hold-out callers to use byte
rather than sectors for their iterations; it also makes it easier
for a future dirty-bitmap patch to offload scaling over to the
internal hbitmap.  Although all callers happen to pass
sector-aligned values, make the internal scaling robust to any
sub-sector requests.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake
3b5d4df0c6 dirty-bitmap: Change bdrv_get_dirty_locked() to take bytes
Half the callers were already scaling bytes to sectors; the other
half can eventually be simplified to use byte iteration.  Both
callers were already using the result as a bool, so make that
explicit.  Making the change also makes it easier for a future
dirty-bitmap patch to offload scaling over to the internal hbitmap.

Remember, asking whether a byte is dirty is effectively asking
whether the entire granularity containing the byte is dirty, since
we only track dirtiness by granularity.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake
9a46dba7b7 dirty-bitmap: Change bdrv_get_dirty_count() to report bytes
Thanks to recent cleanups, all callers were scaling a return value
of sectors into bytes; do the scaling internally instead.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake
f798184cfd dirty-bitmap: Change bdrv_dirty_iter_next() to report byte offset
Thanks to recent cleanups, most callers were scaling a return value
of sectors into bytes (the exception, in qcow2-bitmap, will be
converted to byte-based iteration later).  Update the interface to
do the scaling internally instead.

In qcow2-bitmap, the code was specifically checking for an error
return of -1.  To avoid a regression, we either have to make sure
we continue to return -1 (rather than a scaled -512) on error, or
we have to fix the caller to treat all negative values as error
rather than just one magic value.  It's easy enough to make both
changes at the same time, even though either one in isolation
would work.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake
715a74d819 dirty-bitmap: Set iterator start by offset, not sector
All callers to bdrv_dirty_iter_new() passed 0 for their initial
starting point, drop that parameter.

Most callers to bdrv_set_dirty_iter() were scaling a byte offset to
a sector number; the exception qcow2-bitmap will be converted later
to use byte rather than sector iteration.  Move the scaling to occur
internally to dirty bitmap code instead, so that callers now pass
in bytes.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake
c7e7c87ac8 qcow2: Switch sectors_covered_by_bitmap_cluster() to byte-based
We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Change the qcow2 bitmap
helper function sectors_covered_by_bitmap_cluster(), renaming it
to bytes_covered_by_bitmap_cluster() in the process.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake
86f6ae67e1 dirty-bitmap: Change bdrv_dirty_bitmap_*serialize*() to take bytes
Right now, the dirty-bitmap code exposes the fact that we use
a scale of sector granularity in the underlying hbitmap to anything
that wants to serialize a dirty bitmap.  It's nicer to uniformly
expose bytes as our dirty-bitmap interface, matching the previous
change to bitmap size.  The only caller to serialization is currently
qcow2-cluster.c, which becomes a bit more verbose because it is still
tracking sectors for other reasons, but a later patch will fix that
to more uniformly use byte offsets everywhere.  Likewise, within
dirty-bitmap, we have to add more assertions that we are not
truncating incorrectly, which can go away once the internal hbitmap
is byte-based rather than sector-based.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake
993e6525bf dirty-bitmap: Track bitmap size by bytes
We are still using an internal hbitmap that tracks a size in sectors,
with the granularity scaled down accordingly, because it lets us
use a shortcut for our iterators which are currently sector-based.
But there's no reason we can't track the dirty bitmap size in bytes,
since it is (mostly) an internal-only variable (remember, the size
is how many bytes are covered by the bitmap, not how many bytes the
bitmap occupies).  A later cleanup will convert dirty bitmap
internals to be entirely byte-based, eliminating the intermediate
sector rounding added here; and technically, since bdrv_getlength()
already rounds up to sectors, our use of DIV_ROUND_UP is more for
theoretical completeness than for any actual rounding.

Use is_power_of_2() while at it, instead of open-coding that.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake
ebfcd2e75f dirty-bitmap: Change bdrv_dirty_bitmap_size() to report bytes
We're already reporting bytes for bdrv_dirty_bitmap_granularity();
mixing bytes and sectors in our return values is a recipe for
confusion.  A later cleanup will convert dirty bitmap internals
to be entirely byte-based, but in the meantime, we should report
the bitmap size in bytes.

The only external caller in qcow2-bitmap.c is temporarily more verbose
(because it is still using sector-based math), but will later be
switched to track progress by bytes instead of sectors.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake
1b6cc579de dirty-bitmap: Avoid size query failure during truncate
We've previously fixed several places where we failed to account
for possible errors from bdrv_nb_sectors().  Fix another one by
making bdrv_dirty_bitmap_truncate() take the new size from the
caller instead of querying itself; then adjust the sole caller
bdrv_truncate() to pass the size just determined by a successful
resize, or to reuse the size given to the original truncate
operation when refresh_total_sectors() was not able to confirm the
actual size (the two sizes can potentially differ according to
rounding constraints), thus avoiding sizing the bitmaps to -1.
This also fixes a bug where not all failure paths in
bdrv_truncate() would set errp.

Note that bdrv_truncate() is still a bit awkward.  We may want
to revisit it later and clean up things to better guarantee that
a resize attempt either fails cleanly up front, or cannot fail
after guest-visible changes have been made (if temporary changes
are made, then they need to be cleanly rolled back).  But that
is a task for another day; for now, the goal is the bare minimum
fix to ensure that just bdrv_dirty_bitmap_truncate() cannot fail.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake
dfe55c3577 dirty-bitmap: Drop unused functions
We had several functions that no one is currently using, and which
use sector-based interfaces.  I'm trying to convert towards byte-based
interfaces, so it's easier to just drop the unused functions:

bdrv_dirty_bitmap_get_meta
bdrv_dirty_bitmap_get_meta_locked
bdrv_dirty_bitmap_reset_meta
bdrv_dirty_bitmap_meta_granularity

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake
113754f3a8 qcow2: Ensure bitmap serialization is aligned
When subdividing a bitmap serialization, the code in hbitmap.c
enforces that start/count parameters are aligned (except that
count can end early at end-of-bitmap).  We exposed this required
alignment through bdrv_dirty_bitmap_serialization_align(), but
forgot to actually check that we comply with it.

Fortunately, qcow2 is never dividing bitmap serialization smaller
than one cluster (which is a minimum of 512 bytes); so we are
always compliant with the serialization alignment (which insists
that we partition at least 64 bits per chunk) because we are doing
at least 4k bits per chunk.

Still, it's safer to add an assertion (for the unlikely case that
we'd ever support a cluster smaller than 512 bytes, or if the
hbitmap implementation changes what it considers to be aligned),
rather than leaving bdrv_dirty_bitmap_serialization_align()
without a caller.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake
ecbfa2817d hbitmap: Rename serialization_granularity to serialization_align
The only client of hbitmap_serialization_granularity() is dirty-bitmap's
bdrv_dirty_bitmap_serialization_align().  Keeping the two names consistent
is worthwhile, and the shorter name is more representative of what the
function returns (the required alignment to be used for start/count of
other serialization functions, where violating the alignment causes
assertion failures).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake
a8b42a1c09 block: Make bdrv_img_create() size selection easier to read
All callers of bdrv_img_create() pass in a size, or -1 to read the
size from the backing file.  We then set that size as the QemuOpt
default, which means we will reuse that default rather than the
final parameter to qemu_opt_get_size() several lines later.  But
it is rather confusing to read subsequent checks of 'size == -1'
when it looks (without seeing the full context) like size defaults
to 0; it also doesn't help that a size of 0 is valid (for some
formats).

Rework the logic to make things more legible.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Eric Blake
765d9df962 block: Typo fix in copy_on_readv()
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-10-06 16:28:58 +02:00
Peter Maydell
a26a98dfb9 Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20171006' into staging
s390x changes:
- support for IDA (indirect addressing in ccws) via ccw data stream
- support for extended TOD-Clock (z14 feature)
- various fixes and improvements all over the place

# gpg: Signature made Fri 06 Oct 2017 10:52:22 BST
# gpg:                using RSA key 0xDECF6B93C6F02FAF
# gpg: Good signature from "Cornelia Huck <conny@cornelia-huck.de>"
# gpg:                 aka "Cornelia Huck <huckc@linux.vnet.ibm.com>"
# gpg:                 aka "Cornelia Huck <cornelia.huck@de.ibm.com>"
# gpg:                 aka "Cornelia Huck <cohuck@kernel.org>"
# gpg:                 aka "Cornelia Huck <cohuck@redhat.com>"
# Primary key fingerprint: C3D0 D66D C362 4FF6 A8C0  18CE DECF 6B93 C6F0 2FAF

* remotes/cohuck/tags/s390x-20171006: (33 commits)
  hw/s390x: Mark the "sclpquiesce" device with user_creatable = false
  s390x/tcg: initialize machine check queue
  s390x/sclp: mark sclp-cpu-hotplug as non-usercreatable
  s390x/sclp: Mark the sclp device with user_creatable = false
  s390/kvm: make TOD setting failures fatal for migration
  s390/kvm: Support for get/set of extended TOD-Clock for guest
  s390x/css: fix css migration compat handling
  s390x: sort some devices into categories
  s390x/tcg: make STFL store into the lowcore
  s390x: introduce and use S390_MAX_CPUS
  target/s390x: get rid of next_core_id
  s390x/cpumodel: fix max STFL(E) bit number
  s390x: raise CPU hotplug irq after really hotplugged
  MAINTAINERS: use KVM s390x maintainers for kvm-stubs.c and kvm_s390x.h
  s390x/3270: handle writes of arbitrary length
  s390x/3270: IDA support for 3270 via CcwDataStream
  Revert "s390x/ccw: create s390 phb conditionally"
  s390x/tcg: make idte/ipte use the new _real mmu
  s390x/tcg: make testblock use the new _real mmu
  s390x/tcg: make stora(g) use the new _real mmu
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-10-06 13:19:03 +01:00
Thomas Huth
b923ab3112 hw/s390x: Mark the "sclpquiesce" device with user_creatable = false
The "sclpquiesce" device is just an internal device that should not be
created by the user directly. Though it currently does not seem to cause
any obvious trouble when the user instantiates an additional device, let's
better mark it with user_creatable = false to avoid unexpected behavior,
e.g. because the quiesce notifier gets registered multiple times.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1507193105-15627-1-git-send-email-thuth@redhat.com>
Reviewed-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2017-10-06 10:53:02 +02:00
Cornelia Huck
8986db4922 s390x/tcg: initialize machine check queue
Just as for external interrupts and I/O interrupts, we need to
initialize mchk_index during cpu reset.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2017-10-06 10:53:02 +02:00
Cornelia Huck
7aa4d85d29 s390x/sclp: mark sclp-cpu-hotplug as non-usercreatable
A TYPE_SCLP_CPU_HOTPLUG device for handling cpu hotplug events
is already created by the sclp event facility. Adding a second
TYPE_SCLP_CPU_HOTPLUG device via -device sclp-cpu-hotplug creates
an ambiguity in raise_irq_cpu_hotplug(), leading to a crash once
a cpu is hotplugged.

To fix this, disallow creating a sclp-cpu-hotplug device manually.

Reviewed-by: Thomas Huth <thuth@redhat.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2017-10-06 10:53:02 +02:00
Thomas Huth
e6cb60bf15 s390x/sclp: Mark the sclp device with user_creatable = false
The "sclp" device is just an internal device that can not be instantiated
by the users. If they try to use it, they only get a simple error message:

$ qemu-system-s390x -nographic -device sclp
qemu-system-s390x: Option '-device s390-sclp-event-facility' cannot be
handled by this machine

Since sclp_init() tries to create a TYPE_SCLP_EVENT_FACILITY which is
a non-pluggable sysbus device, there is really no way that the "sclp"
device can be used by the user, so let's set the user_creatable = false
accordingly.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1507125199-22562-1-git-send-email-thuth@redhat.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Reviewed-by: Farhan Ali <alifm@linux.vnet.ibm.com>
Acked-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2017-10-06 10:53:02 +02:00
Collin L. Walling
28f8dbe85d s390/kvm: make TOD setting failures fatal for migration
If we fail to set a proper TOD clock on the target system,  this can
already result in some problematic cases. We print several warn messages
on source and target in that case.

If kvm fails to set a nonzero epoch index, then we must ultimately fail
the migration as this will result in a giant time leap backwards. This
patch lets the migration fail if we can not set the guest time on the
target.

On failure the guest will resume normally on the original host machine.

Signed-off-by: Collin L. Walling <walling@linux.vnet.ibm.com>
Reviewed-by: Eric Farman <farman@linux.vnet.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
[split failure change from epoch index change, minor fixups]
Message-Id: <20171004105751.24655-3-borntraeger@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2017-10-06 10:53:02 +02:00
Collin L. Walling
7edd4a4967 s390/kvm: Support for get/set of extended TOD-Clock for guest
Provides an interface for getting and setting the guest's extended
TOD-Clock via a single ioctl to kvm. If the ioctl fails because it
is not support by kvm, then we fall back to the old style of
retrieving the clock via two ioctls.

Signed-off-by: Collin L. Walling <walling@linux.vnet.ibm.com>
Reviewed-by: Eric Farman <farman@linux.vnet.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
[split failure change from epoch index change]
Message-Id: <20171004105751.24655-2-borntraeger@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
[some cosmetic fixes]
2017-10-06 10:53:02 +02:00
Halil Pasic
489c909f09 s390x/css: fix css migration compat handling
Commit e996583eb3 ("s390x/css: activate ChannelSubSys migration",
2017-07-11) was supposed to enable css migration for virtio-ccw
machines starting 2.10, but it ended up effectively enabling it
only for 2.10 as the registration of the appropriate VMStateDescription
happens in ccw_machine_2_10_instance_options which does not get
called for machines more recent than 2_10.

Let us move the corresponding chunk of code (which conditionally enables
the migration based on the value of the corresponding class property) to
ccw_init, which is called for each virtio-ccw machine instance.

Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reported-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20171004110109.16525-1-pasic@linux.vnet.ibm.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2017-10-06 10:53:02 +02:00
Cornelia Huck
bd2aef1065 s390x: sort some devices into categories
Add missing categorizations for some s390x devices:
- zpci device -> misc
- 3270 -> display
- vfio-ccw -> misc

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2017-10-06 10:53:02 +02:00
David Hildenbrand
86b5ab3909 s390x/tcg: make STFL store into the lowcore
Using virtual memory access is wrong and will soon include low-address
protection checks, which is to be bypassed for STFL.

STFL is a privileged instruction and using LowCore requires
!CONFIG_USER_ONLY, so add the ifdef and move the declaration to the
right place.

This was originally part of a bigger STFL(E) refactoring.

Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170927170027.8539-4-david@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2017-10-06 10:53:02 +02:00
David Hildenbrand
f42dc44a14 s390x: introduce and use S390_MAX_CPUS
Will be handy in the future.

Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170928134609.16985-6-david@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2017-10-06 10:53:02 +02:00
David Hildenbrand
1e70ba24a9 target/s390x: get rid of next_core_id
core_id is not needed by linux-user, as the core_id a.k.a. CPU address
is only accessible from kernel space.

Therefore, drop next_core_id and make cpu_index get autoassigned again
for linux-user.

While at it, shield core_id and cpuid completely from linux-user. cpuid
can also only be queried from kernel space.

Suggested-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>

Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170928134609.16985-5-david@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2017-10-06 10:53:02 +02:00
David Hildenbrand
c547a757f4 s390x/cpumodel: fix max STFL(E) bit number
Not that it would matter in the near future, but it is actually 2048
bytes, therefore 16384 possible bits.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170928134609.16985-4-david@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2017-10-06 10:53:02 +02:00
David Hildenbrand
c5b934303c s390x: raise CPU hotplug irq after really hotplugged
Let's move it into the machine, so we trigger the IRQ after setting
ms->possible_cpus (which SCLP uses to construct the list of
online CPUs).

This also fixes a problem reported by Thomas Huth, whereby qemu can be
crashed using the none machine

qemu-s390x-softmmu -M none -monitor stdio
-> device_add qemu-s390-cpu

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170928134609.16985-3-david@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2017-10-06 10:53:02 +02:00
David Hildenbrand
040078e06d MAINTAINERS: use KVM s390x maintainers for kvm-stubs.c and kvm_s390x.h
Forgot it when factoring code out into these files. This is 100% s390x
KVM material.

Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170928134609.16985-2-david@redhat.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2017-10-06 10:53:02 +02:00
Halil Pasic
17ec9921a7 s390x/3270: handle writes of arbitrary length
The problem is, that the current implementation places unrealistic and
arbitrary constraints on the length of writes to the device (that is the
outbound requests), by asserting ccw.count being such that that even the
worst case escaped payload will fit an  more or less arbitrary sized
buffer. Actually on protocol level there is nothing to justify such
a limitation.

Another strange thing is the return value which more or less reflects
the size (written) after escaping instead of before escaping. This
is strange, because this return value is used to calculate SCSW.count.

Let us teach 3270 how to deal with arbitrary long writes.

Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Reported-by: Jason J . Herne <jjherne@linux.vnet.ibm.com>
Tested-by: Jason J . Herne <jjherne@linux.vnet.ibm.com>
Message-Id: <20170920172314.102710-3-pasic@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2017-10-06 10:53:02 +02:00
Halil Pasic
1baa2eb01e s390x/3270: IDA support for 3270 via CcwDataStream
Let us convert the 3270 code so it uses the recently introduced
CcwDataStream abstraction instead of blindly assuming direct data access.

This patch does not change behavior beyond introducing IDA support: for
direct data access CCWs everything stays as-is. (If there are bugs, they
are also preserved).

Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Message-Id: <20170920172314.102710-2-pasic@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2017-10-06 10:53:02 +02:00
Christian Borntraeger
c1843e2092 Revert "s390x/ccw: create s390 phb conditionally"
This reverts commit d32bd032d8.

Turns out that old QEMUs always created a pci host bridge
and for many CPU models the migration from old QEMUs to new
QEMUs will fail with
qemu-system-s390x: Unknown savevm section or instance 'PCIBUS' 0
qemu-system-s390x: load of migration failed: Invalid argument

As a quick fix we will revert the commit and always create the
pci host bridge.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
[fixed revert to keep the comment fixup, added a comment in the code]
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Message-Id: <20170928131831.81393-1-borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2017-10-06 10:53:02 +02:00
David Hildenbrand
8eb82de91b s390x/tcg: make idte/ipte use the new _real mmu
We don't wrap addresses in the mmu for the _real case, therefore the
behavior should be unchanged.

Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170926183318.12995-7-david@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2017-10-06 10:53:02 +02:00
David Hildenbrand
e26131c904 s390x/tcg: make testblock use the new _real mmu
Low address protection checks will be moved into the mmu later.

Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170926183318.12995-6-david@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2017-10-06 10:53:02 +02:00
David Hildenbrand
4ae433417e s390x/tcg: make stora(g) use the new _real mmu
As we properly handle the return address now, we can drop
potential_page_fault().

Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170926183318.12995-5-david@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2017-10-06 10:53:02 +02:00
David Hildenbrand
34499dadc1 s390x/tcg: make lura(g) use the new _real mmu.
Looks like, lurag was not loading 64bit but only 32bit.

As we properly handle the return address now, we can drop
potential_page_fault().

Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170926183318.12995-4-david@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2017-10-06 10:53:02 +02:00
David Hildenbrand
fb66944df9 s390x/tcg: add MMU for real addresses
This makes it easy to access real addresses (prefix) and in addition
checks for valid memory addresses, which is missing when using e.g.
stl_phys().

We can later reuse it to implement low address protection checks (then
we might even decide to introduce yet another MMU for absolute
addresses, just for handling storage keys and low address protection).

Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170926183318.12995-3-david@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2017-10-06 10:53:02 +02:00
David Hildenbrand
0bd695a960 s390x/tcg: fix checking for invalid memory check
It should have been a >=, but let's directly perform a proper access
check to also be able to deal with hotplugged memory later.

Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170926183318.12995-2-david@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2017-10-06 10:53:02 +02:00
Halil Pasic
93973f8f15 s390x/css: support ccw IDA
Let's add indirect data addressing support for our virtual channel
subsystem. This implementation does not bother with any kind of
prefetching. We simply step through the IDAL on demand.

Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Message-Id: <20170921180841.24490-6-pasic@linux.vnet.ibm.com>
Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2017-10-06 10:53:02 +02:00
Halil Pasic
62a2554ec2 390x/css: introduce maximum data address checking
The architecture mandates the addresses to be accessed on the first
indirection level (that is, the data addresses without IDA, and the
(M)IDAW addresses with (M)IDA) to be checked against an CCW format
dependent limit maximum address.  If a violation is detected, the storage
access is not to be performed and a channel program check needs to be
generated. As of today, we fail to do this check.

Let us stick even closer to the architecture specification.

Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Message-Id: <20170921180841.24490-5-pasic@linux.vnet.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2017-10-06 10:53:02 +02:00
Halil Pasic
f57ba05823 virtio-ccw: use ccw data stream
Replace direct access which implicitly assumes no IDA
or MIDA with the new ccw data stream interface which should
cope with these transparently in the future.

Note that checking the return code for ccw_dstream_* will be
done in a follow-on patch.

Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Message-Id: <20170921180841.24490-4-pasic@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2017-10-06 10:53:01 +02:00
Halil Pasic
0a22eac5aa s390x/css: use ccw data stream
Replace direct access which implicitly assumes no IDA
or MIDA with the new ccw data stream interface which should
cope with these transparently in the future.

Note that checking the return code for ccw_dstream_* will be
done in a follow-on patch.

Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
Message-Id: <20170921180841.24490-3-pasic@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2017-10-06 10:53:01 +02:00
Halil Pasic
57065a70d0 s390x/css: introduce css data stream
This is a preparation for introducing handling for indirect data
addressing and modified indirect data addressing (CCW). Here we introduce
an interface which should make the addressing scheme transparent for the
client code. Here we implement only the basic scheme (no IDA or MIDA).

Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
Message-Id: <20170921180841.24490-2-pasic@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2017-10-06 10:53:01 +02:00
David Hildenbrand
947a38bd6f s390x/kvm: fix and cleanup storing CPU status
env->psa is a 64bit value, while we copy 4 bytes into the save area,
resulting always in 0 getting stored.

Let's try to reduce such errors by using a proper structure. While at
it, use correct cpu->be conversion (and get_psw_mask()), as we will be
reusing this code for TCG soon.

Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170922140338.6068-1-david@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2017-10-06 10:53:01 +02:00
Igor Mammedov
b6805e127c s390x: use generic cpu_model parsing
Define default CPU type in generic way in machine class_init
and let common machine code handle cpu_model parsing.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Message-Id: <1505998749-269631-1-git-send-email-imammedo@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2017-10-06 10:53:01 +02:00
David Hildenbrand
7705c75048 s390x/tcg: add basic MSA features
The STFLE bits for the MSA (extension) facilities simply indicate that
the respective instructions can be executed. The QUERY subfunction can then
be used to identify which features exactly are available.

Availability of subfunctions can also vary on real hardware. For now, we
simply implement a CPU model without any available subfunctions except
QUERY (which is always around).

As all MSA functions behave quite similarly, we can use one translation
handler for now. Prepare the code for implementation of actual subfunctions.

At least MSA is helpful for now, as older Linux kernels require this
facility when compiled for a z9 model. Allow to enable the facilities
for the qemu cpu model.

Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170920153016.3858-4-david@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2017-10-06 10:53:01 +02:00
David Hildenbrand
7634d658e6 s390x/tcg: move wrap_address() to internal.h
We want to use it in another file.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170920153016.3858-3-david@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2017-10-06 10:53:01 +02:00
David Hildenbrand
6b257354c4 s390x/tcg: implement spm (SET PROGRAM MASK)
Missing and is used inside Linux in the context of CPACF.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170920153016.3858-2-david@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2017-10-06 10:53:01 +02:00
Peter Maydell
d8f932cc69 Merge remote-tracking branch 'remotes/stefanha/tags/tracing-pull-request' into staging
# gpg: Signature made Thu 05 Oct 2017 15:25:21 BST
# gpg:                using RSA key 0x9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* remotes/stefanha/tags/tracing-pull-request:
  checkpatch: fix incompatibility with old perl

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-10-05 16:54:29 +01:00
Peter Maydell
67caeeacd3 Merge remote-tracking branch 'remotes/dgilbert/tags/pull-hmp-20171005' into staging
HMP pull 2017-10-05

# gpg: Signature made Thu 05 Oct 2017 11:50:13 BST
# gpg:                using RSA key 0x0516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A  9FA9 0516 331E BC5B FDE7

* remotes/dgilbert/tags/pull-hmp-20171005:
  hmp-commands-info: Change "@findex FOO" to "@findex info FOO"
  hmp-commands-info: Move Texinfo stanzas to conventional place
  hmp-commands-info: Fix "info rocker-FOO" misspellings
  hmp: Fix unknown command for subtable
  hmp: Missing handle_errors

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-10-05 16:13:46 +01:00
Peter Maydell
f43a46f0f4 Merge remote-tracking branch 'remotes/kraxel/tags/usb-20171005-pull-request' into staging
usb bugfixes.

# gpg: Signature made Thu 05 Oct 2017 10:04:15 BST
# gpg:                using RSA key 0x4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"
# Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138

* remotes/kraxel/tags/usb-20171005-pull-request:
  usb: fix host-stub.c build race
  usb: Use angle brackets for cacard include directive
  usb: fix libusb config variable name.
  hw/usb/bus: Remove bad object_unparent() from usb_try_create_simple()

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-10-05 15:31:06 +01:00
Vladimir Sementsov-Ogievskiy
45042732f3 checkpatch: fix incompatibility with old perl
Do not use '/r' modifier which was introduced in perl 5.14.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Fixes: 3e5875afc0f ("checkpatch: check trace-events code style")
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Message-id: 20171004154420.34596-1-vsementsov@virtuozzo.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-10-05 10:22:44 -04:00
Peter Maydell
1fdc4c5d82 Merge remote-tracking branch 'remotes/berrange/tags/pull-qio-2017-10-04-1' into staging
Merge qio 2017/10/04 v1

# gpg: Signature made Wed 04 Oct 2017 13:23:04 BST
# gpg:                using RSA key 0xBE86EBB415104FDF
# gpg: Good signature from "Daniel P. Berrange <dan@berrange.com>"
# gpg:                 aka "Daniel P. Berrange <berrange@redhat.com>"
# Primary key fingerprint: DAF3 A6FD B26B 6291 2D0E  8E3F BE86 EBB4 1510 4FDF

* remotes/berrange/tags/pull-qio-2017-10-04-1:
  io: add trace events for websockets frame handling
  io: Attempt to send websocket close messages to client
  io: Reply to ping frames
  io: Ignore websocket PING and PONG frames
  io: Allow empty websocket payload
  io: Add support for fragmented websocket binary frames
  io: Small updates in preparation for websocket changes
  ui: Always remove an old VNC channel watch before adding a new one
  io: use case insensitive check for Connection & Upgrade websock headers
  io: include full error message in websocket handshake trace
  io: send proper HTTP response for websocket errors

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-10-05 14:44:12 +01:00
Peter Maydell
90586a78ff Merge remote-tracking branch 'remotes/awilliam/tags/vfio-updates-20171003.0' into staging
VFIO updates 2017-10-03

 - NVIDIA GPUDirect Cliques experimental support (Alex Williamson)

# gpg: Signature made Tue 03 Oct 2017 22:28:43 BST
# gpg:                using RSA key 0x239B9B6E3BB08B22
# gpg: Good signature from "Alex Williamson <alex.williamson@redhat.com>"
# gpg:                 aka "Alex Williamson <alex@shazbot.org>"
# gpg:                 aka "Alex Williamson <alwillia@redhat.com>"
# gpg:                 aka "Alex Williamson <alex.l.williamson@gmail.com>"
# Primary key fingerprint: 42F6 C04E 540B D1A9 9E7B  8A90 239B 9B6E 3BB0 8B22

* remotes/awilliam/tags/vfio-updates-20171003.0:
  vfio/pci: Add NVIDIA GPUDirect Cliques support
  vfio/pci: Add virtual capabilities quirk infrastructure
  vfio/pci: Do not unwind on error

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-10-05 13:28:43 +01:00
Peter Maydell
5456c6a4ec Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging
# gpg: Signature made Tue 03 Oct 2017 19:53:34 BST
# gpg:                using RSA key 0x9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* remotes/stefanha/tags/block-pull-request:
  aio: fix assert when remove poll during destroy
  iothread: delay the context release to finalize
  iothread: export iothread_stop()
  iothread: provide helpers for internal use
  qom: provide root container for internal objs

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-10-05 12:02:21 +01:00
Markus Armbruster
1b591700e6 hmp-commands-info: Change "@findex FOO" to "@findex info FOO"
qemu-doc has the monitor commands in the "Function Index".  The "info
FOO" are listed as "FOO" there.  List them as "info FOO" instead.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20171002134538.23332-4-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-10-05 10:08:39 +01:00
Markus Armbruster
a92725136c hmp-commands-info: Move Texinfo stanzas to conventional place
A command's STEXI..ETEXI stanza follows the command's initializer.
Two commands got them backwards.  Correct that.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20171002134538.23332-3-armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-10-05 10:06:05 +01:00
Markus Armbruster
c325ccd304 hmp-commands-info: Fix "info rocker-FOO" misspellings
Screwed up in commit da76ee7.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20171002134538.23332-2-armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-10-05 10:05:13 +01:00
Gerd Hoffmann
eea6ae2037 usb: fix host-stub.c build race
Suggested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-id: 20171004125210.7817-1-kraxel@redhat.com
2017-10-05 11:03:25 +02:00
Dr. David Alan Gilbert
250b819764 hmp: Fix unknown command for subtable
(qemu) info foo
unknown command: 'foo'

fix this to:
(qemu) info foo
unknown command: 'info foo'

Reported-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20170817104216.29150-3-dgilbert@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-10-05 10:01:21 +01:00
Dr. David Alan Gilbert
40c71f6361 hmp: Missing handle_errors
hmp_info_memdev && hmp_info_memory_devices were missing
hmp_handle_error calls.  Add them.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20170817104216.29150-2-dgilbert@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-10-05 10:01:03 +01:00
Daniel P. Berrange
59f183bbd5 io: add trace events for websockets frame handling
It is useful to trace websockets frame encoding/decoding when debugging
problems.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2017-10-04 13:21:53 +01:00
Brandon Carpenter
530ca60c16 io: Attempt to send websocket close messages to client
Make a best effort attempt to close websocket connections according to
the RFC. Sends the close message, as room permits in the socket buffer,
and immediately closes the socket.

Signed-off-by: Brandon Carpenter <brandon.carpenter@cypherpath.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2017-10-04 13:21:53 +01:00
Brandon Carpenter
268a53f50d io: Reply to ping frames
Add an immediate ping reply (pong) to the outgoing stream when a ping
is received. Unsolicited pongs are ignored.

Signed-off-by: Brandon Carpenter <brandon.carpenter@cypherpath.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2017-10-04 13:21:53 +01:00
Brandon Carpenter
01af17fc00 io: Ignore websocket PING and PONG frames
Keep pings and gratuitous pongs generated by web browsers from killing
websocket connections.

Signed-off-by: Brandon Carpenter <brandon.carpenter@cypherpath.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2017-10-04 13:21:53 +01:00
Brandon Carpenter
3a29640e2c io: Allow empty websocket payload
Some browsers send pings/pongs with no payload, so allow empty payloads
instead of closing the connection.

Signed-off-by: Brandon Carpenter <brandon.carpenter@cypherpath.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2017-10-04 13:21:53 +01:00
Brandon Carpenter
ff1300e626 io: Add support for fragmented websocket binary frames
Allows fragmented binary frames by saving the previous opcode. Handles
the case where an intermediary (i.e., web proxy) fragments frames
originally sent unfragmented by the client.

Signed-off-by: Brandon Carpenter <brandon.carpenter@cypherpath.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2017-10-04 13:21:53 +01:00
Brandon Carpenter
eefa3d8ef6 io: Small updates in preparation for websocket changes
Gets rid of unnecessary bit shifting and performs proper EOF checking to
avoid a large number of repeated calls to recvmsg() when a client
abruptly terminates a connection (bug fix).

Signed-off-by: Brandon Carpenter <brandon.carpenter@cypherpath.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2017-10-04 13:21:53 +01:00
Brandon Carpenter
a75d6f0761 ui: Always remove an old VNC channel watch before adding a new one
Also set saved handle to zero when removing without adding a new watch.

Signed-off-by: Brandon Carpenter <brandon.carpenter@cypherpath.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
2017-10-04 13:21:53 +01:00
Daniel P. Berrange
33badfd1e3 io: use case insensitive check for Connection & Upgrade websock headers
When checking the value of the Connection and Upgrade HTTP headers
the websock RFC (6455) requires the comparison to be case insensitive.
The Connection value should be an exact match not a substring.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2017-10-04 13:21:53 +01:00
Daniel P. Berrange
3a3f870596 io: include full error message in websocket handshake trace
When the websocket handshake fails it is useful to log the real
error message via the trace points for debugging purposes.

Fixes bug: #1715186

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2017-10-04 13:21:53 +01:00
Daniel P. Berrange
f69a8bde29 io: send proper HTTP response for websocket errors
When any error occurs while processing the websockets handshake,
QEMU just terminates the connection abruptly. This is in violation
of the HTTP specs and does not help the client understand what they
did wrong. This is particularly bad when the client gives the wrong
path, as a "404 Not Found" would be very helpful.

Refactor the handshake code so that it always sends a response to
the client unless there was an I/O error.

Fixes bug: #1715186

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2017-10-04 13:21:53 +01:00
Alex Williamson
dfbee78db8 vfio/pci: Add NVIDIA GPUDirect Cliques support
NVIDIA has defined a specification for creating GPUDirect "cliques",
where devices with the same clique ID support direct peer-to-peer DMA.
When running on bare-metal, tools like NVIDIA's p2pBandwidthLatencyTest
(part of cuda-samples) determine which GPUs can support peer-to-peer
based on chipset and topology.  When running in a VM, these tools have
no visibility to the physical hardware support or topology.  This
option allows the user to specify hints via a vendor defined
capability.  For instance:

  <qemu:commandline>
    <qemu:arg value='-set'/>
    <qemu:arg value='device.hostdev0.x-nv-gpudirect-clique=0'/>
    <qemu:arg value='-set'/>
    <qemu:arg value='device.hostdev1.x-nv-gpudirect-clique=1'/>
    <qemu:arg value='-set'/>
    <qemu:arg value='device.hostdev2.x-nv-gpudirect-clique=1'/>
  </qemu:commandline>

This enables two cliques.  The first is a singleton clique with ID 0,
for the first hostdev defined in the XML (note that since cliques
define peer-to-peer sets, singleton clique offer no benefit).  The
subsequent two hostdevs are both added to clique ID 1, indicating
peer-to-peer is possible between these devices.

QEMU only provides validation that the clique ID is valid and applied
to an NVIDIA graphics device, any validation that the resulting
cliques are functional and valid is the user's responsibility.  The
NVIDIA specification allows a 4-bit clique ID, thus valid values are
0-15.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-10-03 12:57:36 -06:00
Alex Williamson
e3f79f3bd4 vfio/pci: Add virtual capabilities quirk infrastructure
If the hypervisor needs to add purely virtual capabilties, give us a
hook through quirks to do that.  Note that we determine the maximum
size for a capability based on the physical device, if we insert a
virtual capability, that can change.  Therefore if maximum size is
smaller after added virt capabilities, use that.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-10-03 12:57:36 -06:00
Alex Williamson
5b31c8229d vfio/pci: Do not unwind on error
If vfio_add_std_cap() errors then going to out prepends irrelevant
errors for capabilities we haven't attempted to add as we unwind our
recursive stack.  Just return error.

Fixes: 7ef165b9a8 ("vfio/pci: Pass an error object to vfio_add_capabilities")
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-10-03 12:57:35 -06:00
Stefan Hajnoczi
f708a5e71c aio: fix assert when remove poll during destroy
After iothread is enabled internally inside QEMU with GMainContext, we
may encounter this warning when destroying the iothread:

(qemu-system-x86_64:19925): GLib-CRITICAL **: g_source_remove_poll:
 assertion '!SOURCE_DESTROYED (source)' failed

The problem is that g_source_remove_poll() does not allow to remove one
source from array if the source is detached from its owner
context. (peterx: which IMHO does not make much sense)

Fix it on QEMU side by avoid calling g_source_remove_poll() if we know
the object is during destruction, and we won't leak anything after all
since the array will be gone soon cleanly even with that fd.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-id: 20170928025958.1420-6-peterx@redhat.com
[peterx: write the commit message]
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-10-03 14:36:19 -04:00
Peter Xu
5b3ac23fee iothread: delay the context release to finalize
When gcontext is used with iothread, the context will be destroyed
during iothread_stop().  That's not good since sometimes we would like
to keep the resources until iothread is destroyed, but we may want to
stop the thread before that point.

Delay the destruction of gcontext to iothread finalize.  Then we can do:

  iothread_stop(thread);
  some_cleanup_on_resources();
  iothread_destroy(thread);

We may need this patch if we want to run chardev IOs in iothreads and
hopefully clean them up correctly.  For more specific information,
please see 2b316774f6 ("qemu-char: do not operate on sources from
finalize callbacks").

Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-id: 20170928025958.1420-5-peterx@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-10-03 14:36:19 -04:00
Peter Xu
82d90705fe iothread: export iothread_stop()
So that internal iothread users can explicitly stop one iothread without
destroying it.

Since at it, fix iothread_stop() to allow it to be called multiple
times.  Before this patch we may call iothread_stop() more than once on
single iothread, while that may not be correct since qemu_thread_join()
is not allowed to run twice.  From manual of pthread_join():

  Joining with a thread that has previously been joined results in
  undefined behavior.

Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-id: 20170928025958.1420-4-peterx@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-10-03 14:36:16 -04:00
Peter Xu
0173e21b61 iothread: provide helpers for internal use
IOThread is a general framework that contains IO loop environment and a
real thread behind.  It's also good to be used internally inside qemu.
Provide some helpers for it to create iothreads to be used internally.

Put all the internal used iothreads into the internal object container.

Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-id: 20170928025958.1420-3-peterx@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-10-03 14:26:15 -04:00
Peter Xu
7c47c4ead7 qom: provide root container for internal objs
We have object_get_objects_root() to keep user created objects, however
no place for objects that will be used internally.  Create such a
container for internal objects.

CC: Andreas Färber <afaerber@suse.de>
CC: Markus Armbruster <armbru@redhat.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
Suggested-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20170928025958.1420-2-peterx@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-10-03 14:26:15 -04:00
Peter Maydell
d147f7e815 Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
* iothread bugfix (Eduardo)
* Linux headers sync (Dave)
* .gitignore fix (Eric)
* KVM capability check fixes (Greg)
* kvmclock fix (Jim)

# gpg: Signature made Mon 02 Oct 2017 14:31:09 BST
# gpg:                using RSA key 0xBFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream:
  kvmclock: use the updated system_timer_msr
  kvm: check KVM_CAP_NR_VCPUS with kvm_vm_check_extension()
  kvm: check KVM_CAP_SYNC_MMU with kvm_vm_check_extension()
  linux-headers: sync against v4.14-rc1
  iothread: Make iothread_stop() idempotent
  scsi: Ignore executable for in-tree builds

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-10-03 16:27:24 +01:00
Peter Maydell
0b7fe5aed7 Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2017-10-02' into staging
QAPI patches for 2017-10-02

# gpg: Signature made Mon 02 Oct 2017 12:09:32 BST
# gpg:                using RSA key 0x3870B400EB918653
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>"
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>"
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867  4E5F 3870 B400 EB91 8653

* remotes/armbru/tags/pull-qapi-2017-10-02:
  watchdog: Allow setting action on the fly
  watchdog.h: Drop local redefinition of actions enum
  qapi: Rename WatchdogExpirationAction enum

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-10-03 15:11:00 +01:00
Peter Maydell
2c94822167 Merge remote-tracking branch 'remotes/kraxel/tags/ui-20170929-pull-request' into staging
ui and input patches.

# gpg: Signature made Fri 29 Sep 2017 11:21:45 BST
# gpg:                using RSA key 0x4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"
# Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138

* remotes/kraxel/tags/ui-20170929-pull-request:
  ui: add tracing of VNC authentication process
  ui: add tracing of VNC operations related to QIOChannel
  virtio-input: send rel-wheel events for wheel buttons
  egl: misc framebuffer helper improvements.
  console: purge curses bits from console.h

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-10-03 13:50:10 +01:00
Peter Maydell
be9d199751 Merge remote-tracking branch 'remotes/famz/tags/docker-testing-pull-request' into staging
# gpg: Signature made Fri 29 Sep 2017 04:17:37 BST
# gpg:                using RSA key 0xCA35624C6A9171C6
# gpg: Good signature from "Fam Zheng <famz@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 5003 7CB7 9706 0F76 F021  AD56 CA35 624C 6A91 71C6

* remotes/famz/tags/docker-testing-pull-request:
  docker: Don't mount ccache db if NOUSER=1
  docker: test-block: Don't continue if build fails
  tests/docker/run: don't source /etc/profile
  docker: Fix test-mingw
  docker: add installation to build tests

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-10-03 12:43:03 +01:00
Jim Somerville
346b1215b1 kvmclock: use the updated system_timer_msr
Fixes e2b6c17 (kvmclock: update system_time_msr address forcibly)
which makes a call to get the latest value of the address
stored in system_timer_msr, but then uses the old address anyway.

Signed-off-by: Jim Somerville <Jim.Somerville@windriver.com>
Message-Id: <59b67db0bd15a46ab47c3aa657c81a4c11f168ea.1506702472.git.Jim.Somerville@windriver.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-10-02 14:39:51 +02:00
Greg Kurz
11748ba72e kvm: check KVM_CAP_NR_VCPUS with kvm_vm_check_extension()
On a modern server-class ppc host with the following CPU topology:

Architecture:          ppc64le
Byte Order:            Little Endian
CPU(s):                32
On-line CPU(s) list:   0,8,16,24
Off-line CPU(s) list:  1-7,9-15,17-23,25-31
Thread(s) per core:    1

If both KVM PR and KVM HV loaded and we pass:

        -machine pseries,accel=kvm,kvm-type=PR -smp 8

We expect QEMU to warn that this exceeds the number of online CPUs:

Warning: Number of SMP cpus requested (8) exceeds the recommended
 cpus supported by KVM (4)
Warning: Number of hotpluggable cpus requested (8) exceeds the
 recommended cpus supported by KVM (4)

but nothing is printed...

This happens because on ppc the KVM_CAP_NR_VCPUS capability is VM
specific  ndreally depends on the KVM type, but we currently use it
as a global capability. And KVM returns a fallback value based on
KVM HV being present. Maybe KVM on POWER shouldn't presume anything
as long as it doesn't have a VM, but in all cases, we should call
KVM_CREATE_VM first and use KVM_CAP_NR_VCPUS as a VM capability.

This patch hence changes kvm_recommended_vcpus() accordingly and
moves the sanity checking of smp_cpus after the VM creation.

It is okay for the other archs that also implement KVM_CAP_NR_VCPUS,
ie, mips, s390, x86 and arm, because they don't depend on the VM
being created or not.

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <150600966286.30533.10909862523552370889.stgit@bahia.lan>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-10-02 14:38:06 +02:00
Greg Kurz
62dd4edaaf kvm: check KVM_CAP_SYNC_MMU with kvm_vm_check_extension()
On a server-class ppc host, this capability depends on the KVM type,
ie, HV or PR. If both KVM are present in the kernel, we will always
get the HV specific value, even if we explicitely requested PR on
the command line.

This can have an impact if we're using hugepages or a balloon device.

Since we've already created the VM at the time any user calls
kvm_has_sync_mmu(), switching to kvm_vm_check_extension() is
enough to fix any potential issue.

It is okay for the other archs that also implement KVM_CAP_SYNC_MMU,
ie, mips, s390, x86 and arm, because they don't depend on the VM being
created or not.

While here, let's cache the state of this extension in a bool variable,
since it has several users in the code, as suggested by Thomas Huth.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <150600965332.30533.14702405809647835716.stgit@bahia.lan>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-10-02 14:38:06 +02:00
Michal Privoznik
f0df84c6c4 watchdog: Allow setting action on the fly
Currently, the only time that users can set watchdog action is at
the start as all we expose is this -watchdog-action command line
argument. This is suboptimal when users want to plug the device
later via monitor. Alternatively, they might want to change the
action for already existing device on the fly.

Inspired by: https://bugzilla.redhat.com/show_bug.cgi?id=1447169

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Message-Id: <35d6ce6fe3d357122d73b8272bc8198134c74104.1504771369.git.mprivozn@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
[Missing colon in doc comment fixed]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2017-10-02 13:09:09 +02:00
Michal Privoznik
4c7f4426c4 watchdog.h: Drop local redefinition of actions enum
We already have enum that enumerates all the actions that a
watchdog can take when hitting its timeout: WatchdogAction.
Use that instead of inventing our own.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Message-Id: <ce2790634e6a1b3b6cf90462399d17bad83f0290.1504771369.git.mprivozn@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2017-10-02 08:41:03 +02:00
Michal Privoznik
14d53b4f4a qapi: Rename WatchdogExpirationAction enum
The new name is WatchdogAction which is shorter,

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Message-Id: <dbd61a0928821348486d0d6260be2bd3b02b6402.1504771369.git.mprivozn@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2017-10-02 08:40:01 +02:00
Fam Zheng
13787d59cf usb: Use angle brackets for cacard include directive
This is a library header, so angle brackets are more appropriate; also
move the line to before QEMU headers, as is recommended in HACKING.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 20170920085952.3872-1-famz@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2017-09-29 12:28:26 +02:00
Gerd Hoffmann
275d477a1a usb: fix libusb config variable name.
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Fixes: 4e5ee5b21c
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Tested-by: Jan Kiszka <jan.kiszka@siemens.com>
Message-id: 20170926063820.30773-1-kraxel@redhat.com
2017-09-29 12:27:30 +02:00
Thomas Huth
f3b2bea3c7 hw/usb/bus: Remove bad object_unparent() from usb_try_create_simple()
Valgrind detects an invalid read operation when hot-plugging of an
USB device fails:

$ valgrind x86_64-softmmu/qemu-system-x86_64 -device usb-ehci -nographic -S
==30598== Memcheck, a memory error detector
==30598== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==30598== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==30598== Command: x86_64-softmmu/qemu-system-x86_64 -device usb-ehci -nographic -S
==30598==
QEMU 2.10.50 monitor - type 'help' for more information
(qemu) device_add usb-tablet
(qemu) device_add usb-tablet
(qemu) device_add usb-tablet
(qemu) device_add usb-tablet
(qemu) device_add usb-tablet
(qemu) device_add usb-tablet
==30598== Invalid read of size 8
==30598==    at 0x60EF50: object_unparent (object.c:445)
==30598==    by 0x580F0D: usb_try_create_simple (bus.c:346)
==30598==    by 0x581BEB: usb_claim_port (bus.c:451)
==30598==    by 0x582310: usb_qdev_realize (bus.c:257)
==30598==    by 0x4CB399: device_set_realized (qdev.c:914)
==30598==    by 0x60E26D: property_set_bool (object.c:1886)
==30598==    by 0x61235E: object_property_set_qobject (qom-qobject.c:27)
==30598==    by 0x61000F: object_property_set_bool (object.c:1162)
==30598==    by 0x4567C3: qdev_device_add (qdev-monitor.c:630)
==30598==    by 0x456D52: qmp_device_add (qdev-monitor.c:807)
==30598==    by 0x470A99: hmp_device_add (hmp.c:1933)
==30598==    by 0x3679C3: handle_hmp_command (monitor.c:3123)

The object_unparent() here is not necessary anymore since commit
69382d8b3e ("qdev: Fix object reference leak in case device.realize()
fails"), so let's remove it now.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-id: 1506526106-30971-1-git-send-email-thuth@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2017-09-29 12:23:12 +02:00
Alexey Perevalov
d4083f50e0 linux-headers: sync against v4.14-rc1
Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
Message-Id: <1506085187-24259-2-git-send-email-a.perevalov@samsung.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-29 10:58:31 +02:00
Eduardo Habkost
65072c157e iothread: Make iothread_stop() idempotent
Currently, iothread_stop_all() makes all iothread objects unsafe
to be destroyed, because qemu_thread_join() ends up being called
twice.

To fix this, make iothread_stop() idempotent by checking
thread->stopped.

Fixes the following crash:

  qemu-system-x86_64 -object iothread,id=iothread0 -monitor stdio -display none
  QEMU 2.10.50 monitor - type 'help' for more information
  (qemu) quit
  qemu: qemu_thread_join: No such process
  Aborted (core dumped)

Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20170926130028.12471-1-ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-29 10:56:56 +02:00
Eric Blake
cff3e8b8d6 scsi: Ignore executable for in-tree builds
The new qemu-pr-helper (commit b855f8d17) should not be checked in,
even when doing in-tree builds.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170926151421.14557-1-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-29 10:56:56 +02:00
Daniel P. Berrange
7364dbdabb ui: add tracing of VNC authentication process
Trace anything related to authentication in the VNC protocol
handshake

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170921121528.23935-3-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2017-09-29 10:36:34 +02:00
Daniel P. Berrange
ad6374c43e ui: add tracing of VNC operations related to QIOChannel
Trace anything which opens/closes/wraps a QIOChannel in the
VNC server.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170921121528.23935-2-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2017-09-29 10:36:33 +02:00
Gerd Hoffmann
f4924974c7 virtio-input: send rel-wheel events for wheel buttons
qemu uses wheel-up/down button events for mouse wheel input, however
linux applications typically want REL_WHEEL events.

This fixes wheel with linux guests. Tested with X11/wayland, and
windows virtio-input driver.

Based on a patch from Marc.
Added property to enable/disable wheel axis.

Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170926113243.26081-1-kraxel@redhat.com
2017-09-29 10:36:33 +02:00
Gerd Hoffmann
74083f9c01 egl: misc framebuffer helper improvements.
Rename the functions to to say "setup" instead of "create" because they
support being called multiple times on the same egl framebuffer.

Properly delete unused textures, update function interfaces to support
this.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170927115031.12063-1-kraxel@redhat.com
2017-09-29 10:36:33 +02:00
Gerd Hoffmann
e2f82e924d console: purge curses bits from console.h
Handle the translation from vga chars to curses chars in curses_update()
instead of console_write_ch().  Purge any curses support bits from
ui/console.h include file.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170927103811.19249-1-kraxel@redhat.com
2017-09-29 10:36:33 +02:00
Fam Zheng
36ac78e65a docker: Don't mount ccache db if NOUSER=1
With NOUSER=1 the container runs code as root, which may create
privileged files that will not be be accssible next time. Skip ccache
dir mount in this case.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20170925075458.18047-1-famz@redhat.com>
Acked-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
2017-09-29 11:14:15 +08:00
Fam Zheng
12b25a7d96 docker: test-block: Don't continue if build fails
Report error and exit upon compiling error, otherwise the iotests output
will be pure noise.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20170926110134.2786-1-famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2017-09-29 11:14:15 +08:00
Alex Bennée
bcd7f06f57 tests/docker/run: don't source /etc/profile
The usual behaviour of /etc/profile is to set the default PATH for
users. This runs into problems when we have updated PATH in our
dockerfile e.g. to access a cross-compiler in a non-standard
location. It shouldn't be needed anyway as we inherit the env from the
image when it was setup.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
CC: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20170926133622.14991-1-alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Fam Zheng <famz@redhat.com>
2017-09-29 11:14:15 +08:00
Fam Zheng
299d296ea9 docker: Fix test-mingw
Feature "dtc" is explicitly required by test-mingw, but is not detected
by the run script since we switched to archive-source.sh in b7f404201e.
Since it isn't available in the Fedora image which runs this test on
patchew, the way we get dtc is still from submodule.

archive-source.sh takes care of bundling the submodule files already, so
what we need to do is just checking if files are there. Makefile is
chosen because it is one that is unlikely to get renamed in the future.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20170925082913.22089-1-famz@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2017-09-29 11:14:14 +08:00
Paolo Bonzini
6283847857 docker: add installation to build tests
Basic test that "make install" works; this requires msgfmt so add
gettext to the packages.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1506095371-23160-1-git-send-email-pbonzini@redhat.com>
[Rebase to master. - Fam]
Signed-off-by: Fam Zheng <famz@redhat.com>
2017-09-29 11:14:14 +08:00
Peter Maydell
ab16152926 Merge remote-tracking branch 'remotes/dgilbert/tags/pull-migration-20170927a' into staging
Migration pull 2017-09-27

# gpg: Signature made Wed 27 Sep 2017 14:56:23 BST
# gpg:                using RSA key 0x0516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A  9FA9 0516 331E BC5B FDE7

* remotes/dgilbert/tags/pull-migration-20170927a:
  migration: Route more error paths
  migration: Route errors up through vmstate_save
  migration: wire vmstate_save_state errors up to vmstate_subsection_save
  migration: Check field save returns
  migration: check pre_save return in vmstate_save_state
  migration: pre_save return int
  migration: disable auto-converge during bulk block migration

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-09-27 22:44:51 +01:00
Peter Maydell
1d89344081 Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.11-20170927' into staging
ppc patch queue 2017-09-27

Contains
 * a number of Mac machine type fixes
 * a number of embedded machine type fixes (preliminary to adding the
   Sam460ex board)
 * a important fix for handling of migration with KVM PR
 * assorted other minor fixes and cleanups

# gpg: Signature made Wed 27 Sep 2017 08:40:48 BST
# gpg:                using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg:                 aka "David Gibson (kernel.org) <dwg@kernel.org>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-2.11-20170927: (26 commits)
  macio: use object link between MACIO_IDE and MAC_DBDMA object
  macio: pass channel into MACIOIDEState via qdev property
  mac_dbdma: remove DBDMA_init() function
  mac_dbdma: QOMify
  mac_dbdma: remove unused IO fields from DBDMAState
  spapr: fix the value of SDR1 in kvmppc_put_books_sregs()
  ppc/pnv: check for OPAL firmware file presence
  ppc: remove all unused CPU definitions
  ppc: remove unused CPU definitions
  spapr_pci: make index property mandatory
  macio: convert pmac_ide_ops from old_mmio
  ppc/pnv: Improve macro parenthesization
  spapr: introduce helpers to migrate HPT chunks and the end marker
  ppc/kvm: generalize the use of kvmppc_get_htab_fd()
  ppc/kvm: change kvmppc_get_htab_fd() to return -errno on error
  ppc: Fix OpenPIC model
  ppc/ide/macio: Add missing registers
  ppc/mac: More rework of the DBDMA emulation
  ppc/mac: Advertise a high clock frequency for NewWorld Macs
  ppc: QOMify g3beige machine
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-09-27 18:20:31 +01:00
Peter Maydell
cfe4cade05 Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches

# gpg: Signature made Tue 26 Sep 2017 14:52:32 BST
# gpg:                using RSA key 0x7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream: (24 commits)
  block/qcow2-bitmap: fix use of uninitialized pointer
  qemu-iotests: add shrinking image test
  qcow2: add shrink image support
  qcow2: add qcow2_cache_discard
  qemu-img: add --shrink flag for resize
  iotests: fix 181: enable postcopy-ram capability on target
  qemu-iotests: Test change-backing-file command
  block: Fix permissions after bdrv_reopen()
  block: reopen: Queue children after their parents
  block: Base permissions on rw state after reopen
  block: Add reopen queue to bdrv_check_perm()
  block: Add reopen_queue to bdrv_child_perm()
  qemu-io: Drop write permissions before read-only reopen
  block: Clean up some bad code in the vvfat driver
  block/throttle-groups.c: allocate RestartData on the heap
  throttle: Assert that bkt->max is valid in throttle_compute_wait()
  iotests: Print full path of bad output if mismatch
  iotests: use virtio aliases for 067
  iotests: use -ccw on s390x for 051
  iotests: use -ccw on s390x for 040, 139, and 182
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-09-27 16:48:39 +01:00
Peter Maydell
d666cacaea Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20170927' into staging
Another s390x compat fix that should make it into 2.10.1.

# gpg: Signature made Wed 27 Sep 2017 10:30:16 BST
# gpg:                using RSA key 0xDECF6B93C6F02FAF
# gpg: Good signature from "Cornelia Huck <conny@cornelia-huck.de>"
# gpg:                 aka "Cornelia Huck <huckc@linux.vnet.ibm.com>"
# gpg:                 aka "Cornelia Huck <cornelia.huck@de.ibm.com>"
# gpg:                 aka "Cornelia Huck <cohuck@kernel.org>"
# gpg:                 aka "Cornelia Huck <cohuck@redhat.com>"
# Primary key fingerprint: C3D0 D66D C362 4FF6 A8C0  18CE DECF 6B93 C6F0 2FAF

* remotes/cohuck/tags/s390x-20170927:
  s390x/cpumodel: remove ais from z14 default model-> also for 2.10.1

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-09-27 15:59:35 +01:00
Dr. David Alan Gilbert
2f168d0708 migration: Route more error paths
vmstate_save_state is called in lots of places.
Route error returns from the easier cases back up;  there are lots
of more complex cases where their own error paths need fixing.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20170925112917.21340-7-dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  Commit message fix up as Peter's review
2017-09-27 11:44:18 +01:00
Dr. David Alan Gilbert
687433f611 migration: Route errors up through vmstate_save
Route the errors from vsmtate_save_state back up through
vmstate_save and out to the normal device state path.
That's the normal error path done.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20170925112917.21340-6-dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-09-27 11:41:03 +01:00
Dr. David Alan Gilbert
f3cadd39c4 migration: wire vmstate_save_state errors up to vmstate_subsection_save
Route the errors from vmstate_save_state up through
vmstate_subsection_save (and back down, all rather recursive).

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20170925112917.21340-5-dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  Commit message fixed up as per Peter's review
2017-09-27 11:38:21 +01:00
Dr. David Alan Gilbert
88b0faf185 migration: Check field save returns
Check the return values from vmstate_save_state for fields and also the
return values from 'put' for fields that use that.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20170925112917.21340-4-dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-09-27 11:37:11 +01:00
Dr. David Alan Gilbert
551dbd0846 migration: check pre_save return in vmstate_save_state
Check the return value of pre_save state and fail vmstate_save_state
if the pre_save failed.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20170925112917.21340-3-dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-09-27 11:36:31 +01:00
Dr. David Alan Gilbert
44b1ff319c migration: pre_save return int
Modify the pre_save method on VMStateDescription to return an int
rather than void so that it potentially can fail.

Changed zillions of devices to make them return 0; the only
case I've made it return non-0 is hw/intc/s390_flic_kvm.c that already
had an error_report/return case.

Note: If you add an error exit in your pre_save you must emit
an error_report to say why.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20170925112917.21340-2-dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-09-27 11:35:59 +01:00
Peter Lieven
9ac78b6171 migration: disable auto-converge during bulk block migration
auto-converge and block migration currently do not play well together.
During block migration the auto-converge logic detects that ram
migration makes no progress and thus throttles down the vm until
it nearly stalls completely. Avoid this by disabling the throttling
logic during the bulk phase of the block migration.

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <1506421996-12513-1-git-send-email-pl@kamp.de>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-09-27 11:27:14 +01:00
Christian Borntraeger
9dacc90846 s390x/cpumodel: remove ais from z14 default model-> also for 2.10.1
We disabled ais for 2.10, so let's also remove it from the z14
default model.

Fixes: 3f2d07b3b0 ("s390x/ais: for 2.10 stable: disable ais facility")
CC: qemu-stable@nongnu.org
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Message-Id: <20170927072030.35737-2-borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2017-09-27 11:13:32 +02:00
Mark Cave-Ayland
e451b85f1b macio: use object link between MACIO_IDE and MAC_DBDMA object
Using a standard QOM object link we can pass a reference to the MAC_DBDMA
controller to the MACIO_IDE object which removes the last external parameter
to macio_ide_register_dma().

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-27 13:05:41 +10:00
Mark Cave-Ayland
0fc84331d6 macio: pass channel into MACIOIDEState via qdev property
One of the reasons macio_ide_register_dma() needs to exist is because the
channel id isn't passed into the MACIO_IDE object. Pass in the channel id
using a qdev property to remove this requirement.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-27 13:05:41 +10:00
Mark Cave-Ayland
ecba28dbf2 mac_dbdma: remove DBDMA_init() function
Instead we can now instantiate the MAC_DBDMA object directly within the
macio device. We also add the DBDMA device as a child property so that
it is possible to retrieve later.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-27 13:05:41 +10:00
Mark Cave-Ayland
1d27f351af mac_dbdma: QOMify
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-27 13:05:41 +10:00
Mark Cave-Ayland
2bb4a98f90 mac_dbdma: remove unused IO fields from DBDMAState
These fields were used to manually handle IO requests that weren't aligned
to a sector boundary before this feature was supported by the block API.

Once the block API changed to support byte-aligned IO requests, the macio
controller was switched over to use it in commit be1e343 but these fields
were accidentally left behind. Remove them, including the initialisation
in DBDMA_init().

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-27 13:05:41 +10:00
Greg Kurz
1ec26c757d spapr: fix the value of SDR1 in kvmppc_put_books_sregs()
When running with KVM PR, if a new HPT is allocated we need to inform
KVM about the HPT address and size. This is currently done by hacking
the value of SDR1 and pushing it to KVM in several places.

Also, migration breaks the guest since it is very unlikely the HPT has
the same address in source and destination, but we push the incoming
value of SDR1 to KVM anyway.

This patch introduces a new virtual hypervisor hook so that the spapr
code can provide the correct value of SDR1 to be pushed to KVM each
time kvmppc_put_books_sregs() is called.

It allows to get rid of all the hacking in the spapr/kvmppc code and
it fixes migration of nested KVM PR.

Suggested-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-27 13:05:41 +10:00
Cédric Le Goater
15fcedb26f ppc/pnv: check for OPAL firmware file presence
and exit before uselessly trying to load it if the file does not
exists.

Issue discovered by Coverity Scan.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-27 13:05:41 +10:00
John Snow
5aec066c41 ppc: remove all unused CPU definitions
Remove *all* unused CPU definitions as indicated by compile-time
`#if 0` constructs.

Signed-off-by: John Snow <jsnow@redhat.com>
[dwg: Removed some additional now-useless comments]
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-27 13:05:41 +10:00
John Snow
53a04e8e79 ppc: remove unused CPU definitions
Following commit aef77960, remove now-unused definitions from
cpu-models.h.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-27 13:05:41 +10:00
Greg Kurz
30b3bc5aa9 spapr_pci: make index property mandatory
PHBs can be created with an index property, in which case the machine
code automatically sets all the MMIO windows at addresses derived from
the index. Alternatively, they can be manually created without index,
but the user has to provide addresses for all MMIO windows.

The non-index way happens to be more trouble than it's worth: it's
difficult to use, keeps requiring (potentially incompatible) changes
when some new parameter needs adding, and is awkward to check for
collisions. It currently even has a bug that prevents to use two
non-index PHBs because their child DRCs are all derived from the
same index == -1 value, and, thus, collide.

This patch hence makes the index property mandatory. As a consequence,
the PHB's memory regions and BUID are now always configured according
to the index, and it is no longer possible to set them from the command
line.

This DOES BREAK backwards compat, but we don't think the non-index
PHB feature was used in practice (at least libvirt doesn't) and the
simplification is worth it.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-27 13:05:41 +10:00
Mark Cave-Ayland
5abdf67009 macio: convert pmac_ide_ops from old_mmio
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-27 13:05:41 +10:00
Eric Blake
5261158d21 ppc/pnv: Improve macro parenthesization
Although none of the existing macro call-sites were broken,
it's always better to write macros that properly parenthesize
arguments that can be complex expressions, so that the intended
order of operations is not broken.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-27 13:05:41 +10:00
Greg Kurz
332f7721cb spapr: introduce helpers to migrate HPT chunks and the end marker
This consolidates some duplicated code in a dedicated helpers.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-27 13:05:41 +10:00
Greg Kurz
14b0d74887 ppc/kvm: generalize the use of kvmppc_get_htab_fd()
The use of KVM_PPC_GET_HTAB_FD is open-coded in kvmppc_read_hptes()
and kvmppc_write_hpte().

This patch modifies kvmppc_get_htab_fd() so that it can be used
everywhere we need to access the in-kernel htab:
- add an index argument
  => only kvmppc_read_hptes() passes an actual index, all other users
     pass 0
- add an errp argument to propagate error messages to the caller.
  => spapr migration code prints the error
  => hpte helpers pass &error_abort to keep the current behavior
     of hw_error()

While here, this also fixes a bug in kvmppc_write_hpte() so that it
opens the htab fd for writing instead of reading as it currently does.
This never broke anything because we currently never call this code,
as explained in the changelog of commit c138593380:

"This support updating htab managed by the hypervisor. Currently
 we don't have any user for this feature. This actually bring the
 store_hpte interface in-line with the load_hpte one. We may want
 to use this when we want to emulate henter hcall in qemu for HV
 kvm."

The above is still true today.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-27 13:05:41 +10:00
Greg Kurz
82be8e7394 ppc/kvm: change kvmppc_get_htab_fd() to return -errno on error
When kvmppc_get_htab_fd() fails, its return value is propagated up to
qemu_savevm_state_iterate() or to qemu_savevm_state_complete_precopy().
All savevm handlers expect to receive a negative errno on error.

Let's patch kvmppc_get_htab_fd() accordingly.

While here, let's change htab_load() in the spapr code to also
propagate the error, since it doesn't make sense to abort() if
we couldn't get the htab fd from KVM.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-27 13:05:41 +10:00
Benjamin Herrenschmidt
58b6283586 ppc: Fix OpenPIC model
Apple uses an IBM MPIC2A without timers, it has 64 sources.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-27 13:05:41 +10:00
Benjamin Herrenschmidt
4f7265ff17 ppc/ide/macio: Add missing registers
The timing register exists on all variants of MacIO IDE, we just
store and return its value.

The interrupts register only exists on KeyLargo but it doesn't
hurt to have it. The lack of this register causes MacOS X to
hangs under some circumstances.

Both are 32-bit only. The HW might support smaller access sizes
but no known OS uses them.

Because the core IDE subsystem doesn't provide us with a way
to query the main (level) interrupt state, nor do we have a way
to know that DBDMA issued a (edge) interrupt, we reflect both
through a private pair of qirq's in order to maintain the
register state.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-27 13:05:41 +10:00
Benjamin Herrenschmidt
7745388249 ppc/mac: More rework of the DBDMA emulation
This completely reworks the handling of the control register
according to my understanding of the HW and the spec.

It should (hopefully ... still testing) fix a number of issues
most notably cases of MacOS hanging.

Also update dbdma_unassigned_rw() and dbdma_unassigned_flush() to
have the expected behaviour now that flush is handled slightly
differently.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-27 13:05:41 +10:00
Benjamin Herrenschmidt
3c0622897e ppc/mac: Advertise a high clock frequency for NewWorld Macs
We use 900Mhz, otherwise MacOS X 10.5 refuses to install.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-27 13:05:41 +10:00
Mark Cave-Ayland
c8bd35260d ppc: QOMify g3beige machine
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-27 13:05:41 +10:00
BALATON Zoltan
4c46f372b0 ppc4xx: Add more PLB registers
These registers are present in 440 SoCs (and maybe in others too) and
U-Boot accesses them when printing register info. We don't emulate
these but add them to avoid crashing when they are read or written.

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-27 13:05:41 +10:00
BALATON Zoltan
81bb29ace5 ppc: Add 460EX embedded CPU
Despite its name it is a 440 core CPU

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-27 13:05:41 +10:00
BALATON Zoltan
9ffe4ce56b ehci: Add ppc4xx-ehci for the USB 2.0 controller in embedded PPC SoCs
Some PPC SoCs have an EHCI with OHCI companion USB controller. Add a
new type for this similar to types used for other embedded SoCs.

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-27 13:05:41 +10:00
BALATON Zoltan
d7145b66c6 ohci: Allow sysbus version to be used as a companion
Some PPC SoCs have an EHCI with OHCI companion USB controller. To
emulate this allow the sysbus version of OHCI to be used as a companion.

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-27 13:05:41 +10:00
Greg Kurz
712b25c4cb ppc/kvm: drop kvmppc_has_cap_htab_fd()
It never got used since its introduction (commit 7c43bca004).

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-27 13:05:41 +10:00
Greg Kurz
6977afda16 ppc/kvm: check some capabilities with kvm_vm_check_extension()
The following capabilities are VM specific:
- KVM_CAP_PPC_SMT_POSSIBLE
- KVM_CAP_PPC_HTAB_FD
- KVM_CAP_PPC_ALLOC_HTAB

If both KVM HV and KVM PR are present, checking them always return
the HV value, even if we explicitely requested to use PR.

This has no visible effect for KVM_CAP_PPC_ALLOC_HTAB, because we also
try the KVM_PPC_ALLOCATE_HTAB ioctl which is only suppored by HV. As
a consequence, the spapr code doesn't even check KVM_CAP_PPC_HTAB_FD.

However, this will cause kvmppc_hint_smt_possible(), introduced by
commit fa98fbfcdf, to report several VSMT modes (eg, Available
VSMT modes: 8 4 2 1) whereas PR only support mode 1.

This patch fixes all three anyway to use kvm_vm_check_extension(). It
is okay since the VM is already created at the time kvm_arch_init() or
kvmppc_reset_htab() is called.

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-27 13:05:41 +10:00
Peter Maydell
08df7e5577 Merge remote-tracking branch 'remotes/kraxel/tags/fw-20170926-pull-request' into staging
add --firmwarepath to configure

# gpg: Signature made Tue 26 Sep 2017 12:06:07 BST
# gpg:                using RSA key 0x4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"
# Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138

* remotes/kraxel/tags/fw-20170926-pull-request:
  Add --firmwarepath to configure
  add qemu_add_data_dir()

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-09-26 22:07:02 +01:00
Peter Maydell
31bc1d8481 Merge remote-tracking branch 'remotes/mjt/tags/trivial-patches-fetch' into staging
trivial patches for 2017-09-26

# gpg: Signature made Tue 26 Sep 2017 07:13:16 BST
# gpg:                using RSA key 0x701B4F6B1A693E59
# gpg: Good signature from "Michael Tokarev <mjt@tls.msk.ru>"
# gpg:                 aka "Michael Tokarev <mjt@corpit.ru>"
# gpg:                 aka "Michael Tokarev <mjt@debian.org>"
# Primary key fingerprint: 6EE1 95D1 886E 8FFB 810D  4324 457C E0A0 8044 65C5
#      Subkey fingerprint: 7B73 BAD6 8BE7 A2C2 8931  4B22 701B 4F6B 1A69 3E59

* remotes/mjt/tags/trivial-patches-fetch: (29 commits)
  hw/isa/pc87312: Mark the device with user_creatable = false
  Drop gld linker usage on SunOS
  tests/boot-sector: Increase timeout to 600 seconds
  nbd-client: Use correct macro parenthesization
  hw/display/virtio-gpu: Put the virtio-gpu-device into the display category
  osdep: Fix ROUND_UP(64-bit, 32-bit)
  target/xtensa: Use the pre-defined MEMTXATTRS_UNSPECIFIED macro
  trivial: Add missing "-m" parameter in docs/memory-hotplug.txt
  chardev/baum: fix baum that releases brlapi twice
  remove trailing whitespace from qemu-options.hx
  hw/display/xenfb.c: Add trace_xenfb_key_event
  aux-to-i2c-bridge: don't allow user to create one
  util/qemu-thread-posix.c: Replace OS ifdefs with CONFIG_HAVE_SEM_TIMEDWAIT
  MAINTAINERS: update docs/interop/ entries
  MAINTAINERS: update docs/devel/ entries
  MAINTAINERS: add missing Cryptography entry
  MAINTAINERS: add missing entry for Generic Loader
  MAINTAINERS: add missing AIO entry
  MAINTAINERS: add missing entries for throttling infra
  MAINTAINERS: add missing SSI entries
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-09-26 19:49:08 +01:00
Peter Maydell
2509dda283 Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20170925' into staging
BQL bug fix

# gpg: Signature made Mon 25 Sep 2017 23:14:48 BST
# gpg:                using RSA key 0x64DF38E8AF7E215F
# gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A  05C0 64DF 38E8 AF7E 215F

* remotes/rth/tags/pull-tcg-20170925:
  accel/tcg/cputlb: avoid recursive BQL (fixes #1706296)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-09-26 19:08:49 +01:00
Kevin Wolf
b156d51b62 Merge remote-tracking branch 'mreitz/tags/pull-block-2017-09-26' into queue-block
Block patches

# gpg: Signature made Tue Sep 26 15:01:00 2017 CEST
# gpg:                using RSA key F407DB0061D5CF40
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>"
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1  1829 F407 DB00 61D5 CF40

* mreitz/tags/pull-block-2017-09-26:
  block/qcow2-bitmap: fix use of uninitialized pointer
  qemu-iotests: add shrinking image test
  qcow2: add shrink image support
  qcow2: add qcow2_cache_discard
  qemu-img: add --shrink flag for resize

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-09-26 15:03:02 +02:00
Vladimir Sementsov-Ogievskiy
5330f32b71 block/qcow2-bitmap: fix use of uninitialized pointer
Without initialization to zero dirty_bitmap field may be not zero
for a bitmap which should not be stored and
qcow2_store_persistent_dirty_bitmaps will erroneously call
store_bitmap for it which leads to SIGSEGV on bdrv_dirty_bitmap_name.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20170922144353.4220-1-vsementsov@virtuozzo.com
Cc: qemu-stable@nongnu.org
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-09-26 15:00:32 +02:00
Pavel Butsykin
fefac70d2a qemu-iotests: add shrinking image test
Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20170918124230.8152-5-pbutsykin@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-09-26 15:00:32 +02:00
Pavel Butsykin
46b732cdf3 qcow2: add shrink image support
This patch add shrinking of the image file for qcow2. As a result, this allows
us to reduce the virtual image size and free up space on the disk without
copying the image. Image can be fragmented and shrink is done by punching holes
in the image file.

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20170918124230.8152-4-pbutsykin@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-09-26 15:00:32 +02:00
Pavel Butsykin
f71c08ea8e qcow2: add qcow2_cache_discard
Whenever l2/refcount table clusters are discarded from the file we can
automatically drop unnecessary content of the cache tables. This reduces
the chance of eviction useful cache data and eliminates inconsistent data
in the cache with the data in the file.

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20170918124230.8152-3-pbutsykin@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-09-26 15:00:32 +02:00
Pavel Butsykin
4ffca8904a qemu-img: add --shrink flag for resize
The flag is additional precaution against data loss. Perhaps in the future the
operation shrink without this flag will be blocked for all formats, but for now
we need to maintain compatibility with raw.

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20170918124230.8152-2-pbutsykin@virtuozzo.com
[mreitz: Added a missing space to a warning]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2017-09-26 15:00:32 +02:00
Vladimir Sementsov-Ogievskiy
69ff158b67 iotests: fix 181: enable postcopy-ram capability on target
Migration capabilities should be enabled on both source and
destination qemu processes.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-09-26 14:46:23 +02:00
Kevin Wolf
3fb23e0751 qemu-iotests: Test change-backing-file command
This involves a temporary read-write reopen if the backing file link in
the middle of a backing file chain should be changed and is therefore a
good test for the latest bdrv_reopen() vs. op blockers fixes.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-09-26 14:46:23 +02:00
Kevin Wolf
3045025991 block: Fix permissions after bdrv_reopen()
If we switch between read-only and read-write, the permissions that
image format drivers need on bs->file change, too. Make sure to update
the permissions during bdrv_reopen().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2017-09-26 14:46:23 +02:00
Kevin Wolf
1857c97b76 block: reopen: Queue children after their parents
We will calculate the required new permissions in the prepare stage of a
reopen. Required permissions of children can be influenced by the
changes made to their parents, but parents are independent from their
children. This means that permissions need to be calculated top-down. In
order to achieve this, queue parents before their children rather than
queuing the children first.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2017-09-26 14:46:23 +02:00
Kevin Wolf
148eb13c84 block: Base permissions on rw state after reopen
When new permissions are calculated during bdrv_reopen(), they need to
be based on the state of the graph as it will be after the reopen has
completed, not on the current state of the involved nodes.

This patch makes bdrv_is_writable() optionally accept a BlockReopenQueue
from which the new flags are taken. This is then used for determining
the new bs->file permissions of format drivers as soon as we add the
code to actually pass a non-NULL reopen queue to the .bdrv_child_perm
callbacks.

While moving bdrv_is_writable(), make it static. It isn't used outside
block.c.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2017-09-26 14:46:23 +02:00
Kevin Wolf
3121fb45b0 block: Add reopen queue to bdrv_check_perm()
In the context of bdrv_reopen(), we'll have to look at the state of the
graph as it will be after the reopen. This interface addition is in
preparation for the change.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2017-09-26 14:46:23 +02:00
Kevin Wolf
e0995dc3da block: Add reopen_queue to bdrv_child_perm()
In the context of bdrv_reopen(), we'll have to look at the state of the
graph as it will be after the reopen. This interface addition is in
preparation for the change.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2017-09-26 14:46:23 +02:00
Kevin Wolf
f3adefb2ce qemu-io: Drop write permissions before read-only reopen
qemu-io provides a 'reopen' command that allows switching from writable
to read-only access. We need to make sure that we don't try to keep
write permissions to a BlockBackend that becomes read-only, otherwise
things are going to fail.

This requires a bdrv_drain() call because otherwise in-flight AIO
write requests could issue new internal requests while the permission
has already gone away, which would cause assertion failures. Draining
the queue doesn't break AIO requests in any new way, bdrv_reopen() would
drain it anyway only a few lines later.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2017-09-26 14:46:23 +02:00
Thomas Huth
7a6ab45e19 block: Clean up some bad code in the vvfat driver
Remove the unnecessary home-grown redefinition of the assert() macro here,
and remove the unusable debug code at the end of the checkpoint() function.
The code there uses assert() with side-effects (assignment to the "mapping"
variable), which should be avoided. Looking more closely, it seems as it is
apparently also only usable for one certain directory layout (with a file
named USB.H in it) and thus is of no use for the rest of the world.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-09-26 14:46:23 +02:00
Manos Pitsidianakis
43a5dc02fd block/throttle-groups.c: allocate RestartData on the heap
RestartData is the opaque data of the throttle_group_restart_queue_entry
coroutine. By being stack allocated, it isn't available anymore if
aio_co_enter schedules the coroutine with a bottom half and runs after
throttle_group_restart_queue returns.

Cc: qemu-stable@nongnu.org
Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-09-26 14:46:23 +02:00
Alberto Garcia
b5806108d2 throttle: Assert that bkt->max is valid in throttle_compute_wait()
If bkt->max == 0 and bkt->burst_length > 1 then we could have a
division by 0 in throttle_do_compute_wait(). That configuration is
however not permitted and is already detected by throttle_is_valid(),
but let's assert it in throttle_compute_wait() to make it explicit.

Found by Coverity (CID: 1381016).

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-09-26 14:46:23 +02:00
Fam Zheng
93e53fb695 iotests: Print full path of bad output if mismatch
So it is easier to copy paste the path.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-09-26 14:46:23 +02:00
Cornelia Huck
b1149c1a2a iotests: use virtio aliases for 067
The default cpu model on s390x does not provide zPCI, which is
not yet wired up on tcg. Moreover, virtio-ccw is the standard
on s390x.

Using virtio-scsi will implicitly pick the right device, so just
switch to that for simplicity.

Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: QingFeng Hao <haoqf@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-09-26 14:46:23 +02:00
Cornelia Huck
75f02ed53a iotests: use -ccw on s390x for 051
The default cpu model on s390x does not provide zPCI, which is
not yet wired up on tcg. Moreover, virtio-ccw is the standard
on s390x, so use the -ccw instead of the -pci versions of virtio
devices on s390x.

Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: QingFeng Hao <haoqf@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-09-26 14:46:23 +02:00
Cornelia Huck
f1d5516ab5 iotests: use -ccw on s390x for 040, 139, and 182
The default cpu model on s390x does not provide zPCI, which is
not yet wired up on tcg. Moreover, virtio-ccw is the standard
on s390x, so use the -ccw instead of the -pci versions of virtio
devices on s390x.

Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: QingFeng Hao <haoqf@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-09-26 14:46:23 +02:00
Stefan Hajnoczi
78aa8aa019 docs: add qemu-block-drivers(7) man page
Block driver documentation is available in qemu-doc.html.  It would be
convenient to have documentation for formats, protocols, and filter
drivers in a man page.

Extract the relevant part of qemu-doc.html into a new file called
docs/qemu-block-drivers.texi.  This file can also be built as a
stand-alone document (man, html, etc).

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-09-26 14:46:23 +02:00
Fam Zheng
97ec9117c3 file-posix: Clear out first sector in hdev_create
People get surprised when, after "qemu-img create -f raw /dev/sdX", they
still see qcow2 with "qemu-img info", if previously the bdev had a qcow2
header. While this is natural because raw doesn't need to write any
magic bytes during creation, hdev_create is free to clear out the first
sector to make sure the stale qcow2 header doesn't cause such confusion.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-09-26 14:46:23 +02:00
Fam Zheng
a16efd5340 qemu-img: Clarify about relative backing file options
It's not too surprising when a user specifies the backing file relative
to the current working directory instead of the top layer image. This
causes error when they differ. Though the error message has enough
information to infer the fact about the misunderstanding, it is better
if we document this explicitly, so that users don't have to learn from
mistakes.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-09-26 14:46:23 +02:00
Kevin Wolf
05b4cd5d3c qemu-iotests: Add missing -machine accel=qtest
A basic set of qemu options is initialised in ./common:

    export QEMU_OPTIONS="-nodefaults -machine accel=qtest"

However, two test cases (172 and 186) overwrite QEMU_OPTIONS and neglect
to manually set '-machine accel=qtest'. Add the missing option for 172.
186 probably only copied the code from 172, it doesn't actually need to
overwrite QEMU_OPTIONS, so remove that in 186.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Tested-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
2017-09-26 14:46:23 +02:00
Gerd Hoffmann
3d5eecab4a Add --firmwarepath to configure
Add a firmware path config option to configure.  Multiple directories
are accepted, with the usual colon as separator.  Default value is
${prefix}/share/qemu-firmware.  The path is searched in addition to the
current search path (typically ${prefix}/share/qemu).

This prepares qemu for the planned split of the prebuilt firmware blobs
into a separate project.

Distributions can also use this to get rid of the firmware symlink farm
and add -- for example -- /usr/share/seabios to the firmware path
instead.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170914114236.25343-3-kraxel@redhat.com
2017-09-26 13:05:32 +02:00
Gerd Hoffmann
2a1cce9058 add qemu_add_data_dir()
Add helper function to add a directory to the qemu search path, so we
don't duplicate the checks.  Add a check for duplicate entries, so we
stop trying to open files twice.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170914114236.25343-2-kraxel@redhat.com
2017-09-26 13:05:28 +02:00
Thomas Huth
35deebb232 hw/isa/pc87312: Mark the device with user_creatable = false
QEMU currently aborts if you try to use the device at the command
line:

$ ppc64-softmmu/qemu-system-ppc64 -S -machine prep -device pc87312
Unexpected error in qemu_chr_fe_init() at chardev/char-fe.c:222:
qemu-system-ppc64: -device pc87312: Device 'parallel0' is in use
Aborted (core dumped)

It uses parallel_hds in its realize function, so I can not be
instantiated by the user again.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Hervé Poussineau <hpoussin@reactos.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-09-26 09:11:23 +03:00
Kamil Rytarowski
a9b16ab368 Drop gld linker usage on SunOS
This is required to be removed on SmartOS (Illumos).
As of now there are no alternative supported SunOS distributions.

Signed-off-by: Kamil Rytarowski <n54@gmx.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-09-26 09:11:23 +03:00
Thomas Huth
0789700019 tests/boot-sector: Increase timeout to 600 seconds
If QEMU has been compiled with the flags --enable-tcg-interpreter and
--enable-debug, the guest is running incredibly slow. The pxe boot test
can take up to 400 seconds when testing the pseries ppc64 machine. While
we should still look for ways to speed up the test on the pseries machine,
it's better to increase the timeout in this test to 600 seconds anyway to
allow the test to pass successfully now with this unusal configuration
already.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-09-26 09:11:23 +03:00
Eric Blake
af5eeb2c3b nbd-client: Use correct macro parenthesization
If 'bs' is a complex expression, we were only casting the front half
rather than the full expression.  Luckily, none of the callers were
passing bad arguments, but it's better to be robust up front.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-09-26 09:11:22 +03:00
Thomas Huth
e837acfda1 hw/display/virtio-gpu: Put the virtio-gpu-device into the display category
The virtio-gpu-pci device is already in the display category, so the
virtio-gpu-device should be there, too.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-09-26 09:11:22 +03:00
Eric Blake
2098b073f3 osdep: Fix ROUND_UP(64-bit, 32-bit)
When using bit-wise operations that exploit the power-of-two
nature of the second argument of ROUND_UP(), we still need to
ensure that the mask is as wide as the first argument (done
by using a ternary to force proper arithmetic promotion).
Unpatched, ROUND_UP(2ULL*1024*1024*1024*1024, 512U) produces 0,
instead of the intended 2TiB, because negation of an unsigned
32-bit quantity followed by widening to 64-bits does not
sign-extend the mask.

Broken since its introduction in commit 292c8e50 (v1.5.0).
Callers that passed the same width type to both macro parameters,
or that had other code to ensure the first parameter's maximum
runtime value did not exceed the second parameter's width, are
unaffected, but I did not audit to see which (if any) existing
clients of the macro could trigger incorrect behavior (I found
the bug while adding a new use of the macro).

While preparing the patch, checkpatch complained about poor
spacing, so I also fixed that here and in the nearby DIV_ROUND_UP.

CC: qemu-trivial@nongnu.org
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-09-26 09:11:22 +03:00
Alistair Francis
2c5b1d2a47 target/xtensa: Use the pre-defined MEMTXATTRS_UNSPECIFIED macro
Instead of using the hardcoded (MemTxAttrs){0} for no memory attributes
let's use the already defined MEMTXATTRS_UNSPECIFIED macro instead.

This is technically a change of behaviour as MEMTXATTRS_UNSPECIFIED sets
the unspecified field to 1, but it doesn't look like anything is
checking this field.

Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-09-26 09:11:22 +03:00
Thomas Huth
77fc026cdf trivial: Add missing "-m" parameter in docs/memory-hotplug.txt
The example obviously lacks the "-m" parameter.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-09-26 09:11:22 +03:00
Liang Yan
98e8790326 chardev/baum: fix baum that releases brlapi twice
Error process of baum_chr_open needs to set brlapi null, so it won't
get released twice in char_braille_finalize, which will cause
"/usr/bin/qemu-system-x86_64: double free or corruption (!prev)"

Signed-off-by: Liang Yan <lyan@suse.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-09-26 09:11:22 +03:00
Michael Tokarev
a295d244e5 remove trailing whitespace from qemu-options.hx
Remove trailing whitespace in qemu-options documentation, as it causes
reproducibility issues depending on the echo implementation used by
the Makefile.

Reported-By: Vagrant Cascadian <vagrant@debian.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-09-26 09:06:02 +03:00
Liang Yan
6ec83befe1 hw/display/xenfb.c: Add trace_xenfb_key_event
It may be better to add a trace event to monitor the last moment of
a key event from QEMU to guest VM

Signed-off-by: Liang Yan <lyan@suse.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-09-26 09:06:02 +03:00
KONRAD Frederic
b9710bc911 aux-to-i2c-bridge: don't allow user to create one
This device is private and is created once per aux-bus.
So don't allow the user to create one from command-line.

Reported-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: KONRAD Frederic <frederic.konrad@adacore.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-09-26 09:06:02 +03:00
Peter Maydell
401bc051d7 util/qemu-thread-posix.c: Replace OS ifdefs with CONFIG_HAVE_SEM_TIMEDWAIT
In qemu-thread-posix.c we have two implementations of the
various qemu_sem_* functions, one of which uses native POSIX
sem_* and the other of which emulates them with pthread conditions.
This is necessary because not all our host OSes support
sem_timedwait().

Instead of a hard-coded list of OSes which don't implement
sem_timedwait(), which gets out of date, make configure
test for the presence of the function and set a new
CONFIG_HAVE_SEM_TIMEDWAIT appropriately.

In particular, newer NetBSDs have sem_timedwait(), so this
commit will switch them over to using it. OSX still does
not have an implementation.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Kamil Rytarowski <n54@gmx.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-09-26 09:06:02 +03:00
Philippe Mathieu-Daudé
5746c1cd15 MAINTAINERS: update docs/interop/ entries
moved in commit 7746cf8aab

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Acked-by: Fam Zheng <famz@redhat.com>
Acked-by: John Snow <jsnow@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-09-26 09:06:02 +03:00
Philippe Mathieu-Daudé
c39cdbf6f6 MAINTAINERS: update docs/devel/ entries
moved in commit ac06724a71

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-09-26 09:06:02 +03:00
Philippe Mathieu-Daudé
3947ecfc0a MAINTAINERS: add missing Cryptography entry
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Acked-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-09-26 09:06:02 +03:00
Philippe Mathieu-Daudé
c5e2ac7e5e MAINTAINERS: add missing entry for Generic Loader
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-09-26 09:06:02 +03:00
Philippe Mathieu-Daudé
0a4f9ad1eb MAINTAINERS: add missing AIO entry
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Acked-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-09-26 09:06:02 +03:00
Philippe Mathieu-Daudé
8960393847 MAINTAINERS: add missing entries for throttling infra
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-09-26 09:06:02 +03:00
Philippe Mathieu-Daudé
982d009a18 MAINTAINERS: add missing SSI entries
Alistair Francis volunteered :)

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-09-26 09:06:02 +03:00
Philippe Mathieu-Daudé
68179923a1 MAINTAINERS: add missing PCI entries
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-09-26 09:06:02 +03:00
Philippe Mathieu-Daudé
b24f9882cc MAINTAINERS: add missing qcow2 entry
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-09-26 09:06:02 +03:00
Philippe Mathieu-Daudé
ab7f9f7d78 MAINTAINERS: add missing Guest Agent entries
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-09-26 09:06:02 +03:00
Philippe Mathieu-Daudé
5a49c1b34e MAINTAINERS: add missing VMWare entry
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-09-26 09:06:02 +03:00
Philippe Mathieu-Daudé
37f8043def MAINTAINERS: add missing entry for vhost
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-09-26 09:06:02 +03:00
Philippe Mathieu-Daudé
0e0d345b4f MAINTAINERS: add missing STM32 entry
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Alistair Francis <alistair@alistair23.me>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-09-26 09:06:02 +03:00
Philippe Mathieu-Daudé
c6427ff7a0 MAINTAINERS: add missing ARM entries
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-09-26 09:06:02 +03:00
Kamil Rytarowski
39d96847c9 Replace round_page() with TARGET_PAGE_ALIGN()
This change fixes conflict with the DragonFly BSD headers.

Signed-off-by: Kamil Rytarowski <n54@gmx.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-09-26 09:06:02 +03:00
Stefan Weil
0f9f39d491 configure: Remove unused code (found by shellcheck)
smartcard_cflags is no longer needed since commit
0b22ef0f57.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2017-09-26 09:06:02 +03:00
Peter Maydell
2b521a654c Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2017-09-25' into staging
nbd patches for 2017-09-25

- Eric Blake: nbd-client: Use correct macro parenthesization
- Vladimir Sementsov-Ogievskiy: 0/3 nbd client refactoring and fixing

# gpg: Signature made Mon 25 Sep 2017 14:39:21 BST
# gpg:                using RSA key 0xA7A16B4A2527436A
# gpg: Good signature from "Eric Blake <eblake@redhat.com>"
# gpg:                 aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>"
# gpg:                 aka "[jpeg image of size 6874]"
# Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2  F3AA A7A1 6B4A 2527 436A

* remotes/ericb/tags/pull-nbd-2017-09-25:
  block/nbd-client: nbd_co_send_request: fix return code
  block/nbd-client: simplify check in nbd_co_receive_reply
  block/nbd-client: refactor nbd_co_receive_reply
  nbd-client: Use correct macro parenthesization

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-09-26 00:24:15 +01:00
Peter Maydell
1e3ee83408 Merge remote-tracking branch 'remotes/thibault/tags/samuel-thibault' into staging
slirp updates

# gpg: Signature made Sun 24 Sep 2017 19:07:51 BST
# gpg:                using RSA key 0x9E511E01C737F075
# gpg: Good signature from "Samuel Thibault <samuel.thibault@aquilenet.fr>"
# gpg:                 aka "Samuel Thibault <sthibault@debian.org>"
# gpg:                 aka "Samuel Thibault <samuel.thibault@gnu.org>"
# gpg:                 aka "Samuel Thibault <samuel.thibault@inria.fr>"
# gpg:                 aka "Samuel Thibault <samuel.thibault@labri.fr>"
# gpg:                 aka "Samuel Thibault <samuel.thibault@ens-lyon.org>"
# gpg:                 aka "Samuel Thibault <samuel.thibault@u-bordeaux.fr>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 900C B024 B679 31D4 0F82  304B D017 8C76 7D06 9EE6
#      Subkey fingerprint: 9A37 3D36 64A8 DC62 DA0A  34FD 9E51 1E01 C737 F075

* remotes/thibault/tags/samuel-thibault:
  slirp: Add a special case for the NULL socket
  slirp: Fix intermittent send queue hangs on a socket
  slirp: Add explanation for hostfwd parsing failure

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-09-25 20:31:24 +01:00
Alex Bennée
8b81253332 accel/tcg/cputlb: avoid recursive BQL (fixes #1706296)
The mmio path (see exec.c:prepare_mmio_access) already protects itself
against recursive locking and it makes sense to do the same for
io_readx/writex. Otherwise any helper running in the BQL context will
assert when it attempts to write to device memory as in the case of
the bug report.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
CC: Richard Jones <rjones@redhat.com>
CC: Paolo Bonzini <bonzini@gnu.org>
CC: qemu-stable@nongnu.org
Message-Id: <20170921110625.9500-1-alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-09-25 11:23:30 -07:00
Vladimir Sementsov-Ogievskiy
a693437037 block/nbd-client: nbd_co_send_request: fix return code
It's incorrect to return success rc >= 0 if we skip qio_channel_writev_all()
call due to s->quit.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170920124507.18841-4-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2017-09-25 08:21:26 -05:00
Vladimir Sementsov-Ogievskiy
9397067221 block/nbd-client: simplify check in nbd_co_receive_reply
If we are woken up from while() loop in nbd_read_reply_entry
handles must be equal. If we are woken up from
nbd_recv_coroutines_wake_all s->quit must be true, so we do
not need checking handles equality.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170920124507.18841-3-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2017-09-25 08:21:26 -05:00
Vladimir Sementsov-Ogievskiy
319a56cde7 block/nbd-client: refactor nbd_co_receive_reply
"NBDReply *reply" parameter of nbd_co_receive_reply is used only
to pass return value for nbd_co_request (reply.error). Remove it
and use function return value instead.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170920124507.18841-2-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2017-09-25 08:21:25 -05:00
Eric Blake
cfa3ad635c nbd-client: Use correct macro parenthesization
If 'bs' is a complex expression, we were only casting the front half
rather than the full expression.  Luckily, none of the callers were
passing bad arguments, but it's better to be robust up front.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170918214649.17550-1-eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-09-25 08:21:25 -05:00
Kevin Cernekee
13146a8395 slirp: Add a special case for the NULL socket
NULL sockets are used for NDP, BOOTP, and other critical operations.
If the topmost mbuf in a NULL session is blocked pending resolution,
it may cause problems if it blocks other packets with a NULL socket.
So do not add mbufs with a NULL socket field to the same session.

Signed-off-by: Kevin Cernekee <cernekee@chromium.org>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
2017-09-24 20:04:09 +02:00
Kevin Cernekee
e2aad34d73 slirp: Fix intermittent send queue hangs on a socket
if_output() originally sent one mbuf per call and used the slirp->next_m
variable to keep track of where it left off.  But nowadays it tries to
send all of the mbufs from the fastq, and one mbuf from each session on
the batchq.  The next_m variable is both redundant and harmful: there is
a case[0] involving delayed packets in which next_m ends up pointing
to &slirp->if_batchq when an active session still exists, and this
blocks all traffic for that session until qemu is restarted.

The test case was created to reproduce a problem that was seen on
long-running Chromium OS VM tests[1] which rapidly create and
destroy ssh connections through hostfwd.

[0] https://pastebin.com/NNy6LreF
[1] https://bugs.chromium.org/p/chromium/issues/detail?id=766323

Signed-off-by: Kevin Cernekee <cernekee@chromium.org>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
2017-09-24 20:04:09 +02:00
Dr. David Alan Gilbert
0e7e4fb0a6 slirp: Add explanation for hostfwd parsing failure
e.g.
./x86_64-softmmu/qemu-system-x86_64 -nographic -netdev 'user,id=vnet,hostfwd=:555.0.0.0:0-:22'
qemu-system-x86_64: -netdev user,id=vnet,hostfwd=:555.0.0.0:0-:22: Invalid host forwarding rule ':555.0.0.0:0-:22' (Bad host address)

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
2017-09-24 20:04:09 +02:00
Peter Maydell
460b6c8e58 Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
* Speed up AddressSpaceDispatch creation (Alexey)
* Fix kvm.c assert (David)
* Memory fixes and further speedup (me)
* Persistent reservation manager infrastructure (me)
* virtio-serial: add enable_backend callback (Pavel)
* chardev GMainContext fixes (Peter)

# gpg: Signature made Fri 22 Sep 2017 20:07:33 BST
# gpg:                using RSA key 0xBFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream: (32 commits)
  chardev: remove context in chr_update_read_handler
  chardev: use per-dev context for io_add_watch_poll
  chardev: add Chardev.gcontext field
  chardev: new qemu_chr_be_update_read_handlers()
  scsi: add persistent reservation manager using qemu-pr-helper
  scsi: add multipath support to qemu-pr-helper
  scsi: build qemu-pr-helper
  scsi, file-posix: add support for persistent reservation management
  memory: Share special empty FlatView
  memory: seek FlatView sharing candidates among children subregions
  memory: trace FlatView creation and destruction
  memory: Create FlatView directly
  memory: Get rid of address_space_init_shareable
  memory: Rework "info mtree" to print flat views and dispatch trees
  memory: Do not allocate FlatView in address_space_init
  memory: Share FlatView's and dispatch trees between address spaces
  memory: Move address_space_update_ioeventfds
  memory: Alloc dispatch tree where topology is generared
  memory: Store physical root MR in FlatView
  memory: Rename mem_begin/mem_commit/mem_add helpers
  ...

# Conflicts:
#	configure
2017-09-23 12:55:40 +01:00
Peter Xu
bb86d05f4a chardev: remove context in chr_update_read_handler
We had a per-chardev cache for context, then we don't need this
parameter to be passed in every time when chr_update_read_handler()
called.  As long as we are calling chr_update_read_handler() using
qemu_chr_be_update_read_handlers() we'll be fine.

Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1505975754-21555-5-git-send-email-peterx@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-22 21:07:27 +02:00
Peter Xu
6bbb6c0644 chardev: use per-dev context for io_add_watch_poll
It was only passed in by chr_update_read_handlers().  However when
reconnect, we'll lose that context information.  So if a chardev was
running on another context (rather than the default context, the NULL
pointer), it'll switch back to the default context if reconnection
happens.  But, it should really stick to the old context.

Convert all the callers of io_add_watch_poll() to use the internally
cached gcontext.  Then the context should be able to survive even after
reconnections.

Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1505975754-21555-4-git-send-email-peterx@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-22 21:07:27 +02:00
Peter Xu
95eeeba669 chardev: add Chardev.gcontext field
It caches the gcontext that is used to poll the chardev IO.  Before this
patch, we only passed it in via chr_update_read_handlers().  However
that may not be enough if the char backend is disconnected and
reconnected afterward.  There are chardev codes that still assumed the
context be NULL (which is the main context).  Will fix that up in
following up patches.

Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1505975754-21555-3-git-send-email-peterx@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-22 21:07:27 +02:00
Peter Xu
07241c205c chardev: new qemu_chr_be_update_read_handlers()
Add a wrapper for the chr_update_read_handler().

Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1505975754-21555-2-git-send-email-peterx@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-22 21:07:27 +02:00
Paolo Bonzini
9bad2a6b9d scsi: add persistent reservation manager using qemu-pr-helper
This adds a concrete subclass of pr-manager that talks to qemu-pr-helper.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-22 21:07:27 +02:00
Paolo Bonzini
fe8fc5ae5c scsi: add multipath support to qemu-pr-helper
Proper support of persistent reservation for multipath devices requires
communication with the multipath daemon, so that the reservation is
registered and applied when a path comes up.  The device mapper
utilities provide a library to do so; this patch makes qemu-pr-helper.c
detect multipath devices and, when one is found, delegate the operation
to libmpathpersist.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-22 21:07:27 +02:00
Paolo Bonzini
b855f8d175 scsi: build qemu-pr-helper
Introduce a privileged helper to run persistent reservation commands.
This lets virtual machines send persistent reservations without using
CAP_SYS_RAWIO or out-of-tree patches.  The helper uses Unix permissions
and SCM_RIGHTS to restrict access to processes that can access its socket
and prove that they have an open file descriptor for a raw SCSI device.

The next patch will also correct the usage of persistent reservations
with multipath devices.

It would also be possible to support for Linux's IOC_PR_* ioctls in
the future, to support NVMe devices.  For now, however, only SCSI is
supported.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-22 21:07:24 +02:00
Peter Maydell
c348b54ab5 Merge remote-tracking branch 'remotes/ehabkost/tags/python-next-pull-request' into staging
Python queue, 2017-09-22

* MAINTAINERS update
* Fix logging issue on test scripts using qemu.py

# gpg: Signature made Fri 22 Sep 2017 15:41:43 BST
# gpg:                using RSA key 0x2807936F984DC5A6
# gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>"
# Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF  D1AA 2807 936F 984D C5A6

* remotes/ehabkost/tags/python-next-pull-request:
  MAINTAINERS: Add Python scripts
  qemu.py: Call logging.basicConfig() automatically

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-09-22 16:15:23 +01:00
Peter Maydell
bef81b3eb5 Merge remote-tracking branch 'remotes/juanquintela/tags/migration/20170922-1' into staging
migration/next for 20170922

# gpg: Signature made Fri 22 Sep 2017 13:15:06 BST
# gpg:                using RSA key 0xF487EF185872D723
# gpg: Good signature from "Juan Quintela <quintela@redhat.com>"
# gpg:                 aka "Juan Quintela <quintela@trasno.org>"
# Primary key fingerprint: 1899 FF8E DEBF 58CC EE03  4B82 F487 EF18 5872 D723

* remotes/juanquintela/tags/migration/20170922-1:
  migration: split ufd_version_check onto receive/request features part
  migration: fix hardcoded function name in error report
  migration: pass MigrationIncomingState* into migration check functions
  migration: split common postcopy out of ram postcopy
  migration: fix ram_save_pending
  migration: add has_postcopy savevm handler
  bitmap: provide to_le/from_le helpers
  bitmap: introduce bitmap_count_one()
  bitmap: remove BITOP_WORD()
  migration: Split migration_fd_process_incoming
  migration: Create multifd migration threads
  migration: Create x-multifd-page-count parameter
  migration: Create x-multifd-channels parameter
  migration: Add multifd capability
  migration: Create migration_has_all_channels
  migration: Add comments to channel functions
  migration: Teach it about G_SOURCE_REMOVE
  migration: Create migration_ioc_process_incoming()

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-09-22 14:04:10 +01:00
John Snow
159a9df021 ide: fix enum comparison for gcc 4.7
Apparently GCC gets bent over comparing enum values against zero.
Replace the conditional with something less readable.

Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20170921013821.1673-1-jsnow@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-09-22 13:23:53 +01:00
Alexey Perevalov
54ae0886b1 migration: split ufd_version_check onto receive/request features part
This modification is necessary for userfault fd features which are
required to be requested from userspace.
UFFD_FEATURE_THREAD_ID is a one of such "on demand" feature, which will
be introduced in the next patch.

QEMU have to use separate userfault file descriptor, due to
userfault context has internal state, and after first call of
ioctl UFFD_API it changes its state to UFFD_STATE_RUNNING (in case of
success), but kernel while handling ioctl UFFD_API expects UFFD_STATE_WAIT_API.
So only one ioctl with UFFD_API is possible per ufd.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-09-22 14:11:29 +02:00
Alexey Perevalov
5553499f04 migration: fix hardcoded function name in error report
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-09-22 14:11:28 +02:00
Alexey Perevalov
d7651f150d migration: pass MigrationIncomingState* into migration check functions
That tiny refactoring is necessary to be able to set
UFFD_FEATURE_THREAD_ID while requesting features, and then
to create downtime context in case when kernel supports it.

Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-09-22 14:11:27 +02:00
Vladimir Sementsov-Ogievskiy
58110f0acb migration: split common postcopy out of ram postcopy
Split common postcopy staff from ram postcopy staff.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-09-22 14:11:27 +02:00
Vladimir Sementsov-Ogievskiy
86e1167e9a migration: fix ram_save_pending
Fill postcopy-able pending only if ram postcopy is enabled.
It is necessary because of there will be other postcopy-able states and
when ram postcopy is disabled, it should not spoil common postcopy
related pending.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-09-22 14:11:26 +02:00
Vladimir Sementsov-Ogievskiy
c646762736 migration: add has_postcopy savevm handler
Now postcopy-able states are recognized by not NULL
save_live_complete_postcopy handler. But when we have several different
postcopy-able states, it is not convenient. Ram postcopy may be
disabled, while some other postcopy enabled, in this case Ram state
should behave as it is not postcopy-able.

This patch add separate has_postcopy handler to specify behaviour of
savevm state.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-09-22 14:11:25 +02:00
Peter Xu
d7788151a0 bitmap: provide to_le/from_le helpers
Provide helpers to convert bitmaps to little endian format. It can be
used when we want to send one bitmap via network to some other hosts.

One thing to mention is that, these helpers only solve the problem of
endianess, but it does not solve the problem of different word size on
machines (the bitmaps managing same count of bits may contains different
size when malloced). So we need to take care of the size alignment issue
on the callers for now.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-09-22 14:11:25 +02:00
Peter Xu
fc7deeea26 bitmap: introduce bitmap_count_one()
Count how many bits set in the bitmap.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-09-22 14:11:24 +02:00
Peter Xu
ab089e058e bitmap: remove BITOP_WORD()
We have BIT_WORD(). It's the same.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-09-22 14:11:23 +02:00
Juan Quintela
e595a01ab6 migration: Split migration_fd_process_incoming
We need that on later patches.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
2017-09-22 14:11:23 +02:00
Juan Quintela
f986c3d256 migration: Create multifd migration threads
Creation of the threads, nothing inside yet.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

--

Use pointers instead of long array names
Move to use semaphores instead of conditions as paolo suggestion

Put all the state inside one struct.
Use a counter for the number of threads created.  Needed during cancellation.

Add error return to thread creation

Add id field

Rename functions to multifd_save/load_setup/cleanup
Change recv parameters to a pointer to struct
Change back to a struct
Use Error * for _cleanup
2017-09-22 14:11:22 +02:00
Juan Quintela
0fb86605ea migration: Create x-multifd-page-count parameter
Indicates how many pages we are going to send in each batch to a multifd
thread.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>

--

Be consistent with defaults and documentation
Use new DEFINE_PROP_*
Rename x-multifd-group to x-multifd-page-count
2017-09-22 14:11:21 +02:00
Juan Quintela
4075fb1ca4 migration: Create x-multifd-channels parameter
Indicates the number of channels that we will create.  By default we
create 2 channels.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>

--

Catch inconsistent defaults (eric).
Improve comment stating that number of threads is the same than number
of sockets
Use new DEFIN_PROP_*
Rename x-multifd-threads to x-multifd-threads
2017-09-22 14:11:21 +02:00
Juan Quintela
30126bbf1f migration: Add multifd capability
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>

--

Use new DEFINE_PROP
2017-09-22 14:11:20 +02:00
Juan Quintela
428d89084c migration: Create migration_has_all_channels
This function allows us to decide when to close the listener socket.
For now, we only need one connection.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
2017-09-22 14:11:19 +02:00
Juan Quintela
8e1a1931ca migration: Add comments to channel functions
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
2017-09-22 14:11:18 +02:00
Juan Quintela
2a543bfdfa migration: Teach it about G_SOURCE_REMOVE
As this is defined on glib 2.32, add compatibility macros for older glibs.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
2017-09-22 14:11:18 +02:00
Juan Quintela
4f0fae7f2b migration: Create migration_ioc_process_incoming()
We pass the ioc instead of the fd.  This will allow us to have more
than one channel open.  We also make sure that we set the
from_src_file sooner, so we don't need to pass it as a parameter.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>

--

Do not assing mis->from_src_file (peterxu)
2017-09-22 14:11:17 +02:00
Peter Maydell
a664607440 Merge remote-tracking branch 'remotes/famz/tags/build-and-test-automation-pull-request' into staging
# gpg: Signature made Fri 22 Sep 2017 08:28:38 BST
# gpg:                using RSA key 0xCA35624C6A9171C6
# gpg: Good signature from "Fam Zheng <famz@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 5003 7CB7 9706 0F76 F021  AD56 CA35 624C 6A91 71C6

* remotes/famz/tags/build-and-test-automation-pull-request: (36 commits)
  docker: Drop 'set -e' from run script
  docker: Use archive-source.py
  tests: Add README for vm tests
  MAINTAINERS: Add tests/vm entry
  Makefile: Add rules to run vm tests
  tests: Add OpenBSD image
  tests: Add NetBSD image
  tests: Add FreeBSD image
  tests: Add ubuntu.i386 image
  tests: Add vm test lib
  tests: Add a test key pair
  scripts: Add archive-source.sh
  qemu.py: Add "wait()" method
  gitignore: Ignore vm test images
  MAINTAINERS: Fix subsystem name for "Build and test automation"
  buildsys: Move rdma libs to per object
  buildsys: Move brlapi libs to per object
  buildsys: Move usb redir cflags/libs to per object
  buildsys: Move libusb cflags/libs to per object
  buildsys: Move libcacard cflags/libs to per object
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-09-22 12:14:28 +01:00
Peter Maydell
3aaa8d4499 Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20170922' into staging
Fix an s390x migration breakage up for 2.10 stable.
This will be fixed properly for 2.11.

# gpg: Signature made Fri 22 Sep 2017 08:28:22 BST
# gpg:                using RSA key 0xDECF6B93C6F02FAF
# gpg: Good signature from "Cornelia Huck <conny@cornelia-huck.de>"
# gpg:                 aka "Cornelia Huck <huckc@linux.vnet.ibm.com>"
# gpg:                 aka "Cornelia Huck <cornelia.huck@de.ibm.com>"
# gpg:                 aka "Cornelia Huck <cohuck@kernel.org>"
# gpg:                 aka "Cornelia Huck <cohuck@redhat.com>"
# Primary key fingerprint: C3D0 D66D C362 4FF6 A8C0  18CE DECF 6B93 C6F0 2FAF

* remotes/cohuck/tags/s390x-20170922:
  s390x/ais: for 2.10 stable: disable ais facility

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-09-22 10:55:55 +01:00
Fam Zheng
a43415ebfd seccomp: Don't include libseccomp from QEMU header
The only prototype doesn't need anything from the lib header, and not
including it here allows files that include this header, for example
vl.c, to compile without the libseccomp cflags.

The breakage is since c3883e1f93 for environments where `pkg-config
--cflags libseccomp" is non-empty.

Reported-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Acked-by: Eduardo Otubo <otubo@redhat.com>
Message-id: 20170920083647.14599-1-famz@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-09-22 09:48:33 +01:00
Christian Borntraeger
3f2d07b3b0 s390x/ais: for 2.10 stable: disable ais facility
The migration interface for ais was introduced with kernel 4.13
but the capability itself had been active since 4.12. As migration
support is considered necessary lets disable ais in the 2.10
stable version. A proper fix and re-enablement will be done
for qemu 2.11.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Message-Id: <20170921140834.14233-2-borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2017-09-22 09:25:21 +02:00
Fam Zheng
4f6afe41f2 docker: Drop 'set -e' from run script
Signed-off-by: Fam Zheng <famz@redhat.com>
2017-09-22 14:51:43 +08:00
Fam Zheng
b7f404201e docker: Use archive-source.py
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
2017-09-22 14:51:43 +08:00
Fam Zheng
d72c55c3a5 tests: Add README for vm tests
Signed-off-by: Fam Zheng <famz@redhat.com>
2017-09-22 14:51:43 +08:00
Fam Zheng
18023821b6 MAINTAINERS: Add tests/vm entry
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-09-22 14:51:42 +08:00
Fam Zheng
b1fb9a63fc Makefile: Add rules to run vm tests
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Fam Zheng <famz@redhat.com>
2017-09-22 14:51:42 +08:00
Fam Zheng
fdfaa33291 tests: Add OpenBSD image
The image is prepared following instructions as in:

https://wiki.qemu.org/Hosts/BSD

Signed-off-by: Fam Zheng <famz@redhat.com>
2017-09-22 14:51:42 +08:00
Fam Zheng
5cd2b13851 tests: Add NetBSD image
The image is prepared following instructions as in:

https://wiki.qemu.org/Hosts/BSD

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Kamil Rytarowski <n54@gmx.com>
2017-09-22 14:51:42 +08:00
Fam Zheng
111e30c0c4 tests: Add FreeBSD image
The image is prepared following instructions as in:

https://wiki.qemu.org/Hosts/BSD

Signed-off-by: Fam Zheng <famz@redhat.com>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2017-09-22 14:51:35 +08:00
Fam Zheng
fb15a57032 tests: Add ubuntu.i386 image
This adds a 32bit guest.

The official LTS cloud image is downloaded and initialized with
cloud-init.

Signed-off-by: Fam Zheng <famz@redhat.com>
2017-09-22 10:46:25 +08:00
Fam Zheng
ff2ebff079 tests: Add vm test lib
This is the common code to implement a "VM test" to

  1) Download and initialize a pre-defined VM that has necessary
  dependencies to build QEMU and SSH access.

  2) Archive $SRC_PATH to a .tar file.

  3) Boot the VM, and pass the source tar file to the guest.

  4) SSH into the VM, untar the source tarball, build from the source.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
2017-09-22 10:46:25 +08:00
Fam Zheng
57446e32ac tests: Add a test key pair
This will be used by setup test user ssh.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2017-09-22 10:46:25 +08:00
Fam Zheng
6b560c76ca scripts: Add archive-source.sh
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2017-09-22 10:46:25 +08:00
Fam Zheng
22491a2f2e qemu.py: Add "wait()" method
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
2017-09-22 10:46:25 +08:00
Fam Zheng
b8bd2f598b gitignore: Ignore vm test images
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2017-09-22 10:46:25 +08:00
Eduardo Habkost
0475a03eb8 MAINTAINERS: Fix subsystem name for "Build and test automation"
The subsystem name for the "Build test automation" section is
"-------------------------", because an actual subsystem name
line is missing:

  $ ./scripts/get_maintainer.pl -f tests/docker/docker.py
  "Alex Bennée" <alex.bennee@linaro.org> (maintainer:-----------------...)
  Fam Zheng <famz@redhat.com> (maintainer:-----------------...)
  "Philippe Mathieu-Daudé" <f4bug@amsat.org> (reviewer:-----------------...)
  qemu-devel@nongnu.org (open list:-----------------...)

Fix the issue by inserting a subsystem name line where
get_maintainer.pl expects it.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20170921170209.9101-1-ehabkost@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2017-09-22 10:20:47 +08:00
Fam Zheng
392fb64351 buildsys: Move rdma libs to per object
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20170907084230.26493-1-famz@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2017-09-22 10:20:34 +08:00
Fam Zheng
8eca288989 buildsys: Move brlapi libs to per object
baum.o already receives the sdl cflags in its per object variable, do
the same for brlapi libs to avoid cluttering libs_softmmu.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20170907084700.952-1-famz@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2017-09-22 10:20:34 +08:00
Fam Zheng
cc7923fc07 buildsys: Move usb redir cflags/libs to per object
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20170907082918.7299-10-famz@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Fam Zheng <famz@redhat.com>
2017-09-22 10:20:34 +08:00
Fam Zheng
b878b652df buildsys: Move libusb cflags/libs to per object
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20170907082918.7299-9-famz@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Fam Zheng <famz@redhat.com>
2017-09-22 10:20:34 +08:00
Fam Zheng
7b62bf5a70 buildsys: Move libcacard cflags/libs to per object
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20170907082918.7299-8-famz@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Fam Zheng <famz@redhat.com>
2017-09-22 10:20:34 +08:00
Fam Zheng
b11499117c buildsys: Move audio libs to per object
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20170907082918.7299-5-famz@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Fam Zheng <famz@redhat.com>
2017-09-22 10:20:34 +08:00
Fam Zheng
8ecc89f6e7 buildsys: Move sdl cflags/libs to per object
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20170907082918.7299-3-famz@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Fam Zheng <famz@redhat.com>
2017-09-22 10:20:34 +08:00
Fam Zheng
e2ad6f16a8 buildsys: Move vde libs to per object
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20170907083552.17725-3-famz@redhat.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2017-09-22 10:20:34 +08:00
Fam Zheng
27ad39ba61 vl: Don't include vde header
Nothing in vl.c uses anything from the vde package, do remove the
unnecessary include.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20170907083552.17725-2-famz@redhat.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2017-09-22 10:20:34 +08:00
Fam Zheng
f300ca63c7 docker: Add test-block
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20170905025614.579-6-famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Based-on: 20170905021201.25684-1-famz@redhat.com
2017-09-22 10:20:34 +08:00
Fam Zheng
18d4e35f93 docker: Add nettle-devel to fedora image
The LUKS cases in qemu-iotests requires this.

Reviewed-by: Kashyap Chamarthy <kchamart@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20170905025614.579-5-famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Based-on: 20170905021201.25684-1-famz@redhat.com
2017-09-22 10:20:34 +08:00
Fam Zheng
4470749186 docker: Use unconfined security profile
Some by default blocked syscalls are required to run tests for example
userfaultfd.

Reviewed-by: Kashyap Chamarthy <kchamart@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20170905025614.579-4-famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Based-on: 20170905021201.25684-1-famz@redhat.com
2017-09-22 10:20:34 +08:00
Fam Zheng
82659e844a docker: Add test_fail and prep_fail
They both print a message and exit, but with different status code so
distinguish real test errors from env preparation failures.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20170905025614.579-3-famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Based-on: 20170905021201.25684-1-famz@redhat.com
2017-09-22 10:20:34 +08:00
Fam Zheng
d8a2f5116d docker: Fix return code of build_qemu()
Without "set -e", the "&&" makes sure that the return code reflects the
result status, and that make only runs if configure succeeds.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20170905025614.579-2-famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Based-on: 20170905021201.25684-1-famz@redhat.com
2017-09-22 10:20:34 +08:00
Fam Zheng
05790dafef tests/docker: Clean up paths
The 'run' script already creats src, build and install directories under
$TEST_DIR, use it in common.rc.

Also the tests always run from $QEMU_SRC/tests/docker, so use a relative
$CMD string.

Message-Id: <20170817035721.11064-1-famz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
2017-09-22 10:20:34 +08:00
Fam Zheng
5e8a7fe673 docker: Enable features explicitly in test-full
Also avoid "set -e".

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20170907141245.31946-3-famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2017-09-22 10:20:34 +08:00
Fam Zheng
7fc581c295 docker: Update ubuntu image
Base on the newer ubuntu-lts (16.06) and include more packages for
better build coverage.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20170907141245.31946-2-famz@redhat.com>
2017-09-22 10:20:34 +08:00
Alex Bennée
3f2ff267af docker: reduce noise when building travis.docker
Set the DEBIAN_FRONTEND and locale env vars to stop apt complaining so
much as we build the image.

Suggested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20170725133425.436-7-alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Fam Zheng <famz@redhat.com>
2017-09-22 10:20:34 +08:00
Alex Bennée
9b4154a570 docker: don't install device-tree-compiler build-deps in travis.docker
Installing the device-tree-compiler build-deps is a little extreme. We
only actually need the binary so include it with the other packages.

Suggested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20170725133425.436-6-alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Fam Zheng <famz@redhat.com>
2017-09-22 10:20:34 +08:00
Alex Bennée
6fe3ae3f19 docker: docker.py make --no-cache skip checksum test
If you invoke with NOCACHE=1 we pass --no-cache in the argv to
docker.py but may still not force a rebuild if the dockerfile checksum
hasn't changed. By testing for its presence we can force builds
without having to manually remove the docker image.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20170725133425.436-5-alex.bennee@linaro.org>
Signed-off-by: Fam Zheng <famz@redhat.com>
2017-09-22 10:20:34 +08:00
Alex Bennée
1fddbf7c5e docker: ensure NOUSER for travis images
While adding the current user is a useful default behaviour for
creating new images it is not appropriate for Travis which already has
a default user.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20170725133425.436-2-alex.bennee@linaro.org>
Signed-off-by: Fam Zheng <famz@redhat.com>
2017-09-22 10:20:34 +08:00
Paolo Bonzini
7c9e527659 scsi, file-posix: add support for persistent reservation management
It is a common requirement for virtual machine to send persistent
reservations, but this currently requires either running QEMU with
CAP_SYS_RAWIO, or using out-of-tree patches that let an unprivileged
QEMU bypass Linux's filter on SG_IO commands.

As an alternative mechanism, the next patches will introduce a
privileged helper to run persistent reservation commands without
expanding QEMU's attack surface unnecessarily.

The helper is invoked through a "pr-manager" QOM object, to which
file-posix.c passes SG_IO requests for PERSISTENT RESERVE OUT and
PERSISTENT RESERVE IN commands.  For example:

  $ qemu-system-x86_64
      -device virtio-scsi \
      -object pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock
      -drive if=none,id=hd,driver=raw,file.filename=/dev/sdb,file.pr-manager=helper0
      -device scsi-block,drive=hd

or:

  $ qemu-system-x86_64
      -device virtio-scsi \
      -object pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock
      -blockdev node-name=hd,driver=raw,file.driver=host_device,file.filename=/dev/sdb,file.pr-manager=helper0
      -device scsi-block,drive=hd

Multiple pr-manager implementations are conceivable and possible, though
only one is implemented right now.  For example, a pr-manager could:

- talk directly to the multipath daemon from a privileged QEMU
  (i.e. QEMU links to libmpathpersist); this makes reservation work
  properly with multipath, but still requires CAP_SYS_RAWIO

- use the Linux IOC_PR_* ioctls (they require CAP_SYS_ADMIN though)

- more interestingly, implement reservations directly in QEMU
  through file system locks or a shared database (e.g. sqlite)

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-22 01:06:51 +02:00
Alexey Kardashevskiy
092aa2fc65 memory: Share special empty FlatView
This shares an cached empty FlatView among address spaces. The empty
FV is used every time when a root MR renders into a FV without memory
sections which happens when MR or its children are not enabled or
zero-sized. The empty_view is not NULL to keep the rest of memory
API intact; it also has a dispatch tree for the same reason.

On POWER8 with 255 CPUs, 255 virtio-net, 40 PCI bridges guest this halves
the amount of FlatView's in use (557 -> 260) and dispatch tables
(~800000 -> ~370000).  In an unrelated experiment with 112 non-virtio
devices on x86 ("-M pc"), only 4 FlatViews are alive, and about ~2000
are created at startup.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-16-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-22 01:06:51 +02:00
Paolo Bonzini
e673ba9af9 memory: seek FlatView sharing candidates among children subregions
A container can be used instead of an alias to allow switching between
multiple subregions.  In this case we cannot directly share the
subregions (since they only belong to a single parent), but if the
subregions are aliases we can in turn walk those.

This is not enough to remove all source of quadratic FlatView creation,
but it enables sharing of the PCI bus master FlatViews (and their
AddressSpaceDispatch structures) across all PCI devices.  For 112
virtio-net-pci devices, boot time is reduced from 25 to 10 seconds and
memory consumption from 1.4 to 1 G.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-22 01:06:51 +02:00
Paolo Bonzini
02d9651d6a memory: trace FlatView creation and destruction
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-22 01:06:51 +02:00
Alexey Kardashevskiy
202fc01b05 memory: Create FlatView directly
This avoids usual memory_region_transaction_commit() which rebuilds
all FVs.

On POWER8 with 255 CPUs, 255 virtio-net, 40 PCI bridges guest this brings
down the boot time from 25s to 20s and reduces the amount of temporary FVs
allocated during machine constructon (~800000 -> ~640000) and amount of
temporary dispatch trees (~370000 -> ~300000), the total memory footprint
goes down (18G -> 17G).

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-18-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-22 01:06:51 +02:00
Alexey Kardashevskiy
b516572f31 memory: Get rid of address_space_init_shareable
Since FlatViews are shared now and ASes not, this gets rid of
address_space_init_shareable().

This should cause no behavioural change.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-17-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-22 01:06:51 +02:00
Alexey Kardashevskiy
5e8fd947e2 memory: Rework "info mtree" to print flat views and dispatch trees
This adds a new "-d" switch to "info mtree" to print dispatch tree
internals.

This changes the way "-f" is handled - it prints now flat views and
associated address spaces.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-15-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-21 23:19:38 +02:00
Alexey Kardashevskiy
67ace39b25 memory: Do not allocate FlatView in address_space_init
This creates a new AS object without any FlatView as
memory_region_transaction_commit() may want to reuse the empty FV.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-14-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-21 23:19:38 +02:00
Alexey Kardashevskiy
967dc9b119 memory: Share FlatView's and dispatch trees between address spaces
This allows sharing flat views between address spaces (AS) when
the same root memory region is used when creating a new address space.
This is done by walking through all ASes and caching one FlatView per
a physical root MR (i.e. not aliased).

This removes search for duplicates from address_space_init_shareable() as
FlatViews are shared elsewhere and keeping as::ref_count correct seems
an unnecessary and useless complication.

This should cause no change and memory use or boot time yet.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-13-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-21 23:19:38 +02:00
Alexey Kardashevskiy
0221848764 memory: Move address_space_update_ioeventfds
So it is called (twice) from the same function. This is to make the next
patches a bit simpler.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-12-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-21 23:19:38 +02:00
Alexey Kardashevskiy
9bf561e36c memory: Alloc dispatch tree where topology is generared
This is to make next patches simpler.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-11-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-21 23:19:38 +02:00
Alexey Kardashevskiy
89c177bbdd memory: Store physical root MR in FlatView
Address spaces get to keep a root MR (alias or not) but FlatView stores
the actual MR as this is going to be used later on to decide whether to
share a particular FlatView or not.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-10-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-21 23:19:37 +02:00
Alexey Kardashevskiy
8629d3fcb7 memory: Rename mem_begin/mem_commit/mem_add helpers
This renames some helpers to reflect better what they do.

This should cause no behavioural change.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-9-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-21 23:19:37 +02:00
Alexey Kardashevskiy
9950322a59 memory: Cleanup after switching to FlatView
We store AddressSpaceDispatch* in FlatView anyway so there is no need
to carry it from mem_add() to register_subpage/register_multipage.

This should cause no behavioural change.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-8-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-21 23:19:37 +02:00
Alexey Kardashevskiy
166206845f memory: Switch memory from using AddressSpace to FlatView
FlatView's will be shared between AddressSpace's and subpage_t
and MemoryRegionSection cannot store AS anymore, hence this change.

In particular, for:

 typedef struct subpage_t {
     MemoryRegion iomem;
-    AddressSpace *as;
+    FlatView *fv;
     hwaddr base;
     uint16_t sub_section[];
 } subpage_t;

  struct MemoryRegionSection {
     MemoryRegion *mr;
-    AddressSpace *address_space;
+    FlatView *fv;
     hwaddr offset_within_region;
     Int128 size;
     hwaddr offset_within_address_space;
     bool readonly;
 };

This should cause no behavioural change.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-7-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-21 23:19:37 +02:00
Alexey Kardashevskiy
c775252378 memory: Remove AddressSpace pointer from AddressSpaceDispatch
AS in ASD is only used to pass AS from mem_begin() to register_subpage()
to store it in MemoryRegionSection, we can do this directly now.

This should cause no behavioural change.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-6-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-21 23:19:37 +02:00
Alexey Kardashevskiy
66a6df1dc6 memory: Move AddressSpaceDispatch from AddressSpace to FlatView
As we are going to share FlatView's between AddressSpace's,
and AddressSpaceDispatch is a structure to perform quick lookup
in FlatView, this moves ASD to FlatView.

After previosly open coded ASD rendering, we can also remove
as->next_dispatch as the new FlatView pointer is stored
on a stack and set to an AS atomically.

flatview_destroy() is executed under RCU instead of
address_space_dispatch_free() now.

This makes mem_begin/mem_commit to work with ASD and mem_add with FV
as later on mem_add will be taking FV as an argument anyway.

This should cause no behavioural change.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-5-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-21 23:19:37 +02:00
Alexey Kardashevskiy
cc94cd6d36 memory: Move FlatView allocation to a helper
This moves a FlatView allocation and initialization to a helper.
While we are nere, replace g_new with g_new0 to not to bother if we add
new fields in the future.

This should cause no behavioural change.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-4-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-21 23:19:37 +02:00
Alexey Kardashevskiy
9a62e24f45 memory: Open code FlatView rendering
We are going to share FlatView's between AddressSpace's and per-AS
memory listeners won't suit the purpose anymore so open code
the dispatch tree rendering.

Since there is a good chance that dispatch_listener was the only
listener, this avoids address_space_update_topology_pass() if there is
no registered listeners; this should improve starting time.

This should cause no behavioural change.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-3-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-21 23:19:37 +02:00
Alexey Kardashevskiy
e76bb18f7e exec: Explicitly export target AS from address_space_translate_internal
This adds an AS** parameter to address_space_do_translate()
to make it easier for the next patch to share FlatViews.

This should cause no behavioural change.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <20170921085110.25598-2-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-21 23:19:37 +02:00
Paolo Bonzini
447b0d0b9e memory: avoid "resurrection" of dead FlatViews
It's possible for address_space_get_flatview() as it currently stands
to cause a use-after-free for the returned FlatView, if the reference
count is incremented after the FlatView has been replaced by a writer:

   thread 1             thread 2             RCU thread
  -------------------------------------------------------------
   rcu_read_lock
   read as->current_map
                        set as->current_map
                        flatview_unref
                           '--> call_rcu
   flatview_ref
     [ref=1]
   rcu_read_unlock
                                             flatview_destroy
   <badness>

Since FlatViews are not updated very often, we can just detect the
situation using a new atomic op atomic_fetch_inc_nonzero, similar to
Linux's atomic_inc_not_zero, which performs the refcount increment only if
it hasn't already hit zero.  This is similar to Linux commit de09a9771a53
("CRED: Fix get_task_cred() and task_state() to not resurrect dead
credentials", 2010-07-29).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-21 23:19:37 +02:00
Peter Maydell
0a8066f0c0 Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20170921' into staging
target-arm queue:
 * more preparatory work for v8M support
 * convert some omap devices away from old_mmio
 * remove out of date ARM ARM section references in comments
 * add the Smartfusion2 board

# gpg: Signature made Thu 21 Sep 2017 17:40:40 BST
# gpg:                using RSA key 0x3C2525ED14360CDE
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>"
# gpg:                 aka "Peter Maydell <pmaydell@gmail.com>"
# gpg:                 aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>"
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83  15CF 3C25 25ED 1436 0CDE

* remotes/pmaydell/tags/pull-target-arm-20170921: (31 commits)
  msf2: Add Emcraft's Smartfusion2 SOM kit
  msf2: Add Smartfusion2 SoC
  msf2: Add Smartfusion2 SPI controller
  msf2: Microsemi Smartfusion2 System Register block
  msf2: Add Smartfusion2 System timer
  hw/arm/omap2.c: Don't use old_mmio
  hw/i2c/omap_i2c.c: Don't use old_mmio
  hw/timer/omap_gptimer: Don't use old_mmio
  hw/timer/omap_synctimer.c: Don't use old_mmio
  hw/gpio/omap_gpio.c: Don't use old_mmio
  hw/arm/palm.c: Don't use old_mmio for static_ops
  target/arm: Remove out of date ARM ARM section references in A64 decoder
  nvic: Support banked exceptions in acknowledge and complete
  nvic: Make SHCSR banked for v8M
  nvic: Make ICSR banked for v8M
  target/arm: Handle banking in negative-execution-priority check in cpu_mmu_index()
  nvic: Handle v8M changes in nvic_exec_prio()
  nvic: Disable the non-secure HardFault if AIRCR.BFHFNMINS is clear
  nvic: Implement v8M changes to fixed priority exceptions
  nvic: In escalation to HardFault, support HF not being priority -1
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-09-21 17:42:27 +01:00
Subbaraya Sundeep
6d262dcb7d msf2: Add Emcraft's Smartfusion2 SOM kit
Emulated Emcraft's Smartfusion2 System On Module starter
kit.

Signed-off-by: Subbaraya Sundeep <sundeep.lkml@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20170920201737.25723-6-f4bug@amsat.org
[PMD: drop cpu_model to directly use cpu type]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-09-21 16:36:56 +01:00
Subbaraya Sundeep
ebc1fbb4a1 msf2: Add Smartfusion2 SoC
Smartfusion2 SoC has hardened Microcontroller subsystem
and flash based FPGA fabric. This patch adds support for
Microcontroller subsystem in the SoC.

Signed-off-by: Subbaraya Sundeep <sundeep.lkml@gmail.com>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20170920201737.25723-5-f4bug@amsat.org
[PMD: drop cpu_model to directly use cpu type, check m3clk non null]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-09-21 16:36:56 +01:00
Subbaraya Sundeep
268ee7deb4 msf2: Add Smartfusion2 SPI controller
Modelled Microsemi's Smartfusion2 SPI controller.

Signed-off-by: Subbaraya Sundeep <sundeep.lkml@gmail.com>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20170920201737.25723-4-f4bug@amsat.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-09-21 16:36:56 +01:00
Subbaraya Sundeep
0ee1e1f469 msf2: Microsemi Smartfusion2 System Register block
Added Sytem register block of Smartfusion2.
This block has PLL registers which are accessed by guest.

Signed-off-by: Subbaraya Sundeep <sundeep.lkml@gmail.com>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Acked-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20170920201737.25723-3-f4bug@amsat.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-09-21 16:36:56 +01:00
Subbaraya Sundeep
96401bad45 msf2: Add Smartfusion2 System timer
Modelled System Timer in Microsemi's Smartfusion2 Soc.
Timer has two 32bit down counters and two interrupts.

Signed-off-by: Subbaraya Sundeep <sundeep.lkml@gmail.com>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Acked-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20170920201737.25723-2-f4bug@amsat.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-09-21 16:36:56 +01:00
Peter Maydell
fc14cf0e95 hw/arm/omap2.c: Don't use old_mmio
Don't use old_mmio in the memory region ops struct.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505580378-9044-7-git-send-email-peter.maydell@linaro.org
2017-09-21 16:34:27 +01:00
Peter Maydell
28dc207f5f hw/i2c/omap_i2c.c: Don't use old_mmio
Don't use old_mmio in the memory region ops struct.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505580378-9044-6-git-send-email-peter.maydell@linaro.org
2017-09-21 16:34:27 +01:00
Peter Maydell
13dfde3320 hw/timer/omap_gptimer: Don't use old_mmio
Don't use the old_mmio struct in memory region ops.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505580378-9044-5-git-send-email-peter.maydell@linaro.org
2017-09-21 16:34:27 +01:00
Peter Maydell
27f5bab84d hw/timer/omap_synctimer.c: Don't use old_mmio
Don't use the old_mmio in the memory region ops struct.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505580378-9044-4-git-send-email-peter.maydell@linaro.org
2017-09-21 16:34:27 +01:00
Peter Maydell
940caf1f7e hw/gpio/omap_gpio.c: Don't use old_mmio
Drop the use of old_mmio in the omap2_gpio memory ops.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505580378-9044-3-git-send-email-peter.maydell@linaro.org
2017-09-21 16:34:27 +01:00
Peter Maydell
7b675f1f97 hw/arm/palm.c: Don't use old_mmio for static_ops
Update the static_ops functions to use new-style mmio
rather than the legacy old_mmio functions.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505580378-9044-2-git-send-email-peter.maydell@linaro.org
2017-09-21 16:34:27 +01:00
Peter Maydell
4ce31af4ae target/arm: Remove out of date ARM ARM section references in A64 decoder
In the A64 decoder, we have a lot of references to section numbers
from version A.a of the v8A ARM ARM (DDI0487). This version of the
document is now long obsolete (we are currently on revision B.a),
and various intervening versions renumbered all the sections.

The most recent B.a version of the document doesn't assign
section numbers at all to the individual instruction classes
in the way that the various A.x versions did. The simplest thing
to do is just to delete all the out of date C.x.x references.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20170915150849.23557-1-peter.maydell@linaro.org
2017-09-21 16:32:25 +01:00
Peter Maydell
5cb18069d7 nvic: Support banked exceptions in acknowledge and complete
Update armv7m_nvic_acknowledge_irq() and armv7m_nvic_complete_irq()
to handle banked exceptions:
 * acknowledge needs to use the correct vector, which may be
   in sec_vectors[]
 * acknowledge needs to return to its caller whether the
   exception should be taken to secure or non-secure state
 * complete needs its caller to tell it whether the exception
   being completed is a secure one or not

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505240046-11454-20-git-send-email-peter.maydell@linaro.org
2017-09-21 16:31:09 +01:00
Peter Maydell
437d59c17e nvic: Make SHCSR banked for v8M
Handle banking of SHCSR: some register bits are banked between
Secure and Non-Secure, and some are only accessible to Secure.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505240046-11454-19-git-send-email-peter.maydell@linaro.org
2017-09-21 16:31:09 +01:00
Peter Maydell
3f1e0eb7c3 nvic: Make ICSR banked for v8M
The ICSR NVIC register is banked for v8M. This doesn't
require any new state, but it does mean that some bits
are controlled by BFHNFNMINS and some bits must work
with the correct banked exception. There is also a new
in v8M PENDNMICLR bit.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505240046-11454-18-git-send-email-peter.maydell@linaro.org
2017-09-21 16:31:09 +01:00
Peter Maydell
5d4791991d target/arm: Handle banking in negative-execution-priority check in cpu_mmu_index()
Now that we have a banked FAULTMASK register and banked exceptions,
we can implement the correct check in cpu_mmu_index() for whether
the MPU_CTRL.HFNMIENA bit's effect should apply. This bit causes
handlers which have requested a negative execution priority to run
with the MPU disabled. In v8M the test has to check this for the
current security state and so takes account of banking.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505240046-11454-17-git-send-email-peter.maydell@linaro.org
2017-09-21 16:31:09 +01:00
Peter Maydell
49c80c380d nvic: Handle v8M changes in nvic_exec_prio()
Update nvic_exec_prio() to support the v8M changes:
 * BASEPRI, FAULTMASK and PRIMASK are all banked
 * AIRCR.PRIS can affect NS priorities
 * AIRCR.BFHFNMINS affects FAULTMASK behaviour

These changes mean that it's no longer possible to
definitely say that if FAULTMASK is set it overrides
PRIMASK, and if PRIMASK is set it overrides BASEPRI
(since if PRIMASK_NS is set and AIRCR.PRIS is set then
whether that 0x80 priority should take effect or the
priority in BASEPRI_S depends on the value of BASEPRI_S,
for instance). So we switch to the same approach used
by the pseudocode of working through BASEPRI, PRIMASK
and FAULTMASK and overriding the previous values if
needed.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505240046-11454-16-git-send-email-peter.maydell@linaro.org
2017-09-21 16:31:09 +01:00
Peter Maydell
7208b426c7 nvic: Disable the non-secure HardFault if AIRCR.BFHFNMINS is clear
If AIRCR.BFHFNMINS is clear, then although NonSecure HardFault
can still be pended via SHCSR.HARDFAULTPENDED it mustn't actually
preempt execution. The simple way to achieve this is to clear the
enable bit for it, since the enable bit isn't guest visible.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505240046-11454-15-git-send-email-peter.maydell@linaro.org
2017-09-21 16:31:09 +01:00
Peter Maydell
331f4bae6c nvic: Implement v8M changes to fixed priority exceptions
In v7M, the fixed-priority exceptions are:
 Reset: -3
 NMI: -2
 HardFault: -1

In v8M, this changes because Secure HardFault may need
to be prioritised above NMI:
 Reset: -4
 Secure HardFault if AIRCR.BFHFNMINS == 1: -3
 NMI: -2
 Secure HardFault if AIRCR.BFHFNMINS == 0: -1
 NonSecure HardFault: -1

Make these changes, including support for changing the
priority of Secure HardFault as AIRCR.BFHFNMINS changes.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505240046-11454-14-git-send-email-peter.maydell@linaro.org
2017-09-21 16:31:09 +01:00
Peter Maydell
94a34abe32 nvic: In escalation to HardFault, support HF not being priority -1
When escalating to HardFault, we must go into Lockup if we
can't take the synchronous HardFault because the current
execution priority is already at or below the priority of
HardFault. In v7M HF is always priority -1 so a simple < 0
comparison sufficed; in v8M the priority of HardFault can
vary depending on whether it is a Secure or NonSecure
HardFault, so we must check against the priority of the
HardFault exception vector we're about to use.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505240046-11454-13-git-send-email-peter.maydell@linaro.org
2017-09-21 16:31:09 +01:00
Peter Maydell
80ac239035 nvic: Compare group priority for escalation to HF
In armv7m_nvic_set_pending() we have to compare the
priority of an exception against the execution priority
to decide whether it needs to be escalated to HardFault.
In the specification this is a comparison against the
exception's group priority; for v7M we implemented it
as a comparison against the raw exception priority
because the two comparisons will always give the
same answer. For v8M the existence of AIRCR.PRIS and
the possibility of different PRIGROUP values for secure
and nonsecure exceptions means we need to explicitly
calculate the vector's group priority for this check.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505240046-11454-12-git-send-email-peter.maydell@linaro.org
2017-09-21 16:31:09 +01:00
Peter Maydell
e6a0d3500d nvic: Make SHPR registers banked
Make the set_prio() function take a bool indicating
whether to pend the secure or non-secure version of a banked
interrupt, and use this to implement the correct banking
semantics for the SHPR registers.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505240046-11454-11-git-send-email-peter.maydell@linaro.org
2017-09-21 16:31:09 +01:00
Peter Maydell
2fb50a3340 nvic: Make set_pending and clear_pending take a secure parameter
Make the armv7m_nvic_set_pending() and armv7m_nvic_clear_pending()
functions take a bool indicating whether to pend the secure
or non-secure version of a banked interrupt, and update the
callsites accordingly.

In most callsites we can simply pass the correct security
state in; in a couple of cases we use TODO comments to indicate
that we will return the code in a subsequent commit.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505240046-11454-10-git-send-email-peter.maydell@linaro.org
2017-09-21 16:31:09 +01:00
Peter Maydell
ff96c64aec nvic: Handle banked exceptions in nvic_recompute_state()
Update the nvic_recompute_state() code to handle the security
extension and its associated banked registers.

Code that uses the resulting cached state (ie the irq
acknowledge and complete code) will be updated in a later
commit.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505240046-11454-9-git-send-email-peter.maydell@linaro.org
2017-09-21 16:31:09 +01:00
Peter Maydell
e1be0a576b nvic: Implement NVIC_ITNS<n> registers
For v8M, the NVIC has a new set of registers per interrupt,
NVIC_ITNS<n>. These determine whether the interrupt targets Secure
or Non-secure state. Implement the register read/write code for
these, and make them cause NVIC_IABR, NVIC_ICER, NVIC_ISER,
NVIC_ICPR, NVIC_IPR and NVIC_ISPR to RAZ/WI for non-secure
accesses to fields corresponding to interrupts which are
configured to target secure state.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505240046-11454-8-git-send-email-peter.maydell@linaro.org
2017-09-21 16:29:27 +01:00
Peter Maydell
028b0da424 nvic: Make ICSR.RETTOBASE handle banked exceptions
Update the code in nvic_rettobase() so that it checks the
sec_vectors[] array as well as the vectors[] array if needed.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505240046-11454-7-git-send-email-peter.maydell@linaro.org
2017-09-21 16:29:27 +01:00
Peter Maydell
3b2e934463 nvic: Implement AIRCR changes for v8M
The Application Interrupt and Reset Control Register has some changes
for v8M:
 * new bits SYSRESETREQS, BFHFNMINS and PRIS: these all have
   real state if the security extension is implemented and otherwise
   are constant
 * the PRIGROUP field is banked between security states
 * non-secure code can be blocked from using the SYSRESET bit
   to reset the system if SYSRESETREQS is set

Implement the new state and the changes to register read and write.
For the moment we ignore the effects of the secure PRIGROUP.
We will implement the effects of PRIS and BFHFNMIS later.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505240046-11454-6-git-send-email-peter.maydell@linaro.org
2017-09-21 16:29:27 +01:00
Peter Maydell
5255fcf8e4 nvic: Add cached vectpending_prio state
Instead of looking up the pending priority
in nvic_pending_prio(), cache it in a new state struct
field. The calculation of the pending priority given
the interrupt number is more complicated in v8M with
the security extension, so the caching will be worthwhile.

This changes nvic_pending_prio() from returning a full
(group + subpriority) priority value to returning a group
priority. This doesn't require changes to its callsites
because we use it only in comparisons of the form
  execution_prio > nvic_pending_prio()
and execution priority is always a group priority, so
a test (exec prio > full prio) is true if and only if
(execprio > group_prio).

(Architecturally the expected comparison is with the
group priority for this sort of "would we preempt" test;
we were only doing a test with a full priority as an
optimisation to avoid the mask, which is possible
precisely because the two comparisons always give the
same answer.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505240046-11454-5-git-send-email-peter.maydell@linaro.org
2017-09-21 16:29:27 +01:00
Peter Maydell
e93bc2ac11 nvic: Add cached vectpending_is_s_banked state
With banked exceptions, just the exception number in
s->vectpending is no longer sufficient to uniquely identify
the pending exception. Add a vectpending_is_s_banked bool
which is true if the exception is using the sec_vectors[]
array.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1505240046-11454-4-git-send-email-peter.maydell@linaro.org
2017-09-21 16:29:23 +01:00
Peter Maydell
17906a162a nvic: Add banked exception states
For the v8M security extension, some exceptions must be banked
between security states. Add the new vecinfo array which holds
the state for the banked exceptions and migrate it if the
CPU the NVIC is attached to implements the security extension.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2017-09-21 16:28:59 +01:00
Peter Maydell
50f11062d4 target/arm: Implement MSR/MRS access to NS banked registers
In v8M the MSR and MRS instructions have extra register value
encodings to allow secure code to access the non-secure banked
version of various special registers.

(We don't implement the MSPLIM_NS or PSPLIM_NS aliases, because
we don't currently implement the stack limit registers at all.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1505240046-11454-2-git-send-email-peter.maydell@linaro.org
2017-09-21 16:28:23 +01:00
Paolo Bonzini
db81b99537 atomic: update documentation
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-21 14:47:42 +02:00
KONRAD Frederic
05e015f73c memory: avoid a name clash with access macro
This avoids a name clash with the access macro on windows 64:

make
	CHK version_gen.h
  CC      aarch64-softmmu/memory.o
/home/konrad/qemu/memory.c: In function 'access_with_adjusted_size':
/home/konrad/qemu/memory.c:591:73: error: macro "access" passed 7 arguments, \
                         but takes just 2
                         (size - access_size - i) * 8, access_mask, attrs);
                                                                         ^

Signed-off-by: KONRAD Frederic <frederic.konrad@adacore.com>
Message-Id: <1505988260-8483-1-git-send-email-frederic.konrad@adacore.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-21 14:08:17 +02:00
David Hildenbrand
3110cdbd8a kvm: drop wrong assertion creating problems with pflash
pflash toggles mr->romd_mode. So this assert does not always hold.

1) a device was added with !mr->romd_mode, therefore effectively not
   creating a kvm slot as we want to trap every access (add = false).
2) mr->romd_mode was toggled on before remove it. There is now
   actually no slot to remove and the assert is wrong.

So let's just drop the assert.

Reported-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170920145025.19403-1-david@redhat.com>
Tested-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-21 12:40:08 +02:00
Pavel Butsykin
55289fb036 virtio-serial: add enable_backend callback
We should guarantee that RAM will not be modified while VM has a stopped
state, otherwise it can lead to negative consequences during post-copy
migration. In RUN_STATE_FINISH_MIGRATE step, it's expected that RAM on
source side will not be modified as this could lead to non-consistent vm state
on the destination side. Also RAM access during postcopy-ram migration with
enabled release-ram capability can lead to sad consequences.

Let's add enable_backend() callback to avoid undesirable virtioqueue changes
in the guest memory.

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Message-Id: <20170919120733.22020-1-pbutsykin@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-21 11:51:49 +02:00
407 changed files with 14776 additions and 4884 deletions

2
.gitignore vendored
View File

@@ -49,9 +49,11 @@
/qemu-version.h
/qemu-version.h.tmp
/module_block.h
/scsi/qemu-pr-helper
/vscclient
/vhost-user-scsi
/fsdev/virtfs-proxy-helper
*.tmp
*.[1-9]
*.a
*.aux

View File

@@ -299,6 +299,8 @@ M: Cornelia Huck <cohuck@redhat.com>
M: Alexander Graf <agraf@suse.de>
S: Maintained
F: target/s390x/kvm.c
F: target/s390x/kvm_s390x.h
F: target/s390x/kvm-stub.c
F: target/s390x/ioinst.[ch]
F: target/s390x/machine.c
F: hw/intc/s390_flic.c
@@ -380,6 +382,7 @@ M: Peter Maydell <peter.maydell@linaro.org>
L: qemu-arm@nongnu.org
S: Maintained
F: hw/char/pl011.c
F: include/hw/char/pl011.h
F: hw/display/pl110*
F: hw/dma/pl080.c
F: hw/dma/pl330.c
@@ -403,13 +406,15 @@ F: hw/intc/gic_internal.h
F: hw/misc/a9scu.c
F: hw/misc/arm11scu.c
F: hw/timer/a9gtimer*
F: hw/timer/arm_*
F: include/hw/arm/arm.h
F: hw/timer/arm*
F: include/hw/arm/arm*.h
F: include/hw/intc/arm*
F: include/hw/misc/a9scu.h
F: include/hw/misc/arm11scu.h
F: include/hw/timer/a9gtimer.h
F: include/hw/timer/arm_mptimer.h
F: include/hw/timer/armv7m_systick.h
F: tests/test-arm-mptimer.c
Exynos
M: Igor Mitsyanko <i.mitsyanko@gmail.com>
@@ -512,6 +517,7 @@ M: Peter Maydell <peter.maydell@linaro.org>
L: qemu-arm@nongnu.org
S: Maintained
F: hw/*/versatile*
F: hw/misc/arm_sysctl.c
Xilinx Zynq
M: Edgar E. Iglesias <edgar.iglesias@gmail.com>
@@ -548,6 +554,7 @@ F: hw/char/stm32f2xx_usart.c
F: hw/timer/stm32f2xx_timer.c
F: hw/adc/*
F: hw/ssi/stm32f2xx_spi.c
F: include/hw/*/stm32*.h
Netduino 2
M: Alistair Francis <alistair@alistair23.me>
@@ -925,6 +932,8 @@ F: include/hw/pci/*
F: hw/misc/pci-testdev.c
F: hw/pci/*
F: hw/pci-bridge/*
F: docs/pci*
F: docs/specs/*pci*
ACPI/SMBIOS
M: Michael S. Tsirkin <mst@redhat.com>
@@ -983,10 +992,13 @@ F: hw/scsi/lsi53c895a.c
SSI
M: Peter Crosthwaite <crosthwaite.peter@gmail.com>
M: Alistair Francis <alistair.francis@xilinx.com>
S: Maintained
F: hw/ssi/*
F: hw/block/m25p80.c
F: include/hw/ssi/ssi.h
X: hw/ssi/xilinx_*
F: tests/m25p80-test.c
Xilinx SPI
M: Alistair Francis <alistair.francis@xilinx.com>
@@ -1029,6 +1041,7 @@ vhost
M: Michael S. Tsirkin <mst@redhat.com>
S: Supported
F: hw/*/*vhost*
F: docs/interop/vhost-user.txt
virtio
M: Michael S. Tsirkin <mst@redhat.com>
@@ -1126,6 +1139,7 @@ M: Dmitry Fleytman <dmitry@daynix.com>
S: Maintained
F: hw/net/vmxnet*
F: hw/scsi/vmw_pvscsi*
F: tests/vmxnet3-test.c
Rocker
M: Jiri Pirko <jiri@resnulli.us>
@@ -1156,6 +1170,7 @@ M: Alistair Francis <alistair.francis@xilinx.com>
S: Maintained
F: hw/core/generic-loader.c
F: include/hw/core/generic-loader.h
F: docs/generic-loader.txt
CHRP NVRAM
M: Thomas Huth <thuth@redhat.com>
@@ -1217,6 +1232,7 @@ F: util/aio-*.c
F: block/io.c
F: migration/block*
F: include/block/aio.h
F: scripts/qemugdb/aio.py
T: git git://github.com/stefanha/qemu.git block
Block SCSI subsystem
@@ -1257,7 +1273,7 @@ F: block/dirty-bitmap.c
F: include/qemu/hbitmap.h
F: include/block/dirty-bitmap.h
F: tests/test-hbitmap.c
F: docs/bitmaps.md
F: docs/interop/bitmaps.rst
T: git git://github.com/famz/qemu.git bitmaps
T: git git://github.com/jnsnow/qemu.git bitmaps
@@ -1426,7 +1442,7 @@ F: tests/test-qapi-*.c
F: tests/test-qmp-*.c
F: tests/test-visitor-serialization.c
F: scripts/qapi*
F: docs/qapi*
F: docs/devel/qapi*
T: git git://repo.or.cz/qemu/armbru.git qapi-next
QAPI Schema
@@ -1455,6 +1471,10 @@ QEMU Guest Agent
M: Michael Roth <mdroth@linux.vnet.ibm.com>
S: Maintained
F: qga/
F: qemu-ga.texi
F: scripts/qemu-guest-agent/
F: tests/test-qga.c
F: docs/interop/qemu-ga-ref.texi
T: git git://github.com/mdroth/qemu.git qga
QOM
@@ -1474,7 +1494,7 @@ M: Markus Armbruster <armbru@redhat.com>
S: Supported
F: qmp.c
F: monitor.c
F: docs/*qmp-*
F: docs/devel/*qmp-*
F: scripts/qmp/
F: tests/qmp-test.c
T: git git://repo.or.cz/qemu/armbru.git qapi-next
@@ -1505,7 +1525,7 @@ S: Maintained
F: trace/
F: scripts/tracetool.py
F: scripts/tracetool/
F: docs/tracing.txt
F: docs/devel/tracing.txt
T: git git://github.com/stefanha/qemu.git tracing
TPM
@@ -1528,7 +1548,7 @@ F: include/migration/
F: migration/
F: scripts/vmstate-static-checker.py
F: tests/vmstate-static-checker-data/
F: docs/migration.txt
F: docs/devel/migration.txt
F: qapi/migration.json
Seccomp
@@ -1543,6 +1563,7 @@ S: Maintained
F: crypto/
F: include/crypto/
F: tests/test-crypto-*
F: tests/benchmark-crypto-*
F: qemu.sasl
Coroutines
@@ -1579,8 +1600,10 @@ M: Alberto Garcia <berto@igalia.com>
S: Supported
F: block/throttle-groups.c
F: include/block/throttle-groups.h
F: include/qemu/throttle.h
F: include/qemu/throttle*.h
F: util/throttle.c
F: docs/throttle.txt
F: tests/test-throttle.c
L: qemu-block@nongnu.org
UUID
@@ -1836,7 +1859,7 @@ M: Denis V. Lunev <den@openvz.org>
L: qemu-block@nongnu.org
S: Supported
F: block/parallels.c
F: docs/specs/parallels.txt
F: docs/interop/parallels.txt
qed
M: Stefan Hajnoczi <stefanha@redhat.com>
@@ -1861,6 +1884,7 @@ M: Max Reitz <mreitz@redhat.com>
L: qemu-block@nongnu.org
S: Supported
F: block/qcow2*
F: docs/interop/qcow2.txt
qcow
M: Kevin Wolf <kwolf@redhat.com>
@@ -1904,6 +1928,7 @@ F: docs/block-replication.txt
Build and test automation
-------------------------
Build and test automation
M: Alex Bennée <alex.bennee@linaro.org>
M: Fam Zheng <famz@redhat.com>
R: Philippe Mathieu-Daudé <f4bug@amsat.org>
@@ -1912,6 +1937,7 @@ S: Maintained
F: .travis.yml
F: .shippable.yml
F: tests/docker/
F: tests/vm/
W: https://travis-ci.org/qemu/qemu
W: https://app.shippable.com/github/qemu/qemu
W: http://patchew.org/QEMU/
@@ -1921,5 +1947,5 @@ Documentation
Build system architecture
M: Daniel P. Berrange <berrange@redhat.com>
S: Odd Fixes
F: docs/build-system.txt
F: docs/devel/build-system.txt

View File

@@ -209,6 +209,7 @@ ifdef BUILD_DOCS
DOCS=qemu-doc.html qemu-doc.txt qemu.1 qemu-img.1 qemu-nbd.8 qemu-ga.8
DOCS+=docs/interop/qemu-qmp-ref.html docs/interop/qemu-qmp-ref.txt docs/interop/qemu-qmp-ref.7
DOCS+=docs/interop/qemu-ga-ref.html docs/interop/qemu-ga-ref.txt docs/interop/qemu-ga-ref.7
DOCS+=docs/qemu-block-drivers.7
ifdef CONFIG_VIRTFS
DOCS+=fsdev/virtfs-proxy-helper.1
endif
@@ -372,6 +373,11 @@ qemu-bridge-helper$(EXESUF): qemu-bridge-helper.o $(COMMON_LDADDS)
fsdev/virtfs-proxy-helper$(EXESUF): fsdev/virtfs-proxy-helper.o fsdev/9p-marshal.o fsdev/9p-iov-marshal.o $(COMMON_LDADDS)
fsdev/virtfs-proxy-helper$(EXESUF): LIBS += -lcap
scsi/qemu-pr-helper$(EXESUF): scsi/qemu-pr-helper.o scsi/utils.o $(crypto-obj-y) $(io-obj-y) $(qom-obj-y) $(COMMON_LDADDS)
ifdef CONFIG_MPATH
scsi/qemu-pr-helper$(EXESUF): LIBS += -ludev -lmultipath -lmpathpersist
endif
qemu-img-cmds.h: $(SRC_PATH)/qemu-img-cmds.hx $(SRC_PATH)/scripts/hxtool
$(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -h < $< > $@,"GEN","$@")
@@ -488,7 +494,7 @@ clean:
rm -f *.msi
find . \( -name '*.so' -o -name '*.dll' -o -name '*.mo' -o -name '*.[oda]' \) -type f -exec rm {} +
rm -f $(filter-out %.tlb,$(TOOLS)) $(HELPERS-y) qemu-ga TAGS cscope.* *.pod *~ */*~
rm -f fsdev/*.pod
rm -f fsdev/*.pod scsi/*.pod
rm -f qemu-img-cmds.h
rm -f ui/shader/*-vert.h ui/shader/*-frag.h
@# May not be present in GENERATED_FILES
@@ -527,6 +533,7 @@ distclean: clean
rm -f docs/interop/qemu-qmp-ref.txt docs/interop/qemu-ga-ref.txt
rm -f docs/interop/qemu-qmp-ref.pdf docs/interop/qemu-ga-ref.pdf
rm -f docs/interop/qemu-qmp-ref.html docs/interop/qemu-ga-ref.html
rm -f docs/qemu-block-drivers.7
for d in $(TARGET_DIRS); do \
rm -rf $$d || exit 1 ; \
done
@@ -571,6 +578,7 @@ ifdef CONFIG_POSIX
$(INSTALL_DATA) qemu.1 "$(DESTDIR)$(mandir)/man1"
$(INSTALL_DIR) "$(DESTDIR)$(mandir)/man7"
$(INSTALL_DATA) docs/interop/qemu-qmp-ref.7 "$(DESTDIR)$(mandir)/man7"
$(INSTALL_DATA) docs/qemu-block-drivers.7 "$(DESTDIR)$(mandir)/man7"
ifneq ($(TOOLS),)
$(INSTALL_DATA) qemu-img.1 "$(DESTDIR)$(mandir)/man1"
$(INSTALL_DIR) "$(DESTDIR)$(mandir)/man8"
@@ -716,6 +724,7 @@ qemu-img.1: qemu-img.texi qemu-option-trace.texi qemu-img-cmds.texi
fsdev/virtfs-proxy-helper.1: fsdev/virtfs-proxy-helper.texi
qemu-nbd.8: qemu-nbd.texi qemu-option-trace.texi
qemu-ga.8: qemu-ga.texi
docs/qemu-block-drivers.7: docs/qemu-block-drivers.texi
html: qemu-doc.html docs/interop/qemu-qmp-ref.html docs/interop/qemu-ga-ref.html
info: qemu-doc.info docs/interop/qemu-qmp-ref.info docs/interop/qemu-ga-ref.info
@@ -725,7 +734,7 @@ txt: qemu-doc.txt docs/interop/qemu-qmp-ref.txt docs/interop/qemu-ga-ref.txt
qemu-doc.html qemu-doc.info qemu-doc.pdf qemu-doc.txt: \
qemu-img.texi qemu-nbd.texi qemu-options.texi qemu-option-trace.texi \
qemu-monitor.texi qemu-img-cmds.texi qemu-ga.texi \
qemu-monitor-info.texi
qemu-monitor-info.texi docs/qemu-block-drivers.texi
docs/interop/qemu-ga-ref.dvi docs/interop/qemu-ga-ref.html \
docs/interop/qemu-ga-ref.info docs/interop/qemu-ga-ref.pdf \
@@ -811,6 +820,7 @@ endif
-include $(wildcard *.d tests/*.d)
include $(SRC_PATH)/tests/docker/Makefile.include
include $(SRC_PATH)/tests/vm/Makefile.include
.PHONY: help
help:
@@ -834,6 +844,7 @@ help:
@echo 'Test targets:'
@echo ' check - Run all tests (check-help for details)'
@echo ' docker - Help about targets running tests inside Docker containers'
@echo ' vm-test - Help about targets running tests inside VM'
@echo ''
@echo 'Documentation targets:'
@echo ' html info pdf txt'

View File

@@ -171,6 +171,7 @@ trace-events-subdirs += qapi
trace-events-subdirs += accel/tcg
trace-events-subdirs += accel/kvm
trace-events-subdirs += nbd
trace-events-subdirs += scsi
trace-events-files = $(SRC_PATH)/trace-events $(trace-events-subdirs:%=$(SRC_PATH)/%/trace-events)

View File

@@ -87,6 +87,7 @@ struct KVMState
#endif
int many_ioeventfds;
int intx_set_mask;
bool sync_mmu;
/* The man page (and posix) say ioctl numbers are signed int, but
* they're not. Linux, glibc and *BSD all treat ioctl numbers as
* unsigned, and treating them as signed here can break things */
@@ -722,7 +723,6 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
mem = kvm_lookup_matching_slot(kml, start_addr, size);
if (!add) {
if (!mem) {
g_assert(!memory_region_is_ram(mr) && !writeable && !mr->romd_mode);
return;
}
if (mem->flags & KVM_MEM_LOG_DIRTY_PAGES) {
@@ -1440,7 +1440,7 @@ static void kvm_irqchip_create(MachineState *machine, KVMState *s)
*/
static int kvm_recommended_vcpus(KVMState *s)
{
int ret = kvm_check_extension(s, KVM_CAP_NR_VCPUS);
int ret = kvm_vm_check_extension(s, KVM_CAP_NR_VCPUS);
return (ret) ? ret : 4;
}
@@ -1530,26 +1530,6 @@ static int kvm_init(MachineState *ms)
s->nr_slots = 32;
}
/* check the vcpu limits */
soft_vcpus_limit = kvm_recommended_vcpus(s);
hard_vcpus_limit = kvm_max_vcpus(s);
while (nc->name) {
if (nc->num > soft_vcpus_limit) {
warn_report("Number of %s cpus requested (%d) exceeds "
"the recommended cpus supported by KVM (%d)",
nc->name, nc->num, soft_vcpus_limit);
if (nc->num > hard_vcpus_limit) {
fprintf(stderr, "Number of %s cpus requested (%d) exceeds "
"the maximum cpus supported by KVM (%d)\n",
nc->name, nc->num, hard_vcpus_limit);
exit(1);
}
}
nc++;
}
kvm_type = qemu_opt_get(qemu_get_machine_opts(), "kvm-type");
if (mc->kvm_type) {
type = mc->kvm_type(kvm_type);
@@ -1584,6 +1564,27 @@ static int kvm_init(MachineState *ms)
}
s->vmfd = ret;
/* check the vcpu limits */
soft_vcpus_limit = kvm_recommended_vcpus(s);
hard_vcpus_limit = kvm_max_vcpus(s);
while (nc->name) {
if (nc->num > soft_vcpus_limit) {
warn_report("Number of %s cpus requested (%d) exceeds "
"the recommended cpus supported by KVM (%d)",
nc->name, nc->num, soft_vcpus_limit);
if (nc->num > hard_vcpus_limit) {
fprintf(stderr, "Number of %s cpus requested (%d) exceeds "
"the maximum cpus supported by KVM (%d)\n",
nc->name, nc->num, hard_vcpus_limit);
exit(1);
}
}
nc++;
}
missing_cap = kvm_check_extension_list(s, kvm_required_capabilites);
if (!missing_cap) {
missing_cap =
@@ -1665,6 +1666,8 @@ static int kvm_init(MachineState *ms)
s->many_ioeventfds = kvm_check_many_ioeventfds();
s->sync_mmu = !!kvm_vm_check_extension(kvm_state, KVM_CAP_SYNC_MMU);
return 0;
err:
@@ -2131,10 +2134,9 @@ int kvm_device_access(int fd, int group, uint64_t attr,
return err;
}
/* Return 1 on success, 0 on failure */
int kvm_has_sync_mmu(void)
bool kvm_has_sync_mmu(void)
{
return kvm_check_extension(kvm_state, KVM_CAP_SYNC_MMU);
return kvm_state->sync_mmu;
}
int kvm_has_vcpu_events(void)

View File

@@ -64,9 +64,9 @@ int kvm_cpu_exec(CPUState *cpu)
abort();
}
int kvm_has_sync_mmu(void)
bool kvm_has_sync_mmu(void)
{
return 0;
return false;
}
int kvm_has_many_ioeventfds(void)

View File

@@ -765,7 +765,7 @@ static uint64_t io_readx(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
cpu->mem_io_vaddr = addr;
if (mr->global_locking) {
if (mr->global_locking && !qemu_mutex_iothread_locked()) {
qemu_mutex_lock_iothread();
locked = true;
}
@@ -800,7 +800,7 @@ static void io_writex(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
cpu->mem_io_vaddr = addr;
cpu->mem_io_pc = retaddr;
if (mr->global_locking) {
if (mr->global_locking && !qemu_mutex_iothread_locked()) {
qemu_mutex_lock_iothread();
locked = true;
}

View File

@@ -11,3 +11,9 @@ common-obj-$(CONFIG_AUDIO_WIN_INT) += audio_win_int.o
common-obj-y += wavcapture.o
sdlaudio.o-cflags := $(SDL_CFLAGS)
sdlaudio.o-libs := $(SDL_LIBS)
alsaaudio.o-libs := $(ALSA_LIBS)
paaudio.o-libs := $(PULSE_LIBS)
coreaudio.o-libs := $(COREAUDIO_LIBS)
dsoundaudio.o-libs := $(DSOUND_LIBS)
ossaudio.o-libs := $(OSS_LIBS)

300
block.c
View File

@@ -239,12 +239,6 @@ bool bdrv_is_read_only(BlockDriverState *bs)
return bs->read_only;
}
/* Returns whether the image file can be written to right now */
bool bdrv_is_writable(BlockDriverState *bs)
{
return !bdrv_is_read_only(bs) && !(bs->open_flags & BDRV_O_INACTIVE);
}
int bdrv_can_set_read_only(BlockDriverState *bs, bool read_only,
bool ignore_allow_rdw, Error **errp)
{
@@ -987,6 +981,33 @@ static void bdrv_backing_options(int *child_flags, QDict *child_options,
*child_flags = flags;
}
static int bdrv_backing_update_filename(BdrvChild *c, BlockDriverState *base,
const char *filename, Error **errp)
{
BlockDriverState *parent = c->opaque;
int orig_flags = bdrv_get_flags(parent);
int ret;
if (!(orig_flags & BDRV_O_RDWR)) {
ret = bdrv_reopen(parent, orig_flags | BDRV_O_RDWR, errp);
if (ret < 0) {
return ret;
}
}
ret = bdrv_change_backing_file(parent, filename,
base->drv ? base->drv->format_name : "");
if (ret < 0) {
error_setg_errno(errp, ret, "Could not update backing file link");
}
if (!(orig_flags & BDRV_O_RDWR)) {
bdrv_reopen(parent, orig_flags, NULL);
}
return ret;
}
const BdrvChildRole child_backing = {
.get_parent_desc = bdrv_child_get_parent_desc,
.attach = bdrv_backing_attach,
@@ -995,6 +1016,7 @@ const BdrvChildRole child_backing = {
.drained_begin = bdrv_child_cb_drained_begin,
.drained_end = bdrv_child_cb_drained_end,
.inactivate = bdrv_child_cb_inactivate,
.update_filename = bdrv_backing_update_filename,
};
static int bdrv_open_flags(BlockDriverState *bs, int flags)
@@ -1531,22 +1553,59 @@ static int bdrv_fill_options(QDict **options, const char *filename,
return 0;
}
static int bdrv_child_check_perm(BdrvChild *c, uint64_t perm, uint64_t shared,
static int bdrv_child_check_perm(BdrvChild *c, BlockReopenQueue *q,
uint64_t perm, uint64_t shared,
GSList *ignore_children, Error **errp);
static void bdrv_child_abort_perm_update(BdrvChild *c);
static void bdrv_child_set_perm(BdrvChild *c, uint64_t perm, uint64_t shared);
typedef struct BlockReopenQueueEntry {
bool prepared;
BDRVReopenState state;
QSIMPLEQ_ENTRY(BlockReopenQueueEntry) entry;
} BlockReopenQueueEntry;
/*
* Return the flags that @bs will have after the reopens in @q have
* successfully completed. If @q is NULL (or @bs is not contained in @q),
* return the current flags.
*/
static int bdrv_reopen_get_flags(BlockReopenQueue *q, BlockDriverState *bs)
{
BlockReopenQueueEntry *entry;
if (q != NULL) {
QSIMPLEQ_FOREACH(entry, q, entry) {
if (entry->state.bs == bs) {
return entry->state.flags;
}
}
}
return bs->open_flags;
}
/* Returns whether the image file can be written to after the reopen queue @q
* has been successfully applied, or right now if @q is NULL. */
static bool bdrv_is_writable(BlockDriverState *bs, BlockReopenQueue *q)
{
int flags = bdrv_reopen_get_flags(q, bs);
return (flags & (BDRV_O_RDWR | BDRV_O_INACTIVE)) == BDRV_O_RDWR;
}
static void bdrv_child_perm(BlockDriverState *bs, BlockDriverState *child_bs,
BdrvChild *c,
const BdrvChildRole *role,
BdrvChild *c, const BdrvChildRole *role,
BlockReopenQueue *reopen_queue,
uint64_t parent_perm, uint64_t parent_shared,
uint64_t *nperm, uint64_t *nshared)
{
if (bs->drv && bs->drv->bdrv_child_perm) {
bs->drv->bdrv_child_perm(bs, c, role,
bs->drv->bdrv_child_perm(bs, c, role, reopen_queue,
parent_perm, parent_shared,
nperm, nshared);
}
/* TODO Take force_share from reopen_queue */
if (child_bs && child_bs->force_share) {
*nshared = BLK_PERM_ALL;
}
@@ -1561,7 +1620,8 @@ static void bdrv_child_perm(BlockDriverState *bs, BlockDriverState *child_bs,
* A call to this function must always be followed by a call to bdrv_set_perm()
* or bdrv_abort_perm_update().
*/
static int bdrv_check_perm(BlockDriverState *bs, uint64_t cumulative_perms,
static int bdrv_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
uint64_t cumulative_perms,
uint64_t cumulative_shared_perms,
GSList *ignore_children, Error **errp)
{
@@ -1571,7 +1631,7 @@ static int bdrv_check_perm(BlockDriverState *bs, uint64_t cumulative_perms,
/* Write permissions never work with read-only images */
if ((cumulative_perms & (BLK_PERM_WRITE | BLK_PERM_WRITE_UNCHANGED)) &&
!bdrv_is_writable(bs))
!bdrv_is_writable(bs, q))
{
error_setg(errp, "Block node is read-only");
return -EPERM;
@@ -1596,11 +1656,11 @@ static int bdrv_check_perm(BlockDriverState *bs, uint64_t cumulative_perms,
/* Check all children */
QLIST_FOREACH(c, &bs->children, next) {
uint64_t cur_perm, cur_shared;
bdrv_child_perm(bs, c->bs, c, c->role,
bdrv_child_perm(bs, c->bs, c, c->role, q,
cumulative_perms, cumulative_shared_perms,
&cur_perm, &cur_shared);
ret = bdrv_child_check_perm(c, cur_perm, cur_shared, ignore_children,
errp);
ret = bdrv_child_check_perm(c, q, cur_perm, cur_shared,
ignore_children, errp);
if (ret < 0) {
return ret;
}
@@ -1658,7 +1718,7 @@ static void bdrv_set_perm(BlockDriverState *bs, uint64_t cumulative_perms,
/* Update all children */
QLIST_FOREACH(c, &bs->children, next) {
uint64_t cur_perm, cur_shared;
bdrv_child_perm(bs, c->bs, c, c->role,
bdrv_child_perm(bs, c->bs, c, c->role, NULL,
cumulative_perms, cumulative_shared_perms,
&cur_perm, &cur_shared);
bdrv_child_set_perm(c, cur_perm, cur_shared);
@@ -1726,7 +1786,8 @@ char *bdrv_perm_names(uint64_t perm)
*
* Needs to be followed by a call to either bdrv_set_perm() or
* bdrv_abort_perm_update(). */
static int bdrv_check_update_perm(BlockDriverState *bs, uint64_t new_used_perm,
static int bdrv_check_update_perm(BlockDriverState *bs, BlockReopenQueue *q,
uint64_t new_used_perm,
uint64_t new_shared_perm,
GSList *ignore_children, Error **errp)
{
@@ -1768,19 +1829,20 @@ static int bdrv_check_update_perm(BlockDriverState *bs, uint64_t new_used_perm,
cumulative_shared_perms &= c->shared_perm;
}
return bdrv_check_perm(bs, cumulative_perms, cumulative_shared_perms,
return bdrv_check_perm(bs, q, cumulative_perms, cumulative_shared_perms,
ignore_children, errp);
}
/* Needs to be followed by a call to either bdrv_child_set_perm() or
* bdrv_child_abort_perm_update(). */
static int bdrv_child_check_perm(BdrvChild *c, uint64_t perm, uint64_t shared,
static int bdrv_child_check_perm(BdrvChild *c, BlockReopenQueue *q,
uint64_t perm, uint64_t shared,
GSList *ignore_children, Error **errp)
{
int ret;
ignore_children = g_slist_prepend(g_slist_copy(ignore_children), c);
ret = bdrv_check_update_perm(c->bs, perm, shared, ignore_children, errp);
ret = bdrv_check_update_perm(c->bs, q, perm, shared, ignore_children, errp);
g_slist_free(ignore_children);
return ret;
@@ -1808,7 +1870,7 @@ int bdrv_child_try_set_perm(BdrvChild *c, uint64_t perm, uint64_t shared,
{
int ret;
ret = bdrv_child_check_perm(c, perm, shared, NULL, errp);
ret = bdrv_child_check_perm(c, NULL, perm, shared, NULL, errp);
if (ret < 0) {
bdrv_child_abort_perm_update(c);
return ret;
@@ -1827,6 +1889,7 @@ int bdrv_child_try_set_perm(BdrvChild *c, uint64_t perm, uint64_t shared,
void bdrv_filter_default_perms(BlockDriverState *bs, BdrvChild *c,
const BdrvChildRole *role,
BlockReopenQueue *reopen_queue,
uint64_t perm, uint64_t shared,
uint64_t *nperm, uint64_t *nshared)
{
@@ -1844,6 +1907,7 @@ void bdrv_filter_default_perms(BlockDriverState *bs, BdrvChild *c,
void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
const BdrvChildRole *role,
BlockReopenQueue *reopen_queue,
uint64_t perm, uint64_t shared,
uint64_t *nperm, uint64_t *nshared)
{
@@ -1853,10 +1917,11 @@ void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
if (!backing) {
/* Apart from the modifications below, the same permissions are
* forwarded and left alone as for filters */
bdrv_filter_default_perms(bs, c, role, perm, shared, &perm, &shared);
bdrv_filter_default_perms(bs, c, role, reopen_queue, perm, shared,
&perm, &shared);
/* Format drivers may touch metadata even if the guest doesn't write */
if (bdrv_is_writable(bs)) {
if (bdrv_is_writable(bs, reopen_queue)) {
perm |= BLK_PERM_WRITE | BLK_PERM_RESIZE;
}
@@ -1945,7 +2010,7 @@ static void bdrv_replace_child(BdrvChild *child, BlockDriverState *new_bs)
* because we're just taking a parent away, so we're loosening
* restrictions. */
bdrv_get_cumulative_perm(old_bs, &perm, &shared_perm);
bdrv_check_perm(old_bs, perm, shared_perm, NULL, &error_abort);
bdrv_check_perm(old_bs, NULL, perm, shared_perm, NULL, &error_abort);
bdrv_set_perm(old_bs, perm, shared_perm);
}
@@ -1964,7 +2029,7 @@ BdrvChild *bdrv_root_attach_child(BlockDriverState *child_bs,
BdrvChild *child;
int ret;
ret = bdrv_check_update_perm(child_bs, perm, shared_perm, NULL, errp);
ret = bdrv_check_update_perm(child_bs, NULL, perm, shared_perm, NULL, errp);
if (ret < 0) {
bdrv_abort_perm_update(child_bs);
return NULL;
@@ -1999,7 +2064,7 @@ BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
assert(parent_bs->drv);
assert(bdrv_get_aio_context(parent_bs) == bdrv_get_aio_context(child_bs));
bdrv_child_perm(parent_bs, child_bs, NULL, child_role,
bdrv_child_perm(parent_bs, child_bs, NULL, child_role, NULL,
perm, shared_perm, &perm, &shared_perm);
child = bdrv_root_attach_child(child_bs, child_name, child_role,
@@ -2633,12 +2698,6 @@ BlockDriverState *bdrv_open(const char *filename, const char *reference,
NULL, errp);
}
typedef struct BlockReopenQueueEntry {
bool prepared;
BDRVReopenState state;
QSIMPLEQ_ENTRY(BlockReopenQueueEntry) entry;
} BlockReopenQueueEntry;
/*
* Adds a BlockDriverState to a simple queue for an atomic, transactional
* reopen of multiple devices.
@@ -2737,6 +2796,23 @@ static BlockReopenQueue *bdrv_reopen_queue_child(BlockReopenQueue *bs_queue,
flags |= BDRV_O_ALLOW_RDWR;
}
if (!bs_entry) {
bs_entry = g_new0(BlockReopenQueueEntry, 1);
QSIMPLEQ_INSERT_TAIL(bs_queue, bs_entry, entry);
} else {
QDECREF(bs_entry->state.options);
QDECREF(bs_entry->state.explicit_options);
}
bs_entry->state.bs = bs;
bs_entry->state.options = options;
bs_entry->state.explicit_options = explicit_options;
bs_entry->state.flags = flags;
/* This needs to be overwritten in bdrv_reopen_prepare() */
bs_entry->state.perm = UINT64_MAX;
bs_entry->state.shared_perm = 0;
QLIST_FOREACH(child, &bs->children, next) {
QDict *new_child_options;
char *child_key_dot;
@@ -2756,19 +2832,6 @@ static BlockReopenQueue *bdrv_reopen_queue_child(BlockReopenQueue *bs_queue,
child->role, options, flags);
}
if (!bs_entry) {
bs_entry = g_new0(BlockReopenQueueEntry, 1);
QSIMPLEQ_INSERT_TAIL(bs_queue, bs_entry, entry);
} else {
QDECREF(bs_entry->state.options);
QDECREF(bs_entry->state.explicit_options);
}
bs_entry->state.bs = bs;
bs_entry->state.options = options;
bs_entry->state.explicit_options = explicit_options;
bs_entry->state.flags = flags;
return bs_queue;
}
@@ -2856,6 +2919,52 @@ int bdrv_reopen(BlockDriverState *bs, int bdrv_flags, Error **errp)
return ret;
}
static BlockReopenQueueEntry *find_parent_in_reopen_queue(BlockReopenQueue *q,
BdrvChild *c)
{
BlockReopenQueueEntry *entry;
QSIMPLEQ_FOREACH(entry, q, entry) {
BlockDriverState *bs = entry->state.bs;
BdrvChild *child;
QLIST_FOREACH(child, &bs->children, next) {
if (child == c) {
return entry;
}
}
}
return NULL;
}
static void bdrv_reopen_perm(BlockReopenQueue *q, BlockDriverState *bs,
uint64_t *perm, uint64_t *shared)
{
BdrvChild *c;
BlockReopenQueueEntry *parent;
uint64_t cumulative_perms = 0;
uint64_t cumulative_shared_perms = BLK_PERM_ALL;
QLIST_FOREACH(c, &bs->parents, next_parent) {
parent = find_parent_in_reopen_queue(q, c);
if (!parent) {
cumulative_perms |= c->perm;
cumulative_shared_perms &= c->shared_perm;
} else {
uint64_t nperm, nshared;
bdrv_child_perm(parent->state.bs, bs, c, c->role, q,
parent->state.perm, parent->state.shared_perm,
&nperm, &nshared);
cumulative_perms |= nperm;
cumulative_shared_perms &= nshared;
}
}
*perm = cumulative_perms;
*shared = cumulative_shared_perms;
}
/*
* Prepares a BlockDriverState for reopen. All changes are staged in the
@@ -2921,6 +3030,9 @@ int bdrv_reopen_prepare(BDRVReopenState *reopen_state, BlockReopenQueue *queue,
goto error;
}
/* Calculate required permissions after reopening */
bdrv_reopen_perm(queue, reopen_state->bs,
&reopen_state->perm, &reopen_state->shared_perm);
ret = bdrv_flush(reopen_state->bs);
if (ret) {
@@ -2976,6 +3088,12 @@ int bdrv_reopen_prepare(BDRVReopenState *reopen_state, BlockReopenQueue *queue,
} while ((entry = qdict_next(reopen_state->options, entry)));
}
ret = bdrv_check_perm(reopen_state->bs, queue, reopen_state->perm,
reopen_state->shared_perm, NULL, errp);
if (ret < 0) {
goto error;
}
ret = 0;
error:
@@ -3016,6 +3134,9 @@ void bdrv_reopen_commit(BDRVReopenState *reopen_state)
bdrv_refresh_limits(bs, NULL);
bdrv_set_perm(reopen_state->bs, reopen_state->perm,
reopen_state->shared_perm);
new_can_write =
!bdrv_is_read_only(bs) && !(bdrv_get_flags(bs) & BDRV_O_INACTIVE);
if (!old_can_write && new_can_write && drv->bdrv_reopen_bitmaps_rw) {
@@ -3049,6 +3170,8 @@ void bdrv_reopen_abort(BDRVReopenState *reopen_state)
}
QDECREF(reopen_state->explicit_options);
bdrv_abort_perm_update(reopen_state->bs);
}
@@ -3179,7 +3302,7 @@ void bdrv_replace_node(BlockDriverState *from, BlockDriverState *to,
/* Check whether the required permissions can be granted on @to, ignoring
* all BdrvChild in @list so that they can't block themselves. */
ret = bdrv_check_update_perm(to, perm, shared, list, errp);
ret = bdrv_check_update_perm(to, NULL, perm, shared, list, errp);
if (ret < 0) {
bdrv_abort_perm_update(to);
goto out;
@@ -3368,53 +3491,62 @@ BlockDriverState *bdrv_find_base(BlockDriverState *bs)
* if active == top, that is considered an error
*
*/
int bdrv_drop_intermediate(BlockDriverState *active, BlockDriverState *top,
BlockDriverState *base, const char *backing_file_str)
int bdrv_drop_intermediate(BlockDriverState *top, BlockDriverState *base,
const char *backing_file_str)
{
BlockDriverState *new_top_bs = NULL;
BdrvChild *c, *next;
Error *local_err = NULL;
int ret = -EIO;
bdrv_ref(top);
if (!top->drv || !base->drv) {
goto exit;
}
new_top_bs = bdrv_find_overlay(active, top);
if (new_top_bs == NULL) {
/* we could not find the image above 'top', this is an error */
goto exit;
}
/* special case of new_top_bs->backing->bs already pointing to base - nothing
* to do, no intermediate images */
if (backing_bs(new_top_bs) == base) {
ret = 0;
goto exit;
}
/* Make sure that base is in the backing chain of top */
if (!bdrv_chain_contains(top, base)) {
goto exit;
}
/* success - we can delete the intermediate states, and link top->base */
/* TODO Check graph modification op blockers (BLK_PERM_GRAPH_MOD) once
* we've figured out how they should work. */
backing_file_str = backing_file_str ? backing_file_str : base->filename;
ret = bdrv_change_backing_file(new_top_bs, backing_file_str,
base->drv ? base->drv->format_name : "");
if (ret) {
goto exit;
}
bdrv_set_backing_hd(new_top_bs, base, &local_err);
if (local_err) {
ret = -EPERM;
error_report_err(local_err);
goto exit;
QLIST_FOREACH_SAFE(c, &top->parents, next_parent, next) {
/* Check whether we are allowed to switch c from top to base */
GSList *ignore_children = g_slist_prepend(NULL, c);
bdrv_check_update_perm(base, NULL, c->perm, c->shared_perm,
ignore_children, &local_err);
if (local_err) {
ret = -EPERM;
error_report_err(local_err);
goto exit;
}
g_slist_free(ignore_children);
/* If so, update the backing file path in the image file */
if (c->role->update_filename) {
ret = c->role->update_filename(c, base, backing_file_str,
&local_err);
if (ret < 0) {
bdrv_abort_perm_update(base);
error_report_err(local_err);
goto exit;
}
}
/* Do the actual switch in the in-memory graph.
* Completes bdrv_check_update_perm() transaction internally. */
bdrv_ref(base);
bdrv_replace_child(c, base);
bdrv_unref(top);
}
ret = 0;
exit:
bdrv_unref(top);
return ret;
}
@@ -3450,12 +3582,18 @@ int bdrv_truncate(BdrvChild *child, int64_t offset, PreallocMode prealloc,
assert(!(bs->open_flags & BDRV_O_INACTIVE));
ret = drv->bdrv_truncate(bs, offset, prealloc, errp);
if (ret == 0) {
ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS);
bdrv_dirty_bitmap_truncate(bs);
bdrv_parent_cb_resize(bs);
atomic_inc(&bs->write_gen);
if (ret < 0) {
return ret;
}
ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS);
if (ret < 0) {
error_setg_errno(errp, -ret, "Could not refresh total sector count");
} else {
offset = bs->total_sectors * BDRV_SECTOR_SIZE;
}
bdrv_dirty_bitmap_truncate(bs, offset);
bdrv_parent_cb_resize(bs);
atomic_inc(&bs->write_gen);
return ret;
}
@@ -4049,7 +4187,7 @@ void bdrv_invalidate_cache(BlockDriverState *bs, Error **errp)
/* Update permissions, they may differ for inactive nodes */
bdrv_get_cumulative_perm(bs, &perm, &shared_perm);
ret = bdrv_check_perm(bs, perm, shared_perm, NULL, &local_err);
ret = bdrv_check_perm(bs, NULL, perm, shared_perm, NULL, &local_err);
if (ret < 0) {
bs->open_flags |= BDRV_O_INACTIVE;
error_propagate(errp, local_err);
@@ -4116,7 +4254,7 @@ static int bdrv_inactivate_recurse(BlockDriverState *bs,
/* Update permissions, they may differ for inactive nodes */
bdrv_get_cumulative_perm(bs, &perm, &shared_perm);
bdrv_check_perm(bs, perm, shared_perm, NULL, &error_abort);
bdrv_check_perm(bs, NULL, perm, shared_perm, NULL, &error_abort);
bdrv_set_perm(bs, perm, shared_perm);
}
@@ -4393,7 +4531,7 @@ void bdrv_img_create(const char *filename, const char *fmt,
/* The size for the image must always be specified, unless we have a backing
* file and we have not been forbidden from opening it. */
size = qemu_opt_get_size(opts, BLOCK_OPT_SIZE, 0);
size = qemu_opt_get_size(opts, BLOCK_OPT_SIZE, img_size);
if (backing_file && !(flags & BDRV_O_NO_BACKING)) {
BlockDriverState *bs;
char *full_backing = g_new0(char, PATH_MAX);

View File

@@ -372,10 +372,10 @@ static int coroutine_fn backup_run_incremental(BackupBlockJob *job)
granularity = bdrv_dirty_bitmap_granularity(job->sync_bitmap);
clusters_per_iter = MAX((granularity / job->cluster_size), 1);
dbi = bdrv_dirty_iter_new(job->sync_bitmap, 0);
dbi = bdrv_dirty_iter_new(job->sync_bitmap);
/* Find the next dirty sector(s) */
while ((offset = bdrv_dirty_iter_next(dbi) * BDRV_SECTOR_SIZE) >= 0) {
while ((offset = bdrv_dirty_iter_next(dbi)) >= 0) {
cluster = offset / job->cluster_size;
/* Fake progress updates for any clusters we skipped */
@@ -403,8 +403,7 @@ static int coroutine_fn backup_run_incremental(BackupBlockJob *job)
/* If the bitmap granularity is smaller than the backup granularity,
* we need to advance the iterator pointer to the next cluster. */
if (granularity < job->cluster_size) {
bdrv_set_dirty_iter(dbi,
cluster * job->cluster_size / BDRV_SECTOR_SIZE);
bdrv_set_dirty_iter(dbi, cluster * job->cluster_size);
}
last_cluster = cluster - 1;

View File

@@ -244,7 +244,6 @@ static int read_config(BDRVBlkdebugState *s, const char *filename,
ret = qemu_config_parse(f, config_groups, filename);
if (ret < 0) {
error_setg(errp, "Could not parse blkdebug config file");
ret = -EINVAL;
goto fail;
}
}

View File

@@ -36,13 +36,11 @@ enum {
typedef struct CommitBlockJob {
BlockJob common;
RateLimit limit;
BlockDriverState *active;
BlockDriverState *commit_top_bs;
BlockBackend *top;
BlockBackend *base;
BlockdevOnError on_error;
int base_flags;
int orig_overlay_flags;
char *backing_file_str;
} CommitBlockJob;
@@ -81,18 +79,15 @@ static void commit_complete(BlockJob *job, void *opaque)
{
CommitBlockJob *s = container_of(job, CommitBlockJob, common);
CommitCompleteData *data = opaque;
BlockDriverState *active = s->active;
BlockDriverState *top = blk_bs(s->top);
BlockDriverState *base = blk_bs(s->base);
BlockDriverState *overlay_bs = bdrv_find_overlay(active, s->commit_top_bs);
BlockDriverState *commit_top_bs = s->commit_top_bs;
int ret = data->ret;
bool remove_commit_top_bs = false;
/* Make sure overlay_bs and top stay around until bdrv_set_backing_hd() */
/* Make sure commit_top_bs and top stay around until bdrv_replace_node() */
bdrv_ref(top);
if (overlay_bs) {
bdrv_ref(overlay_bs);
}
bdrv_ref(commit_top_bs);
/* Remove base node parent that still uses BLK_PERM_WRITE/RESIZE before
* the normal backing chain can be restored. */
@@ -100,9 +95,9 @@ static void commit_complete(BlockJob *job, void *opaque)
if (!block_job_is_cancelled(&s->common) && ret == 0) {
/* success */
ret = bdrv_drop_intermediate(active, s->commit_top_bs, base,
ret = bdrv_drop_intermediate(s->commit_top_bs, base,
s->backing_file_str);
} else if (overlay_bs) {
} else {
/* XXX Can (or should) we somehow keep 'consistent read' blocked even
* after the failed/cancelled commit job is gone? If we already wrote
* something to base, the intermediate images aren't valid any more. */
@@ -115,9 +110,6 @@ static void commit_complete(BlockJob *job, void *opaque)
if (s->base_flags != bdrv_get_flags(base)) {
bdrv_reopen(base, s->base_flags, NULL);
}
if (overlay_bs && s->orig_overlay_flags != bdrv_get_flags(overlay_bs)) {
bdrv_reopen(overlay_bs, s->orig_overlay_flags, NULL);
}
g_free(s->backing_file_str);
blk_unref(s->top);
@@ -134,10 +126,13 @@ static void commit_complete(BlockJob *job, void *opaque)
* filter driver from the backing chain. Do this as the final step so that
* the 'consistent read' permission can be granted. */
if (remove_commit_top_bs) {
bdrv_set_backing_hd(overlay_bs, top, &error_abort);
bdrv_child_try_set_perm(commit_top_bs->backing, 0, BLK_PERM_ALL,
&error_abort);
bdrv_replace_node(commit_top_bs, backing_bs(commit_top_bs),
&error_abort);
}
bdrv_unref(overlay_bs);
bdrv_unref(commit_top_bs);
bdrv_unref(top);
}
@@ -257,6 +252,7 @@ static void bdrv_commit_top_close(BlockDriverState *bs)
static void bdrv_commit_top_child_perm(BlockDriverState *bs, BdrvChild *c,
const BdrvChildRole *role,
BlockReopenQueue *reopen_queue,
uint64_t perm, uint64_t shared,
uint64_t *nperm, uint64_t *nshared)
{
@@ -282,10 +278,8 @@ void commit_start(const char *job_id, BlockDriverState *bs,
{
CommitBlockJob *s;
BlockReopenQueue *reopen_queue = NULL;
int orig_overlay_flags;
int orig_base_flags;
BlockDriverState *iter;
BlockDriverState *overlay_bs;
BlockDriverState *commit_top_bs = NULL;
Error *local_err = NULL;
int ret;
@@ -296,31 +290,19 @@ void commit_start(const char *job_id, BlockDriverState *bs,
return;
}
overlay_bs = bdrv_find_overlay(bs, top);
if (overlay_bs == NULL) {
error_setg(errp, "Could not find overlay image for %s:", top->filename);
return;
}
s = block_job_create(job_id, &commit_job_driver, bs, 0, BLK_PERM_ALL,
speed, BLOCK_JOB_DEFAULT, NULL, NULL, errp);
if (!s) {
return;
}
orig_base_flags = bdrv_get_flags(base);
orig_overlay_flags = bdrv_get_flags(overlay_bs);
/* convert base & overlay_bs to r/w, if necessary */
/* convert base to r/w, if necessary */
orig_base_flags = bdrv_get_flags(base);
if (!(orig_base_flags & BDRV_O_RDWR)) {
reopen_queue = bdrv_reopen_queue(reopen_queue, base, NULL,
orig_base_flags | BDRV_O_RDWR);
}
if (!(orig_overlay_flags & BDRV_O_RDWR)) {
reopen_queue = bdrv_reopen_queue(reopen_queue, overlay_bs, NULL,
orig_overlay_flags | BDRV_O_RDWR);
}
if (reopen_queue) {
bdrv_reopen_multiple(bdrv_get_aio_context(bs), reopen_queue, &local_err);
if (local_err != NULL) {
@@ -349,7 +331,7 @@ void commit_start(const char *job_id, BlockDriverState *bs,
error_propagate(errp, local_err);
goto fail;
}
bdrv_set_backing_hd(overlay_bs, commit_top_bs, &local_err);
bdrv_replace_node(top, commit_top_bs, &local_err);
if (local_err) {
bdrv_unref(commit_top_bs);
commit_top_bs = NULL;
@@ -381,14 +363,6 @@ void commit_start(const char *job_id, BlockDriverState *bs,
goto fail;
}
/* overlay_bs must be blocked because it needs to be modified to
* update the backing image string. */
ret = block_job_add_bdrv(&s->common, "overlay of top", overlay_bs,
BLK_PERM_GRAPH_MOD, BLK_PERM_ALL, errp);
if (ret < 0) {
goto fail;
}
s->base = blk_new(BLK_PERM_CONSISTENT_READ
| BLK_PERM_WRITE
| BLK_PERM_RESIZE,
@@ -407,13 +381,8 @@ void commit_start(const char *job_id, BlockDriverState *bs,
goto fail;
}
s->active = bs;
s->base_flags = orig_base_flags;
s->orig_overlay_flags = orig_overlay_flags;
s->base_flags = orig_base_flags;
s->backing_file_str = g_strdup(backing_file_str);
s->on_error = on_error;
trace_commit_start(bs, base, top, s);
@@ -428,7 +397,7 @@ fail:
blk_unref(s->top);
}
if (commit_top_bs) {
bdrv_set_backing_hd(overlay_bs, top, &error_abort);
bdrv_replace_node(commit_top_bs, top, &error_abort);
}
block_job_early_fail(&s->common);
}

View File

@@ -279,6 +279,9 @@ static int block_crypto_open_generic(QCryptoBlockFormat format,
return -EINVAL;
}
bs->supported_write_flags = BDRV_REQ_FUA &
bs->file->bs->supported_write_flags;
opts = qemu_opts_create(opts_spec, NULL, 0, &error_abort);
qemu_opts_absorb_qdict(opts, options, &local_err);
if (local_err) {
@@ -364,8 +367,9 @@ static int block_crypto_truncate(BlockDriverState *bs, int64_t offset,
PreallocMode prealloc, Error **errp)
{
BlockCrypto *crypto = bs->opaque;
size_t payload_offset =
uint64_t payload_offset =
qcrypto_block_get_payload_offset(crypto->block);
assert(payload_offset < (INT64_MAX - offset));
offset += payload_offset;
@@ -379,66 +383,65 @@ static void block_crypto_close(BlockDriverState *bs)
}
#define BLOCK_CRYPTO_MAX_SECTORS 32
/*
* 1 MB bounce buffer gives good performance / memory tradeoff
* when using cache=none|directsync.
*/
#define BLOCK_CRYPTO_MAX_IO_SIZE (1024 * 1024)
static coroutine_fn int
block_crypto_co_readv(BlockDriverState *bs, int64_t sector_num,
int remaining_sectors, QEMUIOVector *qiov)
block_crypto_co_preadv(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
QEMUIOVector *qiov, int flags)
{
BlockCrypto *crypto = bs->opaque;
int cur_nr_sectors; /* number of sectors in current iteration */
uint64_t cur_bytes; /* number of bytes in current iteration */
uint64_t bytes_done = 0;
uint8_t *cipher_data = NULL;
QEMUIOVector hd_qiov;
int ret = 0;
size_t payload_offset =
qcrypto_block_get_payload_offset(crypto->block) / 512;
uint64_t sector_size = qcrypto_block_get_sector_size(crypto->block);
uint64_t payload_offset = qcrypto_block_get_payload_offset(crypto->block);
assert(!flags);
assert(payload_offset < INT64_MAX);
assert(QEMU_IS_ALIGNED(offset, sector_size));
assert(QEMU_IS_ALIGNED(bytes, sector_size));
qemu_iovec_init(&hd_qiov, qiov->niov);
/* Bounce buffer so we have a linear mem region for
* entire sector. XXX optimize so we avoid bounce
* buffer in case that qiov->niov == 1
/* Bounce buffer because we don't wish to expose cipher text
* in qiov which points to guest memory.
*/
cipher_data =
qemu_try_blockalign(bs->file->bs, MIN(BLOCK_CRYPTO_MAX_SECTORS * 512,
qemu_try_blockalign(bs->file->bs, MIN(BLOCK_CRYPTO_MAX_IO_SIZE,
qiov->size));
if (cipher_data == NULL) {
ret = -ENOMEM;
goto cleanup;
}
while (remaining_sectors) {
cur_nr_sectors = remaining_sectors;
if (cur_nr_sectors > BLOCK_CRYPTO_MAX_SECTORS) {
cur_nr_sectors = BLOCK_CRYPTO_MAX_SECTORS;
}
while (bytes) {
cur_bytes = MIN(bytes, BLOCK_CRYPTO_MAX_IO_SIZE);
qemu_iovec_reset(&hd_qiov);
qemu_iovec_add(&hd_qiov, cipher_data, cur_nr_sectors * 512);
qemu_iovec_add(&hd_qiov, cipher_data, cur_bytes);
ret = bdrv_co_readv(bs->file,
payload_offset + sector_num,
cur_nr_sectors, &hd_qiov);
ret = bdrv_co_preadv(bs->file, payload_offset + offset + bytes_done,
cur_bytes, &hd_qiov, 0);
if (ret < 0) {
goto cleanup;
}
if (qcrypto_block_decrypt(crypto->block,
sector_num,
cipher_data, cur_nr_sectors * 512,
NULL) < 0) {
if (qcrypto_block_decrypt(crypto->block, offset + bytes_done,
cipher_data, cur_bytes, NULL) < 0) {
ret = -EIO;
goto cleanup;
}
qemu_iovec_from_buf(qiov, bytes_done,
cipher_data, cur_nr_sectors * 512);
qemu_iovec_from_buf(qiov, bytes_done, cipher_data, cur_bytes);
remaining_sectors -= cur_nr_sectors;
sector_num += cur_nr_sectors;
bytes_done += cur_nr_sectors * 512;
bytes -= cur_bytes;
bytes_done += cur_bytes;
}
cleanup:
@@ -450,63 +453,58 @@ block_crypto_co_readv(BlockDriverState *bs, int64_t sector_num,
static coroutine_fn int
block_crypto_co_writev(BlockDriverState *bs, int64_t sector_num,
int remaining_sectors, QEMUIOVector *qiov)
block_crypto_co_pwritev(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
QEMUIOVector *qiov, int flags)
{
BlockCrypto *crypto = bs->opaque;
int cur_nr_sectors; /* number of sectors in current iteration */
uint64_t cur_bytes; /* number of bytes in current iteration */
uint64_t bytes_done = 0;
uint8_t *cipher_data = NULL;
QEMUIOVector hd_qiov;
int ret = 0;
size_t payload_offset =
qcrypto_block_get_payload_offset(crypto->block) / 512;
uint64_t sector_size = qcrypto_block_get_sector_size(crypto->block);
uint64_t payload_offset = qcrypto_block_get_payload_offset(crypto->block);
assert(!(flags & ~BDRV_REQ_FUA));
assert(payload_offset < INT64_MAX);
assert(QEMU_IS_ALIGNED(offset, sector_size));
assert(QEMU_IS_ALIGNED(bytes, sector_size));
qemu_iovec_init(&hd_qiov, qiov->niov);
/* Bounce buffer so we have a linear mem region for
* entire sector. XXX optimize so we avoid bounce
* buffer in case that qiov->niov == 1
/* Bounce buffer because we're not permitted to touch
* contents of qiov - it points to guest memory.
*/
cipher_data =
qemu_try_blockalign(bs->file->bs, MIN(BLOCK_CRYPTO_MAX_SECTORS * 512,
qemu_try_blockalign(bs->file->bs, MIN(BLOCK_CRYPTO_MAX_IO_SIZE,
qiov->size));
if (cipher_data == NULL) {
ret = -ENOMEM;
goto cleanup;
}
while (remaining_sectors) {
cur_nr_sectors = remaining_sectors;
while (bytes) {
cur_bytes = MIN(bytes, BLOCK_CRYPTO_MAX_IO_SIZE);
if (cur_nr_sectors > BLOCK_CRYPTO_MAX_SECTORS) {
cur_nr_sectors = BLOCK_CRYPTO_MAX_SECTORS;
}
qemu_iovec_to_buf(qiov, bytes_done, cipher_data, cur_bytes);
qemu_iovec_to_buf(qiov, bytes_done,
cipher_data, cur_nr_sectors * 512);
if (qcrypto_block_encrypt(crypto->block,
sector_num,
cipher_data, cur_nr_sectors * 512,
NULL) < 0) {
if (qcrypto_block_encrypt(crypto->block, offset + bytes_done,
cipher_data, cur_bytes, NULL) < 0) {
ret = -EIO;
goto cleanup;
}
qemu_iovec_reset(&hd_qiov);
qemu_iovec_add(&hd_qiov, cipher_data, cur_nr_sectors * 512);
qemu_iovec_add(&hd_qiov, cipher_data, cur_bytes);
ret = bdrv_co_writev(bs->file,
payload_offset + sector_num,
cur_nr_sectors, &hd_qiov);
ret = bdrv_co_pwritev(bs->file, payload_offset + offset + bytes_done,
cur_bytes, &hd_qiov, flags);
if (ret < 0) {
goto cleanup;
}
remaining_sectors -= cur_nr_sectors;
sector_num += cur_nr_sectors;
bytes_done += cur_nr_sectors * 512;
bytes -= cur_bytes;
bytes_done += cur_bytes;
}
cleanup:
@@ -516,13 +514,22 @@ block_crypto_co_writev(BlockDriverState *bs, int64_t sector_num,
return ret;
}
static void block_crypto_refresh_limits(BlockDriverState *bs, Error **errp)
{
BlockCrypto *crypto = bs->opaque;
uint64_t sector_size = qcrypto_block_get_sector_size(crypto->block);
bs->bl.request_alignment = sector_size; /* No sub-sector I/O */
}
static int64_t block_crypto_getlength(BlockDriverState *bs)
{
BlockCrypto *crypto = bs->opaque;
int64_t len = bdrv_getlength(bs->file->bs);
ssize_t offset = qcrypto_block_get_payload_offset(crypto->block);
uint64_t offset = qcrypto_block_get_payload_offset(crypto->block);
assert(offset < INT64_MAX);
assert(offset < len);
len -= offset;
@@ -613,8 +620,9 @@ BlockDriver bdrv_crypto_luks = {
.bdrv_truncate = block_crypto_truncate,
.create_opts = &block_crypto_create_opts_luks,
.bdrv_co_readv = block_crypto_co_readv,
.bdrv_co_writev = block_crypto_co_writev,
.bdrv_refresh_limits = block_crypto_refresh_limits,
.bdrv_co_preadv = block_crypto_co_preadv,
.bdrv_co_pwritev = block_crypto_co_pwritev,
.bdrv_getlength = block_crypto_getlength,
.bdrv_get_info = block_crypto_get_info_luks,
.bdrv_get_specific_info = block_crypto_get_specific_info_luks,

View File

@@ -1,7 +1,7 @@
/*
* Block Dirty Bitmap
*
* Copyright (c) 2016 Red Hat. Inc
* Copyright (c) 2016-2017 Red Hat. Inc
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
@@ -38,11 +38,11 @@
*/
struct BdrvDirtyBitmap {
QemuMutex *mutex;
HBitmap *bitmap; /* Dirty sector bitmap implementation */
HBitmap *bitmap; /* Dirty bitmap implementation */
HBitmap *meta; /* Meta dirty bitmap */
BdrvDirtyBitmap *successor; /* Anonymous child; implies frozen status */
char *name; /* Optional non-empty unique ID */
int64_t size; /* Size of the bitmap (Number of sectors) */
int64_t size; /* Size of the bitmap, in bytes */
bool disabled; /* Bitmap is disabled. It ignores all writes to
the device */
int active_iterators; /* How many iterators are active */
@@ -115,17 +115,14 @@ BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState *bs,
{
int64_t bitmap_size;
BdrvDirtyBitmap *bitmap;
uint32_t sector_granularity;
assert((granularity & (granularity - 1)) == 0);
assert(is_power_of_2(granularity) && granularity >= BDRV_SECTOR_SIZE);
if (name && bdrv_find_dirty_bitmap(bs, name)) {
error_setg(errp, "Bitmap already exists: %s", name);
return NULL;
}
sector_granularity = granularity >> BDRV_SECTOR_BITS;
assert(sector_granularity);
bitmap_size = bdrv_nb_sectors(bs);
bitmap_size = bdrv_getlength(bs);
if (bitmap_size < 0) {
error_setg_errno(errp, -bitmap_size, "could not get length of device");
errno = -bitmap_size;
@@ -133,7 +130,7 @@ BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState *bs,
}
bitmap = g_new0(BdrvDirtyBitmap, 1);
bitmap->mutex = &bs->dirty_bitmap_mutex;
bitmap->bitmap = hbitmap_alloc(bitmap_size, ctz32(sector_granularity));
bitmap->bitmap = hbitmap_alloc(bitmap_size, ctz32(granularity));
bitmap->size = bitmap_size;
bitmap->name = g_strdup(name);
bitmap->disabled = false;
@@ -173,45 +170,6 @@ void bdrv_release_meta_dirty_bitmap(BdrvDirtyBitmap *bitmap)
qemu_mutex_unlock(bitmap->mutex);
}
int bdrv_dirty_bitmap_get_meta_locked(BlockDriverState *bs,
BdrvDirtyBitmap *bitmap, int64_t sector,
int nb_sectors)
{
uint64_t i;
int sectors_per_bit = 1 << hbitmap_granularity(bitmap->meta);
/* To optimize: we can make hbitmap to internally check the range in a
* coarse level, or at least do it word by word. */
for (i = sector; i < sector + nb_sectors; i += sectors_per_bit) {
if (hbitmap_get(bitmap->meta, i)) {
return true;
}
}
return false;
}
int bdrv_dirty_bitmap_get_meta(BlockDriverState *bs,
BdrvDirtyBitmap *bitmap, int64_t sector,
int nb_sectors)
{
bool dirty;
qemu_mutex_lock(bitmap->mutex);
dirty = bdrv_dirty_bitmap_get_meta_locked(bs, bitmap, sector, nb_sectors);
qemu_mutex_unlock(bitmap->mutex);
return dirty;
}
void bdrv_dirty_bitmap_reset_meta(BlockDriverState *bs,
BdrvDirtyBitmap *bitmap, int64_t sector,
int nb_sectors)
{
qemu_mutex_lock(bitmap->mutex);
hbitmap_reset(bitmap->meta, sector, nb_sectors);
qemu_mutex_unlock(bitmap->mutex);
}
int64_t bdrv_dirty_bitmap_size(const BdrvDirtyBitmap *bitmap)
{
return bitmap->size;
@@ -341,17 +299,16 @@ BdrvDirtyBitmap *bdrv_reclaim_dirty_bitmap(BlockDriverState *bs,
* Truncates _all_ bitmaps attached to a BDS.
* Called with BQL taken.
*/
void bdrv_dirty_bitmap_truncate(BlockDriverState *bs)
void bdrv_dirty_bitmap_truncate(BlockDriverState *bs, int64_t bytes)
{
BdrvDirtyBitmap *bitmap;
uint64_t size = bdrv_nb_sectors(bs);
bdrv_dirty_bitmaps_lock(bs);
QLIST_FOREACH(bitmap, &bs->dirty_bitmaps, list) {
assert(!bdrv_dirty_bitmap_frozen(bitmap));
assert(!bitmap->active_iterators);
hbitmap_truncate(bitmap->bitmap, size);
bitmap->size = size;
hbitmap_truncate(bitmap->bitmap, bytes);
bitmap->size = bytes;
}
bdrv_dirty_bitmaps_unlock(bs);
}
@@ -461,7 +418,7 @@ BlockDirtyInfoList *bdrv_query_dirty_bitmaps(BlockDriverState *bs)
QLIST_FOREACH(bm, &bs->dirty_bitmaps, list) {
BlockDirtyInfo *info = g_new0(BlockDirtyInfo, 1);
BlockDirtyInfoList *entry = g_new0(BlockDirtyInfoList, 1);
info->count = bdrv_get_dirty_count(bm) << BDRV_SECTOR_BITS;
info->count = bdrv_get_dirty_count(bm);
info->granularity = bdrv_dirty_bitmap_granularity(bm);
info->has_name = !!bm->name;
info->name = g_strdup(bm->name);
@@ -476,13 +433,13 @@ BlockDirtyInfoList *bdrv_query_dirty_bitmaps(BlockDriverState *bs)
}
/* Called within bdrv_dirty_bitmap_lock..unlock */
int bdrv_get_dirty_locked(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
int64_t sector)
bool bdrv_get_dirty_locked(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
int64_t offset)
{
if (bitmap) {
return hbitmap_get(bitmap->bitmap, sector);
return hbitmap_get(bitmap->bitmap, offset);
} else {
return 0;
return false;
}
}
@@ -508,19 +465,13 @@ uint32_t bdrv_get_default_bitmap_granularity(BlockDriverState *bs)
uint32_t bdrv_dirty_bitmap_granularity(const BdrvDirtyBitmap *bitmap)
{
return BDRV_SECTOR_SIZE << hbitmap_granularity(bitmap->bitmap);
return 1U << hbitmap_granularity(bitmap->bitmap);
}
uint32_t bdrv_dirty_bitmap_meta_granularity(BdrvDirtyBitmap *bitmap)
{
return BDRV_SECTOR_SIZE << hbitmap_granularity(bitmap->meta);
}
BdrvDirtyBitmapIter *bdrv_dirty_iter_new(BdrvDirtyBitmap *bitmap,
uint64_t first_sector)
BdrvDirtyBitmapIter *bdrv_dirty_iter_new(BdrvDirtyBitmap *bitmap)
{
BdrvDirtyBitmapIter *iter = g_new(BdrvDirtyBitmapIter, 1);
hbitmap_iter_init(&iter->hbi, bitmap->bitmap, first_sector);
hbitmap_iter_init(&iter->hbi, bitmap->bitmap, 0);
iter->bitmap = bitmap;
bitmap->active_iterators++;
return iter;
@@ -552,35 +503,35 @@ int64_t bdrv_dirty_iter_next(BdrvDirtyBitmapIter *iter)
/* Called within bdrv_dirty_bitmap_lock..unlock */
void bdrv_set_dirty_bitmap_locked(BdrvDirtyBitmap *bitmap,
int64_t cur_sector, int64_t nr_sectors)
int64_t offset, int64_t bytes)
{
assert(bdrv_dirty_bitmap_enabled(bitmap));
assert(!bdrv_dirty_bitmap_readonly(bitmap));
hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors);
hbitmap_set(bitmap->bitmap, offset, bytes);
}
void bdrv_set_dirty_bitmap(BdrvDirtyBitmap *bitmap,
int64_t cur_sector, int64_t nr_sectors)
int64_t offset, int64_t bytes)
{
bdrv_dirty_bitmap_lock(bitmap);
bdrv_set_dirty_bitmap_locked(bitmap, cur_sector, nr_sectors);
bdrv_set_dirty_bitmap_locked(bitmap, offset, bytes);
bdrv_dirty_bitmap_unlock(bitmap);
}
/* Called within bdrv_dirty_bitmap_lock..unlock */
void bdrv_reset_dirty_bitmap_locked(BdrvDirtyBitmap *bitmap,
int64_t cur_sector, int64_t nr_sectors)
int64_t offset, int64_t bytes)
{
assert(bdrv_dirty_bitmap_enabled(bitmap));
assert(!bdrv_dirty_bitmap_readonly(bitmap));
hbitmap_reset(bitmap->bitmap, cur_sector, nr_sectors);
hbitmap_reset(bitmap->bitmap, offset, bytes);
}
void bdrv_reset_dirty_bitmap(BdrvDirtyBitmap *bitmap,
int64_t cur_sector, int64_t nr_sectors)
int64_t offset, int64_t bytes)
{
bdrv_dirty_bitmap_lock(bitmap);
bdrv_reset_dirty_bitmap_locked(bitmap, cur_sector, nr_sectors);
bdrv_reset_dirty_bitmap_locked(bitmap, offset, bytes);
bdrv_dirty_bitmap_unlock(bitmap);
}
@@ -610,42 +561,42 @@ void bdrv_undo_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap, HBitmap *in)
}
uint64_t bdrv_dirty_bitmap_serialization_size(const BdrvDirtyBitmap *bitmap,
uint64_t start, uint64_t count)
uint64_t offset, uint64_t bytes)
{
return hbitmap_serialization_size(bitmap->bitmap, start, count);
return hbitmap_serialization_size(bitmap->bitmap, offset, bytes);
}
uint64_t bdrv_dirty_bitmap_serialization_align(const BdrvDirtyBitmap *bitmap)
{
return hbitmap_serialization_granularity(bitmap->bitmap);
return hbitmap_serialization_align(bitmap->bitmap);
}
void bdrv_dirty_bitmap_serialize_part(const BdrvDirtyBitmap *bitmap,
uint8_t *buf, uint64_t start,
uint64_t count)
uint8_t *buf, uint64_t offset,
uint64_t bytes)
{
hbitmap_serialize_part(bitmap->bitmap, buf, start, count);
hbitmap_serialize_part(bitmap->bitmap, buf, offset, bytes);
}
void bdrv_dirty_bitmap_deserialize_part(BdrvDirtyBitmap *bitmap,
uint8_t *buf, uint64_t start,
uint64_t count, bool finish)
uint8_t *buf, uint64_t offset,
uint64_t bytes, bool finish)
{
hbitmap_deserialize_part(bitmap->bitmap, buf, start, count, finish);
hbitmap_deserialize_part(bitmap->bitmap, buf, offset, bytes, finish);
}
void bdrv_dirty_bitmap_deserialize_zeroes(BdrvDirtyBitmap *bitmap,
uint64_t start, uint64_t count,
uint64_t offset, uint64_t bytes,
bool finish)
{
hbitmap_deserialize_zeroes(bitmap->bitmap, start, count, finish);
hbitmap_deserialize_zeroes(bitmap->bitmap, offset, bytes, finish);
}
void bdrv_dirty_bitmap_deserialize_ones(BdrvDirtyBitmap *bitmap,
uint64_t start, uint64_t count,
uint64_t offset, uint64_t bytes,
bool finish)
{
hbitmap_deserialize_ones(bitmap->bitmap, start, count, finish);
hbitmap_deserialize_ones(bitmap->bitmap, offset, bytes, finish);
}
void bdrv_dirty_bitmap_deserialize_finish(BdrvDirtyBitmap *bitmap)
@@ -653,8 +604,7 @@ void bdrv_dirty_bitmap_deserialize_finish(BdrvDirtyBitmap *bitmap)
hbitmap_deserialize_finish(bitmap->bitmap);
}
void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
int64_t nr_sectors)
void bdrv_set_dirty(BlockDriverState *bs, int64_t offset, int64_t bytes)
{
BdrvDirtyBitmap *bitmap;
@@ -668,7 +618,7 @@ void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
continue;
}
assert(!bdrv_dirty_bitmap_readonly(bitmap));
hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors);
hbitmap_set(bitmap->bitmap, offset, bytes);
}
bdrv_dirty_bitmaps_unlock(bs);
}
@@ -676,9 +626,9 @@ void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
/**
* Advance a BdrvDirtyBitmapIter to an arbitrary offset.
*/
void bdrv_set_dirty_iter(BdrvDirtyBitmapIter *iter, int64_t sector_num)
void bdrv_set_dirty_iter(BdrvDirtyBitmapIter *iter, int64_t offset)
{
hbitmap_iter_init(&iter->hbi, iter->hbi.hb, sector_num);
hbitmap_iter_init(&iter->hbi, iter->hbi.hb, offset);
}
int64_t bdrv_get_dirty_count(BdrvDirtyBitmap *bitmap)

View File

@@ -33,6 +33,9 @@
#include "block/raw-aio.h"
#include "qapi/qmp/qstring.h"
#include "scsi/pr-manager.h"
#include "scsi/constants.h"
#if defined(__APPLE__) && (__MACH__)
#include <paths.h>
#include <sys/param.h>
@@ -155,6 +158,8 @@ typedef struct BDRVRawState {
bool page_cache_inconsistent:1;
bool has_fallocate;
bool needs_alignment;
PRManager *pr_mgr;
} BDRVRawState;
typedef struct BDRVRawReopenState {
@@ -402,6 +407,11 @@ static QemuOptsList raw_runtime_opts = {
.type = QEMU_OPT_STRING,
.help = "file locking mode (on/off/auto, default: auto)",
},
{
.name = "pr-manager",
.type = QEMU_OPT_STRING,
.help = "id of persistent reservation manager object (default: none)",
},
{ /* end of list */ }
},
};
@@ -413,6 +423,7 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
QemuOpts *opts;
Error *local_err = NULL;
const char *filename = NULL;
const char *str;
BlockdevAioOptions aio, aio_default;
int fd, ret;
struct stat st;
@@ -476,6 +487,16 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
abort();
}
str = qemu_opt_get(opts, "pr-manager");
if (str) {
s->pr_mgr = pr_manager_lookup(str, &local_err);
if (local_err) {
error_propagate(errp, local_err);
ret = -EINVAL;
goto fail;
}
}
s->open_flags = open_flags;
raw_parse_flags(bdrv_flags, &s->open_flags);
@@ -2597,6 +2618,15 @@ static BlockAIOCB *hdev_aio_ioctl(BlockDriverState *bs,
if (fd_open(bs) < 0)
return NULL;
if (req == SG_IO && s->pr_mgr) {
struct sg_io_hdr *io_hdr = buf;
if (io_hdr->cmdp[0] == PERSISTENT_RESERVE_OUT ||
io_hdr->cmdp[0] == PERSISTENT_RESERVE_IN) {
return pr_manager_execute(s->pr_mgr, bdrv_get_aio_context(bs),
s->fd, io_hdr, cb, opaque);
}
}
acb = g_new(RawPosixAIOData, 1);
acb->bs = bs;
acb->aio_type = QEMU_AIO_IOCTL;
@@ -2700,6 +2730,16 @@ static int hdev_create(const char *filename, QemuOpts *opts,
ret = -ENOSPC;
}
if (!ret && total_size) {
uint8_t buf[BDRV_SECTOR_SIZE] = { 0 };
int64_t zero_size = MIN(BDRV_SECTOR_SIZE, total_size);
if (lseek(fd, 0, SEEK_SET) == -1) {
ret = -errno;
} else {
ret = qemu_write_full(fd, buf, zero_size);
ret = ret == zero_size ? 0 : -errno;
}
}
qemu_close(fd);
return ret;
}

View File

@@ -34,6 +34,9 @@
#define NOT_DONE 0x7fffffff /* used while emulated sync operation in progress */
/* Maximum bounce buffer for copy-on-read and write zeroes, in bytes */
#define MAX_BOUNCE_BUFFER (32768 << BDRV_SECTOR_BITS)
static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
int64_t offset, int bytes, BdrvRequestFlags flags);
@@ -945,68 +948,114 @@ static int coroutine_fn bdrv_co_do_copy_on_readv(BdrvChild *child,
BlockDriver *drv = bs->drv;
struct iovec iov;
QEMUIOVector bounce_qiov;
QEMUIOVector local_qiov;
int64_t cluster_offset;
unsigned int cluster_bytes;
size_t skip_bytes;
int ret;
int max_transfer = MIN_NON_ZERO(bs->bl.max_transfer,
BDRV_REQUEST_MAX_BYTES);
unsigned int progress = 0;
/* FIXME We cannot require callers to have write permissions when all they
* are doing is a read request. If we did things right, write permissions
* would be obtained anyway, but internally by the copy-on-read code. As
* long as it is implemented here rather than in a separat filter driver,
* long as it is implemented here rather than in a separate filter driver,
* the copy-on-read code doesn't have its own BdrvChild, however, for which
* it could request permissions. Therefore we have to bypass the permission
* system for the moment. */
// assert(child->perm & (BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE));
/* Cover entire cluster so no additional backing file I/O is required when
* allocating cluster in the image file.
* allocating cluster in the image file. Note that this value may exceed
* BDRV_REQUEST_MAX_BYTES (even when the original read did not), which
* is one reason we loop rather than doing it all at once.
*/
bdrv_round_to_clusters(bs, offset, bytes, &cluster_offset, &cluster_bytes);
skip_bytes = offset - cluster_offset;
trace_bdrv_co_do_copy_on_readv(bs, offset, bytes,
cluster_offset, cluster_bytes);
iov.iov_len = cluster_bytes;
iov.iov_base = bounce_buffer = qemu_try_blockalign(bs, iov.iov_len);
bounce_buffer = qemu_try_blockalign(bs,
MIN(MIN(max_transfer, cluster_bytes),
MAX_BOUNCE_BUFFER));
if (bounce_buffer == NULL) {
ret = -ENOMEM;
goto err;
}
qemu_iovec_init_external(&bounce_qiov, &iov, 1);
while (cluster_bytes) {
int64_t pnum;
ret = bdrv_driver_preadv(bs, cluster_offset, cluster_bytes,
&bounce_qiov, 0);
if (ret < 0) {
goto err;
ret = bdrv_is_allocated(bs, cluster_offset,
MIN(cluster_bytes, max_transfer), &pnum);
if (ret < 0) {
/* Safe to treat errors in querying allocation as if
* unallocated; we'll probably fail again soon on the
* read, but at least that will set a decent errno.
*/
pnum = MIN(cluster_bytes, max_transfer);
}
assert(skip_bytes < pnum);
if (ret <= 0) {
/* Must copy-on-read; use the bounce buffer */
iov.iov_base = bounce_buffer;
iov.iov_len = pnum = MIN(pnum, MAX_BOUNCE_BUFFER);
qemu_iovec_init_external(&local_qiov, &iov, 1);
ret = bdrv_driver_preadv(bs, cluster_offset, pnum,
&local_qiov, 0);
if (ret < 0) {
goto err;
}
bdrv_debug_event(bs, BLKDBG_COR_WRITE);
if (drv->bdrv_co_pwrite_zeroes &&
buffer_is_zero(bounce_buffer, pnum)) {
/* FIXME: Should we (perhaps conditionally) be setting
* BDRV_REQ_MAY_UNMAP, if it will allow for a sparser copy
* that still correctly reads as zero? */
ret = bdrv_co_do_pwrite_zeroes(bs, cluster_offset, pnum, 0);
} else {
/* This does not change the data on the disk, it is not
* necessary to flush even in cache=writethrough mode.
*/
ret = bdrv_driver_pwritev(bs, cluster_offset, pnum,
&local_qiov, 0);
}
if (ret < 0) {
/* It might be okay to ignore write errors for guest
* requests. If this is a deliberate copy-on-read
* then we don't want to ignore the error. Simply
* report it in all cases.
*/
goto err;
}
qemu_iovec_from_buf(qiov, progress, bounce_buffer + skip_bytes,
pnum - skip_bytes);
} else {
/* Read directly into the destination */
qemu_iovec_init(&local_qiov, qiov->niov);
qemu_iovec_concat(&local_qiov, qiov, progress, pnum - skip_bytes);
ret = bdrv_driver_preadv(bs, offset + progress, local_qiov.size,
&local_qiov, 0);
qemu_iovec_destroy(&local_qiov);
if (ret < 0) {
goto err;
}
}
cluster_offset += pnum;
cluster_bytes -= pnum;
progress += pnum - skip_bytes;
skip_bytes = 0;
}
if (drv->bdrv_co_pwrite_zeroes &&
buffer_is_zero(bounce_buffer, iov.iov_len)) {
/* FIXME: Should we (perhaps conditionally) be setting
* BDRV_REQ_MAY_UNMAP, if it will allow for a sparser copy
* that still correctly reads as zero? */
ret = bdrv_co_do_pwrite_zeroes(bs, cluster_offset, cluster_bytes, 0);
} else {
/* This does not change the data on the disk, it is not necessary
* to flush even in cache=writethrough mode.
*/
ret = bdrv_driver_pwritev(bs, cluster_offset, cluster_bytes,
&bounce_qiov, 0);
}
if (ret < 0) {
/* It might be okay to ignore write errors for guest requests. If this
* is a deliberate copy-on-read then we don't want to ignore the error.
* Simply report it in all cases.
*/
goto err;
}
skip_bytes = offset - cluster_offset;
qemu_iovec_from_buf(qiov, 0, bounce_buffer + skip_bytes, bytes);
ret = 0;
err:
qemu_vfree(bounce_buffer);
@@ -1212,9 +1261,6 @@ int coroutine_fn bdrv_co_readv(BdrvChild *child, int64_t sector_num,
return bdrv_co_do_readv(child, sector_num, nb_sectors, qiov, 0);
}
/* Maximum buffer for write zeroes fallback, in bytes */
#define MAX_WRITE_ZEROES_BOUNCE_BUFFER (32768 << BDRV_SECTOR_BITS)
static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
int64_t offset, int bytes, BdrvRequestFlags flags)
{
@@ -1229,8 +1275,7 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
int max_write_zeroes = MIN_NON_ZERO(bs->bl.max_pwrite_zeroes, INT_MAX);
int alignment = MAX(bs->bl.pwrite_zeroes_alignment,
bs->bl.request_alignment);
int max_transfer = MIN_NON_ZERO(bs->bl.max_transfer,
MAX_WRITE_ZEROES_BOUNCE_BUFFER);
int max_transfer = MIN_NON_ZERO(bs->bl.max_transfer, MAX_BOUNCE_BUFFER);
assert(alignment % bs->bl.request_alignment == 0);
head = offset % alignment;
@@ -1334,7 +1379,6 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild *child,
bool waited;
int ret;
int64_t start_sector = offset >> BDRV_SECTOR_BITS;
int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE);
uint64_t bytes_remaining = bytes;
int max_transfer;
@@ -1409,7 +1453,7 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild *child,
bdrv_debug_event(bs, BLKDBG_PWRITEV_DONE);
atomic_inc(&bs->write_gen);
bdrv_set_dirty(bs, start_sector, end_sector - start_sector);
bdrv_set_dirty(bs, offset, bytes);
stat64_max(&bs->wr_highest_offset, offset + bytes);
@@ -1778,6 +1822,10 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
*pnum = 0;
return BDRV_BLOCK_EOF;
}
if (!nb_sectors) {
*pnum = 0;
return 0;
}
n = total_sectors - sector_num;
if (n < nb_sectors) {
@@ -2438,8 +2486,7 @@ int coroutine_fn bdrv_co_pdiscard(BlockDriverState *bs, int64_t offset,
ret = 0;
out:
atomic_inc(&bs->write_gen);
bdrv_set_dirty(bs, req.offset >> BDRV_SECTOR_BITS,
req.bytes >> BDRV_SECTOR_BITS);
bdrv_set_dirty(bs, req.offset, req.bytes);
tracked_request_end(&req);
bdrv_dec_in_flight(bs);
return ret;

View File

@@ -141,8 +141,7 @@ static void mirror_write_complete(void *opaque, int ret)
if (ret < 0) {
BlockErrorAction action;
bdrv_set_dirty_bitmap(s->dirty_bitmap, op->offset >> BDRV_SECTOR_BITS,
op->bytes >> BDRV_SECTOR_BITS);
bdrv_set_dirty_bitmap(s->dirty_bitmap, op->offset, op->bytes);
action = mirror_error_action(s, false, -ret);
if (action == BLOCK_ERROR_ACTION_REPORT && s->ret >= 0) {
s->ret = ret;
@@ -161,8 +160,7 @@ static void mirror_read_complete(void *opaque, int ret)
if (ret < 0) {
BlockErrorAction action;
bdrv_set_dirty_bitmap(s->dirty_bitmap, op->offset >> BDRV_SECTOR_BITS,
op->bytes >> BDRV_SECTOR_BITS);
bdrv_set_dirty_bitmap(s->dirty_bitmap, op->offset, op->bytes);
action = mirror_error_action(s, true, -ret);
if (action == BLOCK_ERROR_ACTION_REPORT && s->ret >= 0) {
s->ret = ret;
@@ -336,12 +334,11 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
int max_io_bytes = MAX(s->buf_size / MAX_IN_FLIGHT, MAX_IO_BYTES);
bdrv_dirty_bitmap_lock(s->dirty_bitmap);
offset = bdrv_dirty_iter_next(s->dbi) * BDRV_SECTOR_SIZE;
offset = bdrv_dirty_iter_next(s->dbi);
if (offset < 0) {
bdrv_set_dirty_iter(s->dbi, 0);
offset = bdrv_dirty_iter_next(s->dbi) * BDRV_SECTOR_SIZE;
trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bitmap) *
BDRV_SECTOR_SIZE);
offset = bdrv_dirty_iter_next(s->dbi);
trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bitmap));
assert(offset >= 0);
}
bdrv_dirty_bitmap_unlock(s->dirty_bitmap);
@@ -362,19 +359,18 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
int64_t next_offset = offset + nb_chunks * s->granularity;
int64_t next_chunk = next_offset / s->granularity;
if (next_offset >= s->bdev_length ||
!bdrv_get_dirty_locked(source, s->dirty_bitmap,
next_offset >> BDRV_SECTOR_BITS)) {
!bdrv_get_dirty_locked(source, s->dirty_bitmap, next_offset)) {
break;
}
if (test_bit(next_chunk, s->in_flight_bitmap)) {
break;
}
next_dirty = bdrv_dirty_iter_next(s->dbi) * BDRV_SECTOR_SIZE;
next_dirty = bdrv_dirty_iter_next(s->dbi);
if (next_dirty > next_offset || next_dirty < 0) {
/* The bitmap iterator's cache is stale, refresh it */
bdrv_set_dirty_iter(s->dbi, next_offset >> BDRV_SECTOR_BITS);
next_dirty = bdrv_dirty_iter_next(s->dbi) * BDRV_SECTOR_SIZE;
bdrv_set_dirty_iter(s->dbi, next_offset);
next_dirty = bdrv_dirty_iter_next(s->dbi);
}
assert(next_dirty == next_offset);
nb_chunks++;
@@ -384,8 +380,8 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
* calling bdrv_get_block_status_above could yield - if some blocks are
* marked dirty in this window, we need to know.
*/
bdrv_reset_dirty_bitmap_locked(s->dirty_bitmap, offset >> BDRV_SECTOR_BITS,
nb_chunks * sectors_per_chunk);
bdrv_reset_dirty_bitmap_locked(s->dirty_bitmap, offset,
nb_chunks * s->granularity);
bdrv_dirty_bitmap_unlock(s->dirty_bitmap);
bitmap_set(s->in_flight_bitmap, offset / s->granularity, nb_chunks);
@@ -616,25 +612,23 @@ static void mirror_throttle(MirrorBlockJob *s)
static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
{
int64_t sector_num, end;
int64_t offset;
BlockDriverState *base = s->base;
BlockDriverState *bs = s->source;
BlockDriverState *target_bs = blk_bs(s->target);
int ret, n;
int ret;
int64_t count;
end = s->bdev_length / BDRV_SECTOR_SIZE;
if (base == NULL && !bdrv_has_zero_init(target_bs)) {
if (!bdrv_can_write_zeroes_with_unmap(target_bs)) {
bdrv_set_dirty_bitmap(s->dirty_bitmap, 0, end);
bdrv_set_dirty_bitmap(s->dirty_bitmap, 0, s->bdev_length);
return 0;
}
s->initial_zeroing_ongoing = true;
for (sector_num = 0; sector_num < end; ) {
int nb_sectors = MIN(end - sector_num,
QEMU_ALIGN_DOWN(INT_MAX, s->granularity) >> BDRV_SECTOR_BITS);
for (offset = 0; offset < s->bdev_length; ) {
int bytes = MIN(s->bdev_length - offset,
QEMU_ALIGN_DOWN(INT_MAX, s->granularity));
mirror_throttle(s);
@@ -650,9 +644,8 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
continue;
}
mirror_do_zero_or_discard(s, sector_num * BDRV_SECTOR_SIZE,
nb_sectors * BDRV_SECTOR_SIZE, false);
sector_num += nb_sectors;
mirror_do_zero_or_discard(s, offset, bytes, false);
offset += bytes;
}
mirror_wait_for_all_io(s);
@@ -660,10 +653,10 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
}
/* First part, loop on the sectors and initialize the dirty bitmap. */
for (sector_num = 0; sector_num < end; ) {
for (offset = 0; offset < s->bdev_length; ) {
/* Just to make sure we are not exceeding int limit. */
int nb_sectors = MIN(INT_MAX >> BDRV_SECTOR_BITS,
end - sector_num);
int bytes = MIN(s->bdev_length - offset,
QEMU_ALIGN_DOWN(INT_MAX, s->granularity));
mirror_throttle(s);
@@ -671,21 +664,16 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
return 0;
}
ret = bdrv_is_allocated_above(bs, base, sector_num * BDRV_SECTOR_SIZE,
nb_sectors * BDRV_SECTOR_SIZE, &count);
ret = bdrv_is_allocated_above(bs, base, offset, bytes, &count);
if (ret < 0) {
return ret;
}
/* TODO: Relax this once bdrv_is_allocated_above and dirty
* bitmaps no longer require sector alignment. */
assert(QEMU_IS_ALIGNED(count, BDRV_SECTOR_SIZE));
n = count >> BDRV_SECTOR_BITS;
assert(n > 0);
assert(count);
if (ret == 1) {
bdrv_set_dirty_bitmap(s->dirty_bitmap, sector_num, n);
bdrv_set_dirty_bitmap(s->dirty_bitmap, offset, count);
}
sector_num += n;
offset += count;
}
return 0;
}
@@ -796,7 +784,7 @@ static void coroutine_fn mirror_run(void *opaque)
}
assert(!s->dbi);
s->dbi = bdrv_dirty_iter_new(s->dirty_bitmap, 0);
s->dbi = bdrv_dirty_iter_new(s->dirty_bitmap);
for (;;) {
uint64_t delay_ns = 0;
int64_t cnt, delta;
@@ -811,11 +799,10 @@ static void coroutine_fn mirror_run(void *opaque)
cnt = bdrv_get_dirty_count(s->dirty_bitmap);
/* s->common.offset contains the number of bytes already processed so
* far, cnt is the number of dirty sectors remaining and
* far, cnt is the number of dirty bytes remaining and
* s->bytes_in_flight is the number of bytes currently being
* processed; together those are the current total operation length */
s->common.len = s->common.offset + s->bytes_in_flight +
cnt * BDRV_SECTOR_SIZE;
s->common.len = s->common.offset + s->bytes_in_flight + cnt;
/* Note that even when no rate limit is applied we need to yield
* periodically with no pending I/O so that bdrv_drain_all() returns.
@@ -827,8 +814,7 @@ static void coroutine_fn mirror_run(void *opaque)
s->common.iostatus == BLOCK_DEVICE_IO_STATUS_OK) {
if (s->in_flight >= MAX_IN_FLIGHT || s->buf_free_count == 0 ||
(cnt == 0 && s->in_flight > 0)) {
trace_mirror_yield(s, cnt * BDRV_SECTOR_SIZE,
s->buf_free_count, s->in_flight);
trace_mirror_yield(s, cnt, s->buf_free_count, s->in_flight);
mirror_wait_for_io(s);
continue;
} else if (cnt != 0) {
@@ -869,7 +855,7 @@ static void coroutine_fn mirror_run(void *opaque)
* whether to switch to target check one last time if I/O has
* come in the meanwhile, and if not flush the data to disk.
*/
trace_mirror_before_drain(s, cnt * BDRV_SECTOR_SIZE);
trace_mirror_before_drain(s, cnt);
bdrv_drained_begin(bs);
cnt = bdrv_get_dirty_count(s->dirty_bitmap);
@@ -888,8 +874,7 @@ static void coroutine_fn mirror_run(void *opaque)
}
ret = 0;
trace_mirror_before_sleep(s, cnt * BDRV_SECTOR_SIZE,
s->synced, delay_ns);
trace_mirror_before_sleep(s, cnt, s->synced, delay_ns);
if (!s->synced) {
block_job_sleep_ns(&s->common, QEMU_CLOCK_REALTIME, delay_ns);
if (block_job_is_cancelled(&s->common)) {
@@ -1056,6 +1041,10 @@ static int coroutine_fn bdrv_mirror_top_pwritev(BlockDriverState *bs,
static int coroutine_fn bdrv_mirror_top_flush(BlockDriverState *bs)
{
if (bs->backing == NULL) {
/* we can be here after failed bdrv_append in mirror_start_job */
return 0;
}
return bdrv_co_flush(bs->backing->bs);
}
@@ -1073,6 +1062,11 @@ static int coroutine_fn bdrv_mirror_top_pdiscard(BlockDriverState *bs,
static void bdrv_mirror_top_refresh_filename(BlockDriverState *bs, QDict *opts)
{
if (bs->backing == NULL) {
/* we can be here after failed bdrv_attach_child in
* bdrv_set_backing_hd */
return;
}
bdrv_refresh_filename(bs->backing->bs);
pstrcpy(bs->exact_filename, sizeof(bs->exact_filename),
bs->backing->bs->filename);
@@ -1084,6 +1078,7 @@ static void bdrv_mirror_top_close(BlockDriverState *bs)
static void bdrv_mirror_top_child_perm(BlockDriverState *bs, BdrvChild *c,
const BdrvChildRole *role,
BlockReopenQueue *reopen_queue,
uint64_t perm, uint64_t shared,
uint64_t *nperm, uint64_t *nshared)
{

View File

@@ -31,8 +31,8 @@
#include "qapi/error.h"
#include "nbd-client.h"
#define HANDLE_TO_INDEX(bs, handle) ((handle) ^ ((uint64_t)(intptr_t)bs))
#define INDEX_TO_HANDLE(bs, index) ((index) ^ ((uint64_t)(intptr_t)bs))
#define HANDLE_TO_INDEX(bs, handle) ((handle) ^ (uint64_t)(intptr_t)(bs))
#define INDEX_TO_HANDLE(bs, index) ((index) ^ (uint64_t)(intptr_t)(bs))
static void nbd_recv_coroutines_wake_all(NBDClientSession *s)
{
@@ -161,6 +161,8 @@ static int nbd_co_send_request(BlockDriverState *bs,
NULL) < 0) {
rc = -EIO;
}
} else if (rc >= 0) {
rc = -EIO;
}
qio_channel_set_cork(s->ioc, false);
} else {
@@ -178,26 +180,27 @@ err:
return rc;
}
static void nbd_co_receive_reply(NBDClientSession *s,
NBDRequest *request,
NBDReply *reply,
QEMUIOVector *qiov)
static int nbd_co_receive_reply(NBDClientSession *s,
NBDRequest *request,
QEMUIOVector *qiov)
{
int ret;
int i = HANDLE_TO_INDEX(s, request->handle);
/* Wait until we're woken up by nbd_read_reply_entry. */
s->requests[i].receiving = true;
qemu_coroutine_yield();
s->requests[i].receiving = false;
*reply = s->reply;
if (reply->handle != request->handle || !s->ioc || s->quit) {
reply->error = EIO;
if (!s->ioc || s->quit) {
ret = -EIO;
} else {
if (qiov && reply->error == 0) {
assert(s->reply.handle == request->handle);
ret = -s->reply.error;
if (qiov && s->reply.error == 0) {
assert(request->len == iov_size(qiov->iov, qiov->niov));
if (qio_channel_readv_all(s->ioc, qiov->iov, qiov->niov,
NULL) < 0) {
reply->error = EIO;
ret = -EIO;
s->quit = true;
}
}
@@ -217,6 +220,8 @@ static void nbd_co_receive_reply(NBDClientSession *s,
s->in_flight--;
qemu_co_queue_next(&s->free_sema);
qemu_co_mutex_unlock(&s->send_mutex);
return ret;
}
static int nbd_co_request(BlockDriverState *bs,
@@ -224,7 +229,6 @@ static int nbd_co_request(BlockDriverState *bs,
QEMUIOVector *qiov)
{
NBDClientSession *client = nbd_get_client_session(bs);
NBDReply reply;
int ret;
assert(!qiov || request->type == NBD_CMD_WRITE ||
@@ -232,12 +236,11 @@ static int nbd_co_request(BlockDriverState *bs,
ret = nbd_co_send_request(bs, request,
request->type == NBD_CMD_WRITE ? qiov : NULL);
if (ret < 0) {
reply.error = -ret;
} else {
nbd_co_receive_reply(client, request, &reply,
request->type == NBD_CMD_READ ? qiov : NULL);
return ret;
}
return -reply.error;
return nbd_co_receive_reply(client, request,
request->type == NBD_CMD_READ ? qiov : NULL);
}
int nbd_client_co_preadv(BlockDriverState *bs, uint64_t offset,

View File

@@ -478,7 +478,9 @@ static int get_cluster_offset(BlockDriverState *bs,
for(i = 0; i < s->cluster_sectors; i++) {
if (i < n_start || i >= n_end) {
memset(s->cluster_data, 0x00, 512);
if (qcrypto_block_encrypt(s->crypto, start_sect + i,
if (qcrypto_block_encrypt(s->crypto,
(start_sect + i) *
BDRV_SECTOR_SIZE,
s->cluster_data,
BDRV_SECTOR_SIZE,
NULL) < 0) {
@@ -668,7 +670,8 @@ static coroutine_fn int qcow_co_readv(BlockDriverState *bs, int64_t sector_num,
}
if (bs->encrypted) {
assert(s->crypto);
if (qcrypto_block_decrypt(s->crypto, sector_num, buf,
if (qcrypto_block_decrypt(s->crypto,
sector_num * BDRV_SECTOR_SIZE, buf,
n * BDRV_SECTOR_SIZE, NULL) < 0) {
ret = -EIO;
break;
@@ -740,8 +743,8 @@ static coroutine_fn int qcow_co_writev(BlockDriverState *bs, int64_t sector_num,
}
if (bs->encrypted) {
assert(s->crypto);
if (qcrypto_block_encrypt(s->crypto, sector_num, buf,
n * BDRV_SECTOR_SIZE, NULL) < 0) {
if (qcrypto_block_encrypt(s->crypto, sector_num * BDRV_SECTOR_SIZE,
buf, n * BDRV_SECTOR_SIZE, NULL) < 0) {
ret = -EIO;
break;
}

View File

@@ -269,15 +269,16 @@ static int free_bitmap_clusters(BlockDriverState *bs, Qcow2BitmapTable *tb)
return 0;
}
/* This function returns the number of disk sectors covered by a single qcow2
* cluster of bitmap data. */
static uint64_t sectors_covered_by_bitmap_cluster(const BDRVQcow2State *s,
const BdrvDirtyBitmap *bitmap)
/* Return the disk size covered by a single qcow2 cluster of bitmap data. */
static uint64_t bytes_covered_by_bitmap_cluster(const BDRVQcow2State *s,
const BdrvDirtyBitmap *bitmap)
{
uint32_t sector_granularity =
bdrv_dirty_bitmap_granularity(bitmap) >> BDRV_SECTOR_BITS;
uint64_t granularity = bdrv_dirty_bitmap_granularity(bitmap);
uint64_t limit = granularity * (s->cluster_size << 3);
return (uint64_t)sector_granularity * (s->cluster_size << 3);
assert(QEMU_IS_ALIGNED(limit,
bdrv_dirty_bitmap_serialization_align(bitmap)));
return limit;
}
/* load_bitmap_data
@@ -290,7 +291,7 @@ static int load_bitmap_data(BlockDriverState *bs,
{
int ret = 0;
BDRVQcow2State *s = bs->opaque;
uint64_t sector, sbc;
uint64_t offset, limit;
uint64_t bm_size = bdrv_dirty_bitmap_size(bitmap);
uint8_t *buf = NULL;
uint64_t i, tab_size =
@@ -302,28 +303,28 @@ static int load_bitmap_data(BlockDriverState *bs,
}
buf = g_malloc(s->cluster_size);
sbc = sectors_covered_by_bitmap_cluster(s, bitmap);
for (i = 0, sector = 0; i < tab_size; ++i, sector += sbc) {
uint64_t count = MIN(bm_size - sector, sbc);
limit = bytes_covered_by_bitmap_cluster(s, bitmap);
for (i = 0, offset = 0; i < tab_size; ++i, offset += limit) {
uint64_t count = MIN(bm_size - offset, limit);
uint64_t entry = bitmap_table[i];
uint64_t offset = entry & BME_TABLE_ENTRY_OFFSET_MASK;
uint64_t data_offset = entry & BME_TABLE_ENTRY_OFFSET_MASK;
assert(check_table_entry(entry, s->cluster_size) == 0);
if (offset == 0) {
if (data_offset == 0) {
if (entry & BME_TABLE_ENTRY_FLAG_ALL_ONES) {
bdrv_dirty_bitmap_deserialize_ones(bitmap, sector, count,
bdrv_dirty_bitmap_deserialize_ones(bitmap, offset, count,
false);
} else {
/* No need to deserialize zeros because the dirty bitmap is
* already cleared */
}
} else {
ret = bdrv_pread(bs->file, offset, buf, s->cluster_size);
ret = bdrv_pread(bs->file, data_offset, buf, s->cluster_size);
if (ret < 0) {
goto finish;
}
bdrv_dirty_bitmap_deserialize_part(bitmap, buf, sector, count,
bdrv_dirty_bitmap_deserialize_part(bitmap, buf, offset, count,
false);
}
}
@@ -602,7 +603,7 @@ static Qcow2BitmapList *bitmap_list_load(BlockDriverState *bs, uint64_t offset,
goto fail;
}
bm = g_new(Qcow2Bitmap, 1);
bm = g_new0(Qcow2Bitmap, 1);
bm->table.offset = e->bitmap_table_offset;
bm->table.size = e->bitmap_table_size;
bm->flags = e->flags;
@@ -1071,8 +1072,8 @@ static uint64_t *store_bitmap_data(BlockDriverState *bs,
{
int ret;
BDRVQcow2State *s = bs->opaque;
int64_t sector;
uint64_t sbc;
int64_t offset;
uint64_t limit;
uint64_t bm_size = bdrv_dirty_bitmap_size(bitmap);
const char *bm_name = bdrv_dirty_bitmap_name(bitmap);
uint8_t *buf = NULL;
@@ -1095,20 +1096,25 @@ static uint64_t *store_bitmap_data(BlockDriverState *bs,
return NULL;
}
dbi = bdrv_dirty_iter_new(bitmap, 0);
dbi = bdrv_dirty_iter_new(bitmap);
buf = g_malloc(s->cluster_size);
sbc = sectors_covered_by_bitmap_cluster(s, bitmap);
assert(DIV_ROUND_UP(bm_size, sbc) == tb_size);
limit = bytes_covered_by_bitmap_cluster(s, bitmap);
assert(DIV_ROUND_UP(bm_size, limit) == tb_size);
while ((sector = bdrv_dirty_iter_next(dbi)) != -1) {
uint64_t cluster = sector / sbc;
while ((offset = bdrv_dirty_iter_next(dbi)) >= 0) {
uint64_t cluster = offset / limit;
uint64_t end, write_size;
int64_t off;
sector = cluster * sbc;
end = MIN(bm_size, sector + sbc);
write_size =
bdrv_dirty_bitmap_serialization_size(bitmap, sector, end - sector);
/*
* We found the first dirty offset, but want to write out the
* entire cluster of the bitmap that includes that offset,
* including any leading zero bits.
*/
offset = QEMU_ALIGN_DOWN(offset, limit);
end = MIN(bm_size, offset + limit);
write_size = bdrv_dirty_bitmap_serialization_size(bitmap, offset,
end - offset);
assert(write_size <= s->cluster_size);
off = qcow2_alloc_clusters(bs, s->cluster_size);
@@ -1120,7 +1126,7 @@ static uint64_t *store_bitmap_data(BlockDriverState *bs,
}
tb[cluster] = off;
bdrv_dirty_bitmap_serialize_part(bitmap, buf, sector, end - sector);
bdrv_dirty_bitmap_serialize_part(bitmap, buf, offset, end - offset);
if (write_size < s->cluster_size) {
memset(buf + write_size, 0, s->cluster_size - write_size);
}

View File

@@ -411,3 +411,29 @@ void qcow2_cache_entry_mark_dirty(BlockDriverState *bs, Qcow2Cache *c,
assert(c->entries[i].offset != 0);
c->entries[i].dirty = true;
}
void *qcow2_cache_is_table_offset(BlockDriverState *bs, Qcow2Cache *c,
uint64_t offset)
{
int i;
for (i = 0; i < c->size; i++) {
if (c->entries[i].offset == offset) {
return qcow2_cache_get_table_addr(bs, c, i);
}
}
return NULL;
}
void qcow2_cache_discard(BlockDriverState *bs, Qcow2Cache *c, void *table)
{
int i = qcow2_cache_get_table_idx(bs, c, table);
assert(c->entries[i].ref == 0);
c->entries[i].offset = 0;
c->entries[i].lru_counter = 0;
c->entries[i].dirty = false;
qcow2_cache_table_release(bs, c, i, 1);
}

View File

@@ -32,6 +32,56 @@
#include "qemu/bswap.h"
#include "trace.h"
int qcow2_shrink_l1_table(BlockDriverState *bs, uint64_t exact_size)
{
BDRVQcow2State *s = bs->opaque;
int new_l1_size, i, ret;
if (exact_size >= s->l1_size) {
return 0;
}
new_l1_size = exact_size;
#ifdef DEBUG_ALLOC2
fprintf(stderr, "shrink l1_table from %d to %d\n", s->l1_size, new_l1_size);
#endif
BLKDBG_EVENT(bs->file, BLKDBG_L1_SHRINK_WRITE_TABLE);
ret = bdrv_pwrite_zeroes(bs->file, s->l1_table_offset +
new_l1_size * sizeof(uint64_t),
(s->l1_size - new_l1_size) * sizeof(uint64_t), 0);
if (ret < 0) {
goto fail;
}
ret = bdrv_flush(bs->file->bs);
if (ret < 0) {
goto fail;
}
BLKDBG_EVENT(bs->file, BLKDBG_L1_SHRINK_FREE_L2_CLUSTERS);
for (i = s->l1_size - 1; i > new_l1_size - 1; i--) {
if ((s->l1_table[i] & L1E_OFFSET_MASK) == 0) {
continue;
}
qcow2_free_clusters(bs, s->l1_table[i] & L1E_OFFSET_MASK,
s->cluster_size, QCOW2_DISCARD_ALWAYS);
s->l1_table[i] = 0;
}
return 0;
fail:
/*
* If the write in the l1_table failed the image may contain a partially
* overwritten l1_table. In this case it would be better to clear the
* l1_table in memory to avoid possible image corruption.
*/
memset(s->l1_table + new_l1_size, 0,
(s->l1_size - new_l1_size) * sizeof(uint64_t));
return ret;
}
int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
bool exact_size)
{
@@ -396,15 +446,13 @@ static bool coroutine_fn do_perform_cow_encrypt(BlockDriverState *bs,
{
if (bytes && bs->encrypted) {
BDRVQcow2State *s = bs->opaque;
int64_t sector = (s->crypt_physical_offset ?
int64_t offset = (s->crypt_physical_offset ?
(cluster_offset + offset_in_cluster) :
(src_cluster_offset + offset_in_cluster))
>> BDRV_SECTOR_BITS;
(src_cluster_offset + offset_in_cluster));
assert((offset_in_cluster & ~BDRV_SECTOR_MASK) == 0);
assert((bytes & ~BDRV_SECTOR_MASK) == 0);
assert(s->crypto);
if (qcrypto_block_encrypt(s->crypto, sector, buffer,
bytes, NULL) < 0) {
if (qcrypto_block_encrypt(s->crypto, offset, buffer, bytes, NULL) < 0) {
return false;
}
}

View File

@@ -29,6 +29,7 @@
#include "block/qcow2.h"
#include "qemu/range.h"
#include "qemu/bswap.h"
#include "qemu/cutils.h"
static int64_t alloc_clusters_noref(BlockDriverState *bs, uint64_t size);
static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
@@ -861,8 +862,24 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
}
s->set_refcount(refcount_block, block_index, refcount);
if (refcount == 0 && s->discard_passthrough[type]) {
update_refcount_discard(bs, cluster_offset, s->cluster_size);
if (refcount == 0) {
void *table;
table = qcow2_cache_is_table_offset(bs, s->refcount_block_cache,
offset);
if (table != NULL) {
qcow2_cache_put(bs, s->refcount_block_cache, &refcount_block);
qcow2_cache_discard(bs, s->refcount_block_cache, table);
}
table = qcow2_cache_is_table_offset(bs, s->l2_table_cache, offset);
if (table != NULL) {
qcow2_cache_discard(bs, s->l2_table_cache, table);
}
if (s->discard_passthrough[type]) {
update_refcount_discard(bs, cluster_offset, s->cluster_size);
}
}
}
@@ -3045,3 +3062,144 @@ done:
qemu_vfree(new_refblock);
return ret;
}
static int qcow2_discard_refcount_block(BlockDriverState *bs,
uint64_t discard_block_offs)
{
BDRVQcow2State *s = bs->opaque;
uint64_t refblock_offs = get_refblock_offset(s, discard_block_offs);
uint64_t cluster_index = discard_block_offs >> s->cluster_bits;
uint32_t block_index = cluster_index & (s->refcount_block_size - 1);
void *refblock;
int ret;
assert(discard_block_offs != 0);
ret = qcow2_cache_get(bs, s->refcount_block_cache, refblock_offs,
&refblock);
if (ret < 0) {
return ret;
}
if (s->get_refcount(refblock, block_index) != 1) {
qcow2_signal_corruption(bs, true, -1, -1, "Invalid refcount:"
" refblock offset %#" PRIx64
", reftable index %u"
", block offset %#" PRIx64
", refcount %#" PRIx64,
refblock_offs,
offset_to_reftable_index(s, discard_block_offs),
discard_block_offs,
s->get_refcount(refblock, block_index));
qcow2_cache_put(bs, s->refcount_block_cache, &refblock);
return -EINVAL;
}
s->set_refcount(refblock, block_index, 0);
qcow2_cache_entry_mark_dirty(bs, s->refcount_block_cache, refblock);
qcow2_cache_put(bs, s->refcount_block_cache, &refblock);
if (cluster_index < s->free_cluster_index) {
s->free_cluster_index = cluster_index;
}
refblock = qcow2_cache_is_table_offset(bs, s->refcount_block_cache,
discard_block_offs);
if (refblock) {
/* discard refblock from the cache if refblock is cached */
qcow2_cache_discard(bs, s->refcount_block_cache, refblock);
}
update_refcount_discard(bs, discard_block_offs, s->cluster_size);
return 0;
}
int qcow2_shrink_reftable(BlockDriverState *bs)
{
BDRVQcow2State *s = bs->opaque;
uint64_t *reftable_tmp =
g_malloc(s->refcount_table_size * sizeof(uint64_t));
int i, ret;
for (i = 0; i < s->refcount_table_size; i++) {
int64_t refblock_offs = s->refcount_table[i] & REFT_OFFSET_MASK;
void *refblock;
bool unused_block;
if (refblock_offs == 0) {
reftable_tmp[i] = 0;
continue;
}
ret = qcow2_cache_get(bs, s->refcount_block_cache, refblock_offs,
&refblock);
if (ret < 0) {
goto out;
}
/* the refblock has own reference */
if (i == offset_to_reftable_index(s, refblock_offs)) {
uint64_t block_index = (refblock_offs >> s->cluster_bits) &
(s->refcount_block_size - 1);
uint64_t refcount = s->get_refcount(refblock, block_index);
s->set_refcount(refblock, block_index, 0);
unused_block = buffer_is_zero(refblock, s->cluster_size);
s->set_refcount(refblock, block_index, refcount);
} else {
unused_block = buffer_is_zero(refblock, s->cluster_size);
}
qcow2_cache_put(bs, s->refcount_block_cache, &refblock);
reftable_tmp[i] = unused_block ? 0 : cpu_to_be64(s->refcount_table[i]);
}
ret = bdrv_pwrite_sync(bs->file, s->refcount_table_offset, reftable_tmp,
s->refcount_table_size * sizeof(uint64_t));
/*
* If the write in the reftable failed the image may contain a partially
* overwritten reftable. In this case it would be better to clear the
* reftable in memory to avoid possible image corruption.
*/
for (i = 0; i < s->refcount_table_size; i++) {
if (s->refcount_table[i] && !reftable_tmp[i]) {
if (ret == 0) {
ret = qcow2_discard_refcount_block(bs, s->refcount_table[i] &
REFT_OFFSET_MASK);
}
s->refcount_table[i] = 0;
}
}
if (!s->cache_discards) {
qcow2_process_discards(bs, ret);
}
out:
g_free(reftable_tmp);
return ret;
}
int64_t qcow2_get_last_cluster(BlockDriverState *bs, int64_t size)
{
BDRVQcow2State *s = bs->opaque;
int64_t i;
for (i = size_to_clusters(s, size) - 1; i >= 0; i--) {
uint64_t refcount;
int ret = qcow2_get_refcount(bs, i, &refcount);
if (ret < 0) {
fprintf(stderr, "Can't get refcount for cluster %" PRId64 ": %s\n",
i, strerror(-ret));
return ret;
}
if (refcount > 0) {
return i;
}
}
qcow2_signal_corruption(bs, true, -1, -1,
"There are no references in the refcount table.");
return -EIO;
}

View File

@@ -1811,7 +1811,7 @@ static coroutine_fn int qcow2_co_preadv(BlockDriverState *bs, uint64_t offset,
if (qcrypto_block_decrypt(s->crypto,
(s->crypt_physical_offset ?
cluster_offset + offset_in_cluster :
offset) >> BDRV_SECTOR_BITS,
offset),
cluster_data,
cur_bytes,
NULL) < 0) {
@@ -1946,7 +1946,7 @@ static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
if (qcrypto_block_encrypt(s->crypto,
(s->crypt_physical_offset ?
cluster_offset + offset_in_cluster :
offset) >> BDRV_SECTOR_BITS,
offset),
cluster_data,
cur_bytes, NULL) < 0) {
ret = -EIO;
@@ -3104,18 +3104,66 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t offset,
}
old_length = bs->total_sectors * 512;
/* shrinking is currently not supported */
if (offset < old_length) {
error_setg(errp, "qcow2 doesn't support shrinking images yet");
return -ENOTSUP;
}
new_l1_size = size_to_l1(s, offset);
ret = qcow2_grow_l1_table(bs, new_l1_size, true);
if (ret < 0) {
error_setg_errno(errp, -ret, "Failed to grow the L1 table");
return ret;
if (offset < old_length) {
int64_t last_cluster, old_file_size;
if (prealloc != PREALLOC_MODE_OFF) {
error_setg(errp,
"Preallocation can't be used for shrinking an image");
return -EINVAL;
}
ret = qcow2_cluster_discard(bs, ROUND_UP(offset, s->cluster_size),
old_length - ROUND_UP(offset,
s->cluster_size),
QCOW2_DISCARD_ALWAYS, true);
if (ret < 0) {
error_setg_errno(errp, -ret, "Failed to discard cropped clusters");
return ret;
}
ret = qcow2_shrink_l1_table(bs, new_l1_size);
if (ret < 0) {
error_setg_errno(errp, -ret,
"Failed to reduce the number of L2 tables");
return ret;
}
ret = qcow2_shrink_reftable(bs);
if (ret < 0) {
error_setg_errno(errp, -ret,
"Failed to discard unused refblocks");
return ret;
}
old_file_size = bdrv_getlength(bs->file->bs);
if (old_file_size < 0) {
error_setg_errno(errp, -old_file_size,
"Failed to inquire current file length");
return old_file_size;
}
last_cluster = qcow2_get_last_cluster(bs, old_file_size);
if (last_cluster < 0) {
error_setg_errno(errp, -last_cluster,
"Failed to find the last cluster");
return last_cluster;
}
if ((last_cluster + 1) * s->cluster_size < old_file_size) {
ret = bdrv_truncate(bs->file, (last_cluster + 1) * s->cluster_size,
PREALLOC_MODE_OFF, NULL);
if (ret < 0) {
warn_report("Failed to truncate the tail of the image: %s",
strerror(-ret));
ret = 0;
}
}
} else {
ret = qcow2_grow_l1_table(bs, new_l1_size, true);
if (ret < 0) {
error_setg_errno(errp, -ret, "Failed to grow the L1 table");
return ret;
}
}
switch (prealloc) {
@@ -3142,7 +3190,7 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t offset,
if (old_file_size < 0) {
error_setg_errno(errp, -old_file_size,
"Failed to inquire current file length");
return ret;
return old_file_size;
}
nb_new_data_clusters = DIV_ROUND_UP(offset - old_length,
@@ -3171,7 +3219,7 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t offset,
if (allocation_start < 0) {
error_setg_errno(errp, -allocation_start,
"Failed to resize refcount structures");
return -allocation_start;
return allocation_start;
}
clusters_allocated = qcow2_alloc_clusters_at(bs, allocation_start,
@@ -3648,20 +3696,19 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, BlockDriverState *in_bs,
*/
required = virtual_size;
} else {
int cluster_sectors = cluster_size / BDRV_SECTOR_SIZE;
int64_t sector_num;
int64_t offset;
int pnum = 0;
for (sector_num = 0;
sector_num < ssize / BDRV_SECTOR_SIZE;
sector_num += pnum) {
int nb_sectors = MIN(ssize / BDRV_SECTOR_SIZE - sector_num,
BDRV_REQUEST_MAX_SECTORS);
for (offset = 0; offset < ssize;
offset += pnum * BDRV_SECTOR_SIZE) {
int nb_sectors = MIN(ssize - offset,
BDRV_REQUEST_MAX_BYTES) / BDRV_SECTOR_SIZE;
BlockDriverState *file;
int64_t ret;
ret = bdrv_get_block_status_above(in_bs, NULL,
sector_num, nb_sectors,
offset >> BDRV_SECTOR_BITS,
nb_sectors,
&pnum, &file);
if (ret < 0) {
error_setg_errno(&local_err, -ret,
@@ -3674,12 +3721,11 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, BlockDriverState *in_bs,
} else if ((ret & (BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED)) ==
(BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED)) {
/* Extend pnum to end of cluster for next iteration */
pnum = ROUND_UP(sector_num + pnum, cluster_sectors) -
sector_num;
pnum = (ROUND_UP(offset + pnum * BDRV_SECTOR_SIZE,
cluster_size) - offset) >> BDRV_SECTOR_BITS;
/* Count clusters we've seen */
required += (sector_num % cluster_sectors + pnum) *
BDRV_SECTOR_SIZE;
required += offset % cluster_size + pnum * BDRV_SECTOR_SIZE;
}
}
}

View File

@@ -521,6 +521,18 @@ static inline uint64_t refcount_diff(uint64_t r1, uint64_t r2)
return r1 > r2 ? r1 - r2 : r2 - r1;
}
static inline
uint32_t offset_to_reftable_index(BDRVQcow2State *s, uint64_t offset)
{
return offset >> (s->refcount_block_bits + s->cluster_bits);
}
static inline uint64_t get_refblock_offset(BDRVQcow2State *s, uint64_t offset)
{
uint32_t index = offset_to_reftable_index(s, offset);
return s->refcount_table[index] & REFT_OFFSET_MASK;
}
/* qcow2.c functions */
int qcow2_backing_read1(BlockDriverState *bs, QEMUIOVector *qiov,
int64_t sector_num, int nb_sectors);
@@ -584,10 +596,13 @@ int qcow2_inc_refcounts_imrt(BlockDriverState *bs, BdrvCheckResult *res,
int qcow2_change_refcount_order(BlockDriverState *bs, int refcount_order,
BlockDriverAmendStatusCB *status_cb,
void *cb_opaque, Error **errp);
int qcow2_shrink_reftable(BlockDriverState *bs);
int64_t qcow2_get_last_cluster(BlockDriverState *bs, int64_t size);
/* qcow2-cluster.c functions */
int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
bool exact_size);
int qcow2_shrink_l1_table(BlockDriverState *bs, uint64_t max_size);
int qcow2_write_l1_entry(BlockDriverState *bs, int l1_index);
int qcow2_decompress_cluster(BlockDriverState *bs, uint64_t cluster_offset);
int qcow2_encrypt_sectors(BDRVQcow2State *s, int64_t sector_num,
@@ -649,6 +664,9 @@ int qcow2_cache_get(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
void **table);
void qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table);
void *qcow2_cache_is_table_offset(BlockDriverState *bs, Qcow2Cache *c,
uint64_t offset);
void qcow2_cache_discard(BlockDriverState *bs, Qcow2Cache *c, void *table);
/* qcow2-bitmap.c functions */
int qcow2_check_bitmaps_refcounts(BlockDriverState *bs, BdrvCheckResult *res,

View File

@@ -157,6 +157,7 @@ static void replication_close(BlockDriverState *bs)
static void replication_child_perm(BlockDriverState *bs, BdrvChild *c,
const BdrvChildRole *role,
BlockReopenQueue *reopen_queue,
uint64_t perm, uint64_t shared,
uint64_t *nperm, uint64_t *nshared)
{

View File

@@ -403,17 +403,19 @@ static void coroutine_fn throttle_group_restart_queue_entry(void *opaque)
schedule_next_request(tgm, is_write);
qemu_mutex_unlock(&tg->lock);
}
g_free(data);
}
static void throttle_group_restart_queue(ThrottleGroupMember *tgm, bool is_write)
{
Coroutine *co;
RestartData rd = {
.tgm = tgm,
.is_write = is_write
};
RestartData *rd = g_new0(RestartData, 1);
co = qemu_coroutine_create(throttle_group_restart_queue_entry, &rd);
rd->tgm = tgm;
rd->is_write = is_write;
co = qemu_coroutine_create(throttle_group_restart_queue_entry, rd);
aio_co_enter(tgm->aio_context, co);
}

View File

@@ -57,15 +57,6 @@
static void checkpoint(void);
#ifdef __MINGW32__
void nonono(const char* file, int line, const char* msg) {
fprintf(stderr, "Nonono! %s:%d %s\n", file, line, msg);
exit(-5);
}
#undef assert
#define assert(a) do {if (!(a)) nonono(__FILE__, __LINE__, #a);}while(0)
#endif
#else
#define DLOG(a)
@@ -3211,6 +3202,7 @@ err:
static void vvfat_child_perm(BlockDriverState *bs, BdrvChild *c,
const BdrvChildRole *role,
BlockReopenQueue *reopen_queue,
uint64_t perm, uint64_t shared,
uint64_t *nperm, uint64_t *nshared)
{
@@ -3270,24 +3262,11 @@ static void bdrv_vvfat_init(void)
block_init(bdrv_vvfat_init);
#ifdef DEBUG
static void checkpoint(void) {
static void checkpoint(void)
{
assert(((mapping_t*)array_get(&(vvv->mapping), 0))->end == 2);
check1(vvv);
check2(vvv);
assert(!vvv->current_mapping || vvv->current_fd || (vvv->current_mapping->mode & MODE_DIRECTORY));
#if 0
if (((direntry_t*)vvv->directory.pointer)[1].attributes != 0xf)
fprintf(stderr, "Nonono!\n");
mapping_t* mapping;
direntry_t* direntry;
assert(vvv->mapping.size >= vvv->mapping.item_size * vvv->mapping.next);
assert(vvv->directory.size >= vvv->directory.item_size * vvv->directory.next);
if (vvv->mapping.next<47)
return;
assert((mapping = array_get(&(vvv->mapping), 47)));
assert(mapping->dir_index < vvv->directory.next);
direntry = array_get(&(vvv->directory), mapping->dir_index);
assert(!memcmp(direntry->name, "USB H ", 11) || direntry->name[0]==0);
#endif
}
#endif

View File

@@ -20,5 +20,6 @@ chardev-obj-$(CONFIG_WIN32) += char-win-stdio.o
common-obj-y += msmouse.o wctablet.o testdev.o
common-obj-$(CONFIG_BRLAPI) += baum.o
baum.o-cflags := $(SDL_CFLAGS)
baum.o-libs := $(BRLAPI_LIBS)
common-obj-$(CONFIG_SPICE) += spice.o

View File

@@ -643,6 +643,7 @@ static void baum_chr_open(Chardev *chr,
error_setg(errp, "brlapi__openConnection: %s",
brlapi_strerror(brlapi_error_location()));
g_free(handle);
baum->brlapi = NULL;
return;
}
baum->deferred_init = 0;

View File

@@ -84,8 +84,7 @@ static GSource *fd_chr_add_watch(Chardev *chr, GIOCondition cond)
return qio_channel_create_watch(s->ioc_out, cond);
}
static void fd_chr_update_read_handler(Chardev *chr,
GMainContext *context)
static void fd_chr_update_read_handler(Chardev *chr)
{
FDChardev *s = FD_CHARDEV(chr);
@@ -94,7 +93,7 @@ static void fd_chr_update_read_handler(Chardev *chr,
chr->gsource = io_add_watch_poll(chr, s->ioc_in,
fd_chr_read_poll,
fd_chr_read, chr,
context);
chr->gcontext);
}
}

View File

@@ -253,7 +253,6 @@ void qemu_chr_fe_set_handlers(CharBackend *b,
bool set_open)
{
Chardev *s;
ChardevClass *cc;
int fe_open;
s = b->chr;
@@ -261,7 +260,6 @@ void qemu_chr_fe_set_handlers(CharBackend *b,
return;
}
cc = CHARDEV_GET_CLASS(s);
if (!opaque && !fd_can_read && !fd_read && !fd_event) {
fe_open = 0;
remove_fd_in_watch(s);
@@ -273,9 +271,8 @@ void qemu_chr_fe_set_handlers(CharBackend *b,
b->chr_event = fd_event;
b->chr_be_change = be_change;
b->opaque = opaque;
if (cc->chr_update_read_handler) {
cc->chr_update_read_handler(s, context);
}
qemu_chr_be_update_read_handlers(s, context);
if (set_open) {
qemu_chr_fe_set_open(b, fe_open);

View File

@@ -112,8 +112,7 @@ static void pty_chr_update_read_handler_locked(Chardev *chr)
}
}
static void pty_chr_update_read_handler(Chardev *chr,
GMainContext *context)
static void pty_chr_update_read_handler(Chardev *chr)
{
qemu_mutex_lock(&chr->chr_write_lock);
pty_chr_update_read_handler_locked(chr);
@@ -219,7 +218,7 @@ static void pty_chr_state(Chardev *chr, int connected)
chr->gsource = io_add_watch_poll(chr, s->ioc,
pty_chr_read_poll,
pty_chr_read,
chr, NULL);
chr, chr->gcontext);
}
}
}

View File

@@ -516,13 +516,12 @@ static void tcp_chr_connect(void *opaque)
chr->gsource = io_add_watch_poll(chr, s->ioc,
tcp_chr_read_poll,
tcp_chr_read,
chr, NULL);
chr, chr->gcontext);
}
qemu_chr_be_event(chr, CHR_EVENT_OPENED);
}
static void tcp_chr_update_read_handler(Chardev *chr,
GMainContext *context)
static void tcp_chr_update_read_handler(Chardev *chr)
{
SocketChardev *s = SOCKET_CHARDEV(chr);
@@ -535,7 +534,7 @@ static void tcp_chr_update_read_handler(Chardev *chr,
chr->gsource = io_add_watch_poll(chr, s->ioc,
tcp_chr_read_poll,
tcp_chr_read, chr,
context);
chr->gcontext);
}
}

View File

@@ -100,8 +100,7 @@ static gboolean udp_chr_read(QIOChannel *chan, GIOCondition cond, void *opaque)
return TRUE;
}
static void udp_chr_update_read_handler(Chardev *chr,
GMainContext *context)
static void udp_chr_update_read_handler(Chardev *chr)
{
UdpChardev *s = UDP_CHARDEV(chr);
@@ -110,7 +109,7 @@ static void udp_chr_update_read_handler(Chardev *chr,
chr->gsource = io_add_watch_poll(chr, s->ioc,
udp_chr_read_poll,
udp_chr_read, chr,
context);
chr->gcontext);
}
}

View File

@@ -180,6 +180,17 @@ void qemu_chr_be_write(Chardev *s, uint8_t *buf, int len)
}
}
void qemu_chr_be_update_read_handlers(Chardev *s,
GMainContext *context)
{
ChardevClass *cc = CHARDEV_GET_CLASS(s);
s->gcontext = context;
if (cc->chr_update_read_handler) {
cc->chr_update_read_handler(s);
}
}
int qemu_chr_add_client(Chardev *s, int fd)
{
return CHARDEV_GET_CLASS(s)->chr_add_client ?

121
configure vendored
View File

@@ -290,6 +290,7 @@ netmap="no"
sdl=""
sdlabi=""
virtfs=""
mpath=""
vnc="yes"
sparse="no"
vde=""
@@ -331,6 +332,7 @@ modules="no"
prefix="/usr/local"
mandir="\${prefix}/share/man"
datadir="\${prefix}/share"
firmwarepath="\${prefix}/share/qemu-firmware"
qemu_docdir="\${prefix}/share/doc/qemu"
bindir="\${prefix}/bin"
libdir="\${prefix}/lib"
@@ -745,7 +747,6 @@ SunOS)
solaris="yes"
make="${MAKE-gmake}"
install="${INSTALL-ginstall}"
ld="gld"
smbd="${SMBD-/usr/sfw/sbin/smbd}"
if test -f /usr/include/sys/soundcard.h ; then
audio_drv_list="oss"
@@ -914,6 +915,8 @@ for opt do
;;
--localstatedir=*) local_statedir="$optarg"
;;
--firmwarepath=*) firmwarepath="$optarg"
;;
--sbindir=*|--sharedstatedir=*|\
--oldincludedir=*|--datarootdir=*|--infodir=*|--localedir=*|\
--htmldir=*|--dvidir=*|--pdfdir=*|--psdir=*)
@@ -936,6 +939,10 @@ for opt do
;;
--enable-virtfs) virtfs="yes"
;;
--disable-mpath) mpath="no"
;;
--enable-mpath) mpath="yes"
;;
--disable-vnc) vnc="no"
;;
--enable-vnc) vnc="yes"
@@ -1411,6 +1418,7 @@ Advanced options (experts only):
--libdir=PATH install libraries in PATH
--sysconfdir=PATH install config in PATH$confsuffix
--localstatedir=PATH install local state in PATH (set at runtime on win32)
--firmwarepath=PATH search PATH for firmware files
--with-confsuffix=SUFFIX suffix for QEMU data inside datadir/libdir/sysconfdir [$confsuffix]
--enable-debug enable common debug build options
--disable-strip disable stripping binaries
@@ -1479,6 +1487,7 @@ disabled with --disable-FEATURE, default is enabled if available:
vnc-png PNG compression for VNC server
cocoa Cocoa UI (Mac OS X only)
virtfs VirtFS
mpath Multipath persistent reservation passthrough
xen xen backend driver support
xen-pci-passthrough
brlapi BrlAPI (Braile)
@@ -2788,7 +2797,6 @@ EOF
sdl_cflags="$sdl_cflags $x11_cflags"
sdl_libs="$sdl_libs $x11_libs"
fi
libs_softmmu="$sdl_libs $libs_softmmu"
fi
##########################################
@@ -2801,7 +2809,6 @@ EOF
rdma_libs="-lrdmacm -libverbs"
if compile_prog "" "$rdma_libs" ; then
rdma="yes"
libs_softmmu="$libs_softmmu $rdma_libs"
else
if test "$rdma" = "yes" ; then
error_exit \
@@ -2946,8 +2953,6 @@ int main(void)
EOF
if compile_prog "" "$vde_libs" ; then
vde=yes
libs_softmmu="$vde_libs $libs_softmmu"
libs_tools="$vde_libs $libs_tools"
else
if test "$vde" = "yes" ; then
feature_not_found "vde" "Install vde (Virtual Distributed Ethernet) devel"
@@ -3035,13 +3040,13 @@ for drv in $audio_drv_list; do
alsa)
audio_drv_probe $drv alsa/asoundlib.h -lasound \
"return snd_pcm_close((snd_pcm_t *)0);"
libs_softmmu="-lasound $libs_softmmu"
alsa_libs="-lasound"
;;
pa)
audio_drv_probe $drv pulse/pulseaudio.h "-lpulse" \
"pa_context_set_source_output_volume(NULL, 0, NULL, NULL, NULL); return 0;"
libs_softmmu="-lpulse $libs_softmmu"
pulse_libs="-lpulse"
audio_pt_int="yes"
;;
@@ -3052,16 +3057,16 @@ for drv in $audio_drv_list; do
;;
coreaudio)
libs_softmmu="-framework CoreAudio $libs_softmmu"
coreaudio_libs="-framework CoreAudio"
;;
dsound)
libs_softmmu="-lole32 -ldxguid $libs_softmmu"
dsound_libs="-lole32 -ldxguid"
audio_win_int="yes"
;;
oss)
libs_softmmu="$oss_lib $libs_softmmu"
oss_libs="$oss_lib"
;;
wav)
@@ -3089,7 +3094,6 @@ int main( void ) { return brlapi__openConnection (NULL, NULL, NULL); }
EOF
if compile_prog "" "$brlapi_libs" ; then
brlapi=yes
libs_softmmu="$brlapi_libs $libs_softmmu"
else
if test "$brlapi" = "yes" ; then
feature_not_found "brlapi" "Install brlapi devel"
@@ -3299,6 +3303,30 @@ else
"Please install the pixman devel package."
fi
##########################################
# libmpathpersist probe
if test "$mpath" != "no" ; then
cat > $TMPC <<EOF
#include <libudev.h>
#include <mpath_persist.h>
unsigned mpath_mx_alloc_len = 1024;
int logsink;
int main(void) {
struct udev *udev = udev_new();
mpath_lib_init(udev);
return 0;
}
EOF
if compile_prog "" "-ludev -lmultipath -lmpathpersist" ; then
mpathpersist=yes
else
mpathpersist=no
fi
else
mpathpersist=no
fi
##########################################
# libcap probe
@@ -4204,13 +4232,10 @@ EOF
fi
# check for smartcard support
smartcard_cflags=""
if test "$smartcard" != "no"; then
if $pkg_config libcacard; then
libcacard_cflags=$($pkg_config --cflags libcacard)
libcacard_libs=$($pkg_config --libs libcacard)
QEMU_CFLAGS="$QEMU_CFLAGS $libcacard_cflags"
libs_softmmu="$libs_softmmu $libcacard_libs"
smartcard="yes"
else
if test "$smartcard" = "yes"; then
@@ -4226,8 +4251,6 @@ if test "$libusb" != "no" ; then
libusb="yes"
libusb_cflags=$($pkg_config --cflags libusb-1.0)
libusb_libs=$($pkg_config --libs libusb-1.0)
QEMU_CFLAGS="$QEMU_CFLAGS $libusb_cflags"
libs_softmmu="$libs_softmmu $libusb_libs"
else
if test "$libusb" = "yes"; then
feature_not_found "libusb" "Install libusb devel >= 1.0.13"
@@ -4242,8 +4265,6 @@ if test "$usb_redir" != "no" ; then
usb_redir="yes"
usb_redir_cflags=$($pkg_config --cflags libusbredirparser-0.5)
usb_redir_libs=$($pkg_config --libs libusbredirparser-0.5)
QEMU_CFLAGS="$QEMU_CFLAGS $usb_redir_cflags"
libs_softmmu="$libs_softmmu $usb_redir_libs"
else
if test "$usb_redir" = "yes"; then
feature_not_found "usb-redir" "Install usbredir devel"
@@ -4406,6 +4427,18 @@ if compile_prog "" "" ; then
posix_syslog=yes
fi
##########################################
# check if we have sem_timedwait
sem_timedwait=no
cat > $TMPC << EOF
#include <semaphore.h>
int main(void) { return sem_timedwait(0, 0); }
EOF
if compile_prog "" "" ; then
sem_timedwait=yes
fi
##########################################
# check if trace backend exists
@@ -5034,16 +5067,34 @@ if test "$want_tools" = "yes" ; then
fi
fi
if test "$softmmu" = yes ; then
if test "$virtfs" != no ; then
if test "$cap" = yes && test "$linux" = yes && test "$attr" = yes ; then
if test "$linux" = yes; then
if test "$virtfs" != no && test "$cap" = yes && test "$attr" = yes ; then
virtfs=yes
tools="$tools fsdev/virtfs-proxy-helper\$(EXESUF)"
else
if test "$virtfs" = yes; then
error_exit "VirtFS is supported only on Linux and requires libcap devel and libattr devel"
error_exit "VirtFS requires libcap devel and libattr devel"
fi
virtfs=no
fi
if test "$mpath" != no && test "$mpathpersist" = yes ; then
mpath=yes
else
if test "$mpath" = yes; then
error_exit "Multipath requires libmpathpersist devel"
fi
mpath=no
fi
tools="$tools scsi/qemu-pr-helper\$(EXESUF)"
else
if test "$virtfs" = yes; then
error_exit "VirtFS is supported only on Linux"
fi
virtfs=no
if test "$mpath" = yes; then
error_exit "Multipath is supported only on Linux"
fi
mpath=no
fi
fi
@@ -5228,6 +5279,7 @@ libs_softmmu="$pixman_libs $libs_softmmu"
echo "Install prefix $prefix"
echo "BIOS directory $(eval echo $qemu_datadir)"
echo "firmware path $(eval echo $firmwarepath)"
echo "binary directory $(eval echo $bindir)"
echo "library directory $(eval echo $libdir)"
echo "module directory $(eval echo $qemu_moddir)"
@@ -5289,6 +5341,7 @@ echo "Audio drivers $audio_drv_list"
echo "Block whitelist (rw) $block_drv_rw_whitelist"
echo "Block whitelist (ro) $block_drv_ro_whitelist"
echo "VirtFS support $virtfs"
echo "Multipath support $mpath"
echo "VNC support $vnc"
if test "$vnc" = "yes" ; then
echo "VNC SASL support $vnc_sasl"
@@ -5418,6 +5471,7 @@ echo "mandir=$mandir" >> $config_host_mak
echo "sysconfdir=$sysconfdir" >> $config_host_mak
echo "qemu_confdir=$qemu_confdir" >> $config_host_mak
echo "qemu_datadir=$qemu_datadir" >> $config_host_mak
echo "qemu_firmwarepath=$firmwarepath" >> $config_host_mak
echo "qemu_docdir=$qemu_docdir" >> $config_host_mak
echo "qemu_moddir=$qemu_moddir" >> $config_host_mak
if test "$mingw32" = "no" ; then
@@ -5499,6 +5553,7 @@ if test "$slirp" = "yes" ; then
fi
if test "$vde" = "yes" ; then
echo "CONFIG_VDE=y" >> $config_host_mak
echo "VDE_LIBS=$vde_libs" >> $config_host_mak
fi
if test "$netmap" = "yes" ; then
echo "CONFIG_NETMAP=y" >> $config_host_mak
@@ -5514,6 +5569,11 @@ for drv in $audio_drv_list; do
def=CONFIG_$(echo $drv | LC_ALL=C tr '[a-z]' '[A-Z]')
echo "$def=y" >> $config_host_mak
done
echo "ALSA_LIBS=$alsa_libs" >> $config_host_mak
echo "PULSE_LIBS=$pulse_libs" >> $config_host_mak
echo "COREAUDIO_LIBS=$coreaudio_libs" >> $config_host_mak
echo "DSOUND_LIBS=$dsound_libs" >> $config_host_mak
echo "OSS_LIBS=$oss_libs" >> $config_host_mak
if test "$audio_pt_int" = "yes" ; then
echo "CONFIG_AUDIO_PT_INT=y" >> $config_host_mak
fi
@@ -5558,6 +5618,7 @@ if test "$sdl" = "yes" ; then
echo "CONFIG_SDL=y" >> $config_host_mak
echo "CONFIG_SDLABI=$sdlabi" >> $config_host_mak
echo "SDL_CFLAGS=$sdl_cflags" >> $config_host_mak
echo "SDL_LIBS=$sdl_libs" >> $config_host_mak
fi
if test "$cocoa" = "yes" ; then
echo "CONFIG_COCOA=y" >> $config_host_mak
@@ -5634,6 +5695,9 @@ fi
if test "$inotify1" = "yes" ; then
echo "CONFIG_INOTIFY1=y" >> $config_host_mak
fi
if test "$sem_timedwait" = "yes" ; then
echo "CONFIG_SEM_TIMEDWAIT=y" >> $config_host_mak
fi
if test "$byteswap_h" = "yes" ; then
echo "CONFIG_BYTESWAP_H=y" >> $config_host_mak
fi
@@ -5647,6 +5711,7 @@ if test "$curl" = "yes" ; then
fi
if test "$brlapi" = "yes" ; then
echo "CONFIG_BRLAPI=y" >> $config_host_mak
echo "BRLAPI_LIBS=$brlapi_libs" >> $config_host_mak
fi
if test "$bluez" = "yes" ; then
echo "CONFIG_BLUEZ=y" >> $config_host_mak
@@ -5732,6 +5797,9 @@ fi
if test "$virtfs" = "yes" ; then
echo "CONFIG_VIRTFS=y" >> $config_host_mak
fi
if test "$mpath" = "yes" ; then
echo "CONFIG_MPATH=y" >> $config_host_mak
fi
if test "$vhost_scsi" = "yes" ; then
echo "CONFIG_VHOST_SCSI=y" >> $config_host_mak
fi
@@ -5781,14 +5849,20 @@ fi
if test "$smartcard" = "yes" ; then
echo "CONFIG_SMARTCARD=y" >> $config_host_mak
echo "SMARTCARD_CFLAGS=$libcacard_cflags" >> $config_host_mak
echo "SMARTCARD_LIBS=$libcacard_libs" >> $config_host_mak
fi
if test "$libusb" = "yes" ; then
echo "CONFIG_USB_LIBUSB=y" >> $config_host_mak
echo "LIBUSB_CFLAGS=$libusb_cflags" >> $config_host_mak
echo "LIBUSB_LIBS=$libusb_libs" >> $config_host_mak
fi
if test "$usb_redir" = "yes" ; then
echo "CONFIG_USB_REDIR=y" >> $config_host_mak
echo "USB_REDIR_CFLAGS=$usb_redir_cflags" >> $config_host_mak
echo "USB_REDIR_LIBS=$usb_redir_libs" >> $config_host_mak
fi
if test "$opengl" = "yes" ; then
@@ -5984,6 +6058,7 @@ echo "CONFIG_TRACE_FILE=$trace_file" >> $config_host_mak
if test "$rdma" = "yes" ; then
echo "CONFIG_RDMA=y" >> $config_host_mak
echo "RDMA_LIBS=$rdma_libs" >> $config_host_mak
fi
if test "$have_rtnetlink" = "yes" ; then
@@ -6505,8 +6580,8 @@ if test "$ccache_cpp2" = "yes"; then
fi
# build tree in object directory in case the source is not in the current directory
DIRS="tests tests/tcg tests/tcg/cris tests/tcg/lm32 tests/libqos tests/qapi-schema tests/tcg/xtensa tests/qemu-iotests"
DIRS="$DIRS docs docs/interop fsdev"
DIRS="tests tests/tcg tests/tcg/cris tests/tcg/lm32 tests/libqos tests/qapi-schema tests/tcg/xtensa tests/qemu-iotests tests/vm"
DIRS="$DIRS docs docs/interop fsdev scsi"
DIRS="$DIRS pc-bios/optionrom pc-bios/spapr-rtas pc-bios/s390-ccw"
DIRS="$DIRS roms/seabios roms/vgabios"
DIRS="$DIRS qapi-generated"

5
cpus.c
View File

@@ -1764,8 +1764,9 @@ void qemu_init_vcpu(CPUState *cpu)
/* If the target cpu hasn't set up any address spaces itself,
* give it the default one.
*/
AddressSpace *as = address_space_init_shareable(cpu->memory,
"cpu-memory");
AddressSpace *as = g_new0(AddressSpace, 1);
address_space_init(as, cpu->memory, "cpu-memory");
cpu->num_ases = 1;
cpu_address_space_init(cpu, as, 0);
}

View File

@@ -846,8 +846,9 @@ qcrypto_block_luks_open(QCryptoBlock *block,
}
}
block->sector_size = QCRYPTO_BLOCK_LUKS_SECTOR_SIZE;
block->payload_offset = luks->header.payload_offset *
QCRYPTO_BLOCK_LUKS_SECTOR_SIZE;
block->sector_size;
luks->cipher_alg = cipheralg;
luks->cipher_mode = ciphermode;
@@ -1240,8 +1241,9 @@ qcrypto_block_luks_create(QCryptoBlock *block,
QCRYPTO_BLOCK_LUKS_SECTOR_SIZE)) *
QCRYPTO_BLOCK_LUKS_NUM_KEY_SLOTS);
block->sector_size = QCRYPTO_BLOCK_LUKS_SECTOR_SIZE;
block->payload_offset = luks->header.payload_offset *
QCRYPTO_BLOCK_LUKS_SECTOR_SIZE;
block->sector_size;
/* Reserve header space to match payload offset */
initfunc(block, block->payload_offset, opaque, &local_err);
@@ -1397,29 +1399,33 @@ static void qcrypto_block_luks_cleanup(QCryptoBlock *block)
static int
qcrypto_block_luks_decrypt(QCryptoBlock *block,
uint64_t startsector,
uint64_t offset,
uint8_t *buf,
size_t len,
Error **errp)
{
assert(QEMU_IS_ALIGNED(offset, QCRYPTO_BLOCK_LUKS_SECTOR_SIZE));
assert(QEMU_IS_ALIGNED(len, QCRYPTO_BLOCK_LUKS_SECTOR_SIZE));
return qcrypto_block_decrypt_helper(block->cipher,
block->niv, block->ivgen,
QCRYPTO_BLOCK_LUKS_SECTOR_SIZE,
startsector, buf, len, errp);
offset, buf, len, errp);
}
static int
qcrypto_block_luks_encrypt(QCryptoBlock *block,
uint64_t startsector,
uint64_t offset,
uint8_t *buf,
size_t len,
Error **errp)
{
assert(QEMU_IS_ALIGNED(offset, QCRYPTO_BLOCK_LUKS_SECTOR_SIZE));
assert(QEMU_IS_ALIGNED(len, QCRYPTO_BLOCK_LUKS_SECTOR_SIZE));
return qcrypto_block_encrypt_helper(block->cipher,
block->niv, block->ivgen,
QCRYPTO_BLOCK_LUKS_SECTOR_SIZE,
startsector, buf, len, errp);
offset, buf, len, errp);
}

View File

@@ -80,6 +80,7 @@ qcrypto_block_qcow_init(QCryptoBlock *block,
goto fail;
}
block->sector_size = QCRYPTO_BLOCK_QCOW_SECTOR_SIZE;
block->payload_offset = 0;
return 0;
@@ -142,29 +143,33 @@ qcrypto_block_qcow_cleanup(QCryptoBlock *block)
static int
qcrypto_block_qcow_decrypt(QCryptoBlock *block,
uint64_t startsector,
uint64_t offset,
uint8_t *buf,
size_t len,
Error **errp)
{
assert(QEMU_IS_ALIGNED(offset, QCRYPTO_BLOCK_QCOW_SECTOR_SIZE));
assert(QEMU_IS_ALIGNED(len, QCRYPTO_BLOCK_QCOW_SECTOR_SIZE));
return qcrypto_block_decrypt_helper(block->cipher,
block->niv, block->ivgen,
QCRYPTO_BLOCK_QCOW_SECTOR_SIZE,
startsector, buf, len, errp);
offset, buf, len, errp);
}
static int
qcrypto_block_qcow_encrypt(QCryptoBlock *block,
uint64_t startsector,
uint64_t offset,
uint8_t *buf,
size_t len,
Error **errp)
{
assert(QEMU_IS_ALIGNED(offset, QCRYPTO_BLOCK_QCOW_SECTOR_SIZE));
assert(QEMU_IS_ALIGNED(len, QCRYPTO_BLOCK_QCOW_SECTOR_SIZE));
return qcrypto_block_encrypt_helper(block->cipher,
block->niv, block->ivgen,
QCRYPTO_BLOCK_QCOW_SECTOR_SIZE,
startsector, buf, len, errp);
offset, buf, len, errp);
}

View File

@@ -127,22 +127,22 @@ QCryptoBlockInfo *qcrypto_block_get_info(QCryptoBlock *block,
int qcrypto_block_decrypt(QCryptoBlock *block,
uint64_t startsector,
uint64_t offset,
uint8_t *buf,
size_t len,
Error **errp)
{
return block->driver->decrypt(block, startsector, buf, len, errp);
return block->driver->decrypt(block, offset, buf, len, errp);
}
int qcrypto_block_encrypt(QCryptoBlock *block,
uint64_t startsector,
uint64_t offset,
uint8_t *buf,
size_t len,
Error **errp)
{
return block->driver->encrypt(block, startsector, buf, len, errp);
return block->driver->encrypt(block, offset, buf, len, errp);
}
@@ -170,6 +170,12 @@ uint64_t qcrypto_block_get_payload_offset(QCryptoBlock *block)
}
uint64_t qcrypto_block_get_sector_size(QCryptoBlock *block)
{
return block->sector_size;
}
void qcrypto_block_free(QCryptoBlock *block)
{
if (!block) {
@@ -188,13 +194,17 @@ int qcrypto_block_decrypt_helper(QCryptoCipher *cipher,
size_t niv,
QCryptoIVGen *ivgen,
int sectorsize,
uint64_t startsector,
uint64_t offset,
uint8_t *buf,
size_t len,
Error **errp)
{
uint8_t *iv;
int ret = -1;
uint64_t startsector = offset / sectorsize;
assert(QEMU_IS_ALIGNED(offset, sectorsize));
assert(QEMU_IS_ALIGNED(len, sectorsize));
iv = niv ? g_new0(uint8_t, niv) : NULL;
@@ -237,13 +247,17 @@ int qcrypto_block_encrypt_helper(QCryptoCipher *cipher,
size_t niv,
QCryptoIVGen *ivgen,
int sectorsize,
uint64_t startsector,
uint64_t offset,
uint8_t *buf,
size_t len,
Error **errp)
{
uint8_t *iv;
int ret = -1;
uint64_t startsector = offset / sectorsize;
assert(QEMU_IS_ALIGNED(offset, sectorsize));
assert(QEMU_IS_ALIGNED(len, sectorsize));
iv = niv ? g_new0(uint8_t, niv) : NULL;

View File

@@ -36,6 +36,7 @@ struct QCryptoBlock {
QCryptoHashAlgorithm kdfhash;
size_t niv;
uint64_t payload_offset; /* In bytes */
uint64_t sector_size; /* In bytes */
};
struct QCryptoBlockDriver {
@@ -81,7 +82,7 @@ int qcrypto_block_decrypt_helper(QCryptoCipher *cipher,
size_t niv,
QCryptoIVGen *ivgen,
int sectorsize,
uint64_t startsector,
uint64_t offset,
uint8_t *buf,
size_t len,
Error **errp);
@@ -90,7 +91,7 @@ int qcrypto_block_encrypt_helper(QCryptoCipher *cipher,
size_t niv,
QCryptoIVGen *ivgen,
int sectorsize,
uint64_t startsector,
uint64_t offset,
uint8_t *buf,
size_t len,
Error **errp);

View File

@@ -129,3 +129,4 @@ CONFIG_ACPI=y
CONFIG_SMBIOS=y
CONFIG_ASPEED_SOC=y
CONFIG_GPIO_KEY=y
CONFIG_MSF2=y

View File

@@ -63,11 +63,23 @@ operations:
typeof(*ptr) atomic_fetch_sub(ptr, val)
typeof(*ptr) atomic_fetch_and(ptr, val)
typeof(*ptr) atomic_fetch_or(ptr, val)
typeof(*ptr) atomic_fetch_xor(ptr, val)
typeof(*ptr) atomic_fetch_inc_nonzero(ptr)
typeof(*ptr) atomic_xchg(ptr, val)
typeof(*ptr) atomic_cmpxchg(ptr, old, new)
all of which return the old value of *ptr. These operations are
polymorphic; they operate on any type that is as wide as an int.
polymorphic; they operate on any type that is as wide as a pointer.
Similar operations return the new value of *ptr:
typeof(*ptr) atomic_inc_fetch(ptr)
typeof(*ptr) atomic_dec_fetch(ptr)
typeof(*ptr) atomic_add_fetch(ptr, val)
typeof(*ptr) atomic_sub_fetch(ptr, val)
typeof(*ptr) atomic_and_fetch(ptr, val)
typeof(*ptr) atomic_or_fetch(ptr, val)
typeof(*ptr) atomic_xor_fetch(ptr, val)
Sequentially consistent loads and stores can be done using:

View File

@@ -202,7 +202,7 @@ The functions to do that are inside a vmstate definition, and are called:
This function is called after we load the state of one device.
- void (*pre_save)(void *opaque);
- int (*pre_save)(void *opaque);
This function is called before we save the state of one device.

View File

@@ -0,0 +1,83 @@
..
======================================
Persistent reservation helper protocol
======================================
QEMU's SCSI passthrough devices, ``scsi-block`` and ``scsi-generic``,
can delegate implementation of persistent reservations to an external
(and typically privileged) program. Persistent Reservations allow
restricting access to block devices to specific initiators in a shared
storage setup.
For a more detailed reference please refer the the SCSI Primary
Commands standard, specifically the section on Reservations and the
"PERSISTENT RESERVE IN" and "PERSISTENT RESERVE OUT" commands.
This document describes the socket protocol used between QEMU's
``pr-manager-helper`` object and the external program.
.. contents::
Connection and initialization
-----------------------------
All data transmitted on the socket is big-endian.
After connecting to the helper program's socket, the helper starts a simple
feature negotiation process by writing four bytes corresponding to
the features it exposes (``supported_features``). QEMU reads it,
then writes four bytes corresponding to the desired features of the
helper program (``requested_features``).
If a bit is 1 in ``requested_features`` and 0 in ``supported_features``,
the corresponding feature is not supported by the helper and the connection
is closed. On the other hand, it is acceptable for a bit to be 0 in
``requested_features`` and 1 in ``supported_features``; in this case,
the helper will not enable the feature.
Right now no feature is defined, so the two parties always write four
zero bytes.
Command format
--------------
It is invalid to send multiple commands concurrently on the same
socket. It is however possible to connect multiple sockets to the
helper and send multiple commands to the helper for one or more
file descriptors.
A command consists of a request and a response. A request consists
of a 16-byte SCSI CDB. A file descriptor must be passed to the helper
together with the SCSI CDB using ancillary data.
The CDB has the following limitations:
- the command (stored in the first byte) must be one of 0x5E
(PERSISTENT RESERVE IN) or 0x5F (PERSISTENT RESERVE OUT).
- the allocation length (stored in bytes 7-8 of the CDB for PERSISTENT
RESERVE IN) or parameter list length (stored in bytes 5-8 of the CDB
for PERSISTENT RESERVE OUT) is limited to 8 KiB.
For PERSISTENT RESERVE OUT, the parameter list is sent right after the
CDB. The length of the parameter list is taken from the CDB itself.
The helper's reply has the following structure:
- 4 bytes for the SCSI status
- 4 bytes for the payload size (nonzero only for PERSISTENT RESERVE IN
and only if the SCSI status is 0x00, i.e. GOOD)
- 96 bytes for the SCSI sense data
- if the size is nonzero, the payload follows
The sense data is always sent to keep the protocol simple, even though
it is only valid if the SCSI status is CHECK CONDITION (0x02).
The payload size is always less than or equal to the allocation length
specified in the CDB for the PERSISTENT RESERVE IN command.
If the protocol is violated, the helper closes the socket.

View File

@@ -24,7 +24,7 @@ Where,
For example, the following command-line:
qemu [...] 1G,slots=3,maxmem=4G
qemu [...] -m 1G,slots=3,maxmem=4G
Creates a guest with 1GB of memory and three hotpluggable memory slots.
The hotpluggable memory slots are empty when the guest is booted, so all

111
docs/pr-manager.rst Normal file
View File

@@ -0,0 +1,111 @@
======================================
Persistent reservation managers
======================================
SCSI persistent Reservations allow restricting access to block devices
to specific initiators in a shared storage setup. When implementing
clustering of virtual machines, it is a common requirement for virtual
machines to send persistent reservation SCSI commands. However,
the operating system restricts sending these commands to unprivileged
programs because incorrect usage can disrupt regular operation of the
storage fabric.
For this reason, QEMU's SCSI passthrough devices, ``scsi-block``
and ``scsi-generic`` (both are only available on Linux) can delegate
implementation of persistent reservations to a separate object,
the "persistent reservation manager". Only PERSISTENT RESERVE OUT and
PERSISTENT RESERVE IN commands are passed to the persistent reservation
manager object; other commands are processed by QEMU as usual.
-----------------------------------------
Defining a persistent reservation manager
-----------------------------------------
A persistent reservation manager is an instance of a subclass of the
"pr-manager" QOM class.
Right now only one subclass is defined, ``pr-manager-helper``, which
forwards the commands to an external privileged helper program
over Unix sockets. The helper program only allows sending persistent
reservation commands to devices for which QEMU has a file descriptor,
so that QEMU will not be able to effect persistent reservations
unless it has access to both the socket and the device.
``pr-manager-helper`` has a single string property, ``path``, which
accepts the path to the helper program's Unix socket. For example,
the following command line defines a ``pr-manager-helper`` object and
attaches it to a SCSI passthrough device::
$ qemu-system-x86_64
-device virtio-scsi \
-object pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock
-drive if=none,id=hd,driver=raw,file.filename=/dev/sdb,file.pr-manager=helper0
-device scsi-block,drive=hd
Alternatively, using ``-blockdev``::
$ qemu-system-x86_64
-device virtio-scsi \
-object pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock
-blockdev node-name=hd,driver=raw,file.driver=host_device,file.filename=/dev/sdb,file.pr-manager=helper0
-device scsi-block,drive=hd
----------------------------------
Invoking :program:`qemu-pr-helper`
----------------------------------
QEMU provides an implementation of the persistent reservation helper,
called :program:`qemu-pr-helper`. The helper should be started as a
system service and supports the following option:
-d, --daemon run in the background
-q, --quiet decrease verbosity
-v, --verbose increase verbosity
-f, --pidfile=path PID file when running as a daemon
-k, --socket=path path to the socket
-T, --trace=trace-opts tracing options
By default, the socket and PID file are placed in the runtime state
directory, for example :file:`/var/run/qemu-pr-helper.sock` and
:file:`/var/run/qemu-pr-helper.pid`. The PID file is not created
unless :option:`-d` is passed too.
:program:`qemu-pr-helper` can also use the systemd socket activation
protocol. In this case, the systemd socket unit should specify a
Unix stream socket, like this::
[Socket]
ListenStream=/var/run/qemu-pr-helper.sock
After connecting to the socket, :program:`qemu-pr-helper`` can optionally drop
root privileges, except for those capabilities that are needed for
its operation. To do this, add the following options:
-u, --user=user user to drop privileges to
-g, --group=group group to drop privileges to
---------------------------------------------
Multipath devices and persistent reservations
---------------------------------------------
Proper support of persistent reservation for multipath devices requires
communication with the multipath daemon, so that the reservation is
registered and applied when a path is newly discovered or becomes online
again. :command:`qemu-pr-helper` can do this if the ``libmpathpersist``
library was available on the system at build time.
As of August 2017, a reservation key must be specified in ``multipath.conf``
for ``multipathd`` to check for persistent reservation for newly
discovered paths or reinstated paths. The attribute can be added
to the ``defaults`` section or the ``multipaths`` section; for example::
multipaths {
multipath {
wwid XXXXXXXXXXXXXXXX
alias yellow
reservation_key 0x123abc
}
}
Linking :program:`qemu-pr-helper` to ``libmpathpersist`` does not impede
its usage on regular SCSI devices.

View File

@@ -0,0 +1,804 @@
@c man begin SYNOPSIS
QEMU block driver reference manual
@c man end
@c man begin DESCRIPTION
@node disk_images_formats
@subsection Disk image file formats
QEMU supports many image file formats that can be used with VMs as well as with
any of the tools (like @code{qemu-img}). This includes the preferred formats
raw and qcow2 as well as formats that are supported for compatibility with
older QEMU versions or other hypervisors.
Depending on the image format, different options can be passed to
@code{qemu-img create} and @code{qemu-img convert} using the @code{-o} option.
This section describes each format and the options that are supported for it.
@table @option
@item raw
Raw disk image format. This format has the advantage of
being simple and easily exportable to all other emulators. If your
file system supports @emph{holes} (for example in ext2 or ext3 on
Linux or NTFS on Windows), then only the written sectors will reserve
space. Use @code{qemu-img info} to know the real size used by the
image or @code{ls -ls} on Unix/Linux.
Supported options:
@table @code
@item preallocation
Preallocation mode (allowed values: @code{off}, @code{falloc}, @code{full}).
@code{falloc} mode preallocates space for image by calling posix_fallocate().
@code{full} mode preallocates space for image by writing zeros to underlying
storage.
@end table
@item qcow2
QEMU image format, the most versatile format. Use it to have smaller
images (useful if your filesystem does not supports holes, for example
on Windows), zlib based compression and support of multiple VM
snapshots.
Supported options:
@table @code
@item compat
Determines the qcow2 version to use. @code{compat=0.10} uses the
traditional image format that can be read by any QEMU since 0.10.
@code{compat=1.1} enables image format extensions that only QEMU 1.1 and
newer understand (this is the default). Amongst others, this includes
zero clusters, which allow efficient copy-on-read for sparse images.
@item backing_file
File name of a base image (see @option{create} subcommand)
@item backing_fmt
Image format of the base image
@item encryption
This option is deprecated and equivalent to @code{encrypt.format=aes}
@item encrypt.format
If this is set to @code{luks}, it requests that the qcow2 payload (not
qcow2 header) be encrypted using the LUKS format. The passphrase to
use to unlock the LUKS key slot is given by the @code{encrypt.key-secret}
parameter. LUKS encryption parameters can be tuned with the other
@code{encrypt.*} parameters.
If this is set to @code{aes}, the image is encrypted with 128-bit AES-CBC.
The encryption key is given by the @code{encrypt.key-secret} parameter.
This encryption format is considered to be flawed by modern cryptography
standards, suffering from a number of design problems:
@itemize @minus
@item The AES-CBC cipher is used with predictable initialization vectors based
on the sector number. This makes it vulnerable to chosen plaintext attacks
which can reveal the existence of encrypted data.
@item The user passphrase is directly used as the encryption key. A poorly
chosen or short passphrase will compromise the security of the encryption.
@item In the event of the passphrase being compromised there is no way to
change the passphrase to protect data in any qcow images. The files must
be cloned, using a different encryption passphrase in the new file. The
original file must then be securely erased using a program like shred,
though even this is ineffective with many modern storage technologies.
@end itemize
The use of this is no longer supported in system emulators. Support only
remains in the command line utilities, for the purposes of data liberation
and interoperability with old versions of QEMU. The @code{luks} format
should be used instead.
@item encrypt.key-secret
Provides the ID of a @code{secret} object that contains the passphrase
(@code{encrypt.format=luks}) or encryption key (@code{encrypt.format=aes}).
@item encrypt.cipher-alg
Name of the cipher algorithm and key length. Currently defaults
to @code{aes-256}. Only used when @code{encrypt.format=luks}.
@item encrypt.cipher-mode
Name of the encryption mode to use. Currently defaults to @code{xts}.
Only used when @code{encrypt.format=luks}.
@item encrypt.ivgen-alg
Name of the initialization vector generator algorithm. Currently defaults
to @code{plain64}. Only used when @code{encrypt.format=luks}.
@item encrypt.ivgen-hash-alg
Name of the hash algorithm to use with the initialization vector generator
(if required). Defaults to @code{sha256}. Only used when @code{encrypt.format=luks}.
@item encrypt.hash-alg
Name of the hash algorithm to use for PBKDF algorithm
Defaults to @code{sha256}. Only used when @code{encrypt.format=luks}.
@item encrypt.iter-time
Amount of time, in milliseconds, to use for PBKDF algorithm per key slot.
Defaults to @code{2000}. Only used when @code{encrypt.format=luks}.
@item cluster_size
Changes the qcow2 cluster size (must be between 512 and 2M). Smaller cluster
sizes can improve the image file size whereas larger cluster sizes generally
provide better performance.
@item preallocation
Preallocation mode (allowed values: @code{off}, @code{metadata}, @code{falloc},
@code{full}). An image with preallocated metadata is initially larger but can
improve performance when the image needs to grow. @code{falloc} and @code{full}
preallocations are like the same options of @code{raw} format, but sets up
metadata also.
@item lazy_refcounts
If this option is set to @code{on}, reference count updates are postponed with
the goal of avoiding metadata I/O and improving performance. This is
particularly interesting with @option{cache=writethrough} which doesn't batch
metadata updates. The tradeoff is that after a host crash, the reference count
tables must be rebuilt, i.e. on the next open an (automatic) @code{qemu-img
check -r all} is required, which may take some time.
This option can only be enabled if @code{compat=1.1} is specified.
@item nocow
If this option is set to @code{on}, it will turn off COW of the file. It's only
valid on btrfs, no effect on other file systems.
Btrfs has low performance when hosting a VM image file, even more when the guest
on the VM also using btrfs as file system. Turning off COW is a way to mitigate
this bad performance. Generally there are two ways to turn off COW on btrfs:
a) Disable it by mounting with nodatacow, then all newly created files will be
NOCOW. b) For an empty file, add the NOCOW file attribute. That's what this option
does.
Note: this option is only valid to new or empty files. If there is an existing
file which is COW and has data blocks already, it couldn't be changed to NOCOW
by setting @code{nocow=on}. One can issue @code{lsattr filename} to check if
the NOCOW flag is set or not (Capital 'C' is NOCOW flag).
@end table
@item qed
Old QEMU image format with support for backing files and compact image files
(when your filesystem or transport medium does not support holes).
When converting QED images to qcow2, you might want to consider using the
@code{lazy_refcounts=on} option to get a more QED-like behaviour.
Supported options:
@table @code
@item backing_file
File name of a base image (see @option{create} subcommand).
@item backing_fmt
Image file format of backing file (optional). Useful if the format cannot be
autodetected because it has no header, like some vhd/vpc files.
@item cluster_size
Changes the cluster size (must be power-of-2 between 4K and 64K). Smaller
cluster sizes can improve the image file size whereas larger cluster sizes
generally provide better performance.
@item table_size
Changes the number of clusters per L1/L2 table (must be power-of-2 between 1
and 16). There is normally no need to change this value but this option can be
used for performance benchmarking.
@end table
@item qcow
Old QEMU image format with support for backing files, compact image files,
encryption and compression.
Supported options:
@table @code
@item backing_file
File name of a base image (see @option{create} subcommand)
@item encryption
This option is deprecated and equivalent to @code{encrypt.format=aes}
@item encrypt.format
If this is set to @code{aes}, the image is encrypted with 128-bit AES-CBC.
The encryption key is given by the @code{encrypt.key-secret} parameter.
This encryption format is considered to be flawed by modern cryptography
standards, suffering from a number of design problems enumerated previously
against the @code{qcow2} image format.
The use of this is no longer supported in system emulators. Support only
remains in the command line utilities, for the purposes of data liberation
and interoperability with old versions of QEMU.
Users requiring native encryption should use the @code{qcow2} format
instead with @code{encrypt.format=luks}.
@item encrypt.key-secret
Provides the ID of a @code{secret} object that contains the encryption
key (@code{encrypt.format=aes}).
@end table
@item luks
LUKS v1 encryption format, compatible with Linux dm-crypt/cryptsetup
Supported options:
@table @code
@item key-secret
Provides the ID of a @code{secret} object that contains the passphrase.
@item cipher-alg
Name of the cipher algorithm and key length. Currently defaults
to @code{aes-256}.
@item cipher-mode
Name of the encryption mode to use. Currently defaults to @code{xts}.
@item ivgen-alg
Name of the initialization vector generator algorithm. Currently defaults
to @code{plain64}.
@item ivgen-hash-alg
Name of the hash algorithm to use with the initialization vector generator
(if required). Defaults to @code{sha256}.
@item hash-alg
Name of the hash algorithm to use for PBKDF algorithm
Defaults to @code{sha256}.
@item iter-time
Amount of time, in milliseconds, to use for PBKDF algorithm per key slot.
Defaults to @code{2000}.
@end table
@item vdi
VirtualBox 1.1 compatible image format.
Supported options:
@table @code
@item static
If this option is set to @code{on}, the image is created with metadata
preallocation.
@end table
@item vmdk
VMware 3 and 4 compatible image format.
Supported options:
@table @code
@item backing_file
File name of a base image (see @option{create} subcommand).
@item compat6
Create a VMDK version 6 image (instead of version 4)
@item hwversion
Specify vmdk virtual hardware version. Compat6 flag cannot be enabled
if hwversion is specified.
@item subformat
Specifies which VMDK subformat to use. Valid options are
@code{monolithicSparse} (default),
@code{monolithicFlat},
@code{twoGbMaxExtentSparse},
@code{twoGbMaxExtentFlat} and
@code{streamOptimized}.
@end table
@item vpc
VirtualPC compatible image format (VHD).
Supported options:
@table @code
@item subformat
Specifies which VHD subformat to use. Valid options are
@code{dynamic} (default) and @code{fixed}.
@end table
@item VHDX
Hyper-V compatible image format (VHDX).
Supported options:
@table @code
@item subformat
Specifies which VHDX subformat to use. Valid options are
@code{dynamic} (default) and @code{fixed}.
@item block_state_zero
Force use of payload blocks of type 'ZERO'. Can be set to @code{on} (default)
or @code{off}. When set to @code{off}, new blocks will be created as
@code{PAYLOAD_BLOCK_NOT_PRESENT}, which means parsers are free to return
arbitrary data for those blocks. Do not set to @code{off} when using
@code{qemu-img convert} with @code{subformat=dynamic}.
@item block_size
Block size; min 1 MB, max 256 MB. 0 means auto-calculate based on image size.
@item log_size
Log size; min 1 MB.
@end table
@end table
@subsubsection Read-only formats
More disk image file formats are supported in a read-only mode.
@table @option
@item bochs
Bochs images of @code{growing} type.
@item cloop
Linux Compressed Loop image, useful only to reuse directly compressed
CD-ROM images present for example in the Knoppix CD-ROMs.
@item dmg
Apple disk image.
@item parallels
Parallels disk image format.
@end table
@node host_drives
@subsection Using host drives
In addition to disk image files, QEMU can directly access host
devices. We describe here the usage for QEMU version >= 0.8.3.
@subsubsection Linux
On Linux, you can directly use the host device filename instead of a
disk image filename provided you have enough privileges to access
it. For example, use @file{/dev/cdrom} to access to the CDROM.
@table @code
@item CD
You can specify a CDROM device even if no CDROM is loaded. QEMU has
specific code to detect CDROM insertion or removal. CDROM ejection by
the guest OS is supported. Currently only data CDs are supported.
@item Floppy
You can specify a floppy device even if no floppy is loaded. Floppy
removal is currently not detected accurately (if you change floppy
without doing floppy access while the floppy is not loaded, the guest
OS will think that the same floppy is loaded).
Use of the host's floppy device is deprecated, and support for it will
be removed in a future release.
@item Hard disks
Hard disks can be used. Normally you must specify the whole disk
(@file{/dev/hdb} instead of @file{/dev/hdb1}) so that the guest OS can
see it as a partitioned disk. WARNING: unless you know what you do, it
is better to only make READ-ONLY accesses to the hard disk otherwise
you may corrupt your host data (use the @option{-snapshot} command
line option or modify the device permissions accordingly).
@end table
@subsubsection Windows
@table @code
@item CD
The preferred syntax is the drive letter (e.g. @file{d:}). The
alternate syntax @file{\\.\d:} is supported. @file{/dev/cdrom} is
supported as an alias to the first CDROM drive.
Currently there is no specific code to handle removable media, so it
is better to use the @code{change} or @code{eject} monitor commands to
change or eject media.
@item Hard disks
Hard disks can be used with the syntax: @file{\\.\PhysicalDrive@var{N}}
where @var{N} is the drive number (0 is the first hard disk).
WARNING: unless you know what you do, it is better to only make
READ-ONLY accesses to the hard disk otherwise you may corrupt your
host data (use the @option{-snapshot} command line so that the
modifications are written in a temporary file).
@end table
@subsubsection Mac OS X
@file{/dev/cdrom} is an alias to the first CDROM.
Currently there is no specific code to handle removable media, so it
is better to use the @code{change} or @code{eject} monitor commands to
change or eject media.
@node disk_images_fat_images
@subsection Virtual FAT disk images
QEMU can automatically create a virtual FAT disk image from a
directory tree. In order to use it, just type:
@example
qemu-system-i386 linux.img -hdb fat:/my_directory
@end example
Then you access access to all the files in the @file{/my_directory}
directory without having to copy them in a disk image or to export
them via SAMBA or NFS. The default access is @emph{read-only}.
Floppies can be emulated with the @code{:floppy:} option:
@example
qemu-system-i386 linux.img -fda fat:floppy:/my_directory
@end example
A read/write support is available for testing (beta stage) with the
@code{:rw:} option:
@example
qemu-system-i386 linux.img -fda fat:floppy:rw:/my_directory
@end example
What you should @emph{never} do:
@itemize
@item use non-ASCII filenames ;
@item use "-snapshot" together with ":rw:" ;
@item expect it to work when loadvm'ing ;
@item write to the FAT directory on the host system while accessing it with the guest system.
@end itemize
@node disk_images_nbd
@subsection NBD access
QEMU can access directly to block device exported using the Network Block Device
protocol.
@example
qemu-system-i386 linux.img -hdb nbd://my_nbd_server.mydomain.org:1024/
@end example
If the NBD server is located on the same host, you can use an unix socket instead
of an inet socket:
@example
qemu-system-i386 linux.img -hdb nbd+unix://?socket=/tmp/my_socket
@end example
In this case, the block device must be exported using qemu-nbd:
@example
qemu-nbd --socket=/tmp/my_socket my_disk.qcow2
@end example
The use of qemu-nbd allows sharing of a disk between several guests:
@example
qemu-nbd --socket=/tmp/my_socket --share=2 my_disk.qcow2
@end example
@noindent
and then you can use it with two guests:
@example
qemu-system-i386 linux1.img -hdb nbd+unix://?socket=/tmp/my_socket
qemu-system-i386 linux2.img -hdb nbd+unix://?socket=/tmp/my_socket
@end example
If the nbd-server uses named exports (supported since NBD 2.9.18, or with QEMU's
own embedded NBD server), you must specify an export name in the URI:
@example
qemu-system-i386 -cdrom nbd://localhost/debian-500-ppc-netinst
qemu-system-i386 -cdrom nbd://localhost/openSUSE-11.1-ppc-netinst
@end example
The URI syntax for NBD is supported since QEMU 1.3. An alternative syntax is
also available. Here are some example of the older syntax:
@example
qemu-system-i386 linux.img -hdb nbd:my_nbd_server.mydomain.org:1024
qemu-system-i386 linux2.img -hdb nbd:unix:/tmp/my_socket
qemu-system-i386 -cdrom nbd:localhost:10809:exportname=debian-500-ppc-netinst
@end example
@node disk_images_sheepdog
@subsection Sheepdog disk images
Sheepdog is a distributed storage system for QEMU. It provides highly
available block level storage volumes that can be attached to
QEMU-based virtual machines.
You can create a Sheepdog disk image with the command:
@example
qemu-img create sheepdog:///@var{image} @var{size}
@end example
where @var{image} is the Sheepdog image name and @var{size} is its
size.
To import the existing @var{filename} to Sheepdog, you can use a
convert command.
@example
qemu-img convert @var{filename} sheepdog:///@var{image}
@end example
You can boot from the Sheepdog disk image with the command:
@example
qemu-system-i386 sheepdog:///@var{image}
@end example
You can also create a snapshot of the Sheepdog image like qcow2.
@example
qemu-img snapshot -c @var{tag} sheepdog:///@var{image}
@end example
where @var{tag} is a tag name of the newly created snapshot.
To boot from the Sheepdog snapshot, specify the tag name of the
snapshot.
@example
qemu-system-i386 sheepdog:///@var{image}#@var{tag}
@end example
You can create a cloned image from the existing snapshot.
@example
qemu-img create -b sheepdog:///@var{base}#@var{tag} sheepdog:///@var{image}
@end example
where @var{base} is a image name of the source snapshot and @var{tag}
is its tag name.
You can use an unix socket instead of an inet socket:
@example
qemu-system-i386 sheepdog+unix:///@var{image}?socket=@var{path}
@end example
If the Sheepdog daemon doesn't run on the local host, you need to
specify one of the Sheepdog servers to connect to.
@example
qemu-img create sheepdog://@var{hostname}:@var{port}/@var{image} @var{size}
qemu-system-i386 sheepdog://@var{hostname}:@var{port}/@var{image}
@end example
@node disk_images_iscsi
@subsection iSCSI LUNs
iSCSI is a popular protocol used to access SCSI devices across a computer
network.
There are two different ways iSCSI devices can be used by QEMU.
The first method is to mount the iSCSI LUN on the host, and make it appear as
any other ordinary SCSI device on the host and then to access this device as a
/dev/sd device from QEMU. How to do this differs between host OSes.
The second method involves using the iSCSI initiator that is built into
QEMU. This provides a mechanism that works the same way regardless of which
host OS you are running QEMU on. This section will describe this second method
of using iSCSI together with QEMU.
In QEMU, iSCSI devices are described using special iSCSI URLs
@example
URL syntax:
iscsi://[<username>[%<password>]@@]<host>[:<port>]/<target-iqn-name>/<lun>
@end example
Username and password are optional and only used if your target is set up
using CHAP authentication for access control.
Alternatively the username and password can also be set via environment
variables to have these not show up in the process list
@example
export LIBISCSI_CHAP_USERNAME=<username>
export LIBISCSI_CHAP_PASSWORD=<password>
iscsi://<host>/<target-iqn-name>/<lun>
@end example
Various session related parameters can be set via special options, either
in a configuration file provided via '-readconfig' or directly on the
command line.
If the initiator-name is not specified qemu will use a default name
of 'iqn.2008-11.org.linux-kvm[:<uuid>'] where <uuid> is the UUID of the
virtual machine. If the UUID is not specified qemu will use
'iqn.2008-11.org.linux-kvm[:<name>'] where <name> is the name of the
virtual machine.
@example
Setting a specific initiator name to use when logging in to the target
-iscsi initiator-name=iqn.qemu.test:my-initiator
@end example
@example
Controlling which type of header digest to negotiate with the target
-iscsi header-digest=CRC32C|CRC32C-NONE|NONE-CRC32C|NONE
@end example
These can also be set via a configuration file
@example
[iscsi]
user = "CHAP username"
password = "CHAP password"
initiator-name = "iqn.qemu.test:my-initiator"
# header digest is one of CRC32C|CRC32C-NONE|NONE-CRC32C|NONE
header-digest = "CRC32C"
@end example
Setting the target name allows different options for different targets
@example
[iscsi "iqn.target.name"]
user = "CHAP username"
password = "CHAP password"
initiator-name = "iqn.qemu.test:my-initiator"
# header digest is one of CRC32C|CRC32C-NONE|NONE-CRC32C|NONE
header-digest = "CRC32C"
@end example
Howto use a configuration file to set iSCSI configuration options:
@example
cat >iscsi.conf <<EOF
[iscsi]
user = "me"
password = "my password"
initiator-name = "iqn.qemu.test:my-initiator"
header-digest = "CRC32C"
EOF
qemu-system-i386 -drive file=iscsi://127.0.0.1/iqn.qemu.test/1 \
-readconfig iscsi.conf
@end example
Howto set up a simple iSCSI target on loopback and accessing it via QEMU:
@example
This example shows how to set up an iSCSI target with one CDROM and one DISK
using the Linux STGT software target. This target is available on Red Hat based
systems as the package 'scsi-target-utils'.
tgtd --iscsi portal=127.0.0.1:3260
tgtadm --lld iscsi --op new --mode target --tid 1 -T iqn.qemu.test
tgtadm --lld iscsi --mode logicalunit --op new --tid 1 --lun 1 \
-b /IMAGES/disk.img --device-type=disk
tgtadm --lld iscsi --mode logicalunit --op new --tid 1 --lun 2 \
-b /IMAGES/cd.iso --device-type=cd
tgtadm --lld iscsi --op bind --mode target --tid 1 -I ALL
qemu-system-i386 -iscsi initiator-name=iqn.qemu.test:my-initiator \
-boot d -drive file=iscsi://127.0.0.1/iqn.qemu.test/1 \
-cdrom iscsi://127.0.0.1/iqn.qemu.test/2
@end example
@node disk_images_gluster
@subsection GlusterFS disk images
GlusterFS is a user space distributed file system.
You can boot from the GlusterFS disk image with the command:
@example
URI:
qemu-system-x86_64 -drive file=gluster[+@var{type}]://[@var{host}[:@var{port}]]/@var{volume}/@var{path}
[?socket=...][,file.debug=9][,file.logfile=...]
JSON:
qemu-system-x86_64 'json:@{"driver":"qcow2",
"file":@{"driver":"gluster",
"volume":"testvol","path":"a.img","debug":9,"logfile":"...",
"server":[@{"type":"tcp","host":"...","port":"..."@},
@{"type":"unix","socket":"..."@}]@}@}'
@end example
@var{gluster} is the protocol.
@var{type} specifies the transport type used to connect to gluster
management daemon (glusterd). Valid transport types are
tcp and unix. In the URI form, if a transport type isn't specified,
then tcp type is assumed.
@var{host} specifies the server where the volume file specification for
the given volume resides. This can be either a hostname or an ipv4 address.
If transport type is unix, then @var{host} field should not be specified.
Instead @var{socket} field needs to be populated with the path to unix domain
socket.
@var{port} is the port number on which glusterd is listening. This is optional
and if not specified, it defaults to port 24007. If the transport type is unix,
then @var{port} should not be specified.
@var{volume} is the name of the gluster volume which contains the disk image.
@var{path} is the path to the actual disk image that resides on gluster volume.
@var{debug} is the logging level of the gluster protocol driver. Debug levels
are 0-9, with 9 being the most verbose, and 0 representing no debugging output.
The default level is 4. The current logging levels defined in the gluster source
are 0 - None, 1 - Emergency, 2 - Alert, 3 - Critical, 4 - Error, 5 - Warning,
6 - Notice, 7 - Info, 8 - Debug, 9 - Trace
@var{logfile} is a commandline option to mention log file path which helps in
logging to the specified file and also help in persisting the gfapi logs. The
default is stderr.
You can create a GlusterFS disk image with the command:
@example
qemu-img create gluster://@var{host}/@var{volume}/@var{path} @var{size}
@end example
Examples
@example
qemu-system-x86_64 -drive file=gluster://1.2.3.4/testvol/a.img
qemu-system-x86_64 -drive file=gluster+tcp://1.2.3.4/testvol/a.img
qemu-system-x86_64 -drive file=gluster+tcp://1.2.3.4:24007/testvol/dir/a.img
qemu-system-x86_64 -drive file=gluster+tcp://[1:2:3:4:5:6:7:8]/testvol/dir/a.img
qemu-system-x86_64 -drive file=gluster+tcp://[1:2:3:4:5:6:7:8]:24007/testvol/dir/a.img
qemu-system-x86_64 -drive file=gluster+tcp://server.domain.com:24007/testvol/dir/a.img
qemu-system-x86_64 -drive file=gluster+unix:///testvol/dir/a.img?socket=/tmp/glusterd.socket
qemu-system-x86_64 -drive file=gluster+rdma://1.2.3.4:24007/testvol/a.img
qemu-system-x86_64 -drive file=gluster://1.2.3.4/testvol/a.img,file.debug=9,file.logfile=/var/log/qemu-gluster.log
qemu-system-x86_64 'json:@{"driver":"qcow2",
"file":@{"driver":"gluster",
"volume":"testvol","path":"a.img",
"debug":9,"logfile":"/var/log/qemu-gluster.log",
"server":[@{"type":"tcp","host":"1.2.3.4","port":24007@},
@{"type":"unix","socket":"/var/run/glusterd.socket"@}]@}@}'
qemu-system-x86_64 -drive driver=qcow2,file.driver=gluster,file.volume=testvol,file.path=/path/a.img,
file.debug=9,file.logfile=/var/log/qemu-gluster.log,
file.server.0.type=tcp,file.server.0.host=1.2.3.4,file.server.0.port=24007,
file.server.1.type=unix,file.server.1.socket=/var/run/glusterd.socket
@end example
@node disk_images_ssh
@subsection Secure Shell (ssh) disk images
You can access disk images located on a remote ssh server
by using the ssh protocol:
@example
qemu-system-x86_64 -drive file=ssh://[@var{user}@@]@var{server}[:@var{port}]/@var{path}[?host_key_check=@var{host_key_check}]
@end example
Alternative syntax using properties:
@example
qemu-system-x86_64 -drive file.driver=ssh[,file.user=@var{user}],file.host=@var{server}[,file.port=@var{port}],file.path=@var{path}[,file.host_key_check=@var{host_key_check}]
@end example
@var{ssh} is the protocol.
@var{user} is the remote user. If not specified, then the local
username is tried.
@var{server} specifies the remote ssh server. Any ssh server can be
used, but it must implement the sftp-server protocol. Most Unix/Linux
systems should work without requiring any extra configuration.
@var{port} is the port number on which sshd is listening. By default
the standard ssh port (22) is used.
@var{path} is the path to the disk image.
The optional @var{host_key_check} parameter controls how the remote
host's key is checked. The default is @code{yes} which means to use
the local @file{.ssh/known_hosts} file. Setting this to @code{no}
turns off known-hosts checking. Or you can check that the host key
matches a specific fingerprint:
@code{host_key_check=md5:78:45:8e:14:57:4f:d5:45:83:0a:0e:f3:49:82:c9:c8}
(@code{sha1:} can also be used as a prefix, but note that OpenSSH
tools only use MD5 to print fingerprints).
Currently authentication must be done using ssh-agent. Other
authentication methods may be supported in future.
Note: Many ssh servers do not support an @code{fsync}-style operation.
The ssh driver cannot guarantee that disk flush requests are
obeyed, and this causes a risk of disk corruption if the remote
server or network goes down during writes. The driver will
print a warning when @code{fsync} is not supported:
warning: ssh server @code{ssh.example.com:22} does not support fsync
With sufficiently new versions of libssh2 and OpenSSH, @code{fsync} is
supported.
@c man end
@ignore
@setfilename qemu-block-drivers
@settitle QEMU block drivers reference
@c man begin SEEALSO
The HTML documentation of QEMU for more precise information and Linux
user mode emulator invocation.
@c man end
@c man begin AUTHOR
Fabrice Bellard and the QEMU Project developers
@c man end
@end ignore

330
exec.c
View File

@@ -187,21 +187,18 @@ typedef struct PhysPageMap {
} PhysPageMap;
struct AddressSpaceDispatch {
struct rcu_head rcu;
MemoryRegionSection *mru_section;
/* This is a multi-level map on the physical address space.
* The bottom level has pointers to MemoryRegionSections.
*/
PhysPageEntry phys_map;
PhysPageMap map;
AddressSpace *as;
};
#define SUBPAGE_IDX(addr) ((addr) & ~TARGET_PAGE_MASK)
typedef struct subpage_t {
MemoryRegion iomem;
AddressSpace *as;
FlatView *fv;
hwaddr base;
uint16_t sub_section[];
} subpage_t;
@@ -361,7 +358,7 @@ static void phys_page_compact(PhysPageEntry *lp, Node *nodes)
}
}
static void phys_page_compact_all(AddressSpaceDispatch *d, int nodes_nb)
void address_space_dispatch_compact(AddressSpaceDispatch *d)
{
if (d->phys_map.skip) {
phys_page_compact(&d->phys_map, d->map.nodes);
@@ -471,12 +468,13 @@ address_space_translate_internal(AddressSpaceDispatch *d, hwaddr addr, hwaddr *x
}
/* Called from RCU critical section */
static MemoryRegionSection address_space_do_translate(AddressSpace *as,
hwaddr addr,
hwaddr *xlat,
hwaddr *plen,
bool is_write,
bool is_mmio)
static MemoryRegionSection flatview_do_translate(FlatView *fv,
hwaddr addr,
hwaddr *xlat,
hwaddr *plen,
bool is_write,
bool is_mmio,
AddressSpace **target_as)
{
IOMMUTLBEntry iotlb;
MemoryRegionSection *section;
@@ -484,8 +482,9 @@ static MemoryRegionSection address_space_do_translate(AddressSpace *as,
IOMMUMemoryRegionClass *imrc;
for (;;) {
AddressSpaceDispatch *d = atomic_rcu_read(&as->dispatch);
section = address_space_translate_internal(d, addr, &addr, plen, is_mmio);
section = address_space_translate_internal(
flatview_to_dispatch(fv), addr, &addr,
plen, is_mmio);
iommu_mr = memory_region_get_iommu(section->mr);
if (!iommu_mr) {
@@ -502,7 +501,8 @@ static MemoryRegionSection address_space_do_translate(AddressSpace *as,
goto translate_fail;
}
as = iotlb.target_as;
fv = address_space_to_flatview(iotlb.target_as);
*target_as = iotlb.target_as;
}
*xlat = addr;
@@ -524,8 +524,8 @@ IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr,
plen = (hwaddr)-1;
/* This can never be MMIO. */
section = address_space_do_translate(as, addr, &xlat, &plen,
is_write, false);
section = flatview_do_translate(address_space_to_flatview(as), addr,
&xlat, &plen, is_write, false, &as);
/* Illegal translation */
if (section.mr == &io_mem_unassigned) {
@@ -548,7 +548,7 @@ IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr,
plen -= 1;
return (IOMMUTLBEntry) {
.target_as = section.address_space,
.target_as = as,
.iova = addr & ~plen,
.translated_addr = xlat & ~plen,
.addr_mask = plen,
@@ -561,15 +561,15 @@ iotlb_fail:
}
/* Called from RCU critical section */
MemoryRegion *address_space_translate(AddressSpace *as, hwaddr addr,
hwaddr *xlat, hwaddr *plen,
bool is_write)
MemoryRegion *flatview_translate(FlatView *fv, hwaddr addr, hwaddr *xlat,
hwaddr *plen, bool is_write)
{
MemoryRegion *mr;
MemoryRegionSection section;
AddressSpace *as = NULL;
/* This can be MMIO, so setup MMIO bit. */
section = address_space_do_translate(as, addr, xlat, plen, is_write, true);
section = flatview_do_translate(fv, addr, xlat, plen, is_write, true, &as);
mr = section.mr;
if (xen_enabled() && memory_access_is_direct(mr, is_write)) {
@@ -1219,7 +1219,7 @@ hwaddr memory_region_section_get_iotlb(CPUState *cpu,
} else {
AddressSpaceDispatch *d;
d = atomic_rcu_read(&section->address_space->dispatch);
d = flatview_to_dispatch(section->fv);
iotlb = section - d->map.sections;
iotlb += xlat;
}
@@ -1245,7 +1245,7 @@ hwaddr memory_region_section_get_iotlb(CPUState *cpu,
static int subpage_register (subpage_t *mmio, uint32_t start, uint32_t end,
uint16_t section);
static subpage_t *subpage_init(AddressSpace *as, hwaddr base);
static subpage_t *subpage_init(FlatView *fv, hwaddr base);
static void *(*phys_mem_alloc)(size_t size, uint64_t *align) =
qemu_anon_ram_alloc;
@@ -1302,8 +1302,9 @@ static void phys_sections_free(PhysPageMap *map)
g_free(map->nodes);
}
static void register_subpage(AddressSpaceDispatch *d, MemoryRegionSection *section)
static void register_subpage(FlatView *fv, MemoryRegionSection *section)
{
AddressSpaceDispatch *d = flatview_to_dispatch(fv);
subpage_t *subpage;
hwaddr base = section->offset_within_address_space
& TARGET_PAGE_MASK;
@@ -1317,8 +1318,8 @@ static void register_subpage(AddressSpaceDispatch *d, MemoryRegionSection *secti
assert(existing->mr->subpage || existing->mr == &io_mem_unassigned);
if (!(existing->mr->subpage)) {
subpage = subpage_init(d->as, base);
subsection.address_space = d->as;
subpage = subpage_init(fv, base);
subsection.fv = fv;
subsection.mr = &subpage->iomem;
phys_page_set(d, base >> TARGET_PAGE_BITS, 1,
phys_section_add(&d->map, &subsection));
@@ -1332,9 +1333,10 @@ static void register_subpage(AddressSpaceDispatch *d, MemoryRegionSection *secti
}
static void register_multipage(AddressSpaceDispatch *d,
static void register_multipage(FlatView *fv,
MemoryRegionSection *section)
{
AddressSpaceDispatch *d = flatview_to_dispatch(fv);
hwaddr start_addr = section->offset_within_address_space;
uint16_t section_index = phys_section_add(&d->map, section);
uint64_t num_pages = int128_get64(int128_rshift(section->size,
@@ -1344,10 +1346,8 @@ static void register_multipage(AddressSpaceDispatch *d,
phys_page_set(d, start_addr >> TARGET_PAGE_BITS, num_pages, section_index);
}
static void mem_add(MemoryListener *listener, MemoryRegionSection *section)
void flatview_add_to_dispatch(FlatView *fv, MemoryRegionSection *section)
{
AddressSpace *as = container_of(listener, AddressSpace, dispatch_listener);
AddressSpaceDispatch *d = as->next_dispatch;
MemoryRegionSection now = *section, remain = *section;
Int128 page_size = int128_make64(TARGET_PAGE_SIZE);
@@ -1356,7 +1356,7 @@ static void mem_add(MemoryListener *listener, MemoryRegionSection *section)
- now.offset_within_address_space;
now.size = int128_min(int128_make64(left), now.size);
register_subpage(d, &now);
register_subpage(fv, &now);
} else {
now.size = int128_zero();
}
@@ -1366,13 +1366,13 @@ static void mem_add(MemoryListener *listener, MemoryRegionSection *section)
remain.offset_within_region += int128_get64(now.size);
now = remain;
if (int128_lt(remain.size, page_size)) {
register_subpage(d, &now);
register_subpage(fv, &now);
} else if (remain.offset_within_address_space & ~TARGET_PAGE_MASK) {
now.size = page_size;
register_subpage(d, &now);
register_subpage(fv, &now);
} else {
now.size = int128_and(now.size, int128_neg(page_size));
register_multipage(d, &now);
register_multipage(fv, &now);
}
}
}
@@ -2500,6 +2500,11 @@ static const MemoryRegionOps watch_mem_ops = {
.endianness = DEVICE_NATIVE_ENDIAN,
};
static MemTxResult flatview_write(FlatView *fv, hwaddr addr, MemTxAttrs attrs,
const uint8_t *buf, int len);
static bool flatview_access_valid(FlatView *fv, hwaddr addr, int len,
bool is_write);
static MemTxResult subpage_read(void *opaque, hwaddr addr, uint64_t *data,
unsigned len, MemTxAttrs attrs)
{
@@ -2511,8 +2516,7 @@ static MemTxResult subpage_read(void *opaque, hwaddr addr, uint64_t *data,
printf("%s: subpage %p len %u addr " TARGET_FMT_plx "\n", __func__,
subpage, len, addr);
#endif
res = address_space_read(subpage->as, addr + subpage->base,
attrs, buf, len);
res = flatview_read(subpage->fv, addr + subpage->base, attrs, buf, len);
if (res) {
return res;
}
@@ -2561,8 +2565,7 @@ static MemTxResult subpage_write(void *opaque, hwaddr addr,
default:
abort();
}
return address_space_write(subpage->as, addr + subpage->base,
attrs, buf, len);
return flatview_write(subpage->fv, addr + subpage->base, attrs, buf, len);
}
static bool subpage_accepts(void *opaque, hwaddr addr,
@@ -2574,8 +2577,8 @@ static bool subpage_accepts(void *opaque, hwaddr addr,
__func__, subpage, is_write ? 'w' : 'r', len, addr);
#endif
return address_space_access_valid(subpage->as, addr + subpage->base,
len, is_write);
return flatview_access_valid(subpage->fv, addr + subpage->base,
len, is_write);
}
static const MemoryRegionOps subpage_ops = {
@@ -2609,12 +2612,12 @@ static int subpage_register (subpage_t *mmio, uint32_t start, uint32_t end,
return 0;
}
static subpage_t *subpage_init(AddressSpace *as, hwaddr base)
static subpage_t *subpage_init(FlatView *fv, hwaddr base)
{
subpage_t *mmio;
mmio = g_malloc0(sizeof(subpage_t) + TARGET_PAGE_SIZE * sizeof(uint16_t));
mmio->as = as;
mmio->fv = fv;
mmio->base = base;
memory_region_init_io(&mmio->iomem, NULL, &subpage_ops, mmio,
NULL, TARGET_PAGE_SIZE);
@@ -2628,12 +2631,11 @@ static subpage_t *subpage_init(AddressSpace *as, hwaddr base)
return mmio;
}
static uint16_t dummy_section(PhysPageMap *map, AddressSpace *as,
MemoryRegion *mr)
static uint16_t dummy_section(PhysPageMap *map, FlatView *fv, MemoryRegion *mr)
{
assert(as);
assert(fv);
MemoryRegionSection section = {
.address_space = as,
.fv = fv,
.mr = mr,
.offset_within_address_space = 0,
.offset_within_region = 0,
@@ -2670,46 +2672,31 @@ static void io_mem_init(void)
NULL, UINT64_MAX);
}
static void mem_begin(MemoryListener *listener)
AddressSpaceDispatch *address_space_dispatch_new(FlatView *fv)
{
AddressSpace *as = container_of(listener, AddressSpace, dispatch_listener);
AddressSpaceDispatch *d = g_new0(AddressSpaceDispatch, 1);
uint16_t n;
n = dummy_section(&d->map, as, &io_mem_unassigned);
n = dummy_section(&d->map, fv, &io_mem_unassigned);
assert(n == PHYS_SECTION_UNASSIGNED);
n = dummy_section(&d->map, as, &io_mem_notdirty);
n = dummy_section(&d->map, fv, &io_mem_notdirty);
assert(n == PHYS_SECTION_NOTDIRTY);
n = dummy_section(&d->map, as, &io_mem_rom);
n = dummy_section(&d->map, fv, &io_mem_rom);
assert(n == PHYS_SECTION_ROM);
n = dummy_section(&d->map, as, &io_mem_watch);
n = dummy_section(&d->map, fv, &io_mem_watch);
assert(n == PHYS_SECTION_WATCH);
d->phys_map = (PhysPageEntry) { .ptr = PHYS_MAP_NODE_NIL, .skip = 1 };
d->as = as;
as->next_dispatch = d;
return d;
}
static void address_space_dispatch_free(AddressSpaceDispatch *d)
void address_space_dispatch_free(AddressSpaceDispatch *d)
{
phys_sections_free(&d->map);
g_free(d);
}
static void mem_commit(MemoryListener *listener)
{
AddressSpace *as = container_of(listener, AddressSpace, dispatch_listener);
AddressSpaceDispatch *cur = as->dispatch;
AddressSpaceDispatch *next = as->next_dispatch;
phys_page_compact_all(next, next->map.nodes_nb);
atomic_rcu_set(&as->dispatch, next);
if (cur) {
call_rcu(cur, address_space_dispatch_free, rcu);
}
}
static void tcg_commit(MemoryListener *listener)
{
CPUAddressSpace *cpuas;
@@ -2723,39 +2710,11 @@ static void tcg_commit(MemoryListener *listener)
* We reload the dispatch pointer now because cpu_reloading_memory_map()
* may have split the RCU critical section.
*/
d = atomic_rcu_read(&cpuas->as->dispatch);
d = address_space_to_dispatch(cpuas->as);
atomic_rcu_set(&cpuas->memory_dispatch, d);
tlb_flush(cpuas->cpu);
}
void address_space_init_dispatch(AddressSpace *as)
{
as->dispatch = NULL;
as->dispatch_listener = (MemoryListener) {
.begin = mem_begin,
.commit = mem_commit,
.region_add = mem_add,
.region_nop = mem_add,
.priority = 0,
};
memory_listener_register(&as->dispatch_listener, as);
}
void address_space_unregister(AddressSpace *as)
{
memory_listener_unregister(&as->dispatch_listener);
}
void address_space_destroy_dispatch(AddressSpace *as)
{
AddressSpaceDispatch *d = as->dispatch;
atomic_rcu_set(&as->dispatch, NULL);
if (d) {
call_rcu(d, address_space_dispatch_free, rcu);
}
}
static void memory_map_init(void)
{
system_memory = g_malloc(sizeof(*system_memory));
@@ -2899,11 +2858,11 @@ static bool prepare_mmio_access(MemoryRegion *mr)
}
/* Called within RCU critical section. */
static MemTxResult address_space_write_continue(AddressSpace *as, hwaddr addr,
MemTxAttrs attrs,
const uint8_t *buf,
int len, hwaddr addr1,
hwaddr l, MemoryRegion *mr)
static MemTxResult flatview_write_continue(FlatView *fv, hwaddr addr,
MemTxAttrs attrs,
const uint8_t *buf,
int len, hwaddr addr1,
hwaddr l, MemoryRegion *mr)
{
uint8_t *ptr;
uint64_t val;
@@ -2965,14 +2924,14 @@ static MemTxResult address_space_write_continue(AddressSpace *as, hwaddr addr,
}
l = len;
mr = address_space_translate(as, addr, &addr1, &l, true);
mr = flatview_translate(fv, addr, &addr1, &l, true);
}
return result;
}
MemTxResult address_space_write(AddressSpace *as, hwaddr addr, MemTxAttrs attrs,
const uint8_t *buf, int len)
static MemTxResult flatview_write(FlatView *fv, hwaddr addr, MemTxAttrs attrs,
const uint8_t *buf, int len)
{
hwaddr l;
hwaddr addr1;
@@ -2982,20 +2941,27 @@ MemTxResult address_space_write(AddressSpace *as, hwaddr addr, MemTxAttrs attrs,
if (len > 0) {
rcu_read_lock();
l = len;
mr = address_space_translate(as, addr, &addr1, &l, true);
result = address_space_write_continue(as, addr, attrs, buf, len,
addr1, l, mr);
mr = flatview_translate(fv, addr, &addr1, &l, true);
result = flatview_write_continue(fv, addr, attrs, buf, len,
addr1, l, mr);
rcu_read_unlock();
}
return result;
}
MemTxResult address_space_write(AddressSpace *as, hwaddr addr,
MemTxAttrs attrs,
const uint8_t *buf, int len)
{
return flatview_write(address_space_to_flatview(as), addr, attrs, buf, len);
}
/* Called within RCU critical section. */
MemTxResult address_space_read_continue(AddressSpace *as, hwaddr addr,
MemTxAttrs attrs, uint8_t *buf,
int len, hwaddr addr1, hwaddr l,
MemoryRegion *mr)
MemTxResult flatview_read_continue(FlatView *fv, hwaddr addr,
MemTxAttrs attrs, uint8_t *buf,
int len, hwaddr addr1, hwaddr l,
MemoryRegion *mr)
{
uint8_t *ptr;
uint64_t val;
@@ -3055,14 +3021,14 @@ MemTxResult address_space_read_continue(AddressSpace *as, hwaddr addr,
}
l = len;
mr = address_space_translate(as, addr, &addr1, &l, false);
mr = flatview_translate(fv, addr, &addr1, &l, false);
}
return result;
}
MemTxResult address_space_read_full(AddressSpace *as, hwaddr addr,
MemTxAttrs attrs, uint8_t *buf, int len)
MemTxResult flatview_read_full(FlatView *fv, hwaddr addr,
MemTxAttrs attrs, uint8_t *buf, int len)
{
hwaddr l;
hwaddr addr1;
@@ -3072,25 +3038,33 @@ MemTxResult address_space_read_full(AddressSpace *as, hwaddr addr,
if (len > 0) {
rcu_read_lock();
l = len;
mr = address_space_translate(as, addr, &addr1, &l, false);
result = address_space_read_continue(as, addr, attrs, buf, len,
addr1, l, mr);
mr = flatview_translate(fv, addr, &addr1, &l, false);
result = flatview_read_continue(fv, addr, attrs, buf, len,
addr1, l, mr);
rcu_read_unlock();
}
return result;
}
MemTxResult address_space_rw(AddressSpace *as, hwaddr addr, MemTxAttrs attrs,
uint8_t *buf, int len, bool is_write)
static MemTxResult flatview_rw(FlatView *fv, hwaddr addr, MemTxAttrs attrs,
uint8_t *buf, int len, bool is_write)
{
if (is_write) {
return address_space_write(as, addr, attrs, (uint8_t *)buf, len);
return flatview_write(fv, addr, attrs, (uint8_t *)buf, len);
} else {
return address_space_read(as, addr, attrs, (uint8_t *)buf, len);
return flatview_read(fv, addr, attrs, (uint8_t *)buf, len);
}
}
MemTxResult address_space_rw(AddressSpace *as, hwaddr addr,
MemTxAttrs attrs, uint8_t *buf,
int len, bool is_write)
{
return flatview_rw(address_space_to_flatview(as),
addr, attrs, buf, len, is_write);
}
void cpu_physical_memory_rw(hwaddr addr, uint8_t *buf,
int len, int is_write)
{
@@ -3248,7 +3222,8 @@ static void cpu_notify_map_clients(void)
qemu_mutex_unlock(&map_client_list_lock);
}
bool address_space_access_valid(AddressSpace *as, hwaddr addr, int len, bool is_write)
static bool flatview_access_valid(FlatView *fv, hwaddr addr, int len,
bool is_write)
{
MemoryRegion *mr;
hwaddr l, xlat;
@@ -3256,7 +3231,7 @@ bool address_space_access_valid(AddressSpace *as, hwaddr addr, int len, bool is_
rcu_read_lock();
while (len > 0) {
l = len;
mr = address_space_translate(as, addr, &xlat, &l, is_write);
mr = flatview_translate(fv, addr, &xlat, &l, is_write);
if (!memory_access_is_direct(mr, is_write)) {
l = memory_access_size(mr, l, addr);
if (!memory_region_access_valid(mr, xlat, l, is_write)) {
@@ -3272,8 +3247,16 @@ bool address_space_access_valid(AddressSpace *as, hwaddr addr, int len, bool is_
return true;
}
bool address_space_access_valid(AddressSpace *as, hwaddr addr,
int len, bool is_write)
{
return flatview_access_valid(address_space_to_flatview(as),
addr, len, is_write);
}
static hwaddr
address_space_extend_translation(AddressSpace *as, hwaddr addr, hwaddr target_len,
flatview_extend_translation(FlatView *fv, hwaddr addr,
hwaddr target_len,
MemoryRegion *mr, hwaddr base, hwaddr len,
bool is_write)
{
@@ -3290,7 +3273,8 @@ address_space_extend_translation(AddressSpace *as, hwaddr addr, hwaddr target_le
}
len = target_len;
this_mr = address_space_translate(as, addr, &xlat, &len, is_write);
this_mr = flatview_translate(fv, addr, &xlat,
&len, is_write);
if (this_mr != mr || xlat != base + done) {
return done;
}
@@ -3313,6 +3297,7 @@ void *address_space_map(AddressSpace *as,
hwaddr l, xlat;
MemoryRegion *mr;
void *ptr;
FlatView *fv = address_space_to_flatview(as);
if (len == 0) {
return NULL;
@@ -3320,7 +3305,7 @@ void *address_space_map(AddressSpace *as,
l = len;
rcu_read_lock();
mr = address_space_translate(as, addr, &xlat, &l, is_write);
mr = flatview_translate(fv, addr, &xlat, &l, is_write);
if (!memory_access_is_direct(mr, is_write)) {
if (atomic_xchg(&bounce.in_use, true)) {
@@ -3336,7 +3321,7 @@ void *address_space_map(AddressSpace *as,
memory_region_ref(mr);
bounce.mr = mr;
if (!is_write) {
address_space_read(as, addr, MEMTXATTRS_UNSPECIFIED,
flatview_read(fv, addr, MEMTXATTRS_UNSPECIFIED,
bounce.buffer, l);
}
@@ -3347,7 +3332,8 @@ void *address_space_map(AddressSpace *as,
memory_region_ref(mr);
*plen = address_space_extend_translation(as, addr, len, mr, xlat, l, is_write);
*plen = flatview_extend_translation(fv, addr, len, mr, xlat,
l, is_write);
ptr = qemu_ram_ptr_length(mr->ram_block, xlat, plen, true);
rcu_read_unlock();
@@ -3630,3 +3616,87 @@ void page_size_init(void)
}
qemu_host_page_mask = -(intptr_t)qemu_host_page_size;
}
#if !defined(CONFIG_USER_ONLY)
static void mtree_print_phys_entries(fprintf_function mon, void *f,
int start, int end, int skip, int ptr)
{
if (start == end - 1) {
mon(f, "\t%3d ", start);
} else {
mon(f, "\t%3d..%-3d ", start, end - 1);
}
mon(f, " skip=%d ", skip);
if (ptr == PHYS_MAP_NODE_NIL) {
mon(f, " ptr=NIL");
} else if (!skip) {
mon(f, " ptr=#%d", ptr);
} else {
mon(f, " ptr=[%d]", ptr);
}
mon(f, "\n");
}
#define MR_SIZE(size) (int128_nz(size) ? (hwaddr)int128_get64( \
int128_sub((size), int128_one())) : 0)
void mtree_print_dispatch(fprintf_function mon, void *f,
AddressSpaceDispatch *d, MemoryRegion *root)
{
int i;
mon(f, " Dispatch\n");
mon(f, " Physical sections\n");
for (i = 0; i < d->map.sections_nb; ++i) {
MemoryRegionSection *s = d->map.sections + i;
const char *names[] = { " [unassigned]", " [not dirty]",
" [ROM]", " [watch]" };
mon(f, " #%d @" TARGET_FMT_plx ".." TARGET_FMT_plx " %s%s%s%s%s",
i,
s->offset_within_address_space,
s->offset_within_address_space + MR_SIZE(s->mr->size),
s->mr->name ? s->mr->name : "(noname)",
i < ARRAY_SIZE(names) ? names[i] : "",
s->mr == root ? " [ROOT]" : "",
s == d->mru_section ? " [MRU]" : "",
s->mr->is_iommu ? " [iommu]" : "");
if (s->mr->alias) {
mon(f, " alias=%s", s->mr->alias->name ?
s->mr->alias->name : "noname");
}
mon(f, "\n");
}
mon(f, " Nodes (%d bits per level, %d levels) ptr=[%d] skip=%d\n",
P_L2_BITS, P_L2_LEVELS, d->phys_map.ptr, d->phys_map.skip);
for (i = 0; i < d->map.nodes_nb; ++i) {
int j, jprev;
PhysPageEntry prev;
Node *n = d->map.nodes + i;
mon(f, " [%d]\n", i);
for (j = 0, jprev = 0, prev = *n[0]; j < ARRAY_SIZE(*n); ++j) {
PhysPageEntry *pe = *n + j;
if (pe->ptr == prev.ptr && pe->skip == prev.skip) {
continue;
}
mtree_print_phys_entries(mon, f, jprev, j, prev.skip, prev.ptr);
jprev = j;
prev = *pe;
}
if (jprev != ARRAY_SIZE(*n)) {
mtree_print_phys_entries(mon, f, jprev, j, prev.skip, prev.ptr);
}
}
}
#endif

View File

@@ -23,7 +23,7 @@ ETEXI
STEXI
@item info version
@findex version
@findex info version
Show the version of QEMU.
ETEXI
@@ -37,7 +37,7 @@ ETEXI
STEXI
@item info network
@findex network
@findex info network
Show the network state.
ETEXI
@@ -51,7 +51,7 @@ ETEXI
STEXI
@item info chardev
@findex chardev
@findex info chardev
Show the character devices.
ETEXI
@@ -66,7 +66,7 @@ ETEXI
STEXI
@item info block
@findex block
@findex info block
Show info of one block device or all block devices.
ETEXI
@@ -80,7 +80,7 @@ ETEXI
STEXI
@item info blockstats
@findex blockstats
@findex info blockstats
Show block device statistics.
ETEXI
@@ -94,7 +94,7 @@ ETEXI
STEXI
@item info block-jobs
@findex block-jobs
@findex info block-jobs
Show progress of ongoing block device operations.
ETEXI
@@ -108,7 +108,7 @@ ETEXI
STEXI
@item info registers
@findex registers
@findex info registers
Show the cpu registers.
ETEXI
@@ -125,7 +125,7 @@ ETEXI
STEXI
@item info lapic
@findex lapic
@findex info lapic
Show local APIC state
ETEXI
@@ -141,7 +141,7 @@ ETEXI
STEXI
@item info ioapic
@findex ioapic
@findex info ioapic
Show io APIC state
ETEXI
@@ -155,7 +155,7 @@ ETEXI
STEXI
@item info cpus
@findex cpus
@findex info cpus
Show infos for each CPU.
ETEXI
@@ -169,7 +169,7 @@ ETEXI
STEXI
@item info history
@findex history
@findex info history
Show the command line history.
ETEXI
@@ -183,7 +183,7 @@ ETEXI
STEXI
@item info irq
@findex irq
@findex info irq
Show the interrupts statistics (if available).
ETEXI
@@ -197,7 +197,7 @@ ETEXI
STEXI
@item info pic
@findex pic
@findex info pic
Show i8259 (PIC) state.
ETEXI
@@ -211,7 +211,7 @@ ETEXI
STEXI
@item info pci
@findex pci
@findex info pci
Show PCI information.
ETEXI
@@ -228,7 +228,7 @@ ETEXI
STEXI
@item info tlb
@findex tlb
@findex info tlb
Show virtual to physical memory mappings.
ETEXI
@@ -244,21 +244,22 @@ ETEXI
STEXI
@item info mem
@findex mem
@findex info mem
Show the active virtual memory mappings.
ETEXI
{
.name = "mtree",
.args_type = "flatview:-f",
.params = "[-f]",
.help = "show memory tree (-f: dump flat view for address spaces)",
.args_type = "flatview:-f,dispatch_tree:-d",
.params = "[-f][-d]",
.help = "show memory tree (-f: dump flat view for address spaces;"
"-d: dump dispatch tree, valid with -f only)",
.cmd = hmp_info_mtree,
},
STEXI
@item info mtree
@findex mtree
@findex info mtree
Show memory tree.
ETEXI
@@ -274,7 +275,7 @@ ETEXI
STEXI
@item info jit
@findex jit
@findex info jit
Show dynamic compiler info.
ETEXI
@@ -290,7 +291,7 @@ ETEXI
STEXI
@item info opcount
@findex opcount
@findex info opcount
Show dynamic compiler opcode counters
ETEXI
@@ -304,7 +305,7 @@ ETEXI
STEXI
@item info kvm
@findex kvm
@findex info kvm
Show KVM information.
ETEXI
@@ -318,7 +319,7 @@ ETEXI
STEXI
@item info numa
@findex numa
@findex info numa
Show NUMA information.
ETEXI
@@ -332,7 +333,7 @@ ETEXI
STEXI
@item info usb
@findex usb
@findex info usb
Show guest USB devices.
ETEXI
@@ -346,7 +347,7 @@ ETEXI
STEXI
@item info usbhost
@findex usbhost
@findex info usbhost
Show host USB devices.
ETEXI
@@ -360,7 +361,7 @@ ETEXI
STEXI
@item info profile
@findex profile
@findex info profile
Show profiling information.
ETEXI
@@ -374,7 +375,7 @@ ETEXI
STEXI
@item info capture
@findex capture
@findex info capture
Show capture information.
ETEXI
@@ -388,7 +389,7 @@ ETEXI
STEXI
@item info snapshots
@findex snapshots
@findex info snapshots
Show the currently saved VM snapshots.
ETEXI
@@ -402,7 +403,7 @@ ETEXI
STEXI
@item info status
@findex status
@findex info status
Show the current VM status (running|paused).
ETEXI
@@ -416,7 +417,7 @@ ETEXI
STEXI
@item info mice
@findex mice
@findex info mice
Show which guest mouse is receiving events.
ETEXI
@@ -430,7 +431,7 @@ ETEXI
STEXI
@item info vnc
@findex vnc
@findex info vnc
Show the vnc server status.
ETEXI
@@ -446,7 +447,7 @@ ETEXI
STEXI
@item info spice
@findex spice
@findex info spice
Show the spice server status.
ETEXI
@@ -460,7 +461,7 @@ ETEXI
STEXI
@item info name
@findex name
@findex info name
Show the current VM name.
ETEXI
@@ -474,7 +475,7 @@ ETEXI
STEXI
@item info uuid
@findex uuid
@findex info uuid
Show the current VM UUID.
ETEXI
@@ -488,7 +489,7 @@ ETEXI
STEXI
@item info cpustats
@findex cpustats
@findex info cpustats
Show CPU statistics.
ETEXI
@@ -504,7 +505,7 @@ ETEXI
STEXI
@item info usernet
@findex usernet
@findex info usernet
Show user network stack connection states.
ETEXI
@@ -518,7 +519,7 @@ ETEXI
STEXI
@item info migrate
@findex migrate
@findex info migrate
Show migration status.
ETEXI
@@ -532,7 +533,7 @@ ETEXI
STEXI
@item info migrate_capabilities
@findex migrate_capabilities
@findex info migrate_capabilities
Show current migration capabilities.
ETEXI
@@ -546,7 +547,7 @@ ETEXI
STEXI
@item info migrate_parameters
@findex migrate_parameters
@findex info migrate_parameters
Show current migration parameters.
ETEXI
@@ -560,7 +561,7 @@ ETEXI
STEXI
@item info migrate_cache_size
@findex migrate_cache_size
@findex info migrate_cache_size
Show current migration xbzrle cache size.
ETEXI
@@ -574,7 +575,7 @@ ETEXI
STEXI
@item info balloon
@findex balloon
@findex info balloon
Show balloon information.
ETEXI
@@ -588,7 +589,7 @@ ETEXI
STEXI
@item info qtree
@findex qtree
@findex info qtree
Show device tree.
ETEXI
@@ -602,7 +603,7 @@ ETEXI
STEXI
@item info qdm
@findex qdm
@findex info qdm
Show qdev device model list.
ETEXI
@@ -616,7 +617,7 @@ ETEXI
STEXI
@item info qom-tree
@findex qom-tree
@findex info qom-tree
Show QOM composition tree.
ETEXI
@@ -630,7 +631,7 @@ ETEXI
STEXI
@item info roms
@findex roms
@findex info roms
Show roms.
ETEXI
@@ -646,7 +647,7 @@ ETEXI
STEXI
@item info trace-events
@findex trace-events
@findex info trace-events
Show available trace-events & their state.
ETEXI
@@ -660,7 +661,7 @@ ETEXI
STEXI
@item info tpm
@findex tpm
@findex info tpm
Show the TPM device.
ETEXI
@@ -674,7 +675,7 @@ ETEXI
STEXI
@item info memdev
@findex memdev
@findex info memdev
Show memory backends
ETEXI
@@ -688,7 +689,7 @@ ETEXI
STEXI
@item info memory-devices
@findex memory-devices
@findex info memory-devices
Show memory devices.
ETEXI
@@ -702,7 +703,7 @@ ETEXI
STEXI
@item info iothreads
@findex iothreads
@findex info iothreads
Show iothread's identifiers.
ETEXI
@@ -716,7 +717,7 @@ ETEXI
STEXI
@item info rocker @var{name}
@findex rocker
@findex info rocker
Show rocker switch.
ETEXI
@@ -729,8 +730,8 @@ ETEXI
},
STEXI
@item info rocker_ports @var{name}-ports
@findex ocker-ports
@item info rocker-ports @var{name}-ports
@findex info rocker-ports
Show rocker ports.
ETEXI
@@ -743,8 +744,8 @@ ETEXI
},
STEXI
@item info rocker_of_dpa_flows @var{name} [@var{tbl_id}]
@findex rocker-of-dpa-flows
@item info rocker-of-dpa-flows @var{name} [@var{tbl_id}]
@findex info rocker-of-dpa-flows
Show rocker OF-DPA flow tables.
ETEXI
@@ -758,7 +759,7 @@ ETEXI
STEXI
@item info rocker-of-dpa-groups @var{name} [@var{type}]
@findex rocker-of-dpa-groups
@findex info rocker-of-dpa-groups
Show rocker OF-DPA groups.
ETEXI
@@ -774,7 +775,7 @@ ETEXI
STEXI
@item info skeys @var{address}
@findex skeys
@findex info skeys
Display the value of a storage key (s390 only)
ETEXI
@@ -790,7 +791,7 @@ ETEXI
STEXI
@item info cmma @var{address}
@findex cmma
@findex info cmma
Display the values of the CMMA storage attributes for a range of pages (s390 only)
ETEXI
@@ -804,7 +805,7 @@ ETEXI
STEXI
@item info dump
@findex dump
@findex info dump
Display the latest dump status.
ETEXI
@@ -818,7 +819,7 @@ ETEXI
STEXI
@item info ramblock
@findex ramblock
@findex info ramblock
Dump all the ramblocks of the system.
ETEXI
@@ -832,14 +833,8 @@ ETEXI
STEXI
@item info hotpluggable-cpus
@findex hotpluggable-cpus
@findex info hotpluggable-cpus
Show information about hotpluggable CPUs
ETEXI
STEXI
@item info vm-generation-id
@findex vm-generation-id
Show Virtual Machine Generation ID
ETEXI
{
@@ -851,10 +846,9 @@ ETEXI
},
STEXI
@item info memory_size_summary
@findex memory_size_summary
Display the amount of initially allocated and present hotpluggable (if
enabled) memory in bytes.
@item info vm-generation-id
@findex info vm-generation-id
Show Virtual Machine Generation ID
ETEXI
{
@@ -866,6 +860,13 @@ ETEXI
.cmd = hmp_info_memory_size_summary,
},
STEXI
@item info memory_size_summary
@findex info memory_size_summary
Display the amount of initially allocated and present hotpluggable (if
enabled) memory in bytes.
ETEXI
STEXI
@end table
ETEXI

16
hmp.c
View File

@@ -336,6 +336,12 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict)
monitor_printf(mon, "%s: %s\n",
MigrationParameter_str(MIGRATION_PARAMETER_BLOCK_INCREMENTAL),
params->block_incremental ? "on" : "off");
monitor_printf(mon, "%s: %" PRId64 "\n",
MigrationParameter_str(MIGRATION_PARAMETER_X_MULTIFD_CHANNELS),
params->x_multifd_channels);
monitor_printf(mon, "%s: %" PRId64 "\n",
MigrationParameter_str(MIGRATION_PARAMETER_X_MULTIFD_PAGE_COUNT),
params->x_multifd_page_count);
}
qapi_free_MigrationParameters(params);
@@ -1621,6 +1627,14 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
p->has_block_incremental = true;
visit_type_bool(v, param, &p->block_incremental, &err);
break;
case MIGRATION_PARAMETER_X_MULTIFD_CHANNELS:
p->has_x_multifd_channels = true;
visit_type_int(v, param, &p->x_multifd_channels, &err);
break;
case MIGRATION_PARAMETER_X_MULTIFD_PAGE_COUNT:
p->has_x_multifd_page_count = true;
visit_type_int(v, param, &p->x_multifd_page_count, &err);
break;
default:
assert(0);
}
@@ -2380,6 +2394,7 @@ void hmp_info_memdev(Monitor *mon, const QDict *qdict)
monitor_printf(mon, "\n");
qapi_free_MemdevList(memdev_list);
hmp_handle_error(mon, &err);
}
void hmp_info_memory_devices(Monitor *mon, const QDict *qdict)
@@ -2418,6 +2433,7 @@ void hmp_info_memory_devices(Monitor *mon, const QDict *qdict)
}
qapi_free_MemoryDeviceInfoList(info_list);
hmp_handle_error(mon, &err);
}
void hmp_info_iothreads(Monitor *mon, const QDict *qdict)

View File

@@ -19,3 +19,4 @@ obj-$(CONFIG_FSL_IMX31) += fsl-imx31.o kzm.o
obj-$(CONFIG_FSL_IMX6) += fsl-imx6.o sabrelite.o
obj-$(CONFIG_ASPEED_SOC) += aspeed_soc.o aspeed.o
obj-$(CONFIG_MPS2) += mps2.o
obj-$(CONFIG_MSF2) += msf2-soc.o msf2-som.o

View File

@@ -41,7 +41,7 @@ static MemTxResult bitband_read(void *opaque, hwaddr offset,
/* Find address in underlying memory and round down to multiple of size */
addr = bitband_addr(s, offset) & (-size);
res = address_space_read(s->source_as, addr, attrs, buf, size);
res = address_space_read(&s->source_as, addr, attrs, buf, size);
if (res) {
return res;
}
@@ -66,7 +66,7 @@ static MemTxResult bitband_write(void *opaque, hwaddr offset, uint64_t value,
/* Find address in underlying memory and round down to multiple of size */
addr = bitband_addr(s, offset) & (-size);
res = address_space_read(s->source_as, addr, attrs, buf, size);
res = address_space_read(&s->source_as, addr, attrs, buf, size);
if (res) {
return res;
}
@@ -79,7 +79,7 @@ static MemTxResult bitband_write(void *opaque, hwaddr offset, uint64_t value,
} else {
buf[bitpos >> 3] &= ~bit;
}
return address_space_write(s->source_as, addr, attrs, buf, size);
return address_space_write(&s->source_as, addr, attrs, buf, size);
}
static const MemoryRegionOps bitband_ops = {
@@ -111,8 +111,7 @@ static void bitband_realize(DeviceState *dev, Error **errp)
return;
}
s->source_as = address_space_init_shareable(s->source_memory,
"bitband-source");
address_space_init(&s->source_as, s->source_memory, "bitband-source");
}
/* Board init. */

238
hw/arm/msf2-soc.c Normal file
View File

@@ -0,0 +1,238 @@
/*
* SmartFusion2 SoC emulation.
*
* Copyright (c) 2017 Subbaraya Sundeep <sundeep.lkml@gmail.com>
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*/
#include "qemu/osdep.h"
#include "qapi/error.h"
#include "qemu-common.h"
#include "hw/arm/arm.h"
#include "exec/address-spaces.h"
#include "hw/char/serial.h"
#include "hw/boards.h"
#include "sysemu/block-backend.h"
#include "qemu/cutils.h"
#include "hw/arm/msf2-soc.h"
#include "hw/misc/unimp.h"
#define MSF2_TIMER_BASE 0x40004000
#define MSF2_SYSREG_BASE 0x40038000
#define ENVM_BASE_ADDRESS 0x60000000
#define SRAM_BASE_ADDRESS 0x20000000
#define MSF2_ENVM_MAX_SIZE (512 * K_BYTE)
/*
* eSRAM max size is 80k without SECDED(Single error correction and
* dual error detection) feature and 64k with SECDED.
* We do not support SECDED now.
*/
#define MSF2_ESRAM_MAX_SIZE (80 * K_BYTE)
static const uint32_t spi_addr[MSF2_NUM_SPIS] = { 0x40001000 , 0x40011000 };
static const uint32_t uart_addr[MSF2_NUM_UARTS] = { 0x40000000 , 0x40010000 };
static const int spi_irq[MSF2_NUM_SPIS] = { 2, 3 };
static const int uart_irq[MSF2_NUM_UARTS] = { 10, 11 };
static const int timer_irq[MSF2_NUM_TIMERS] = { 14, 15 };
static void m2sxxx_soc_initfn(Object *obj)
{
MSF2State *s = MSF2_SOC(obj);
int i;
object_initialize(&s->armv7m, sizeof(s->armv7m), TYPE_ARMV7M);
qdev_set_parent_bus(DEVICE(&s->armv7m), sysbus_get_default());
object_initialize(&s->sysreg, sizeof(s->sysreg), TYPE_MSF2_SYSREG);
qdev_set_parent_bus(DEVICE(&s->sysreg), sysbus_get_default());
object_initialize(&s->timer, sizeof(s->timer), TYPE_MSS_TIMER);
qdev_set_parent_bus(DEVICE(&s->timer), sysbus_get_default());
for (i = 0; i < MSF2_NUM_SPIS; i++) {
object_initialize(&s->spi[i], sizeof(s->spi[i]),
TYPE_MSS_SPI);
qdev_set_parent_bus(DEVICE(&s->spi[i]), sysbus_get_default());
}
}
static void m2sxxx_soc_realize(DeviceState *dev_soc, Error **errp)
{
MSF2State *s = MSF2_SOC(dev_soc);
DeviceState *dev, *armv7m;
SysBusDevice *busdev;
Error *err = NULL;
int i;
MemoryRegion *system_memory = get_system_memory();
MemoryRegion *nvm = g_new(MemoryRegion, 1);
MemoryRegion *nvm_alias = g_new(MemoryRegion, 1);
MemoryRegion *sram = g_new(MemoryRegion, 1);
memory_region_init_rom(nvm, NULL, "MSF2.eNVM", s->envm_size,
&error_fatal);
/*
* On power-on, the eNVM region 0x60000000 is automatically
* remapped to the Cortex-M3 processor executable region
* start address (0x0). We do not support remapping other eNVM,
* eSRAM and DDR regions by guest(via Sysreg) currently.
*/
memory_region_init_alias(nvm_alias, NULL, "MSF2.eNVM",
nvm, 0, s->envm_size);
memory_region_add_subregion(system_memory, ENVM_BASE_ADDRESS, nvm);
memory_region_add_subregion(system_memory, 0, nvm_alias);
memory_region_init_ram(sram, NULL, "MSF2.eSRAM", s->esram_size,
&error_fatal);
memory_region_add_subregion(system_memory, SRAM_BASE_ADDRESS, sram);
armv7m = DEVICE(&s->armv7m);
qdev_prop_set_uint32(armv7m, "num-irq", 81);
qdev_prop_set_string(armv7m, "cpu-type", s->cpu_type);
object_property_set_link(OBJECT(&s->armv7m), OBJECT(get_system_memory()),
"memory", &error_abort);
object_property_set_bool(OBJECT(&s->armv7m), true, "realized", &err);
if (err != NULL) {
error_propagate(errp, err);
return;
}
if (!s->m3clk) {
error_setg(errp, "Invalid m3clk value");
error_append_hint(errp, "m3clk can not be zero\n");
return;
}
system_clock_scale = NANOSECONDS_PER_SECOND / s->m3clk;
for (i = 0; i < MSF2_NUM_UARTS; i++) {
if (serial_hds[i]) {
serial_mm_init(get_system_memory(), uart_addr[i], 2,
qdev_get_gpio_in(armv7m, uart_irq[i]),
115200, serial_hds[i], DEVICE_NATIVE_ENDIAN);
}
}
dev = DEVICE(&s->timer);
/* APB0 clock is the timer input clock */
qdev_prop_set_uint32(dev, "clock-frequency", s->m3clk / s->apb0div);
object_property_set_bool(OBJECT(&s->timer), true, "realized", &err);
if (err != NULL) {
error_propagate(errp, err);
return;
}
busdev = SYS_BUS_DEVICE(dev);
sysbus_mmio_map(busdev, 0, MSF2_TIMER_BASE);
sysbus_connect_irq(busdev, 0,
qdev_get_gpio_in(armv7m, timer_irq[0]));
sysbus_connect_irq(busdev, 1,
qdev_get_gpio_in(armv7m, timer_irq[1]));
dev = DEVICE(&s->sysreg);
qdev_prop_set_uint32(dev, "apb0divisor", s->apb0div);
qdev_prop_set_uint32(dev, "apb1divisor", s->apb1div);
object_property_set_bool(OBJECT(&s->sysreg), true, "realized", &err);
if (err != NULL) {
error_propagate(errp, err);
return;
}
busdev = SYS_BUS_DEVICE(dev);
sysbus_mmio_map(busdev, 0, MSF2_SYSREG_BASE);
for (i = 0; i < MSF2_NUM_SPIS; i++) {
gchar *bus_name;
object_property_set_bool(OBJECT(&s->spi[i]), true, "realized", &err);
if (err != NULL) {
error_propagate(errp, err);
return;
}
sysbus_mmio_map(SYS_BUS_DEVICE(&s->spi[i]), 0, spi_addr[i]);
sysbus_connect_irq(SYS_BUS_DEVICE(&s->spi[i]), 0,
qdev_get_gpio_in(armv7m, spi_irq[i]));
/* Alias controller SPI bus to the SoC itself */
bus_name = g_strdup_printf("spi%d", i);
object_property_add_alias(OBJECT(s), bus_name,
OBJECT(&s->spi[i]), "spi",
&error_abort);
g_free(bus_name);
}
/* Below devices are not modelled yet. */
create_unimplemented_device("i2c_0", 0x40002000, 0x1000);
create_unimplemented_device("dma", 0x40003000, 0x1000);
create_unimplemented_device("watchdog", 0x40005000, 0x1000);
create_unimplemented_device("i2c_1", 0x40012000, 0x1000);
create_unimplemented_device("gpio", 0x40013000, 0x1000);
create_unimplemented_device("hs-dma", 0x40014000, 0x1000);
create_unimplemented_device("can", 0x40015000, 0x1000);
create_unimplemented_device("rtc", 0x40017000, 0x1000);
create_unimplemented_device("apb_config", 0x40020000, 0x10000);
create_unimplemented_device("emac", 0x40041000, 0x1000);
create_unimplemented_device("usb", 0x40043000, 0x1000);
}
static Property m2sxxx_soc_properties[] = {
/*
* part name specifies the type of SmartFusion2 device variant(this
* property is for information purpose only.
*/
DEFINE_PROP_STRING("cpu-type", MSF2State, cpu_type),
DEFINE_PROP_STRING("part-name", MSF2State, part_name),
DEFINE_PROP_UINT64("eNVM-size", MSF2State, envm_size, MSF2_ENVM_MAX_SIZE),
DEFINE_PROP_UINT64("eSRAM-size", MSF2State, esram_size,
MSF2_ESRAM_MAX_SIZE),
/* Libero GUI shows 100Mhz as default for clocks */
DEFINE_PROP_UINT32("m3clk", MSF2State, m3clk, 100 * 1000000),
/* default divisors in Libero GUI */
DEFINE_PROP_UINT8("apb0div", MSF2State, apb0div, 2),
DEFINE_PROP_UINT8("apb1div", MSF2State, apb1div, 2),
DEFINE_PROP_END_OF_LIST(),
};
static void m2sxxx_soc_class_init(ObjectClass *klass, void *data)
{
DeviceClass *dc = DEVICE_CLASS(klass);
dc->realize = m2sxxx_soc_realize;
dc->props = m2sxxx_soc_properties;
}
static const TypeInfo m2sxxx_soc_info = {
.name = TYPE_MSF2_SOC,
.parent = TYPE_SYS_BUS_DEVICE,
.instance_size = sizeof(MSF2State),
.instance_init = m2sxxx_soc_initfn,
.class_init = m2sxxx_soc_class_init,
};
static void m2sxxx_soc_types(void)
{
type_register_static(&m2sxxx_soc_info);
}
type_init(m2sxxx_soc_types)

105
hw/arm/msf2-som.c Normal file
View File

@@ -0,0 +1,105 @@
/*
* SmartFusion2 SOM starter kit(from Emcraft) emulation.
*
* Copyright (c) 2017 Subbaraya Sundeep <sundeep.lkml@gmail.com>
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*/
#include "qemu/osdep.h"
#include "qapi/error.h"
#include "qemu/error-report.h"
#include "hw/boards.h"
#include "hw/arm/arm.h"
#include "exec/address-spaces.h"
#include "qemu/cutils.h"
#include "hw/arm/msf2-soc.h"
#include "cpu.h"
#define DDR_BASE_ADDRESS 0xA0000000
#define DDR_SIZE (64 * M_BYTE)
#define M2S010_ENVM_SIZE (256 * K_BYTE)
#define M2S010_ESRAM_SIZE (64 * K_BYTE)
static void emcraft_sf2_s2s010_init(MachineState *machine)
{
DeviceState *dev;
DeviceState *spi_flash;
MSF2State *soc;
MachineClass *mc = MACHINE_GET_CLASS(machine);
DriveInfo *dinfo = drive_get_next(IF_MTD);
qemu_irq cs_line;
SSIBus *spi_bus;
MemoryRegion *sysmem = get_system_memory();
MemoryRegion *ddr = g_new(MemoryRegion, 1);
if (strcmp(machine->cpu_type, mc->default_cpu_type) != 0) {
error_report("This board can only be used with CPU %s",
mc->default_cpu_type);
}
memory_region_init_ram(ddr, NULL, "ddr-ram", DDR_SIZE,
&error_fatal);
memory_region_add_subregion(sysmem, DDR_BASE_ADDRESS, ddr);
dev = qdev_create(NULL, TYPE_MSF2_SOC);
qdev_prop_set_string(dev, "part-name", "M2S010");
qdev_prop_set_string(dev, "cpu-type", mc->default_cpu_type);
qdev_prop_set_uint64(dev, "eNVM-size", M2S010_ENVM_SIZE);
qdev_prop_set_uint64(dev, "eSRAM-size", M2S010_ESRAM_SIZE);
/*
* CPU clock and peripheral clocks(APB0, APB1)are configurable
* in Libero. CPU clock is divided by APB0 and APB1 divisors for
* peripherals. Emcraft's SoM kit comes with these settings by default.
*/
qdev_prop_set_uint32(dev, "m3clk", 142 * 1000000);
qdev_prop_set_uint32(dev, "apb0div", 2);
qdev_prop_set_uint32(dev, "apb1div", 2);
object_property_set_bool(OBJECT(dev), true, "realized", &error_fatal);
soc = MSF2_SOC(dev);
/* Attach SPI flash to SPI0 controller */
spi_bus = (SSIBus *)qdev_get_child_bus(dev, "spi0");
spi_flash = ssi_create_slave_no_init(spi_bus, "s25sl12801");
qdev_prop_set_uint8(spi_flash, "spansion-cr2nv", 1);
if (dinfo) {
qdev_prop_set_drive(spi_flash, "drive", blk_by_legacy_dinfo(dinfo),
&error_fatal);
}
qdev_init_nofail(spi_flash);
cs_line = qdev_get_gpio_in_named(spi_flash, SSI_GPIO_CS, 0);
sysbus_connect_irq(SYS_BUS_DEVICE(&soc->spi[0]), 1, cs_line);
armv7m_load_kernel(ARM_CPU(first_cpu), machine->kernel_filename,
soc->envm_size);
}
static void emcraft_sf2_machine_init(MachineClass *mc)
{
mc->desc = "SmartFusion2 SOM kit from Emcraft (M2S010)";
mc->init = emcraft_sf2_s2s010_init;
mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-m3");
}
DEFINE_MACHINE("emcraft-sf2", emcraft_sf2_machine_init)

View File

@@ -2087,19 +2087,44 @@ static void omap_sysctl_write(void *opaque, hwaddr addr,
}
}
static uint64_t omap_sysctl_readfn(void *opaque, hwaddr addr,
unsigned size)
{
switch (size) {
case 1:
return omap_sysctl_read8(opaque, addr);
case 2:
return omap_badwidth_read32(opaque, addr); /* TODO */
case 4:
return omap_sysctl_read(opaque, addr);
default:
g_assert_not_reached();
}
}
static void omap_sysctl_writefn(void *opaque, hwaddr addr,
uint64_t value, unsigned size)
{
switch (size) {
case 1:
omap_sysctl_write8(opaque, addr, value);
break;
case 2:
omap_badwidth_write32(opaque, addr, value); /* TODO */
break;
case 4:
omap_sysctl_write(opaque, addr, value);
break;
default:
g_assert_not_reached();
}
}
static const MemoryRegionOps omap_sysctl_ops = {
.old_mmio = {
.read = {
omap_sysctl_read8,
omap_badwidth_read32, /* TODO */
omap_sysctl_read,
},
.write = {
omap_sysctl_write8,
omap_badwidth_write32, /* TODO */
omap_sysctl_write,
},
},
.read = omap_sysctl_readfn,
.write = omap_sysctl_writefn,
.valid.min_access_size = 1,
.valid.max_access_size = 4,
.endianness = DEVICE_NATIVE_ENDIAN,
};

View File

@@ -31,26 +31,16 @@
#include "exec/address-spaces.h"
#include "cpu.h"
static uint32_t static_readb(void *opaque, hwaddr offset)
static uint64_t static_read(void *opaque, hwaddr offset, unsigned size)
{
uint32_t *val = (uint32_t *) opaque;
return *val >> ((offset & 3) << 3);
uint32_t *val = (uint32_t *)opaque;
uint32_t sizemask = 7 >> size;
return *val >> ((offset & sizemask) << 3);
}
static uint32_t static_readh(void *opaque, hwaddr offset)
{
uint32_t *val = (uint32_t *) opaque;
return *val >> ((offset & 1) << 3);
}
static uint32_t static_readw(void *opaque, hwaddr offset)
{
uint32_t *val = (uint32_t *) opaque;
return *val >> ((offset & 0) << 3);
}
static void static_write(void *opaque, hwaddr offset,
uint32_t value)
static void static_write(void *opaque, hwaddr offset, uint64_t value,
unsigned size)
{
#ifdef SPY
printf("%s: value %08lx written at " PA_FMT "\n",
@@ -59,10 +49,10 @@ static void static_write(void *opaque, hwaddr offset,
}
static const MemoryRegionOps static_ops = {
.old_mmio = {
.read = { static_readb, static_readh, static_readw, },
.write = { static_write, static_write, static_write, },
},
.read = static_read,
.write = static_write,
.valid.min_access_size = 1,
.valid.max_access_size = 4,
.endianness = DEVICE_NATIVE_ENDIAN,
};

View File

@@ -1143,13 +1143,15 @@ static void pxa2xx_rtc_init(Object *obj)
sysbus_init_mmio(dev, &s->iomem);
}
static void pxa2xx_rtc_pre_save(void *opaque)
static int pxa2xx_rtc_pre_save(void *opaque)
{
PXA2xxRTCState *s = (PXA2xxRTCState *) opaque;
pxa2xx_rtc_hzupdate(s);
pxa2xx_rtc_piupdate(s);
pxa2xx_rtc_swupdate(s);
return 0;
}
static int pxa2xx_rtc_post_load(void *opaque, int version_id)

View File

@@ -406,11 +406,13 @@ static void strongarm_rtc_init(Object *obj)
sysbus_init_mmio(dev, &s->iomem);
}
static void strongarm_rtc_pre_save(void *opaque)
static int strongarm_rtc_pre_save(void *opaque)
{
StrongARMRTCState *s = opaque;
strongarm_rtc_hzupdate(s);
return 0;
}
static int strongarm_rtc_post_load(void *opaque, int version_id)

View File

@@ -440,6 +440,8 @@ static void xlnx_zynqmp_class_init(ObjectClass *oc, void *data)
dc->props = xlnx_zynqmp_props;
dc->realize = xlnx_zynqmp_realize;
/* Reason: Uses serial_hds in realize function, thus can't be used twice */
dc->user_creatable = false;
}
static const TypeInfo xlnx_zynqmp_type_info = {

View File

@@ -567,11 +567,13 @@ static int wm8750_rx(I2CSlave *i2c)
return 0x00;
}
static void wm8750_pre_save(void *opaque)
static int wm8750_pre_save(void *opaque)
{
WM8750State *s = opaque;
s->rate_vmstate = s->rate - wm_rate_table;
return 0;
}
static int wm8750_post_load(void *opaque, int version_id)

View File

@@ -1101,11 +1101,13 @@ static int reconstruct_phase(FDCtrl *fdctrl)
}
}
static void fdc_pre_save(void *opaque)
static int fdc_pre_save(void *opaque)
{
FDCtrl *s = opaque;
s->dor_vmstate = s->dor | GET_CUR_DRV(s);
return 0;
}
static int fdc_pre_load(void *opaque)

View File

@@ -1251,9 +1251,11 @@ static void m25p80_reset(DeviceState *d)
reset_memory(s);
}
static void m25p80_pre_save(void *opaque)
static int m25p80_pre_save(void *opaque)
{
flash_sync_dirty((Flash *)opaque, -1);
return 0;
}
static Property m25p80_properties[] = {

View File

@@ -325,11 +325,13 @@ static void nand_command(NANDFlashState *s)
}
}
static void nand_pre_save(void *opaque)
static int nand_pre_save(void *opaque)
{
NANDFlashState *s = NAND(opaque);
s->ioaddr_vmstate = s->ioaddr - s->io;
return 0;
}
static int nand_post_load(void *opaque, int version_id)

View File

@@ -137,7 +137,7 @@ static void onenand_intr_update(OneNANDState *s)
qemu_set_irq(s->intr, ((s->intstatus >> 15) ^ (~s->config[0] >> 6)) & 1);
}
static void onenand_pre_save(void *opaque)
static int onenand_pre_save(void *opaque)
{
OneNANDState *s = opaque;
if (s->current == s->otp) {
@@ -147,6 +147,8 @@ static void onenand_pre_save(void *opaque)
} else {
s->current_direction = 0;
}
return 0;
}
static int onenand_post_load(void *opaque, int version_id)
@@ -518,10 +520,6 @@ static void onenand_command(OneNANDState *s)
s->intstatus |= ONEN_INT;
for (b = 0; b < s->blocks; b ++) {
if (b >= s->blocks) {
s->status |= ONEN_ERR_CMD;
break;
}
if (s->blockwp[b] == ONEN_LOCK_LOCKTIGHTEN)
break;

View File

@@ -630,10 +630,12 @@ static void serial_event(void *opaque, int event)
serial_receive_break(s);
}
static void serial_pre_save(void *opaque)
static int serial_pre_save(void *opaque)
{
SerialState *s = opaque;
s->fcr_vmstate = s->fcr;
return 0;
}
static int serial_pre_load(void *opaque)

View File

@@ -30,7 +30,6 @@ typedef struct Terminal3270 {
uint8_t inv[INPUT_BUFFER_SIZE];
uint8_t outv[OUTPUT_BUFFER_SIZE];
int in_len;
int out_len;
bool handshake_done;
guint timer_tag;
} Terminal3270;
@@ -145,7 +144,6 @@ static void chr_event(void *opaque, int event)
/* Ensure the initial status correct, always reset them. */
t->in_len = 0;
t->out_len = 0;
t->handshake_done = false;
if (t->timer_tag) {
g_source_remove(t->timer_tag);
@@ -182,14 +180,18 @@ static void terminal_init(EmulatedCcw3270Device *dev, Error **errp)
terminal_read, chr_event, NULL, t, NULL, true);
}
static int read_payload_3270(EmulatedCcw3270Device *dev, uint32_t cda,
uint16_t count)
static inline CcwDataStream *get_cds(Terminal3270 *t)
{
return &(CCW_DEVICE(&t->cdev)->sch->cds);
}
static int read_payload_3270(EmulatedCcw3270Device *dev)
{
Terminal3270 *t = TERMINAL_3270(dev);
int len;
len = MIN(count, t->in_len);
cpu_physical_memory_write(cda, t->inv, len);
len = MIN(ccw_dstream_avail(get_cds(t)), t->in_len);
ccw_dstream_write_buf(get_cds(t), t->inv, len);
t->in_len -= len;
return len;
@@ -222,13 +224,14 @@ static int insert_IAC_escape_char(uint8_t *outv, int out_len)
* Write 3270 outbound to socket.
* Return the count of 3270 data field if succeeded, zero if failed.
*/
static int write_payload_3270(EmulatedCcw3270Device *dev, uint8_t cmd,
uint32_t cda, uint16_t count)
static int write_payload_3270(EmulatedCcw3270Device *dev, uint8_t cmd)
{
Terminal3270 *t = TERMINAL_3270(dev);
int retval = 0;
assert(count <= (OUTPUT_BUFFER_SIZE - 3) / 2);
int count = ccw_dstream_avail(get_cds(t));
int bound = (OUTPUT_BUFFER_SIZE - 3) / 2;
int len = MIN(count, bound);
int out_len = 0;
if (!t->handshake_done) {
if (!(t->outv[0] == IAC && t->outv[1] != IAC)) {
@@ -243,16 +246,23 @@ static int write_payload_3270(EmulatedCcw3270Device *dev, uint8_t cmd,
/* We just say we consumed all data if there's no backend. */
return count;
}
t->outv[0] = cmd;
cpu_physical_memory_read(cda, &t->outv[1], count);
t->out_len = count + 1;
t->out_len = insert_IAC_escape_char(t->outv, t->out_len);
t->outv[t->out_len++] = IAC;
t->outv[t->out_len++] = IAC_EOR;
t->outv[out_len++] = cmd;
do {
ccw_dstream_read_buf(get_cds(t), &t->outv[out_len], len);
count = ccw_dstream_avail(get_cds(t));
out_len += len;
retval = qemu_chr_fe_write_all(&t->chr, t->outv, t->out_len);
return (retval <= 0) ? 0 : (retval - 3);
out_len = insert_IAC_escape_char(t->outv, out_len);
if (!count) {
t->outv[out_len++] = IAC;
t->outv[out_len++] = IAC_EOR;
}
retval = qemu_chr_fe_write_all(&t->chr, t->outv, out_len);
len = MIN(count, bound);
out_len = 0;
} while (len && retval >= 0);
return (retval <= 0) ? 0 : get_cds(t)->count;
}
static Property terminal_properties[] = {

View File

@@ -187,6 +187,26 @@ static int chr_be_change(void *opaque)
return 0;
}
static void virtconsole_enable_backend(VirtIOSerialPort *port, bool enable)
{
VirtConsole *vcon = VIRTIO_CONSOLE(port);
if (!qemu_chr_fe_backend_connected(&vcon->chr)) {
return;
}
if (enable) {
VirtIOSerialPortClass *k = VIRTIO_SERIAL_PORT_GET_CLASS(port);
qemu_chr_fe_set_handlers(&vcon->chr, chr_can_read, chr_read,
k->is_console ? NULL : chr_event,
chr_be_change, vcon, NULL, false);
} else {
qemu_chr_fe_set_handlers(&vcon->chr, NULL, NULL, NULL,
NULL, NULL, NULL, false);
}
}
static void virtconsole_realize(DeviceState *dev, Error **errp)
{
VirtIOSerialPort *port = VIRTIO_SERIAL_PORT(dev);
@@ -258,6 +278,7 @@ static void virtserialport_class_init(ObjectClass *klass, void *data)
k->unrealize = virtconsole_unrealize;
k->have_data = flush_buf;
k->set_guest_connected = set_guest_connected;
k->enable_backend = virtconsole_enable_backend;
k->guest_writable = guest_writable;
dc->props = virtserialport_properties;
}

View File

@@ -637,6 +637,13 @@ static void set_status(VirtIODevice *vdev, uint8_t status)
if (!(status & VIRTIO_CONFIG_S_DRIVER_OK)) {
guest_reset(vser);
}
QTAILQ_FOREACH(port, &vser->ports, next) {
VirtIOSerialPortClass *vsc = VIRTIO_SERIAL_PORT_GET_CLASS(port);
if (vsc->enable_backend) {
vsc->enable_backend(port, vdev->vm_running);
}
}
}
static void vser_reset(VirtIODevice *vdev)

View File

@@ -758,6 +758,38 @@ void machine_run_board_init(MachineState *machine)
if (nb_numa_nodes) {
machine_numa_finish_init(machine);
}
/* If the machine supports the valid_cpu_types check and the user
* specified a CPU with -cpu check here that the user CPU is supported.
*/
if (machine_class->valid_cpu_types && machine->cpu_type) {
ObjectClass *class = object_class_by_name(machine->cpu_type);
int i;
for (i = 0; machine_class->valid_cpu_types[i]; i++) {
if (object_class_dynamic_cast(class,
machine_class->valid_cpu_types[i])) {
/* The user specificed CPU is in the valid field, we are
* good to go.
*/
break;
}
}
if (!machine_class->valid_cpu_types[i]) {
/* The user specified CPU is not valid */
error_report("Invalid CPU type: %s", machine->cpu_type);
error_printf("The valid types are: %s",
machine_class->valid_cpu_types[0]);
for (i = 1; machine_class->valid_cpu_types[i]; i++) {
error_printf(", %s", machine_class->valid_cpu_types[i]);
}
error_printf("\n");
exit(1);
}
}
machine_class->init(machine);
}

View File

@@ -2204,7 +2204,7 @@ static void qxl_realize_secondary(PCIDevice *dev, Error **errp)
qxl_realize_common(qxl, errp);
}
static void qxl_pre_save(void *opaque)
static int qxl_pre_save(void *opaque)
{
PCIQXLDevice* d = opaque;
uint8_t *ram_start = d->vga.vram_ptr;
@@ -2216,6 +2216,8 @@ static void qxl_pre_save(void *opaque)
d->last_release_offset = (uint8_t *)d->last_release - ram_start;
}
assert(d->last_release_offset < d->vga.vram_size);
return 0;
}
static int qxl_pre_load(void *opaque)

View File

@@ -6,6 +6,7 @@ jazz_led_write(uint64_t addr, uint8_t new) "write addr=0x%"PRIx64": 0x%x"
# hw/display/xenfb.c
xenfb_mouse_event(void *opaque, int dx, int dy, int dz, int button_state, int abs_pointer_wanted) "%p x %d y %d z %d bs 0x%x abs %d"
xenfb_key_event(void *opaque, int scancode, int button_state) "%p scancode %d bs 0x%x"
xenfb_input_connected(void *xendev, int abs_pointer_wanted) "%p abs %d"
# hw/display/g364fb.c

View File

@@ -1050,9 +1050,7 @@ static int virtio_gpu_save(QEMUFile *f, void *opaque, size_t size,
}
qemu_put_be32(f, 0); /* end of list */
vmstate_save_state(f, &vmstate_virtio_gpu_scanouts, g, NULL);
return 0;
return vmstate_save_state(f, &vmstate_virtio_gpu_scanouts, g, NULL);
}
static int virtio_gpu_load(QEMUFile *f, void *opaque, size_t size,
@@ -1321,6 +1319,7 @@ static void virtio_gpu_class_init(ObjectClass *klass, void *data)
vdc->reset = virtio_gpu_reset;
set_bit(DEVICE_CATEGORY_DISPLAY, dc->categories);
dc->props = virtio_gpu_properties;
dc->vmsd = &vmstate_virtio_gpu;
dc->hotpluggable = false;

View File

@@ -290,6 +290,7 @@ static void xenfb_key_event(void *opaque, int scancode)
scancode |= 0x80;
xenfb->extended = 0;
}
trace_xenfb_key_event(opaque, scancode2linux[scancode], down);
xenfb_send_key(xenfb, down, scancode2linux[scancode]);
}

View File

@@ -525,17 +525,23 @@ static void omap2_gpio_module_write(void *opaque, hwaddr addr,
}
}
static uint32_t omap2_gpio_module_readp(void *opaque, hwaddr addr)
static uint64_t omap2_gpio_module_readp(void *opaque, hwaddr addr,
unsigned size)
{
return omap2_gpio_module_read(opaque, addr & ~3) >> ((addr & 3) << 3);
}
static void omap2_gpio_module_writep(void *opaque, hwaddr addr,
uint32_t value)
uint64_t value, unsigned size)
{
uint32_t cur = 0;
uint32_t mask = 0xffff;
if (size == 4) {
omap2_gpio_module_write(opaque, addr, value);
return;
}
switch (addr & ~3) {
case 0x00: /* GPIO_REVISION */
case 0x14: /* GPIO_SYSSTATUS */
@@ -581,18 +587,10 @@ static void omap2_gpio_module_writep(void *opaque, hwaddr addr,
}
static const MemoryRegionOps omap2_gpio_module_ops = {
.old_mmio = {
.read = {
omap2_gpio_module_readp,
omap2_gpio_module_readp,
omap2_gpio_module_read,
},
.write = {
omap2_gpio_module_writep,
omap2_gpio_module_writep,
omap2_gpio_module_write,
},
},
.read = omap2_gpio_module_readp,
.write = omap2_gpio_module_writep,
.valid.min_access_size = 1,
.valid.max_access_size = 4,
.endianness = DEVICE_NATIVE_ENDIAN,
};

View File

@@ -41,7 +41,7 @@ static const TypeInfo i2c_bus_info = {
.instance_size = sizeof(I2CBus),
};
static void i2c_bus_pre_save(void *opaque)
static int i2c_bus_pre_save(void *opaque)
{
I2CBus *bus = opaque;
@@ -53,6 +53,8 @@ static void i2c_bus_pre_save(void *opaque)
bus->saved_address = I2C_BROADCAST;
}
}
return 0;
}
static const VMStateDescription vmstate_i2c_bus = {

View File

@@ -430,19 +430,39 @@ static void omap_i2c_writeb(void *opaque, hwaddr addr,
}
}
static uint64_t omap_i2c_readfn(void *opaque, hwaddr addr,
unsigned size)
{
switch (size) {
case 2:
return omap_i2c_read(opaque, addr);
default:
return omap_badwidth_read16(opaque, addr);
}
}
static void omap_i2c_writefn(void *opaque, hwaddr addr,
uint64_t value, unsigned size)
{
switch (size) {
case 1:
/* Only the last fifo write can be 8 bit. */
omap_i2c_writeb(opaque, addr, value);
break;
case 2:
omap_i2c_write(opaque, addr, value);
break;
default:
omap_badwidth_write16(opaque, addr, value);
break;
}
}
static const MemoryRegionOps omap_i2c_ops = {
.old_mmio = {
.read = {
omap_badwidth_read16,
omap_i2c_read,
omap_badwidth_read16,
},
.write = {
omap_i2c_writeb, /* Only the last fifo write can be 8 bit. */
omap_i2c_write,
omap_badwidth_write16,
},
},
.read = omap_i2c_readfn,
.write = omap_i2c_writefn,
.valid.min_access_size = 1,
.valid.max_access_size = 4,
.endianness = DEVICE_NATIVE_ENDIAN,
};

View File

@@ -62,7 +62,7 @@ static uint64_t kvmclock_current_nsec(KVMClockState *s)
{
CPUState *cpu = first_cpu;
CPUX86State *env = cpu->env_ptr;
hwaddr kvmclock_struct_pa = env->system_time_msr & ~1ULL;
hwaddr kvmclock_struct_pa;
uint64_t migration_tsc = env->tsc;
struct pvclock_vcpu_time_info time;
uint64_t delta;
@@ -77,6 +77,7 @@ static uint64_t kvmclock_current_nsec(KVMClockState *s)
return 0;
}
kvmclock_struct_pa = env->system_time_msr & ~1ULL;
cpu_physical_memory_read(kvmclock_struct_pa, &time, sizeof(time));
assert(time.tsc_timestamp <= migration_tsc);
@@ -254,11 +255,13 @@ static const VMStateDescription kvmclock_reliable_get_clock = {
* final pages of memory (which happens between vm_stop()
* and pre_save()) takes max_downtime.
*/
static void kvmclock_pre_save(void *opaque)
static int kvmclock_pre_save(void *opaque)
{
KVMClockState *s = opaque;
kvm_update_clock(s);
return 0;
}
static const VMStateDescription kvmclock_vmsd = {

View File

@@ -184,7 +184,7 @@ static void ahci_check_irq(AHCIState *s)
static void ahci_trigger_irq(AHCIState *s, AHCIDevice *d,
enum AHCIPortIRQ irqbit)
{
g_assert(irqbit >= 0 && irqbit < 32);
g_assert((unsigned)irqbit < 32);
uint32_t irq = 1U << irqbit;
uint32_t irqstat = d->port_regs.irq_stat | irq;

View File

@@ -68,7 +68,7 @@ const char *IDE_DMA_CMD_lookup[IDE_DMA__COUNT] = {
static const char *IDE_DMA_CMD_str(enum ide_dma_cmd enval)
{
if (enval >= IDE_DMA__BEGIN && enval < IDE_DMA__COUNT) {
if ((unsigned)enval < IDE_DMA__COUNT) {
return IDE_DMA_CMD_lookup[enval];
}
return "DMA UNKNOWN CMD";
@@ -2752,7 +2752,7 @@ static int ide_drive_pio_post_load(void *opaque, int version_id)
return 0;
}
static void ide_drive_pio_pre_save(void *opaque)
static int ide_drive_pio_pre_save(void *opaque)
{
IDEState *s = opaque;
int idx;
@@ -2768,6 +2768,8 @@ static void ide_drive_pio_pre_save(void *opaque)
} else {
s->end_transfer_fn_idx = idx;
}
return 0;
}
static bool ide_drive_pio_state_needed(void *opaque)

View File

@@ -255,114 +255,100 @@ static void pmac_ide_flush(DBDMA_io *io)
}
/* PowerMac IDE memory IO */
static void pmac_ide_writeb (void *opaque,
hwaddr addr, uint32_t val)
static uint64_t pmac_ide_read(void *opaque, hwaddr addr, unsigned size)
{
MACIOIDEState *d = opaque;
uint64_t retval = 0xffffffff;
int reg = addr >> 4;
addr = (addr & 0xFFF) >> 4;
switch (addr) {
case 1 ... 7:
ide_ioport_write(&d->bus, addr, val);
switch (reg) {
case 0x0:
if (size == 2) {
retval = ide_data_readw(&d->bus, 0);
} else if (size == 4) {
retval = ide_data_readl(&d->bus, 0);
}
break;
case 8:
case 22:
ide_cmd_write(&d->bus, 0, val);
case 0x1 ... 0x7:
if (size == 1) {
retval = ide_ioport_read(&d->bus, reg);
}
break;
default:
case 0x8:
case 0x16:
if (size == 1) {
retval = ide_status_read(&d->bus, 0);
}
break;
case 0x20:
if (size == 4) {
retval = d->timing_reg;
}
break;
case 0x30:
/* This is an interrupt state register that only exists
* in the KeyLargo and later variants. Bit 0x8000_0000
* latches the DMA interrupt and has to be written to
* clear. Bit 0x4000_0000 is an image of the disk
* interrupt. MacOS X relies on this and will hang if
* we don't provide at least the disk interrupt
*/
if (size == 4) {
retval = d->irq_reg;
}
break;
}
}
static uint32_t pmac_ide_readb (void *opaque,hwaddr addr)
{
uint8_t retval;
MACIOIDEState *d = opaque;
addr = (addr & 0xFFF) >> 4;
switch (addr) {
case 1 ... 7:
retval = ide_ioport_read(&d->bus, addr);
break;
case 8:
case 22:
retval = ide_status_read(&d->bus, 0);
break;
default:
retval = 0xFF;
break;
}
return retval;
}
static void pmac_ide_writew (void *opaque,
hwaddr addr, uint32_t val)
static void pmac_ide_write(void *opaque, hwaddr addr, uint64_t val,
unsigned size)
{
MACIOIDEState *d = opaque;
int reg = addr >> 4;
addr = (addr & 0xFFF) >> 4;
val = bswap16(val);
if (addr == 0) {
ide_data_writew(&d->bus, 0, val);
switch (reg) {
case 0x0:
if (size == 2) {
ide_data_writew(&d->bus, 0, val);
} else if (size == 4) {
ide_data_writel(&d->bus, 0, val);
}
break;
case 0x1 ... 0x7:
if (size == 1) {
ide_ioport_write(&d->bus, reg, val);
}
break;
case 0x8:
case 0x16:
if (size == 1) {
ide_cmd_write(&d->bus, 0, val);
}
break;
case 0x20:
if (size == 4) {
d->timing_reg = val;
}
break;
case 0x30:
if (size == 4) {
if (val & 0x80000000u) {
d->irq_reg &= 0x7fffffff;
}
}
break;
}
}
static uint32_t pmac_ide_readw (void *opaque,hwaddr addr)
{
uint16_t retval;
MACIOIDEState *d = opaque;
addr = (addr & 0xFFF) >> 4;
if (addr == 0) {
retval = ide_data_readw(&d->bus, 0);
} else {
retval = 0xFFFF;
}
retval = bswap16(retval);
return retval;
}
static void pmac_ide_writel (void *opaque,
hwaddr addr, uint32_t val)
{
MACIOIDEState *d = opaque;
addr = (addr & 0xFFF) >> 4;
val = bswap32(val);
if (addr == 0) {
ide_data_writel(&d->bus, 0, val);
}
}
static uint32_t pmac_ide_readl (void *opaque,hwaddr addr)
{
uint32_t retval;
MACIOIDEState *d = opaque;
addr = (addr & 0xFFF) >> 4;
if (addr == 0) {
retval = ide_data_readl(&d->bus, 0);
} else {
retval = 0xFFFFFFFF;
}
retval = bswap32(retval);
return retval;
}
static const MemoryRegionOps pmac_ide_ops = {
.old_mmio = {
.write = {
pmac_ide_writeb,
pmac_ide_writew,
pmac_ide_writel,
},
.read = {
pmac_ide_readb,
pmac_ide_readw,
pmac_ide_readl,
},
},
.endianness = DEVICE_NATIVE_ENDIAN,
.read = pmac_ide_read,
.write = pmac_ide_write,
.valid.min_access_size = 1,
.valid.max_access_size = 4,
.endianness = DEVICE_LITTLE_ENDIAN,
};
static const VMStateDescription vmstate_pmac = {
@@ -426,13 +412,32 @@ static void macio_ide_realizefn(DeviceState *dev, Error **errp)
{
MACIOIDEState *s = MACIO_IDE(dev);
ide_init2(&s->bus, s->irq);
ide_init2(&s->bus, s->ide_irq);
/* Register DMA callbacks */
s->dma.ops = &dbdma_ops;
s->bus.dma = &s->dma;
}
static void pmac_ide_irq(void *opaque, int n, int level)
{
MACIOIDEState *s = opaque;
uint32_t mask = 0x80000000u >> n;
/* We need to reflect the IRQ state in the irq register */
if (level) {
s->irq_reg |= mask;
} else {
s->irq_reg &= ~mask;
}
if (n) {
qemu_set_irq(s->real_ide_irq, level);
} else {
qemu_set_irq(s->real_dma_irq, level);
}
}
static void macio_ide_initfn(Object *obj)
{
SysBusDevice *d = SYS_BUS_DEVICE(obj);
@@ -441,16 +446,28 @@ static void macio_ide_initfn(Object *obj)
ide_bus_new(&s->bus, sizeof(s->bus), DEVICE(obj), 0, 2);
memory_region_init_io(&s->mem, obj, &pmac_ide_ops, s, "pmac-ide", 0x1000);
sysbus_init_mmio(d, &s->mem);
sysbus_init_irq(d, &s->irq);
sysbus_init_irq(d, &s->dma_irq);
sysbus_init_irq(d, &s->real_ide_irq);
sysbus_init_irq(d, &s->real_dma_irq);
s->dma_irq = qemu_allocate_irq(pmac_ide_irq, s, 0);
s->ide_irq = qemu_allocate_irq(pmac_ide_irq, s, 1);
object_property_add_link(obj, "dbdma", TYPE_MAC_DBDMA,
(Object **) &s->dbdma,
qdev_prop_allow_set_link_before_realize, 0, NULL);
}
static Property macio_ide_properties[] = {
DEFINE_PROP_UINT32("channel", MACIOIDEState, channel, 0),
DEFINE_PROP_END_OF_LIST(),
};
static void macio_ide_class_init(ObjectClass *oc, void *data)
{
DeviceClass *dc = DEVICE_CLASS(oc);
dc->realize = macio_ide_realizefn;
dc->reset = macio_ide_reset;
dc->props = macio_ide_properties;
dc->vmsd = &vmstate_pmac;
set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
}
@@ -480,10 +497,9 @@ void macio_ide_init_drives(MACIOIDEState *s, DriveInfo **hd_table)
}
}
void macio_ide_register_dma(MACIOIDEState *s, void *dbdma, int channel)
void macio_ide_register_dma(MACIOIDEState *s)
{
s->dbdma = dbdma;
DBDMA_register_channel(dbdma, channel, s->dma_irq,
DBDMA_register_channel(s->dbdma, s->channel, s->dma_irq,
pmac_ide_transfer, pmac_ide_flush, s);
}

View File

@@ -296,7 +296,7 @@ static bool ide_bmdma_status_needed(void *opaque)
return ((bm->status & abused_bits) != 0);
}
static void ide_bmdma_pre_save(void *opaque)
static int ide_bmdma_pre_save(void *opaque)
{
BMDMAState *bm = opaque;
uint8_t abused_bits = BM_MIGRATION_COMPAT_STATUS_BITS;
@@ -310,6 +310,8 @@ static void ide_bmdma_pre_save(void *opaque)
bm->migration_retry_nsector = bm->bus->retry_nsector;
bm->migration_compat_status =
(bm->status & ~abused_bits) | (bm->bus->error_status & abused_bits);
return 0;
}
/* This function accesses bm->bus->error_status which is loaded only after

View File

@@ -1216,12 +1216,14 @@ static int ps2_kbd_post_load(void* opaque, int version_id)
return 0;
}
static void ps2_kbd_pre_save(void *opaque)
static int ps2_kbd_pre_save(void *opaque)
{
PS2KbdState *s = (PS2KbdState *)opaque;
PS2State *ps2 = &s->common;
ps2_common_post_load(ps2);
return 0;
}
static const VMStateDescription vmstate_ps2_keyboard = {
@@ -1254,12 +1256,14 @@ static int ps2_mouse_post_load(void *opaque, int version_id)
return 0;
}
static void ps2_mouse_pre_save(void *opaque)
static int ps2_mouse_pre_save(void *opaque)
{
PS2MouseState *s = (PS2MouseState *)opaque;
PS2State *ps2 = &s->common;
ps2_common_post_load(ps2);
return 0;
}
static const VMStateDescription vmstate_ps2_mouse = {

View File

@@ -976,10 +976,12 @@ static void tsc210x_i2s_set_rate(TSC210xState *s, int in, int out)
s->i2s_rx_rate = in;
}
static void tsc210x_pre_save(void *opaque)
static int tsc210x_pre_save(void *opaque)
{
TSC210xState *s = (TSC210xState *) opaque;
s->now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
return 0;
}
static int tsc210x_post_load(void *opaque, int version_id)

View File

@@ -190,6 +190,7 @@ static void virtio_input_key_config(VirtIOInput *vinput,
static void virtio_input_handle_event(DeviceState *dev, QemuConsole *src,
InputEvent *evt)
{
VirtIOInputHID *vhid = VIRTIO_INPUT_HID(dev);
VirtIOInput *vinput = VIRTIO_INPUT(dev);
virtio_input_event event;
int qcode;
@@ -215,7 +216,14 @@ static void virtio_input_handle_event(DeviceState *dev, QemuConsole *src,
break;
case INPUT_EVENT_KIND_BTN:
btn = evt->u.btn.data;
if (keymap_button[btn->button]) {
if (vhid->wheel_axis && (btn->button == INPUT_BUTTON_WHEEL_UP ||
btn->button == INPUT_BUTTON_WHEEL_DOWN)) {
event.type = cpu_to_le16(EV_REL);
event.code = cpu_to_le16(REL_WHEEL);
event.value = cpu_to_le32(btn->button == INPUT_BUTTON_WHEEL_UP
? 1 : -1);
virtio_input_send(vinput, &event);
} else if (keymap_button[btn->button]) {
event.type = cpu_to_le16(EV_KEY);
event.code = cpu_to_le16(keymap_button[btn->button]);
event.value = cpu_to_le32(btn->down ? 1 : 0);
@@ -407,7 +415,7 @@ static QemuInputHandler virtio_mouse_handler = {
.sync = virtio_input_handle_sync,
};
static struct virtio_input_config virtio_mouse_config[] = {
static struct virtio_input_config virtio_mouse_config_v1[] = {
{
.select = VIRTIO_INPUT_CFG_ID_NAME,
.size = sizeof(VIRTIO_ID_NAME_MOUSE),
@@ -432,13 +440,53 @@ static struct virtio_input_config virtio_mouse_config[] = {
{ /* end of list */ },
};
static struct virtio_input_config virtio_mouse_config_v2[] = {
{
.select = VIRTIO_INPUT_CFG_ID_NAME,
.size = sizeof(VIRTIO_ID_NAME_MOUSE),
.u.string = VIRTIO_ID_NAME_MOUSE,
},{
.select = VIRTIO_INPUT_CFG_ID_DEVIDS,
.size = sizeof(struct virtio_input_devids),
.u.ids = {
.bustype = const_le16(BUS_VIRTUAL),
.vendor = const_le16(0x0627), /* same we use for usb hid devices */
.product = const_le16(0x0002),
.version = const_le16(0x0002),
},
},{
.select = VIRTIO_INPUT_CFG_EV_BITS,
.subsel = EV_REL,
.size = 2,
.u.bitmap = {
(1 << REL_X) | (1 << REL_Y),
(1 << (REL_WHEEL - 8))
},
},
{ /* end of list */ },
};
static Property virtio_mouse_properties[] = {
DEFINE_PROP_BOOL("wheel-axis", VirtIOInputHID, wheel_axis, true),
DEFINE_PROP_END_OF_LIST(),
};
static void virtio_mouse_class_init(ObjectClass *klass, void *data)
{
DeviceClass *dc = DEVICE_CLASS(klass);
dc->props = virtio_mouse_properties;
}
static void virtio_mouse_init(Object *obj)
{
VirtIOInputHID *vhid = VIRTIO_INPUT_HID(obj);
VirtIOInput *vinput = VIRTIO_INPUT(obj);
vhid->handler = &virtio_mouse_handler;
virtio_input_init_config(vinput, virtio_mouse_config);
virtio_input_init_config(vinput, vhid->wheel_axis
? virtio_mouse_config_v2
: virtio_mouse_config_v1);
virtio_input_key_config(vinput, keymap_button,
ARRAY_SIZE(keymap_button));
}
@@ -448,6 +496,7 @@ static const TypeInfo virtio_mouse_info = {
.parent = TYPE_VIRTIO_INPUT_HID,
.instance_size = sizeof(VirtIOInputHID),
.instance_init = virtio_mouse_init,
.class_init = virtio_mouse_class_init,
};
/* ----------------------------------------------------------------- */
@@ -459,7 +508,7 @@ static QemuInputHandler virtio_tablet_handler = {
.sync = virtio_input_handle_sync,
};
static struct virtio_input_config virtio_tablet_config[] = {
static struct virtio_input_config virtio_tablet_config_v1[] = {
{
.select = VIRTIO_INPUT_CFG_ID_NAME,
.size = sizeof(VIRTIO_ID_NAME_TABLET),
@@ -496,13 +545,72 @@ static struct virtio_input_config virtio_tablet_config[] = {
{ /* end of list */ },
};
static struct virtio_input_config virtio_tablet_config_v2[] = {
{
.select = VIRTIO_INPUT_CFG_ID_NAME,
.size = sizeof(VIRTIO_ID_NAME_TABLET),
.u.string = VIRTIO_ID_NAME_TABLET,
},{
.select = VIRTIO_INPUT_CFG_ID_DEVIDS,
.size = sizeof(struct virtio_input_devids),
.u.ids = {
.bustype = const_le16(BUS_VIRTUAL),
.vendor = const_le16(0x0627), /* same we use for usb hid devices */
.product = const_le16(0x0003),
.version = const_le16(0x0002),
},
},{
.select = VIRTIO_INPUT_CFG_EV_BITS,
.subsel = EV_ABS,
.size = 1,
.u.bitmap = {
(1 << ABS_X) | (1 << ABS_Y),
},
},{
.select = VIRTIO_INPUT_CFG_EV_BITS,
.subsel = EV_REL,
.size = 2,
.u.bitmap = {
0,
(1 << (REL_WHEEL - 8))
},
},{
.select = VIRTIO_INPUT_CFG_ABS_INFO,
.subsel = ABS_X,
.size = sizeof(virtio_input_absinfo),
.u.abs.min = const_le32(INPUT_EVENT_ABS_MIN),
.u.abs.max = const_le32(INPUT_EVENT_ABS_MAX),
},{
.select = VIRTIO_INPUT_CFG_ABS_INFO,
.subsel = ABS_Y,
.size = sizeof(virtio_input_absinfo),
.u.abs.min = const_le32(INPUT_EVENT_ABS_MIN),
.u.abs.max = const_le32(INPUT_EVENT_ABS_MAX),
},
{ /* end of list */ },
};
static Property virtio_tablet_properties[] = {
DEFINE_PROP_BOOL("wheel-axis", VirtIOInputHID, wheel_axis, true),
DEFINE_PROP_END_OF_LIST(),
};
static void virtio_tablet_class_init(ObjectClass *klass, void *data)
{
DeviceClass *dc = DEVICE_CLASS(klass);
dc->props = virtio_tablet_properties;
}
static void virtio_tablet_init(Object *obj)
{
VirtIOInputHID *vhid = VIRTIO_INPUT_HID(obj);
VirtIOInput *vinput = VIRTIO_INPUT(obj);
vhid->handler = &virtio_tablet_handler;
virtio_input_init_config(vinput, virtio_tablet_config);
virtio_input_init_config(vinput, vhid->wheel_axis
? virtio_tablet_config_v2
: virtio_tablet_config_v1);
virtio_input_key_config(vinput, keymap_button,
ARRAY_SIZE(keymap_button));
}
@@ -512,6 +620,7 @@ static const TypeInfo virtio_tablet_info = {
.parent = TYPE_VIRTIO_INPUT_HID,
.instance_size = sizeof(VirtIOInputHID),
.instance_init = virtio_tablet_init,
.class_init = virtio_tablet_class_init,
};
/* ----------------------------------------------------------------- */

View File

@@ -360,7 +360,7 @@ static int apic_pre_load(void *opaque)
return 0;
}
static void apic_dispatch_pre_save(void *opaque)
static int apic_dispatch_pre_save(void *opaque)
{
APICCommonState *s = APIC_COMMON(opaque);
APICCommonClass *info = APIC_COMMON_GET_CLASS(s);
@@ -368,6 +368,8 @@ static void apic_dispatch_pre_save(void *opaque)
if (info->pre_save) {
info->pre_save(s);
}
return 0;
}
static int apic_dispatch_post_load(void *opaque, int version_id)

View File

@@ -23,7 +23,7 @@
#include "gic_internal.h"
#include "hw/arm/linux-boot-if.h"
static void gic_pre_save(void *opaque)
static int gic_pre_save(void *opaque)
{
GICState *s = (GICState *)opaque;
ARMGICCommonClass *c = ARM_GIC_COMMON_GET_CLASS(s);
@@ -31,6 +31,8 @@ static void gic_pre_save(void *opaque)
if (c->pre_save) {
c->pre_save(s);
}
return 0;
}
static int gic_post_load(void *opaque, int version_id)

View File

@@ -28,7 +28,7 @@
#include "gicv3_internal.h"
#include "hw/arm/linux-boot-if.h"
static void gicv3_pre_save(void *opaque)
static int gicv3_pre_save(void *opaque)
{
GICv3State *s = (GICv3State *)opaque;
ARMGICv3CommonClass *c = ARM_GICV3_COMMON_GET_CLASS(s);
@@ -36,6 +36,8 @@ static void gicv3_pre_save(void *opaque)
if (c->pre_save) {
c->pre_save(s);
}
return 0;
}
static int gicv3_post_load(void *opaque, int version_id)

View File

@@ -23,7 +23,7 @@
#include "hw/intc/arm_gicv3_its_common.h"
#include "qemu/log.h"
static void gicv3_its_pre_save(void *opaque)
static int gicv3_its_pre_save(void *opaque)
{
GICv3ITSState *s = (GICv3ITSState *)opaque;
GICv3ITSCommonClass *c = ARM_GICV3_ITS_COMMON_GET_CLASS(s);
@@ -31,6 +31,8 @@ static void gicv3_its_pre_save(void *opaque)
if (c->pre_save) {
c->pre_save(s);
}
return 0;
}
static int gicv3_its_post_load(void *opaque, int version_id)

File diff suppressed because it is too large Load Diff

View File

@@ -46,7 +46,7 @@ void pic_reset_common(PICCommonState *s)
/* Note: ELCR is not reset */
}
static void pic_dispatch_pre_save(void *opaque)
static int pic_dispatch_pre_save(void *opaque)
{
PICCommonState *s = opaque;
PICCommonClass *info = PIC_COMMON_GET_CLASS(s);
@@ -54,6 +54,8 @@ static void pic_dispatch_pre_save(void *opaque)
if (info->pre_save) {
info->pre_save(s);
}
return 0;
}
static int pic_dispatch_post_load(void *opaque, int version_id)

View File

@@ -102,7 +102,7 @@ void ioapic_reset_common(DeviceState *dev)
}
}
static void ioapic_dispatch_pre_save(void *opaque)
static int ioapic_dispatch_pre_save(void *opaque)
{
IOAPICCommonState *s = IOAPIC_COMMON(opaque);
IOAPICCommonClass *info = IOAPIC_COMMON_GET_CLASS(s);
@@ -110,6 +110,8 @@ static void ioapic_dispatch_pre_save(void *opaque)
if (info->pre_save) {
info->pre_save(s);
}
return 0;
}
static int ioapic_dispatch_post_load(void *opaque, int version_id)

View File

@@ -92,6 +92,16 @@ static int get_current_cpu(void);
#define RAVEN_MAX_TMR OPENPIC_MAX_TMR
#define RAVEN_MAX_IPI OPENPIC_MAX_IPI
/* KeyLargo */
#define KEYLARGO_MAX_CPU 4
#define KEYLARGO_MAX_EXT 64
#define KEYLARGO_MAX_IPI 4
#define KEYLARGO_MAX_IRQ (64 + KEYLARGO_MAX_IPI)
#define KEYLARGO_MAX_TMR 0
#define KEYLARGO_IPI_IRQ (KEYLARGO_MAX_EXT) /* First IPI IRQ */
/* Timers don't exist but this makes the code happy... */
#define KEYLARGO_TMR_IRQ (KEYLARGO_IPI_IRQ + KEYLARGO_MAX_IPI)
/* Interrupt definitions */
#define RAVEN_FE_IRQ (RAVEN_MAX_EXT) /* Internal functional IRQ */
#define RAVEN_ERR_IRQ (RAVEN_MAX_EXT + 1) /* Error IRQ */
@@ -120,6 +130,7 @@ static FslMpicInfo fsl_mpic_42 = {
#define VID_REVISION_1_3 3
#define VIR_GENERIC 0x00000000 /* Generic Vendor ID */
#define VIR_MPIC2A 0x00004614 /* IBM MPIC-2A */
#define GCR_RESET 0x80000000
#define GCR_MODE_PASS 0x00000000
@@ -329,6 +340,8 @@ typedef struct OpenPICState {
uint32_t nb_cpus;
/* Timer registers */
OpenPICTimer timers[OPENPIC_MAX_TMR];
uint32_t max_tmr;
/* Shared MSI registers */
OpenPICMSI msi[MAX_MSI];
uint32_t max_irq;
@@ -1715,6 +1728,28 @@ static void openpic_realize(DeviceState *dev, Error **errp)
return;
}
map_list(opp, list_le, &list_count);
break;
case OPENPIC_MODEL_KEYLARGO:
opp->nb_irqs = KEYLARGO_MAX_EXT;
opp->vid = VID_REVISION_1_2;
opp->vir = VIR_GENERIC;
opp->vector_mask = 0xFF;
opp->tfrr_reset = 4160000;
opp->ivpr_reset = IVPR_MASK_MASK | IVPR_MODE_MASK;
opp->idr_reset = 0;
opp->max_irq = KEYLARGO_MAX_IRQ;
opp->irq_ipi0 = KEYLARGO_IPI_IRQ;
opp->irq_tim0 = KEYLARGO_TMR_IRQ;
opp->brr1 = -1;
opp->mpic_mode_mask = GCR_MODE_MIXED;
if (opp->nb_cpus != 1) {
error_setg(errp, "Only UP supported today");
return;
}
map_list(opp, list_le, &list_count);
break;
}

View File

@@ -124,7 +124,7 @@ static void kvm_openpic_region_add(MemoryListener *listener,
uint64_t reg_base;
int ret;
if (section->address_space != &address_space_memory) {
if (section->fv != address_space_to_flatview(&address_space_memory)) {
abort();
}

View File

@@ -420,7 +420,7 @@ typedef struct KVMS390FLICStateMigTmp {
uint8_t nimm;
} KVMS390FLICStateMigTmp;
static void kvm_flic_ais_pre_save(void *opaque)
static int kvm_flic_ais_pre_save(void *opaque)
{
KVMS390FLICStateMigTmp *tmp = opaque;
KVMS390FLICState *flic = tmp->parent;
@@ -433,11 +433,13 @@ static void kvm_flic_ais_pre_save(void *opaque)
if (ioctl(flic->fd, KVM_GET_DEVICE_ATTR, &attr)) {
error_report("Failed to retrieve kvm flic ais states");
return;
return -EINVAL;
}
tmp->simm = ais.simm;
tmp->nimm = ais.nimm;
return 0;
}
static int kvm_flic_ais_post_load(void *opaque, int version_id)

View File

@@ -167,16 +167,17 @@ gicv3_redist_set_irq(uint32_t cpu, int irq, int level) "GICv3 redistributor 0x%x
gicv3_redist_send_sgi(uint32_t cpu, int irq) "GICv3 redistributor 0x%x pending SGI %d"
# hw/intc/armv7m_nvic.c
nvic_recompute_state(int vectpending, int exception_prio) "NVIC state recomputed: vectpending %d exception_prio %d"
nvic_set_prio(int irq, uint8_t prio) "NVIC set irq %d priority %d"
nvic_recompute_state(int vectpending, int vectpending_prio, int exception_prio) "NVIC state recomputed: vectpending %d vectpending_prio %d exception_prio %d"
nvic_recompute_state_secure(int vectpending, bool vectpending_is_s_banked, int vectpending_prio, int exception_prio) "NVIC state recomputed: vectpending %d is_s_banked %d vectpending_prio %d exception_prio %d"
nvic_set_prio(int irq, bool secure, uint8_t prio) "NVIC set irq %d secure-bank %d priority %d"
nvic_irq_update(int vectpending, int pendprio, int exception_prio, int level) "NVIC vectpending %d pending prio %d exception_prio %d: setting irq line to %d"
nvic_escalate_prio(int irq, int irqprio, int runprio) "NVIC escalating irq %d to HardFault: insufficient priority %d >= %d"
nvic_escalate_disabled(int irq) "NVIC escalating irq %d to HardFault: disabled"
nvic_set_pending(int irq, int en, int prio) "NVIC set pending irq %d (enabled: %d priority %d)"
nvic_clear_pending(int irq, int en, int prio) "NVIC clear pending irq %d (enabled: %d priority %d)"
nvic_set_pending(int irq, bool secure, int en, int prio) "NVIC set pending irq %d secure-bank %d (enabled: %d priority %d)"
nvic_clear_pending(int irq, bool secure, int en, int prio) "NVIC clear pending irq %d secure-bank %d (enabled: %d priority %d)"
nvic_set_pending_level(int irq) "NVIC set pending: irq %d higher prio than vectpending: setting irq line to 1"
nvic_acknowledge_irq(int irq, int prio) "NVIC acknowledge IRQ: %d now active (prio %d)"
nvic_complete_irq(int irq) "NVIC complete IRQ %d"
nvic_acknowledge_irq(int irq, int prio, bool targets_secure) "NVIC acknowledge IRQ: %d now active (prio %d targets_secure %d)"
nvic_complete_irq(int irq, bool secure) "NVIC complete IRQ %d (secure %d)"
nvic_set_irq_level(int irq, int level) "NVIC external irq %d level set to %d"
nvic_sysreg_read(uint64_t addr, uint32_t value, unsigned size) "NVIC sysreg read addr 0x%" PRIx64 " data 0x%" PRIx32 " size %u"
nvic_sysreg_write(uint64_t addr, uint32_t value, unsigned size) "NVIC sysreg write addr 0x%" PRIx64 " data 0x%" PRIx32 " size %u"

View File

@@ -241,7 +241,7 @@ static void icp_irq(ICSState *ics, int server, int nr, uint8_t priority)
}
}
static void icp_dispatch_pre_save(void *opaque)
static int icp_dispatch_pre_save(void *opaque)
{
ICPState *icp = opaque;
ICPStateClass *info = ICP_GET_CLASS(icp);
@@ -249,6 +249,8 @@ static void icp_dispatch_pre_save(void *opaque)
if (info->pre_save) {
info->pre_save(icp);
}
return 0;
}
static int icp_dispatch_post_load(void *opaque, int version_id)
@@ -533,7 +535,7 @@ static void ics_simple_reset(void *dev)
}
}
static void ics_simple_dispatch_pre_save(void *opaque)
static int ics_simple_dispatch_pre_save(void *opaque)
{
ICSState *ics = opaque;
ICSStateClass *info = ICS_BASE_GET_CLASS(ics);
@@ -541,6 +543,8 @@ static void ics_simple_dispatch_pre_save(void *opaque)
if (info->pre_save) {
info->pre_save(ics);
}
return 0;
}
static int ics_simple_dispatch_post_load(void *opaque, int version_id)

View File

@@ -386,6 +386,8 @@ static void pc87312_class_init(ObjectClass *klass, void *data)
dc->reset = pc87312_reset;
dc->vmsd = &vmstate_pc87312;
dc->props = pc87312_properties;
/* Reason: Uses parallel_hds[0] in realize(), so it can't be used twice */
dc->user_creatable = false;
}
static const TypeInfo pc87312_type_info = {

View File

@@ -59,3 +59,4 @@ obj-$(CONFIG_HYPERV_TESTDEV) += hyperv_testdev.o
obj-$(CONFIG_AUX) += auxbus.o
obj-$(CONFIG_ASPEED_SOC) += aspeed_scu.o aspeed_sdmc.o
obj-y += mmio_interface.o
obj-$(CONFIG_MSF2) += msf2-sysreg.o

Some files were not shown because too many files have changed in this diff Show More