Compare commits

..

1629 Commits

Author SHA1 Message Date
xiaoqiang zhao
07b9098dfc hw/audio: QOM'ify milkymist-ac97.c
* Drop the old SysBus init function and use instance_init
* Move AUD_open_in / AUD_open_out function into realize stage

Acked-by: Michael Walle <michael@walle.cc>
Tested-by: Michael Walle <michael@walle.cc>
Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Message-id: 1463111220-30335-5-git-send-email-zxq_yx_007@163.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-06-03 11:13:38 +02:00
xiaoqiang zhao
bda8d9b8b1 hw/audio: QOM'ify intel-hda
* use DeviceClass::realize instead of DeviceClass::init

Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Message-id: 1463111220-30335-4-git-send-email-zxq_yx_007@163.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-06-03 11:13:38 +02:00
xiaoqiang zhao
e19202af79 hw/audio: QOM cleanup for intel-hda
drop the DO_UPCAST macro

Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Message-id: 1463111220-30335-3-git-send-email-zxq_yx_007@163.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-06-03 11:13:38 +02:00
xiaoqiang zhao
ff2df541bb hw/audio: QOM'ify cs4231.c
Drop the old SysBus init function and use instance_init

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Message-id: 1463111220-30335-2-git-send-email-zxq_yx_007@163.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-06-03 11:13:38 +02:00
Peter Krempa
e58ff62d58 audio: pa: Set volume of recording stream instead of recording device
Since pulseaudio 1.0 it's possible to set the individual stream volume
rather than setting the device volume. With this, setting hardware mixer
of a emulated sound card doesn't mess up the volume configuration of the
host.

A side effect is that this limits compatible pulseaudio version to 1.0
which was released on 2011-09-27.

Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 78853815be2069971b89b3a2e3181837064dd8f3.1462962512.git.pkrempa@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-06-03 11:13:38 +02:00
Peter Maydell
2c107d7684 Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging
# gpg: Signature made Thu 02 Jun 2016 07:23:18 BST using RSA key ID 398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F  3562 EF04 965B 398D 6211

* remotes/jasowang/tags/net-pull-request: (31 commits)
  Add ENET device to i.MX6 SOC.
  Add ENET/Gbps Ethernet support to FEC device
  i.MX: move FEC device to a register array structure.
  i.MX: Rename i.MX FEC defines to ENET_XXX
  i.MX: reset TX/RX descriptors when FEC is disabled.
  i.MX: Fix FEC code for ECR register reset value.
  i.MX: Fix FEC code for MDIO address selection
  i.MX: Fix FEC code for MDIO operation selection
  net: handle optional VLAN header in checksum computation.
  net: improve UDP/TCP checksum computation.
  e1000e: Introduce qtest for e1000e device
  net: Introduce e1000e device emulation
  e1000: Move out code that will be reused in e1000e
  e1000_regs: Add definitions for Intel 82574-specific bits
  vmxnet3: Use pci_dma_* API instead of cpu_physical_memory_*
  net_pkt: Extend packet abstraction as required by e1000e functionality
  rtl8139: Move more TCP definitions to common header
  net_pkt: Name vmxnet3 packet abstractions more generic
  vmxnet3: Use common MAC address tracing macros
  net: Add macros for MAC address tracing
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-06-02 14:26:57 +01:00
Peter Maydell
cbd614870f Merge remote-tracking branch 'remotes/famz/tags/pull-docker-20160601' into staging
v2: Fix warning due to include.
    Various temp dir/file changes.
    Don't use "find -executable" to be compatible with Mac.

# gpg: Signature made Wed 01 Jun 2016 10:30:33 BST using RSA key ID 6A9171C6
# gpg: Good signature from "Fam Zheng <famz@redhat.com>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 5003 7CB7 9706 0F76 F021  AD56 CA35 624C 6A91 71C6

* remotes/famz/tags/pull-docker-20160601:
  .gitignore: Ignore docker source copy
  MAINTAINERS: Add tests/docker
  docker: Add EXTRA_CONFIGURE_OPTS
  docs: Add text for tests/docker in build-system.txt
  docker: Add travis tool
  docker: Add mingw test
  docker: Add clang test
  docker: Add full test
  docker: Add quick test
  docker: Add common.rc
  docker: Add test runner
  docker: Add images
  Makefile: Rules for docker testing
  Makefile: Always include rules.mak
  rules.mak: Add "COMMA" constant
  tests: Add utilities for docker testing

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-06-02 13:42:52 +01:00
Jean-Christophe Dubois
517b5e9a17 Add ENET device to i.MX6 SOC.
This adds the ENET device to the i.MX6 SOC.

This was tested by booting Linux on an Qemu i.MX6 instance and accessing
the internet from the linux guest.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:46 +08:00
Jean-Christophe Dubois
a699b410d7 Add ENET/Gbps Ethernet support to FEC device
The ENET device (present in i.MX6) is "derived" from FEC and backward
compatible with it.

This patch adds the necessary support of the added feature in the ENET
device to allow Linux to use it (on supported processors).

Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:46 +08:00
Jean-Christophe Dubois
db0de35268 i.MX: move FEC device to a register array structure.
This is to prepare for the ENET Gb device of the i.MX6.

Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:46 +08:00
Jean-Christophe Dubois
1bb3c37182 i.MX: Rename i.MX FEC defines to ENET_XXX
Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:46 +08:00
Jean-Christophe Dubois
ff4b325f5e i.MX: reset TX/RX descriptors when FEC is disabled.
According to the FEC chapter of i.MX25 reference manual

RX adn TX descriptors are reseted when the FEC device is disabled through ECR.

Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:46 +08:00
Jean-Christophe Dubois
ccdb81d327 i.MX: Fix FEC code for ECR register reset value.
According to the FEC chapter of i.MX25 reference manual ECR register is
initialized at 0xf0000000 at reset time.

We fix the value.

Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:46 +08:00
Jean-Christophe Dubois
b413643a5c i.MX: Fix FEC code for MDIO address selection
According to the FEC chapter of i.MX25 reference manual

When writing to MMFR register, the MDIO device and adress are selected by
bit 27 to 23 and bit 22 to 18 respectively. This is a total of 10 bits
that need to be used by the Phy chip/address decoding function.

This patch fixes the number of bits used from 9 to 10.

Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:46 +08:00
Jean-Christophe Dubois
4816dc168b i.MX: Fix FEC code for MDIO operation selection
According to the FEC chapter of i.MX25 reference manual

When writing the MMFR register, bit 29 and 28 select the requested operation.
 * 10 means read operation with valid MII mgmt frame
 * 11 means read operation with non compliant MII mgmt frame
 * 01 means write operation with valid MII mgmt frame
 * 00 means write operation with non compliant MII mgmt frame

So while bit 28 does change beween read/write for valid MII mgmt frame, the
mening is inverted for non compliant MII mgmt frame.

Bit 29 on the other hand means read/write whatever the type of mgmt frame
involved.

So this patch change the operation selection from bit 28 to bit 29 as it is
more generic.

Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:46 +08:00
Jean-Christophe Dubois
ade6bad111 net: handle optional VLAN header in checksum computation.
Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:46 +08:00
Jean-Christophe Dubois
50dbce6538 net: improve UDP/TCP checksum computation.
* based on Eth, UDP, TCP struct present in eth.h instead of hardcoded
   indexes and sizes.
 * based on various macros present in eth.h.

Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:30 +08:00
Dmitry Fleytman
7c375e2294 e1000e: Introduce qtest for e1000e device
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:30 +08:00
Dmitry Fleytman
6f3fbe4ed0 net: Introduce e1000e device emulation
This patch introduces emulation for the Intel 82574 adapter, AKA e1000e.

This implementation is derived from the e1000 emulation code, and
utilizes the TX/RX packet abstractions that were initially developed for
the vmxnet3 device. Although some parts of the introduced code may be
shared with e1000, the differences are substantial enough so that the
only shared resources for the two devices are the definitions in
hw/net/e1000_regs.h.

Similarly to vmxnet3, the new device uses virtio headers for task
offloads (for backends that support virtio extensions). Usage of
virtio headers may be forcibly disabled via a boolean device property
"vnet" (which is enabled by default). In such case task offloads
will be performed in software, in the same way it is done on
backends that do not support virtio headers.

The device code is split into two parts:

  1. hw/net/e1000e.c: QEMU-specific code for a network device;
  2. hw/net/e1000e_core.[hc]: Device emulation according to the spec.

The new device name is e1000e.

Intel specifications for the 82574 controller are available at:
http://www.intel.com/content/dam/doc/datasheet/82574l-gbe-controller-datasheet.pdf

Throughput measurement results (iperf2):

                Fedora 22 guest, TCP, RX
    4 ++------------------------------------------+
      |                                           |
      |                           X   X   X   X   X
  3.5 ++          X   X   X   X                   |
      |       X                                   |
      |                                           |
    3 ++                                          |
G     |   X                                       |
b     |                                           |
/ 2.5 ++                                          |
s     |                                           |
      |                                           |
    2 ++                                          |
      |                                           |
      |                                           |
  1.5 X+                                          |
      |                                           |
      +   +   +   +   +   +   +   +   +   +   +   +
    1 ++--+---+---+---+---+---+---+---+---+---+---+
     32  64  128 256 512  1   2   4   8  16  32  64
      B   B   B   B   B   KB  KB  KB  KB KB  KB  KB
                       Buffer size

               Fedora 22 guest, TCP, TX
  18 ++-------------------------------------------+
     |                        X                   |
  16 ++                           X   X   X   X   X
     |                   X                        |
  14 ++                                           |
     |                                            |
  12 ++                                           |
G    |               X                            |
b 10 ++                                           |
/    |                                            |
s  8 ++                                           |
     |                                            |
   6 ++          X                                |
     |                                            |
   4 ++                                           |
     |       X                                    |
   2 ++  X                                        |
     X   +   +   +   +   +    +   +   +   +   +   +
   0 ++--+---+---+---+---+----+---+---+---+---+---+
    32  64  128 256 512  1    2   4   8  16  32  64
     B   B   B   B   B   KB   KB  KB  KB KB  KB  KB
                       Buffer size

                Fedora 22 guest, UDP, RX
    3 ++------------------------------------------+
      |                                           X
      |                                           |
  2.5 ++                                          |
      |                                           |
      |                                           |
    2 ++                                 X        |
G     |                                           |
b     |                                           |
/ 1.5 ++                                          |
s     |                         X                 |
      |                                           |
    1 ++                                          |
      |                                           |
      |                 X                         |
  0.5 ++                                          |
      |        X                                  |
      X        +        +       +        +        +
    0 ++-------+--------+-------+--------+--------+
     32       64       128     256      512       1
      B        B         B       B        B      KB
                       Datagram size

                Fedora 22 guest, UDP, TX
    1 ++------------------------------------------+
      |                                           X
  0.9 ++                                          |
      |                                           |
  0.8 ++                                          |
  0.7 ++                                          |
      |                                           |
G 0.6 ++                                          |
b     |                                           |
/ 0.5 ++                                          |
s     |                                  X        |
  0.4 ++                                          |
      |                                           |
  0.3 ++                                          |
  0.2 ++                        X                 |
      |                                           |
  0.1 ++                X                         |
      X        X        +       +        +        +
    0 ++-------+--------+-------+--------+--------+
     32       64       128     256      512       1
      B        B         B       B        B      KB
                       Datagram size

              Windows 2012R2 guest, TCP, RX
  3.2 ++------------------------------------------+
      |                                   X       |
    3 ++                                          |
      |                                           |
  2.8 ++                                          |
      |                                           |
  2.6 ++                              X           |
G     |   X                   X   X           X   X
b 2.4 ++      X       X                           |
/     |                                           |
s 2.2 ++                                          |
      |                                           |
    2 ++                                          |
      |           X       X                       |
  1.8 ++                                          |
      |                                           |
  1.6 X+                                          |
      +   +   +   +   +   +   +   +   +   +   +   +
  1.4 ++--+---+---+---+---+---+---+---+---+---+---+
     32  64  128 256 512  1   2   4   8  16  32  64
      B   B   B   B   B   KB  KB  KB  KB KB  KB  KB
                       Buffer size

             Windows 2012R2 guest, TCP, TX
  14 ++-------------------------------------------+
     |                                            |
     |                                        X   X
  12 ++                                           |
     |                                            |
  10 ++                                           |
     |                                            |
G    |                                            |
b  8 ++                                           |
/    |                                    X       |
s  6 ++                                           |
     |                                            |
     |                                            |
   4 ++                               X           |
     |                                            |
   2 ++                                           |
     |           X   X            X               |
     +   X   X   +   +   X    X   +   +   +   +   +
   0 X+--+---+---+---+---+----+---+---+---+---+---+
    32  64  128 256 512  1    2   4   8  16  32  64
     B   B   B   B   B   KB   KB  KB  KB KB  KB  KB
                       Buffer size

              Windows 2012R2 guest, UDP, RX
  1.6 ++------------------------------------------X
      |                                           |
  1.4 ++                                          |
      |                                           |
  1.2 ++                                          |
      |                                  X        |
      |                                           |
G   1 ++                                          |
b     |                                           |
/ 0.8 ++                                          |
s     |                                           |
  0.6 ++                        X                 |
      |                                           |
  0.4 ++                                          |
      |                 X                         |
      |                                           |
  0.2 ++       X                                  |
      X        +        +       +        +        +
    0 ++-------+--------+-------+--------+--------+
     32       64       128     256      512       1
      B        B         B       B        B      KB
                       Datagram size

              Windows 2012R2 guest, UDP, TX
  0.6 ++------------------------------------------+
      |                                           X
      |                                           |
  0.5 ++                                          |
      |                                           |
      |                                           |
  0.4 ++                                          |
G     |                                           |
b     |                                           |
/ 0.3 ++                                 X        |
s     |                                           |
      |                                           |
  0.2 ++                                          |
      |                                           |
      |                         X                 |
  0.1 ++                                          |
      |                 X                         |
      X        X        +       +        +        +
    0 ++-------+--------+-------+--------+--------+
     32       64       128     256      512       1
      B        B         B       B        B      KB
                       Datagram size

Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:29 +08:00
Dmitry Fleytman
093454e21d e1000: Move out code that will be reused in e1000e
Code that will be shared moved to a separate files.

Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:29 +08:00
Dmitry Fleytman
06e7fa0ad7 e1000_regs: Add definitions for Intel 82574-specific bits
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:29 +08:00
Dmitry Fleytman
111710107d vmxnet3: Use pci_dma_* API instead of cpu_physical_memory_*
To make this device and network packets
abstractions ready for IOMMU.

Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:28 +08:00
Dmitry Fleytman
eb700029c7 net_pkt: Extend packet abstraction as required by e1000e functionality
This patch extends the TX/RX packet abstractions with features that will
be used by the e1000e device implementation.

Changes are:

  1. Support iovec lists for RX buffers
  2. Deeper RX packets parsing
  3. Loopback option for TX packets
  4. Extended VLAN headers handling
  5. RSS processing for RX packets

Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:28 +08:00
Dmitry Fleytman
66409b7c8b rtl8139: Move more TCP definitions to common header
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:28 +08:00
Dmitry Fleytman
605d52e62f net_pkt: Name vmxnet3 packet abstractions more generic
This patch drops "vmx" prefix from packet abstractions names
to emphasize the fact they are generic and not tied to any
specific network device.

These abstractions will be reused by e1000e emulation implementation
introduced by following patches so their names need generalization.

This patch (except renamed files, adjusted comments and changes in MAINTAINTERS)
was produced by:

git grep -lz 'vmxnet_tx_pkt' | xargs -0 perl -i'' -pE "s/vmxnet_tx_pkt/net_tx_pkt/g"
git grep -lz 'vmxnet_rx_pkt' | xargs -0 perl -i'' -pE "s/vmxnet_rx_pkt/net_rx_pkt/g"
git grep -lz 'VmxnetTxPkt' | xargs -0 perl -i'' -pE "s/VmxnetTxPkt/NetTxPkt/g"
git grep -lz 'VMXNET_TX_PKT' | xargs -0 perl -i'' -pE "s/VMXNET_TX_PKT/NET_TX_PKT/g"
git grep -lz 'VmxnetRxPkt' | xargs -0 perl -i'' -pE "s/VmxnetRxPkt/NetRxPkt/g"
git grep -lz 'VMXNET_RX_PKT' | xargs -0 perl -i'' -pE "s/VMXNET_RX_PKT/NET_RX_PKT/g"
sed -ie 's/VMXNET_/NET_/g' hw/net/vmxnet_rx_pkt.c
sed -ie 's/VMXNET_/NET_/g' hw/net/vmxnet_tx_pkt.c

Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:27 +08:00
Dmitry Fleytman
ab64787201 vmxnet3: Use common MAC address tracing macros
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:27 +08:00
Dmitry Fleytman
6d1d4939a6 net: Add macros for MAC address tracing
These macros will be used by future commits introducing
e1000e device emulation and by vmxnet3 tracing code.

Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:27 +08:00
Dmitry Fleytman
0478d1ddae net: Introduce Toeplitz hash calculator
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:27 +08:00
Dmitry Fleytman
a4b387e623 vmxnet3: Use generic function for DSN capability definition
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:26 +08:00
Dmitry Fleytman
b56b9285e4 pcie: Introduce function for DSN capability creation
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:26 +08:00
Dmitry Fleytman
6383292ac8 pcie: Add support for PCIe CAP v1
Added support for PCIe CAP v1, while reusing some of the existing v2
infrastructure.

Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:26 +08:00
Dmitry Fleytman
83f17ed278 pci: Introduce define for PM capability version 1.1
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:09 +08:00
Dmitry Fleytman
3bdfaabbcf msix: make msix_clr_pending() visible for clients
This function will be used by e1000e device code.

Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:42:09 +08:00
Dmitry Fleytman
059a65f3ad pci: fix unaligned access in pci_xxx_quad()
Replace legacy cpu_to_le64w()/le64_to_cpup()
calls with stq_le_p()/ldq_le_p().

Motivation for this modification is that
follow up patches add utility function
pcie_dev_ser_num_init() for PCIe DSN
capability creation which uses
pci_set_quad() with a misaligned offset.

Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02 10:16:53 +08:00
Fam Zheng
0bc7a6f307 .gitignore: Ignore docker source copy
Signed-off-by: Fam Zheng <famz@redhat.com>
2016-06-01 17:27:35 +08:00
Fam Zheng
8a49e97f45 MAINTAINERS: Add tests/docker
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1464755128-32490-16-git-send-email-famz@redhat.com
2016-06-01 17:27:35 +08:00
Fam Zheng
35e0f959b5 docker: Add EXTRA_CONFIGURE_OPTS
Whatever passed in this variable will be appended to all
configure commands.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 1464755128-32490-15-git-send-email-famz@redhat.com
2016-06-01 17:27:35 +08:00
Fam Zheng
dc2e7eebd8 docs: Add text for tests/docker in build-system.txt
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1464755128-32490-14-git-send-email-famz@redhat.com
2016-06-01 17:27:35 +08:00
Fam Zheng
d5bd789198 docker: Add travis tool
The script is not prefixed with test- so it won't run with "make docker-test",
because it can take too long.

Run it with "make docker-travis@ubuntu".

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1464755128-32490-13-git-send-email-famz@redhat.com
2016-06-01 17:27:35 +08:00
Fam Zheng
c4f0eed1f3 docker: Add mingw test
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1464755128-32490-12-git-send-email-famz@redhat.com
2016-06-01 17:27:35 +08:00
Fam Zheng
c8908570dc docker: Add clang test
The (currently partially commented out) configure options are suggested
by John Snow <jsnow@redhat.com>.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 1464755128-32490-11-git-send-email-famz@redhat.com
2016-06-01 17:27:35 +08:00
Fam Zheng
d710ac871c docker: Add full test
This builds all available targets.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 1464755128-32490-10-git-send-email-famz@redhat.com
2016-06-01 17:27:35 +08:00
Fam Zheng
b7899d63c8 docker: Add quick test
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1464755128-32490-9-git-send-email-famz@redhat.com
2016-06-01 17:27:35 +08:00
Fam Zheng
3568f98ca5 docker: Add common.rc
"requires" checks the "FEATURE" environment for specified prerequisits,
and skip the execution of test if not found.

"build_qemu" is the central routine to compile QEMU for tests to call.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1464755128-32490-8-git-send-email-famz@redhat.com
2016-06-01 17:27:35 +08:00
Fam Zheng
b344aa9132 docker: Add test runner
It's better to have a launcher for all tests, to make it easier to
initialize and manage the environment.

If "DEBUG=1"  a shell prompt will show up before the test runs.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1464755128-32490-7-git-send-email-famz@redhat.com
2016-06-01 17:27:35 +08:00
Fam Zheng
ca853f0c76 docker: Add images
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1464755128-32490-6-git-send-email-famz@redhat.com
2016-06-01 17:27:35 +08:00
Fam Zheng
324027c24c Makefile: Rules for docker testing
This adds a group of make targets to run docker tests, all are available
in source tree without running ./configure.

The usage is shown with "make docker".

Besides the fixed ones, dynamic targets for building each image and
running each test in each image are generated automatically by make,
scanning $(SRC_PATH)/tests/docker/ files with specific patterns.

Alternative to manually list particular targets (docker-TEST@IMAGE)
set, you can control which tests/images to run by filtering variables,
TESTS= and IMAGES=, which are expressed in Makefile pattern syntax,
"foo% %bar ...". For example:

    $ make docker-test IMAGES="ubuntu fedora"

Unfortunately, it's impossible to propagate "-j $JOBS" into make in
containers, however since each combination is made a first class target
in the top Makefile, "make -j$N docker-test" still parallels the tests
coarsely.

Still, $J is made a magic variable to let all make invocations in
containers to use -j$J.

Instead of providing a live version of the source tree to the docker
container we snapshot it with git-archive. This ensures the tree is in a
pristine state for whatever operations the container is going to run on
them.

Uncommitted changes known to files known by the git index will be
included in the snapshot if there are any.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 1464755128-32490-5-git-send-email-famz@redhat.com
2016-06-01 17:27:34 +08:00
Fam Zheng
fb57c88102 Makefile: Always include rules.mak
When config-host.mak is not found it is safe to assume SRC_PATH is ".".
So, it is okay to move inclusion of ruls.mak out of the ifeq condition.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1464755128-32490-4-git-send-email-famz@redhat.com
2016-06-01 17:25:50 +08:00
Fam Zheng
2f4e4dc237 rules.mak: Add "COMMA" constant
Using "," literal in $(call quiet-command, ...) arguments is awkward.
Add this constant to make it at least doable.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1464755128-32490-3-git-send-email-famz@redhat.com
2016-06-01 17:25:50 +08:00
Fam Zheng
4485b04be9 tests: Add utilities for docker testing
docker.py is added with a number of useful subcommands to manager docker
images and instances for QEMU docker testing. Subcommands are:

run: A wrapper of "docker run" (or "sudo -n docker run" if necessary),
which takes care of killing and removing the running container at
SIGINT.

clean: Tear down all the containers including inactive ones that are
started by docker_run.

build: Compare an image from given dockerfile and rebuild it if they're
different.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1464755128-32490-2-git-send-email-famz@redhat.com
2016-06-01 17:25:50 +08:00
Zhang Chen
16a3df403b net/net: Add SocketReadState for reuse codes
This function is from net/socket.c, move it to net.c and net.h.
Add SocketReadState to make others reuse net_fill_rstate().
suggestion from jason.

v4:
 - move 'rs->finalize = finalize' to rs_init()

v3:
 - remove SocketReadState init callback
 - put finalize callback to net_fill_rstate()

v2:
 - rename ReadState to SocketReadState
 - add SocketReadState init and finalize callback

v1:
 - init patch

Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-01 09:25:29 +08:00
Eduardo Habkost
d30300f771 net: vl: Move default_net to vl.c
All handling of defaults (default_* variables) is inside vl.c,
move default_net there too, so we can more easily refactor that
code later.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-01 09:25:29 +08:00
Peter Maydell
500acc9c41 Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.7-20160531' into staging
ppc patch queue for 2016-05-31

Here's another ppc patch queue.  This batch is all preliminaries
towards two significant features:

1) Full hypervisor-mode support for POWER8
    Patches 1-8 start fixing various bugs with TCG's handling of
    hypervisor mode

2) CPU hotplug support
    Patches 9-12 make some preliminary fixes towards implementing CPU
    hotplug on ppc64 (and other non-x86 platforms).  These patches are
    actually to generic code, not ppc, but are included here with
    Paolo's ACK.

# gpg: Signature made Tue 31 May 2016 01:39:44 BST using RSA key ID 20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-2.7-20160531:
  cpu: Add a sync version of cpu_remove()
  cpu: Reclaim vCPU objects
  exec: Do vmstate unregistration from cpu_exec_exit()
  exec: Remove cpu from cpus list during cpu_exec_exit()
  ppc: Add PPC_64H instruction flag to POWER7 and POWER8
  ppc: Get out of emulation on SMT "OR" ops
  ppc: Fix sign extension issue in mtmsr(d) emulation
  ppc: Change 'invalid' bit mask of tlbiel and tlbie
  ppc: tlbie, tlbia and tlbisync are HV only
  ppc: Do some batching of TCG tlb flushes
  ppc: Use split I/D mmu modes to avoid flushes on interrupts
  ppc: Remove MMU_MODEn_SUFFIX definitions

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-31 10:37:22 +01:00
Peter Maydell
07e070aac4 Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
* docs/atomics fixes and atomic_rcu_* optimization (Emilio)
* NBD bugfix (Eric)
* Memory fixes and cleanups (Paolo, Paul)
* scsi-block support for SCSI status, including persistent
  reservations (Paolo)
* kvm_stat moves to the Linux repository
* SCSI bug fixes (Peter, Prasad)
* Killing qemu_char_get_next_serial, non-ARM parts (Xiaoqiang)

# gpg: Signature made Sun 29 May 2016 08:11:20 BST using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"

* remotes/bonzini/tags/for-upstream: (30 commits)
  exec: hide mr->ram_addr from qemu_get_ram_ptr users
  memory: split memory_region_from_host from qemu_ram_addr_from_host
  exec: remove ram_addr argument from qemu_ram_block_from_host
  memory: remove qemu_get_ram_fd, qemu_set_ram_fd, qemu_ram_block_host_ptr
  scsi-generic: Merge block max xfer len in INQUIRY response
  scsi-block: always use SG_IO
  scsi-disk: introduce scsi_disk_req_check_error
  scsi-disk: add need_fua_emulation to SCSIDiskClass
  scsi-disk: introduce dma_readv and dma_writev
  scsi-disk: introduce a common base class
  xen-hvm: ignore background I/O sections
  docs/atomics: update comparison with Linux
  atomics: do not emit consume barrier for atomic_rcu_read
  atomics: emit an smp_read_barrier_depends() barrier only for Alpha and Thread Sanitizer
  docs/atomics: update atomic_read/set comparison with Linux
  bt: rewrite csrhci_write to avoid out-of-bounds writes
  block/iscsi: avoid potential overflow of acb->task->cdb
  scsi: megasas: check 'read_queue_head' index value
  scsi: megasas: initialise local configuration data buffer
  scsi: megasas: use appropriate property buffer size
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-31 09:29:23 +01:00
Bharata B Rao
2c579042e3 cpu: Add a sync version of cpu_remove()
This sync API will be used by the CPU hotplug code to wait for the CPU to
completely get removed before flagging the failure to the device_add
command.

Sync version of this call is needed to correctly recover from CPU
realization failures when ->plug() handler fails.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-30 14:17:05 +10:00
Gu Zheng
4c055ab54f cpu: Reclaim vCPU objects
In order to deal well with the kvm vcpus (which can not be removed without any
protection), we do not close KVM vcpu fd, just record and mark it as stopped
into a list, so that we can reuse it for the appending cpu hot-add request if
possible. It is also the approach that kvm guys suggested:
https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html

Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
               [- Explicit CPU_REMOVE() from qemu_kvm/tcg_destroy_vcpu()
                  isn't needed as it is done from cpu_exec_exit()
                - Use iothread mutex instead of global mutex during
                  destroy
                - Don't cleanup vCPU object from vCPU thread context
                  but leave it to the callers (device_add/device_del)]
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-30 14:03:59 +10:00
Bharata B Rao
9dfeca7c6b exec: Do vmstate unregistration from cpu_exec_exit()
cpu_exec_init() does vmstate_register for the CPU device. This needs to be
undone from cpu_exec_exit(). This change is needed to support CPU hot
removal.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
[dwg: added missing include to fix compile on some archs]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-30 14:03:29 +10:00
Bharata B Rao
1c59eb39cf exec: Remove cpu from cpus list during cpu_exec_exit()
CPUState *cpu gets added to the cpus list during cpu_exec_init(). It
should be removed from cpu_exec_exit().

cpu_exec_exit() is called from generic CPU::instance_finalize and some
archs like PowerPC call it from CPU unrealizefn. So ensure that we
dequeue the cpu only once.

Now -1 value for cpu->cpu_index indicates that we have already dequeued
the cpu for CONFIG_USER_ONLY case also.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-30 13:22:20 +10:00
Benjamin Herrenschmidt
4e0806110c ppc: Add PPC_64H instruction flag to POWER7 and POWER8
This will enable decoding of hrfid

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-30 13:20:04 +10:00
Benjamin Herrenschmidt
b68e60e6f0 ppc: Get out of emulation on SMT "OR" ops
Otherwise tight loops at smt_low for example, which OPAL does,
eat so much CPU that we can't boot a kernel anymore. With that,
I can boot 8 CPUs just fine with powernv.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-30 13:20:04 +10:00
Michael Neuling
c409bc5daf ppc: Fix sign extension issue in mtmsr(d) emulation
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-30 13:20:04 +10:00
Benjamin Herrenschmidt
f9ef0527ff ppc: Change 'invalid' bit mask of tlbiel and tlbie
Otherwise it will trip on the forms used in recent architecture.

Ideally, we should have different handlers for different architecture
levels but our current implementation of TLB flushing is dumb enough
that this will do for now.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-30 13:20:04 +10:00
Benjamin Herrenschmidt
74693da988 ppc: tlbie, tlbia and tlbisync are HV only
Not that anything remotely recent supports tlbia but ...

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-30 13:20:04 +10:00
Benjamin Herrenschmidt
cd0c6f4735 ppc: Do some batching of TCG tlb flushes
On ppc64 especially, we flush the tlb on any slbie or tlbie instruction.

However, those instructions often come in bursts of 3 or more (context
switch will favor a series of slbie's for example to an slbia if the
SLB has less than a certain number of entries in it, and tlbie's can
happen in a series, with PAPR, H_BULK_REMOVE can remove up to 4 entries
at a time.

Doing a tlb_flush() each time is a waste of time. We end up doing a memset
of the whole TLB, reloading it for the next instruction, memset'ing again,
etc...

Those instructions don't have to take effect immediately. For slbie, they
can wait for the next context synchronizing event. For tlbie, the next
tlbsync.

This implements batching by keeping a flag that indicates that we have a
TLB in need of flushing. We check it on interrupts, rfi's, isync's and
tlbsync and flush the TLB if needed.

This reduces the number of tlb_flush() on a boot to a ubuntu installer
first dialog screen from roughly 360K down to 36K.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[clg: added a 'CPUPPCState *' variable in h_remove() and
      h_bulk_remove() ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
[dwg: removed spurious whitespace change, use 0/1 not true/false
      consistently, since tlb_need_flush has int type]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-30 13:20:04 +10:00
Benjamin Herrenschmidt
9fb0449114 ppc: Use split I/D mmu modes to avoid flushes on interrupts
We rework the way the MMU indices are calculated, providing separate
indices for I and D side based on MSR:IR and MSR:DR respectively,
and thus no longer need to flush the TLB on context changes. This also
adds correct support for HV as a separate address space.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-30 13:20:04 +10:00
Benjamin Herrenschmidt
5fd1111b20 ppc: Remove MMU_MODEn_SUFFIX definitions
We don't use the resulting accessors and this gets in the way of
the split I/D TLB work.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-30 13:20:04 +10:00
Paolo Bonzini
0878d0e11b exec: hide mr->ram_addr from qemu_get_ram_ptr users
Let users of qemu_get_ram_ptr and qemu_ram_ptr_length pass in an
address that is relative to the MemoryRegion.  This basically means
what address_space_translate returns.

Because the semantics of the second parameter change, rename the
function to qemu_map_ram_ptr.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:12 +02:00
Paolo Bonzini
07bdaa4196 memory: split memory_region_from_host from qemu_ram_addr_from_host
Move the old qemu_ram_addr_from_host to memory_region_from_host and
make it return an offset within the region.  For qemu_ram_addr_from_host
return the ram_addr_t directly, similar to what it was before
commit 1b5ec23 ("memory: return MemoryRegion from qemu_ram_addr_from_host",
2013-07-04).

Reviewed-by: Marc-André Lureau <marcandre.lureau@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:12 +02:00
Paolo Bonzini
f615f39616 exec: remove ram_addr argument from qemu_ram_block_from_host
Of the two callers, one does not use it, and the other can compute
it itself based on the other output argument (offset) and the RAMBlock.

Reviewed-by: Marc-André Lureau <marcandre.lureau@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:12 +02:00
Paolo Bonzini
4ff87573df memory: remove qemu_get_ram_fd, qemu_set_ram_fd, qemu_ram_block_host_ptr
Remove direct uses of ram_addr_t and optimize memory_region_{get,set}_fd
now that a MemoryRegion knows its RAMBlock directly.

Reviewed-by: Marc-André Lureau <marcandre.lureau@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:12 +02:00
Fam Zheng
063143d5b1 scsi-generic: Merge block max xfer len in INQUIRY response
The rationale is similar to the above mode sense response interception:
this is practically the only channel to communicate restraints from
elsewhere such as host and block driver.

The scsi bus we attach onto can have a larger max xfer len than what is
accepted by the host file system (guarding between the host scsi LUN and
QEMU), in which case the SG_IO we generate would get -EINVAL.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <1464243305-10661-3-git-send-email-famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:12 +02:00
Paolo Bonzini
8fdc7839e4 scsi-block: always use SG_IO
Using pread/pwrite or io_submit has the advantage of eliminating the
bounce buffer, but drops the SCSI status.  This keeps the guest from
seeing unit attention codes, as well as statuses such as RESERVATION
CONFLICT.  Because we know scsi-block operates on an SBC device we can
still use the DMA helpers with SG_IO; just remember to patch the CDBs
if the transfer is split into multiple segments.

This means that scsi-block will always use the thread-pool unfortunately,
instead of respecting aio=native.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:11 +02:00
Paolo Bonzini
5b956f415a scsi-disk: introduce scsi_disk_req_check_error
Commonize all the checks for canceled requests and errors.  The next patch
will add another case to check for, in order to handle passthrough commands.

There is no semantic change here; the only nontrivial modification is in
scsi_write_do_fua, where cancellation has been checked earlier by both
callers.  Thus, the check is replaced with an assertion.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:11 +02:00
Paolo Bonzini
94f8ba1125 scsi-disk: add need_fua_emulation to SCSIDiskClass
scsi-block will be able to do FUA just by passing the request through
to the LUN (which is also more efficient); there is no need to emulate
it like we do for scsi-disk.

Add a new method to distinguish this.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:11 +02:00
Paolo Bonzini
fcaafb1001 scsi-disk: introduce dma_readv and dma_writev
These are replacements for blk_aio_readv and blk_aio_writev that allow
customization of the data path.  They reuse the DMA helpers' DMAIOFunc
callback type, so that the same function can be used in either the
QEMUSGList or the bounce-buffered case.

This customization will be needed in the next patch to do zero-copy
SG_IO on scsi-block.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:11 +02:00
Paolo Bonzini
993935f315 scsi-disk: introduce a common base class
This will be the place to add DMAIOFuncs in the next patch.  There
are also a couple DeviceClass members that can be moved to the
abstract class's initialization function.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:11 +02:00
Paul Durrant
a8ff431679 xen-hvm: ignore background I/O sections
Since Xen will correctly handle accesses to unimplemented I/O ports (by
returning all 1's for reads and ignoring writes) there is no need for
QEMU to register backgroud I/O sections.

This patch therefore adds checks to xen_io_add/del so that sections with
memory-region ops pointing at 'unassigned_io_ops' are ignored.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1462811480-16295-1-git-send-email-paul.durrant@citrix.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:11 +02:00
Paolo Bonzini
a4a0e4b258 docs/atomics: update comparison with Linux
Over time, some differences between QEMU and Linux atomics are getting
smoothed.  In particular, Linux grew atomic_fetch_or (and in general
the differences regarding RMW operations were not described accurately)
and smp_load_acquire/smp_store_release.  Also, set_mb was renamed to
smp_store_mb().  Include these changes in the documentation.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:11 +02:00
Emilio G. Cota
15487aa132 atomics: do not emit consume barrier for atomic_rcu_read
Currently we emit a consume-load in atomic_rcu_read.  Because of
limitations in current compilers, this is overkill for non-Alpha hosts
and it is only useful to make Thread Sanitizer work.

This patch leaves the consume-load in atomic_rcu_read when
compiling with Thread Sanitizer enabled, and resorts to a
relaxed load + smp_read_barrier_depends otherwise.

On an RMO host architecture, such as aarch64, the performance
improvement of this change is easily measurable. For instance,
qht-bench performs an atomic_rcu_read on every lookup. Performance
before and after applying this patch:

$ tests/qht-bench -d 5 -n 1
Before: 9.78 MT/s
After:  10.96 MT/s

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1464120374-8950-4-git-send-email-cota@braap.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:11 +02:00
Emilio G. Cota
c983895258 atomics: emit an smp_read_barrier_depends() barrier only for Alpha and Thread Sanitizer
For correctness, smp_read_barrier_depends() is only required to
emit a barrier on Alpha hosts. However, we are currently emitting
a consume fence unconditionally, and most compilers currently treat
consume and acquire fences as equivalent.

Fix it by keeping the consume fence if we're compiling with Thread
Sanitizer, since this might help prevent false warnings. Otherwise,
only emit the barrier for Alpha hosts. Note that we still guarantee
that smp_read_barrier_depends() is a compiler barrier.

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1464120374-8950-3-git-send-email-cota@braap.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:11 +02:00
Emilio G. Cota
56ebe02203 docs/atomics: update atomic_read/set comparison with Linux
Recently Linux did a mass conversion of its atomic_read/set calls
so that they at least are READ/WRITE_ONCE. See Linux's commit
62e8a325 ("atomic, arch: Audit atomic_{read,set}()"). It seems though
that their documentation hasn't been updated to reflect this.

The appended updates our documentation to reflect the change, which
means there is effectively no difference between our atomic_read/set
and the current Linux implementation.

While at it, fix the statement that a barrier is implied by
atomic_read/set, which is incorrect. Volatile/atomic semantics prevent
transformations pertaining the variable they apply to; this, however,
has no effect on surrounding statements like barriers do. For more
details on this, see:
  https://gcc.gnu.org/onlinedocs/gcc/Volatiles.html

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1464120374-8950-2-git-send-email-cota@braap.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:11 +02:00
Paolo Bonzini
141af038dd bt: rewrite csrhci_write to avoid out-of-bounds writes
The usage of INT_MAX in this function confuses Coverity.  I think
the defect is bogus, however there is no protection against
getting more than sizeof(s->inpkt) bytes from the character device
backend.

Rewrite the function to only fill in as much data as needed from
buf into s->inpkt.  The plen variable is replaced by a simple
state machine and there is no need anymore to shift contents to
the beginning of s->inpkt.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:11 +02:00
Peter Lieven
a6b3167fa0 block/iscsi: avoid potential overflow of acb->task->cdb
at least in the path via virtio-blk the maximum size is not
restricted.

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <1464080368-29584-1-git-send-email-pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:11 +02:00
Prasad J Pandit
b60bdd1f1e scsi: megasas: check 'read_queue_head' index value
While doing MegaRAID SAS controller command frame lookup, routine
'megasas_lookup_frame' uses 'read_queue_head' value as an index
into 'frames[MEGASAS_MAX_FRAMES=2048]' array. Limit its value
within array bounds to avoid any OOB access.

Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <1464179110-18593-1-git-send-email-ppandit@redhat.com>
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:11 +02:00
Prasad J Pandit
d37af74073 scsi: megasas: initialise local configuration data buffer
When reading MegaRAID SAS controller configuration via MegaRAID
Firmware Interface(MFI) commands, routine megasas_dcmd_cfg_read
uses an uninitialised local data buffer. Initialise this buffer
to avoid stack information leakage.

Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <1464178304-12831-1-git-send-email-ppandit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:10 +02:00
Prasad J Pandit
1b85898025 scsi: megasas: use appropriate property buffer size
When setting MegaRAID SAS controller properties via MegaRAID
Firmware Interface(MFI) commands, a user supplied size parameter
is used to set property value. Use appropriate size value to avoid
OOB access issues.

Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <1464172291-2856-2-git-send-email-ppandit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:10 +02:00
Prasad J Pandit
06630554cc scsi: mptsas: infinite loop while fetching requests
The LSI SAS1068 Host Bus Adapter emulator in Qemu, periodically
looks for requests and fetches them. A loop doing that in
mptsas_fetch_requests() could run infinitely if 's->state' was
not operational. Move check to avoid such a loop.

Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Cc: qemu-stable@nongnu.org
Message-Id: <1464077264-25473-1-git-send-email-ppandit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:10 +02:00
Prasad J Pandit
3e831b40e0 scsi: pvscsi: check command descriptor ring buffer size (CVE-2016-4952)
Vmware Paravirtual SCSI emulation uses command descriptors to
process SCSI commands. These descriptors come with their ring
buffers. A guest could set the ring buffer size to an arbitrary
value leading to OOB access issue. Add check to avoid it.

Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Cc: qemu-stable@nongnu.org
Message-Id: <1464000485-27041-1-git-send-email-ppandit@redhat.com>
Reviewed-by: Shmulik Ladkani <shmulik.ladkani@ravellosystems.com>
Reviewed-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:10 +02:00
Paolo Bonzini
60b412dd18 kvm_stat: Remove
The source has moved to the Linux kernel tree.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:10 +02:00
Eric Blake
353ab96973 nbd: Don't trim unrequested bytes
Similar to commit df7b97ff, we are mishandling clients that
give an unaligned NBD_CMD_TRIM request, and potentially
trimming bytes that occur before their request; which in turn
can cause potential unintended data loss (unlikely in
practice, since most clients are sane and issue aligned trim
requests).  However, while we fixed read and write by switching
to the byte interfaces of blk_, we don't yet have a byte
interface for discard.  On the other hand, trim is advisory, so
rounding the user's request to simply ignore the first and last
unaligned sectors (or the entire request, if it is sub-sector
in length) is just fine.

CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1464173965-9694-1-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:10 +02:00
xiaoqiang zhao
e269fbe231 hw/char: QOM'ify milkymist-uart.c
drop the qemu_char_get_next_serial and use chardev prop instead

Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Message-Id: <1464158344-12266-6-git-send-email-zxq_yx_007@163.com>
Tested-by: Michael Walle <michael@walle.cc>
Acked-by: Michael Walle <michael@walle.cc>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:10 +02:00
xiaoqiang zhao
7aaefcaf66 hw/char: QOM'ify lm32_uart.c
* Drop the old SysBus init function and use instance_init
* Call qemu_chr_add_handlers in the realize callback
* Use qdev chardev prop instead of qemu_char_get_next_serial
* Add lm32_uart_create function to create lm32 uart device

Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Message-Id: <1464158344-12266-5-git-send-email-zxq_yx_007@163.com>
Tested-by: Michael Walle <michael@walle.cc>
Acked-by: Michael Walle <michael@walle.cc>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:10 +02:00
xiaoqiang zhao
c2ddaa62b6 hw/char: QOM'ify lm32_juart.c
* Drop the old SysBus init function
* Call qemu_chr_add_handlers in the realize callback
* Use qdev chardev prop instead of qemu_char_get_next_serial

Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Message-Id: <1464158344-12266-4-git-send-email-zxq_yx_007@163.com>
Tested-by: Michael Walle <michael@walle.cc>
Acked-by: Michael Walle <michael@walle.cc>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:10 +02:00
xiaoqiang zhao
8290de92b8 hw/char: QOM'ify etraxfs_ser.c
* Drop the old SysBus init function and use instance_init
* Call qemu_chr_add_handlers in the realize callback
* Use qdev chardev prop instead of qemu_char_get_next_serial
* Add etraxfs_ser_create function to create etraxfs serial device

Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Message-Id: <1464158344-12266-3-git-send-email-zxq_yx_007@163.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:10 +02:00
xiaoqiang zhao
e7c9136977 hw/char: QOM'ify escc.c
* Drop the old SysBus init function and use instance_init
* Call qemu_chr_add_handlers in the realize callback

Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Message-Id: <1464158344-12266-2-git-send-email-zxq_yx_007@163.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:10 +02:00
Paolo Bonzini
b138e654a0 Revert "memory: Drop FlatRange.romd_mode"
This reverts commit 5b5660adf1,
as it breaks the UEFI guest firmware (known as ArmVirtPkg or AAVMF)
running in the "virt" machine type of "qemu-system-aarch64":

Contrary to the commit message, (a->mr == b->mr) does *not* imply
that (a->romd_mode == b->romd_mode): the pflash device model calls
memory_region_rom_device_set_romd() -- for switching between the above
modes --, and that function changes mr->romd_mode but the current
AddressSpaceDispatch's FlatRange keeps the old value.  Therefore
region_del/region_add are not called on the KVM MemoryListener.

Reported-by: Drew Jones <drjones@redhat.com>
Tested-by: Drew Jones <drjones@redhat.com>
Analyzed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-29 09:11:10 +02:00
Peter Maydell
d6550e9ed2 Merge remote-tracking branch 'remotes/riku/tags/pull-linux-user-20160527' into staging
linux-user pull request v2 for may 2016

# gpg: Signature made Fri 27 May 2016 12:51:10 BST using RSA key ID DE3C9BC0
# gpg: Good signature from "Riku Voipio <riku.voipio@iki.fi>"
# gpg:                 aka "Riku Voipio <riku.voipio@linaro.org>"

* remotes/riku/tags/pull-linux-user-20160527: (38 commits)
  linux-user,target-ppc: fix use of MSR_LE
  linux-user/signal.c: Use s390 target space address instead of host space
  linux-user/signal.c: Use target address instead of host address for microblaze restorer
  linux-user/signal.c: Generate opcode data for restorer in setup_rt_frame
  linux-user: arm: Remove ARM_cpsr and similar #defines
  linux-user: Use direct syscalls for setuid(), etc
  linux-user: x86_64: Don't use 16-bit UIDs
  linux-user: Use g_try_malloc() in do_msgrcv()
  linux-user: Handle msgrcv error case correctly
  linux-user: Handle negative values in timespec conversion
  linux-user: Use safe_syscall for futex syscall
  linux-user: Use safe_syscall for pselect, select syscalls
  linux-user: Use safe_syscall for execve syscall
  linux-user: Use safe_syscall for wait system calls
  linux-user: Use safe_syscall for open and openat system calls
  linux-user: Use safe_syscall for read and write system calls
  linux-user: Provide safe_syscall for fixing races between signals and syscalls
  linux-user: Add debug code to exercise restarting system calls
  linux-user: Support for restarting system calls for Microblaze targets
  linux-user: Set r14 on exit from microblaze syscall
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-27 14:05:48 +01:00
Laurent Vivier
49e55cbacf linux-user,target-ppc: fix use of MSR_LE
setup_frame()/setup_rt_frame()/restore_user_regs() are using
MSR_LE as the similar kernel functions do: as a bitmask.

But in QEMU, MSR_LE is a bit position, so change this
accordingly.

The previous code was doing nothing as MSR_LE is 0,
and "env->msr &= ~MSR_LE" doesn't change the value of msr.

And yes, a user process can change its endianness,
see linux kernel commit:

    fab5db9 [PATCH] powerpc: Implement support for setting little-endian mode via prctl

and prctl(2): PR_SET_ENDIAN, PR_GET_ENDIAN

Reviewed-by: Thomas Huth <huth@tuxfamily.org>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-05-27 14:50:40 +03:00
Chen Gang
5b1d59d0bb linux-user/signal.c: Use s390 target space address instead of host space
The return address is in target space, so the restorer address needs to
be target space, too.

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
2016-05-27 14:50:40 +03:00
Chen Gang
166c97edd6 linux-user/signal.c: Use target address instead of host address for microblaze restorer
The return address is in target space, so the restorer address needs to
be target space, too.

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-05-27 14:50:40 +03:00
Chen Gang
f1d9d1071c linux-user/signal.c: Generate opcode data for restorer in setup_rt_frame
Original implementation uses do_rt_sigreturn directly in host space,
when a guest program is in unwind procedure in guest space, it will get
an incorrect restore address, then causes unwind failure.

Also cleanup the original incorrect indentation.

Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-05-27 14:50:39 +03:00
Peter Maydell
167e4cdc29 linux-user: arm: Remove ARM_cpsr and similar #defines
The #defines of ARM_cpsr and friends in linux-user/arm/target-syscall.h
can clash with versions in the system headers if building on an
ARM or AArch64 build (though this seems to be dependent on the version
of the system headers). The QEMU defines are not very useful (it's
not clear that they're intended for use with the target_pt_regs struct
rather than (say) the CPUARMState structure) and we only use them in one
function in elfload.c anyway. So just remove the #defines and directly
access regs->uregs[].

Reported-by: Christopher Covington <cov@codeaurora.org>
Tested-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-05-27 14:50:39 +03:00
Peter Maydell
fd6f7798ac linux-user: Use direct syscalls for setuid(), etc
On Linux the setuid(), setgid(), etc system calls have different semantics
from the libc functions. The libc functions follow POSIX and update the
credentials for all threads in the process; the system calls update only
the thread which makes the call. (This impedance mismatch is worked around
in libc by signalling all threads to tell them to do a syscall, in a
byzantine and fragile way; see http://ewontfix.com/17/.)

Since in linux-user we are trying to emulate the system call semantics,
we must implement all these syscalls to directly call the underlying
host syscall, rather than calling the host libc function.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-05-27 14:50:39 +03:00
Peter Maydell
716f3fbef2 linux-user: x86_64: Don't use 16-bit UIDs
The 64-bit x86 syscall ABI uses 32-bit UIDs; only define
USE_UID16 for 32-bit x86.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-05-27 14:50:39 +03:00
Peter Maydell
415d847110 linux-user: Use g_try_malloc() in do_msgrcv()
In do_msgrcv() we want to allocate a message buffer, whose size
is passed to us by the guest. That means we could legitimately
fail, so use g_try_malloc() and handle the error case, in the same
way that do_msgsnd() does.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-05-27 14:50:39 +03:00
Peter Maydell
99874f6552 linux-user: Handle msgrcv error case correctly
The msgrcv ABI is a bit odd -- the msgsz argument is a size_t, which is
unsigned, but it must fail EINVAL if the value is negative when cast
to a long. We were incorrectly passing the value through an
"unsigned int", which meant that if the guest was 32-bit longs and
the host was 64-bit longs an input of 0xffffffff (which should trigger
EINVAL) would simply be passed to the host msgrcv() as 0xffffffff,
where it does not cause the host kernel to reject it.
Follow the same approach as do_msgsnd() in using a ssize_t and
doing the check for negative values by hand, so we correctly fail
in this corner case.

This fixes the msgrcv03 Linux Test Project test case, which otherwise
hangs.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-05-27 14:50:39 +03:00
Peter Maydell
c7e35da348 linux-user: Handle negative values in timespec conversion
In a struct timespec, both fields are signed longs. Converting
them from guest to host with code like
    host_ts->tv_sec = tswapal(target_ts->tv_sec);
mishandles negative values if the guest has 32-bit longs and
the host has 64-bit longs because tswapal()'s return type is
abi_ulong: the assignment will zero-extend into the host long
type rather than sign-extending it.

Make the conversion routines use __get_user() and __set_user()
instead: this automatically picks up the signedness of the
field type and does the correct kind of sign or zero extension.
It also handles the possibility that the target struct is not
sufficiently aligned for the host's requirements.

In particular, this fixes a hang when running the Linux Test Project
mq_timedsend01 and mq_timedreceive01 tests: one of the test cases
sets the timeout to -1 and expects an EINVAL failure, but we were
setting a very long timeout instead.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-05-27 14:50:39 +03:00
Peter Maydell
d509eeb13c linux-user: Use safe_syscall for futex syscall
Use the safe_syscall wrapper for the futex syscall.

In particular, this fixes hangs when using programs that link
against the Boehm garbage collector, including the Mono runtime.

(We don't change the sys_futex() call in the implementation of
the exit syscall, because as the FIXME comment there notes
that should be handled by disabling signals, since we can't
easily back out if the futex were to return ERESTARTSYS.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-05-27 14:50:39 +03:00
Peter Maydell
6df9d38d33 linux-user: Use safe_syscall for pselect, select syscalls
Use the safe_syscall wrapper for the pselect and select syscalls.
Since not every architecture has the select syscall, we now
have to implement select in terms of pselect, which means doing
timeval<->timespec conversion.

(Five years on from the initial patch that added pselect support
to QEMU and a decade after pselect6 went into the kernel, it seems
safe to not try to support hosts with header files which don't
define __NR_pselect6.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-05-27 14:50:38 +03:00
Timothy E Baldwin
ffdcbe223d linux-user: Use safe_syscall for execve syscall
Wrap execve() in the safe-syscall handling. Although execve() is not
an interruptible syscall, it is a special case: if we allow a signal
to happen before we make the host$ syscall then we will 'lose' it,
because at the point of execve the process leaves QEMU's control.  So
we use the safe syscall wrapper to ensure that we either take the
signal as a guest signal, or else it does not happen before the
execve completes and makes it the other program's problem.

The practical upshot is that without this SIGTERM could fail to
terminate the process.

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Message-id: 1441497448-32489-25-git-send-email-T.E.Baldwin99@members.leeds.ac.uk
[PMM: expanded commit message to explain in more detail why this is
 needed, and add comment about it too]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-05-27 14:50:38 +03:00
Timothy E Baldwin
4af80a3783 linux-user: Use safe_syscall for wait system calls
Use safe_syscall for waitpid, waitid and wait4 syscalls. Note that this
change allows us to implement support for waitid's fifth (rusage) argument
in future; for the moment we ignore it as we have done up til now.

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Message-id: 1441497448-32489-18-git-send-email-T.E.Baldwin99@members.leeds.ac.uk
[PMM: Adjust to new safe_syscall convention. Add fifth waitid syscall argument
 (which isn't present in the libc interface but is in the syscall ABI)]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-05-27 14:50:38 +03:00
Timothy E Baldwin
c10a07387b linux-user: Use safe_syscall for open and openat system calls
Restart open() and openat() if signals occur before,
or during with SA_RESTART.

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Message-id: 1441497448-32489-17-git-send-email-T.E.Baldwin99@members.leeds.ac.uk
[PMM: Adjusted to follow new -1-and-set-errno safe_syscall convention]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-05-27 14:50:38 +03:00
Timothy E Baldwin
50afd02b84 linux-user: Use safe_syscall for read and write system calls
Restart read() and write() if signals occur before, or during with SA_RESTART

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Message-id: 1441497448-32489-15-git-send-email-T.E.Baldwin99@members.leeds.ac.uk
[PMM: Update to new safe_syscall() convention of setting errno]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-05-27 14:50:38 +03:00
Timothy E Baldwin
4d330cee37 linux-user: Provide safe_syscall for fixing races between signals and syscalls
If a signal is delivered immediately before a blocking system call the
handler will only be called after the system call returns, which may be a
long time later or never.

This is fixed by using a function (safe_syscall) that checks if a guest
signal is pending prior to making a system call, and if so does not call the
system call and returns -TARGET_ERESTARTSYS. If a signal is received between
the check and the system call host_signal_handler() rewinds execution to
before the check. This rewinding has the effect of closing the race window
so that safe_syscall will reliably either (a) go into the host syscall
with no unprocessed guest signals pending or or (b) return
-TARGET_ERESTARTSYS so that the caller can deal with the signals.
Implementing this requires a per-host-architecture assembly language
fragment.

This will also resolve the mishandling of the SA_RESTART flag where
we would restart a host system call and not call the guest signal handler
until the syscall finally completed -- syscall restarting now always
happens at the guest syscall level so the guest signal handler will run.
(The host syscall will never be restarted because if the host kernel
rewinds the PC to point at the syscall insn for a restart then our
host_signal_handler() will see this and arrange the guest PC rewind.)

This commit contains the infrastructure for implementing safe_syscall
and the assembly language fragment for x86-64, but does not change any
syscalls to use it.

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Message-id: 1441497448-32489-14-git-send-email-T.E.Baldwin99@members.leeds.ac.uk
[PMM:
 * Avoid having an architecture if-ladder in configure by putting
   linux-user/host/$(ARCH) on the include path and including
   safe-syscall.inc.S from it
 * Avoid ifdef ladder in signal.c by creating new hostdep.h to hold
   host-architecture-specific things
 * Added copyright/license header to safe-syscall.inc.S
 * Rewrote commit message
 * Added comments to safe-syscall.inc.S
 * Changed calling convention of safe_syscall() to match syscall()
   (returns -1 and host error in errno on failure)
 * Added a long comment in qemu.h about how to use safe_syscall()
   to implement guest syscalls.
]
RV: squashed Peters "fixup! linux-user: compile on non-x86-64 hosts"
patch
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-27 14:49:51 +03:00
Timothy E Baldwin
71a8f7fece linux-user: Add debug code to exercise restarting system calls
If DEBUG_ERESTARTSYS is set restart all system calls once. This
is pure debug code for exercising the syscall restart code paths
in the per-architecture cpu main loops.

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Message-id: 1441497448-32489-10-git-send-email-T.E.Baldwin99@members.leeds.ac.uk
[PMM: Add comment and a commented-out #define next to the commented-out
 generic DEBUG #define; remove the check on TARGET_USE_ERESTARTSYS;
 tweak comment message]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-05-27 14:49:51 +03:00
Timothy E Baldwin
4134ecfeb9 linux-user: Support for restarting system calls for Microblaze targets
Update the Microblaze main loop and sigreturn code:
 * on TARGET_ERESTARTSYS, wind guest PC backwards to repeat syscall insn
 * set all guest CPU state within signal.c code on sigreturn
 * handle TARGET_QEMU_ESIGRETURN in the main loop as the indication
   that the main loop should not touch any guest CPU state

Note that this in passing fixes a bug where we were corrupting
the guest r[3] on sigreturn with the guest's r[10] because
do_sigreturn() was returning env->regs[10] but the register for
syscall return values is env->regs[3].

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Message-id: 1441497448-32489-11-git-send-email-T.E.Baldwin99@members.leeds.ac.uk
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: Commit message tweaks; drop TARGET_USE_ERESTARTSYS define;
 drop whitespace changes]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-05-27 14:49:51 +03:00
Peter Maydell
d7749ab770 linux-user: Set r14 on exit from microblaze syscall
All syscall exits on microblaze result in r14 being equal to the
PC we return to, because the kernel syscall exit instruction "rtbd"
does this. (This is true even for sigreturn(); note that r14 is
not a userspace-usable register as the kernel may clobber it at
any point.)

Emulate the setting of r14 on exit; this isn't really a guest
visible change for valid guest code because r14 isn't reliably
observable anyway. However having the code and the comment helps
to explain why it's ok for the ERESTARTSYS handling not to undo
the changes to r14 that happen on syscall entry.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-05-27 14:49:50 +03:00
Peter Maydell
a9175169cc linux-user: Support for restarting system calls for tilegx targets
Update the tilegx main loop and sigreturn code:
 * on TARGET_ERESTARTSYS, wind guest PC backwards to repeat syscall insn
 * return -TARGET_QEMU_ESIGRETURN from sigreturn rather than current R_RE
 * handle TARGET_QEMU_ESIGRETURN in the main loop as the indication
   that the main loop should not touch any guest CPU state

Note that this fixes a bug where a sigreturn which happened to have
an errno value in TILEGX_R_RE would incorrectly cause TILEGX_R_ERR
to get set.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-05-27 14:49:50 +03:00
Timothy E Baldwin
6205086558 linux-user: Support for restarting system calls for CRIS targets
Update the CRIS main loop and sigreturn code:
 * on TARGET_ERESTARTSYS, wind guest PC backwards to repeat syscall insn
 * set all guest CPU state within signal.c code on sigreturn
 * handle TARGET_QEMU_ESIGRETURN in the main loop as the indication
   that the main loop should not touch any guest CPU state

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Message-id: 1441497448-32489-34-git-send-email-T.E.Baldwin99@members.leeds.ac.uk
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
[PMM: tweak commit message; drop TARGET_USE_ERESTARTSYS define]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-05-27 14:49:50 +03:00
Timothy E Baldwin
47405ab642 linux-user: Support for restarting system calls for S390 targets
Update the S390 main loop and sigreturn code:
 * on TARGET_ERESTARTSYS, wind guest PC backwards to repeat syscall insn
 * set all guest CPU state within signal.c code on sigreturn
 * handle TARGET_QEMU_ESIGRETURN in the main loop as the indication
   that the main loop should not touch any guest CPU state

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Message-id: 1441497448-32489-33-git-send-email-T.E.Baldwin99@members.leeds.ac.uk
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: tweak commit message; remove stray double semicolon; drop
 TARGET_USE_ERESTARTSYS define]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-05-27 14:49:50 +03:00
Timothy E Baldwin
7ccb84a916 linux-user: Support for restarting system calls for M68K targets
Update the M68K main loop and sigreturn code:
 * on TARGET_ERESTARTSYS, wind guest PC backwards to repeat syscall insn
 * set all guest CPU state within signal.c code on sigreturn
 * handle TARGET_QEMU_ESIGRETURN in the main loop as the indication
   that the main loop should not touch any guest CPU state

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Message-id: 1441497448-32489-32-git-send-email-T.E.Baldwin99@members.leeds.ac.uk
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: tweak commit message; drop TARGET_USE_ERESTARTSYS define]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-05-27 14:49:50 +03:00
Timothy E Baldwin
7fe7231a49 linux-user: Support for restarting system calls for OpenRISC targets
Update the OpenRISC main loop code:
 * on TARGET_ERESTARTSYS, wind guest PC backwards to repeat syscall insn
 * handle TARGET_QEMU_ESIGRETURN in the main loop as the indication
   that the main loop should not touch any guest CPU state

(We don't implement sigreturn on this target so there is no
code there to update.)

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Message-id: 1441497448-32489-31-git-send-email-T.E.Baldwin99@members.leeds.ac.uk
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: tweak commit message; drop TARGET_USE_ERESTARTSYS define]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-05-27 14:49:50 +03:00
Timothy E Baldwin
256cb6af7f linux-user: Support for restarting system calls for UniCore32 targets
Update the UniCore32 main loop code:
 * on TARGET_ERESTARTSYS, wind guest PC backwards to repeat syscall insn
 * handle TARGET_QEMU_ESIGRETURN in the main loop as the indication
   that the main loop should not touch any guest CPU state

(We don't support signals on this target so there is no sigreturn code
to update.)

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Message-id: 1441497448-32489-30-git-send-email-T.E.Baldwin99@members.leeds.ac.uk
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: tweak commit message; drop TARGET_USE_ERESTARTSYS define]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-05-27 14:49:50 +03:00
Timothy E Baldwin
338c858c94 linux-user: Support for restarting system calls for Alpha targets
Update the Alpha main loop and sigreturn code:
 * on TARGET_ERESTARTSYS, wind guest PC backwards to repeat syscall insn
 * handle TARGET_QEMU_ESIGRETURN in the main loop as the indication
   that the main loop should not touch any guest CPU state

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Message-id: 1441497448-32489-13-git-send-email-T.E.Baldwin99@members.leeds.ac.uk
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: tweak commit message; drop TARGET_USE_ERESTARTSYS define;
 PC is env->pc, not env->ir[IR_PV]]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-05-27 14:49:50 +03:00
Timothy E Baldwin
ba41249678 linux-user: Support for restarting system calls for SH4 targets
Update the SH4 main loop and sigreturn code:
 * on TARGET_ERESTARTSYS, wind guest PC backwards to repeat syscall insn
 * set all guest CPU state within signal.c code on sigreturn
 * handle TARGET_QEMU_ESIGRETURN in the main loop as the indication
   that the main loop should not touch any guest CPU state

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Message-id: 1441497448-32489-12-git-send-email-T.E.Baldwin99@members.leeds.ac.uk
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: tweak commit message; drop TARGET_USE_ERESTARTSYS define]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-05-27 14:49:49 +03:00
Timothy E Baldwin
c0bea68f9e linux-user: Support for restarting system calls for SPARC targets
Update the SPARC main loop and sigreturn code:
 * on TARGET_ERESTARTSYS, wind guest PC backwards to repeat syscall insn
 * set all guest CPU state within signal.c code on sigreturn
 * handle TARGET_QEMU_ESIGRETURN in the main loop as the indication
   that the main loop should not touch any guest CPU state

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Message-id: 1441497448-32489-9-git-send-email-T.E.Baldwin99@members.leeds.ac.uk
[PMM: Commit message tweaks; drop TARGET_USE_ERESTARTSYS define]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-05-27 14:49:49 +03:00
Timothy E Baldwin
6db9d00e2f linux-user: Support for restarting system calls for PPC targets
Update the PPC main loop code:
 * on TARGET_ERESTARTSYS, wind guest PC backwards to repeat syscall insn

(We already handle TARGET_QEMU_ESIGRETURN.)

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Message-id: 1441497448-32489-8-git-send-email-T.E.Baldwin99@members.leeds.ac.uk
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: tweak commit message; drop TARGET_USE_ERESTARTSYS define]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-05-27 14:49:49 +03:00
Timothy E Baldwin
2eb3ae27ec linux-user: Support for restarting system calls for MIPS targets
Update the MIPS main loop code:
 * on TARGET_ERESTARTSYS, wind guest PC backwards to repeat syscall insn

(We already handle TARGET_QEMU_ESIGRETURN.)

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Message-id: 1441497448-32489-7-git-send-email-T.E.Baldwin99@members.leeds.ac.uk
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: tweak commit message; drop TARGET_USE_ERESTARTSYS define]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-05-27 14:49:49 +03:00
Timothy E Baldwin
f0267ef711 linux-user: Support for restarting system calls for ARM targets
Update the 32-bit and 64-bit ARM main loop and sigreturn code:
 * on TARGET_ERESTARTSYS, wind guest PC backwards to repeat syscall insn
 * set all guest CPU state within signal.c code on sigreturn
 * handle TARGET_QEMU_ESIGRETURN in the main loop as the indication
   that the main loop should not touch any guest CPU state

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Message-id: 1441497448-32489-6-git-send-email-T.E.Baldwin99@members.leeds.ac.uk
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: tweak commit message; drop TARGET_USE_ERESTARTSYS define]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-05-27 14:49:49 +03:00
Timothy E Baldwin
0284b03ba3 linux-user: Support for restarting system calls for x86 targets
Update the x86 main loop and sigreturn code:
 * on TARGET_ERESTARTSYS, wind guest PC backwards to repeat syscall insn
 * set all guest CPU state within signal.c code rather than passing it
   back out as the "return code" from do_sigreturn()
 * handle TARGET_QEMU_ESIGRETURN in the main loop as the indication
   that the main loop should not touch EAX

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Message-id: 1441497448-32489-5-git-send-email-T.E.Baldwin99@members.leeds.ac.uk
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: Commit message tweaks; drop TARGET_USE_ERESTARTSYS define]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-05-27 14:49:49 +03:00
Timothy E Baldwin
499b5d176a linux-user: Renumber TARGET_QEMU_ESIGRETURN, make it not arch-specific
Currently we define a QEMU-internal errno TARGET_QEMU_ESIGRETURN
only on the MIPS and PPC targets; move this to errno_defs.h
so it is available for all architectures, and renumber it to 513.
We pick 513 because this is safe from future use as a system call return
value: Linux uses it as ERESTART_NOINTR internally and never allows that
errno to escape to userspace.

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Message-id: 1441497448-32489-4-git-send-email-T.E.Baldwin99@members.leeds.ac.uk
[PMM: TARGET_ERESTARTSYS split out into preceding patch, add comment]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-05-27 14:49:49 +03:00
Timothy E Baldwin
14896d3281 linux-user: Define TARGET_ERESTART* errno values
Define TARGET_ERESTARTSYS; like the kernel, we will use this to
indicate that a guest system call should be restarted. We use
the same value the kernel does for this, 512.

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
[PMM: split out from the patch which moves and renumbers
 TARGET_QEMU_ESIGRETURN, add comment on usage]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-05-27 14:49:49 +03:00
Timothy E Baldwin
da7c8647e5 linux-user: Reindent signal handling
Some of the signal handling was a mess with a mixture of tabs and 8 space
indents.

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Message-id: 1441497448-32489-3-git-send-email-T.E.Baldwin99@members.leeds.ac.uk
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: just rebased]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-05-27 14:49:48 +03:00
Peter Maydell
a3ca7bb259 linux-user: Consistently return host errnos from do_openat()
The function do_openat() is not consistent about whether it is
returning a host errno or a guest errno in case of failure.
Standardise on returning -1 with errno set (ie caller has
to call get_errno()).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reported-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
2016-05-27 14:49:48 +03:00
Timothy E Baldwin
2466119c95 linux-user: Check array bounds in errno conversion
Check array bounds in host_to_target_errno() and target_to_host_errno().

Signed-off-by: Timothy Edward Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Message-id: 1441497448-32489-2-git-send-email-T.E.Baldwin99@members.leeds.ac.uk
[PMM: Add a lower-bound check, use braces on if(), tweak commit message]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
2016-05-27 14:49:48 +03:00
Peter Maydell
34c99d7b93 Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.7-20160527' into staging
ppc patch queue for 2016-05-27 (first pull for qemu-2.7)

I'm back from holidays now, and have re-collated the ppc patch queue.
This is a first pull request against the qemu-2.7 branch, mostly
consisting of patches which were posted before the 2.6 freeze, but
weren't suitable for late inclusion in the 2.6 branch.

 * Assorted bugfixes and cleanups
 * Some preliminary patches towards dynamic DMA windows and CPU hotplug
 * Significant performance impovement for the spapr-llan device
 * Added myself to MAINTAINERS for ppc (overdue)

# gpg: Signature made Fri 27 May 2016 04:04:15 BST using RSA key ID 20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-2.7-20160527:
  MAINTAINERS: Add David Gibson as ppc maintainer
  spapr_iommu: Move table allocation to helpers
  spapr_iommu: Finish renaming vfio_accel to need_vfio
  spapr_pci: Use correct DMA LIOBN when composing the device tree
  spapr: ensure device trees are always associated with DRC
  PPC/KVM: early validation of vcpu id
  Added negative check for get_image_size()
  hw/net/spapr_llan: Provide counter with dropped rx frames to the guest
  hw/net/spapr_llan: Delay flushing of the RX queue while adding new RX buffers
  target-ppc: Cleanups to rldinm, rldnm, rldimi
  target-ppc: Use 32-bit rotate instead of deposit + 64-bit rotate
  target-ppc: Use movcond in isel
  target-ppc: Correct KVM synchronization for ppc_hash64_set_external_hpt()

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-27 10:11:11 +01:00
David Gibson
b4daafbd13 MAINTAINERS: Add David Gibson as ppc maintainer
I've been de facto co-maintainer of all ppc target related code for some
time.  Alex Graf isworking on other things and doesn't have a whole lot of
time for qemu ppc maintainership.  So, update the MAINTAINERS file to
reflect this.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexander Graf <agraf@suse.de>
Acked-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
2016-05-27 12:59:41 +10:00
Alexey Kardashevskiy
fec5d3a1cd spapr_iommu: Move table allocation to helpers
At the moment presence of vfio-pci devices on a bus affect the way
the guest view table is allocated. If there is no vfio-pci on a PHB
and the host kernel supports KVM acceleration of H_PUT_TCE, a table
is allocated in KVM. However, if there is vfio-pci and we do yet not
KVM acceleration for these, the table has to be allocated by
the userspace. At the moment the table is allocated once at boot time
but next patches will reallocate it.

This moves kvmppc_create_spapr_tce/g_malloc0 and their counterparts
to helpers.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-27 09:40:23 +10:00
Alexey Kardashevskiy
f94819d601 spapr_iommu: Finish renaming vfio_accel to need_vfio
6a81dd17 "spapr_iommu: Rename vfio_accel parameter" renamed vfio_accel
flag everywhere but one spot was missed.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-27 09:40:23 +10:00
Alexey Kardashevskiy
eded5bac3b spapr_pci: Use correct DMA LIOBN when composing the device tree
The user could have picked LIOBN via the CLI but the device tree
rendering code would still use the value derived from the PHB index
(which is the default fallback if LIOBN is not set in the CLI).

This replaces SPAPR_PCI_LIOBN() with the actual DMA LIOBN value.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-27 09:40:23 +10:00
Jianjun Duan
5dd5238c0b spapr: ensure device trees are always associated with DRC
There are possible racing situations involving hotplug events and
guest migration. For cases where a hotplug event is migrated, or
the guest is in the process of fetching device tree at the time of
migration, we need to ensure the device tree is created and
associated with the corresponding DRC for devices that were
hotplugged on the source, but 'coldplugged' on the target.

Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-27 09:40:23 +10:00
Greg Kurz
41264b385c PPC/KVM: early validation of vcpu id
The KVM API restricts vcpu ids to be < KVM_CAP_MAX_VCPUS. On PowerPC
targets, depending on the number of threads per core in the host and
in the guest, some topologies do generate higher vcpu ids actually.
When this happens, QEMU bails out with the following error:

kvm_init_vcpu failed: Invalid argument

The KVM_CREATE_VCPU ioctl has several EINVAL return paths, so it is
not possible to fully disambiguate.

This patch adds a check in the code that computes vcpu ids, so that
we can detect the error earlier, and print a friendlier message instead
of calling KVM_CREATE_VCPU with an obviously bogus vcpu id.

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-27 09:40:23 +10:00
Zhou Jie
8afc22a20f Added negative check for get_image_size()
This patch adds check for negative return value from get_image_size(),
where it is missing. It avoids unnecessary two function calls.

Signed-off-by: Zhou Jie <zhoujie2011@cn.fujitsu.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-27 09:40:23 +10:00
Thomas Huth
5c29dd8c28 hw/net/spapr_llan: Provide counter with dropped rx frames to the guest
The last 8 bytes of the receive buffer list page (that has been supplied
by the guest with the H_REGISTER_LOGICAL_LAN call) contain a counter
for frames that have been dropped because there was no suitable receive
buffer available. This patch introduces code to use this field to
provide the information about dropped rx packets to the guest.
There it can be queried with "ethtool -S eth0 | grep rx_no_buffer".

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-27 09:40:23 +10:00
Thomas Huth
8836630f5d hw/net/spapr_llan: Delay flushing of the RX queue while adding new RX buffers
Currently, the spapr-vlan device is trying to flush the RX queue
after each RX buffer that has been added by the guest via the
H_ADD_LOGICAL_LAN_BUFFER hypercall. In case the receive buffer pool
was empty before, we only pass single packets to the guest this
way. This can cause very bad performance if a sender is trying
to stream fragmented UDP packets to the guest. For example when
using the UDP_STREAM test from netperf with UDP packets that are
much bigger than the MTU size, almost all UDP packets are dropped
in the guest since the chances are quite high that at least one of
the fragments got lost on the way.

When flushing the receive queue, it's much better if we'd have
a bunch of receive buffers available already, so that fragmented
packets can be passed to the guest in one go. To do this, the
spapr_vlan_receive() function should return 0 instead of -1 if there
are no more receive buffers available, so that receive_disabled = 1
gets temporarily set for the receive queue, and we have to delay
the queue flushing at the end of h_add_logical_lan_buffer() a little
bit by using a timer, so that the guest gets a chance to add multiple
RX buffers before we flush the queue again.

This improves the UDP_STREAM test with the spapr-vlan device a lot:
Running
 netserver -p 44444 -L <guestip> -f -D -4
in the guest, and
 netperf -p 44444 -L <hostip> -H <guestip> -t UDP_STREAM -l 60 -- -m 16384
in the host, I get the following values _without_ this patch:

Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

229376   16384   60.00     1738970      0    3798.83
229376           60.00          23              0.05

That "0.05" means that almost all UDP packets got lost/discarded
at the receiving side.
With this patch applied, the value look much better:

Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

229376   16384   60.00     1789104      0    3908.35
229376           60.00       22818             49.85

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-27 09:40:22 +10:00
Richard Henderson
a7b2c8b90a target-ppc: Cleanups to rldinm, rldnm, rldimi
Mirror the cleanups just done to rlwinm, rlwnm and rlwimi.
This adds use of deposit to rldimi.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-27 09:40:22 +10:00
Richard Henderson
63ae0915f8 target-ppc: Use 32-bit rotate instead of deposit + 64-bit rotate
A 32-bit rotate insn is more common on hosts than a deposit insn,
and if the host has neither the result is truely horrific.

At the same time, tidy up the temporaries within these functions,
drop the over-use of "likely", drop some checks for identity that
will also be checked by tcg-op.c functions, and special case mask
without rotate within rlwinm.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-27 09:40:22 +10:00
Richard Henderson
24f9cd951d target-ppc: Use movcond in isel
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-27 09:40:22 +10:00
David Gibson
319de6fe6e target-ppc: Correct KVM synchronization for ppc_hash64_set_external_hpt()
ppc_hash64_set_external_hpt() was added in e5c0d3c "target-ppc: Add helpers
for updating a CPU's SDR1 and external HPT".  This helper contains a
cpu_synchronize_state() since it may need to push state back to KVM
afterwards.

This turns out to break things when it is used in the reset path, which is
the only current user.  It appears that kvm_vcpu_dirty is not being set
early in the reset path, so the cpu_synchronize_state() is clobbering state
set up by the early part of the cpu reset path with stale state from KVM.

This may require some changes to the generic cpu reset path to fix
properly, but as a short term fix we can just remove the
cpu_synchronize_state() from ppc_hash64_set_external_hpt(), and require any
non-reset path callers to do that manually.

Reported-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-27 09:40:22 +10:00
Peter Maydell
84cfc756d1 Merge remote-tracking branch 'remotes/awilliam/tags/vfio-update-20160526.1' into staging
VFIO updates 2016-05-26

 - Infrastructure and quirks to support IGD assignment (Alex Williamson)
 - Fixes to 128bit handling, IOMMU replay, IOMMU translation sanity
   checking (Alexey Kardashevskiy)

# gpg: Signature made Thu 26 May 2016 18:50:29 BST using RSA key ID 3BB08B22
# gpg: Good signature from "Alex Williamson <alex.williamson@redhat.com>"
# gpg:                 aka "Alex Williamson <alex@shazbot.org>"
# gpg:                 aka "Alex Williamson <alwillia@redhat.com>"
# gpg:                 aka "Alex Williamson <alex.l.williamson@gmail.com>"

* remotes/awilliam/tags/vfio-update-20160526.1:
  vfio: Check that IOMMU MR translates to system address space
  memory: Fix IOMMU replay base address
  vfio: Fix 128 bit handling when deleting region
  vfio/pci: Add IGD documentation
  vfio/pci: Add a separate option for IGD OpRegion support
  vfio/pci: Intel graphics legacy mode assignment
  vfio/pci: Setup BAR quirks after capabilities probing
  vfio/pci: Consolidate VGA setup
  vfio/pci: Fix return of vfio_populate_vga()
  vfio: Create device specific region info helper
  vfio: Enable sparse mmap capability

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-26 19:18:08 +01:00
Alexey Kardashevskiy
f1f9365019 vfio: Check that IOMMU MR translates to system address space
At the moment IOMMU MR only translate to the system memory.
However if some new code changes this, we will need clear indication why
it is not working so here is the check.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-05-26 11:12:09 -06:00
Alexey Kardashevskiy
d78c19b5cf memory: Fix IOMMU replay base address
Since a788f227 "memory: Allow replay of IOMMU mapping notifications"
when new VFIO listener is added, all existing IOMMU mappings are
replayed. However there is a problem that the base address of
an IOMMU memory region (IOMMU MR) is ignored which is not a problem
for the existing user (which is pseries) with its default 32bit DMA
window starting at 0 but it is if there is another DMA window.

This stores the IOMMU's offset_within_address_space and adjusts
the IOVA before calling vfio_dma_map/vfio_dma_unmap.

As the IOMMU notifier expects IOVA offset rather than the absolute
address, this also adjusts IOVA in sPAPR H_PUT_TCE handler before
calling notifier(s).

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-05-26 11:12:08 -06:00
Alexey Kardashevskiy
7a057b4fb9 vfio: Fix 128 bit handling when deleting region
7532d3cbf "vfio: Fix 128 bit handling" added support for 64bit IOMMU
memory regions when those are added to VFIO address space; however
removing code cannot cope with these as int128_get64() will fail on
1<<64.

This copies 128bit handling from region_add() to region_del().

Since the only machine type which is actually going to use 64bit IOMMU
is pseries and it never really removes them (instead it will dynamically
add/remove subregions), this should cause no behavioral change.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-05-26 11:12:07 -06:00
Alex Williamson
0eb7342417 vfio/pci: Add IGD documentation
Document the usage modes, host primary graphics considerations, usage,
and fw_cfg ABI required for IGD assignment with vfio.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Tested-by: Gerd Hoffmann <kraxel@redhat.com>
2016-05-26 11:12:05 -06:00
Alex Williamson
6ced0bba70 vfio/pci: Add a separate option for IGD OpRegion support
The IGD OpRegion is enabled automatically when running in legacy mode,
but it can sometimes be useful in universal passthrough mode as well.
Without an OpRegion, output spigots don't work, and even though Intel
doesn't officially support physical outputs in UPT mode, it's a
useful feature.  Note that if an OpRegion is enabled but a monitor is
not connected, some graphics features will be disabled in the guest
versus a headless system without an OpRegion, where they would work.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Tested-by: Gerd Hoffmann <kraxel@redhat.com>
2016-05-26 11:12:03 -06:00
Alex Williamson
c4c45e943e vfio/pci: Intel graphics legacy mode assignment
Enable quirks to support SandyBridge and newer IGD devices as primary
VM graphics.  This requires new vfio-pci device specific regions added
in kernel v4.6 to expose the IGD OpRegion, the shadow ROM, and config
space access to the PCI host bridge and LPC/ISA bridge.  VM firmware
support, SeaBIOS only so far, is also required for reserving memory
regions for IGD specific use.  In order to enable this mode, IGD must
be assigned to the VM at PCI bus address 00:02.0, it must have a ROM,
it must be able to enable VGA, it must have or be able to create on
its own an LPC/ISA bridge of the proper type at PCI bus address
00:1f.0 (sorry, not compatible with Q35 yet), and it must have the
above noted vfio-pci kernel features and BIOS.  The intention is that
to enable this mode, a user simply needs to assign 00:02.0 from the
host to 00:02.0 in the VM:

  -device vfio-pci,host=0000:00:02.0,bus=pci.0,addr=02.0

and everything either happens automatically or it doesn't.  In the
case that it doesn't, we leave error reports, but assume the device
will operate in universal passthrough mode (UPT), which doesn't
require any of this, but has a much more narrow window of supported
devices, supported use cases, and supported guest drivers.

When using IGD in this mode, the VM firmware is required to reserve
some VM RAM for the OpRegion (on the order or several 4k pages) and
stolen memory for the GTT (up to 8MB for the latest GPUs).  An
additional option, x-igd-gms allows the user to specify some amount
of additional memory (value is number of 32MB chunks up to 512MB) that
is pre-allocated for graphics use.  TBH, I don't know of anything that
requires this or makes use of this memory, which is why we don't
allocate any by default, but the specification suggests this is not
actually a valid combination, so the option exists as a workaround.
Please report if it's actually necessary in some environment.

See code comments for further discussion about the actual operation
of the quirks necessary to assign these devices.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Tested-by: Gerd Hoffmann <kraxel@redhat.com>
2016-05-26 11:12:01 -06:00
Alex Williamson
581406e0e3 vfio/pci: Setup BAR quirks after capabilities probing
Capability probing modifies wmask, which quirks may be interested in
changing themselves.  Apply our BAR quirks after the capability scan
to make this possible.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Tested-by: Gerd Hoffmann <kraxel@redhat.com>
2016-05-26 11:12:00 -06:00
Alex Williamson
182bca4592 vfio/pci: Consolidate VGA setup
Combine VGA discovery and registration.  Quirks can have dependencies
on BARs, so the quirks push out until after we've scanned the BARs.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Tested-by: Gerd Hoffmann <kraxel@redhat.com>
2016-05-26 11:11:58 -06:00
Alex Williamson
4225f2b670 vfio/pci: Fix return of vfio_populate_vga()
This function returns success if either we setup the VGA region or
the host vfio doesn't return enough regions to support the VGA index.
This latter case doesn't make any sense.  If we're asked to populate
VGA, fail if it doesn't exist and let the caller decide if that's
important.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Tested-by: Gerd Hoffmann <kraxel@redhat.com>
2016-05-26 11:11:56 -06:00
Alex Williamson
e61a424f05 vfio: Create device specific region info helper
Given a device specific region type and sub-type, find it.  Also
cleanup return point on error in vfio_get_region_info() so that we
always return 0 with a valid pointer or -errno and NULL.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Tested-by: Gerd Hoffmann <kraxel@redhat.com>
2016-05-26 11:04:50 -06:00
Alex Williamson
b53b0f696b vfio: Enable sparse mmap capability
The sparse mmap capability in a vfio region info allows vfio to tell
us which sub-areas of a region may be mmap'd.  Thus rather than
assuming a single mmap covers the entire region and later frobbing it
ourselves for things like the PCI MSI-X vector table, we can read that
directly from vfio.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Tested-by: Gerd Hoffmann <kraxel@redhat.com>
2016-05-26 09:43:20 -06:00
Peter Maydell
aef11b8d33 Merge remote-tracking branch 'remotes/amit-migration/tags/migration-2.7-2' into staging
migration: add TLS support to the migration data channel

This is a big refactoring of the migration backend code - moving away from
QEMUFile to the new QIOChannel framework introduced here.  This brings a
good level of abstraction and reduction of many lines of code.

This series also adds the ability for many backends (all except RDMA) to
use TLS for encrypting the migration data between the endpoints.

# gpg: Signature made Thu 26 May 2016 07:07:08 BST using RSA key ID 657EF670
# gpg: Good signature from "Amit Shah <amit@amitshah.net>"
# gpg:                 aka "Amit Shah <amit@kernel.org>"
# gpg:                 aka "Amit Shah <amitshah@gmx.net>"

* remotes/amit-migration/tags/migration-2.7-2: (28 commits)
  migration: remove qemu_get_fd method from QEMUFile
  migration: remove support for non-iovec based write handlers
  migration: add support for encrypting data with TLS
  migration: define 'tls-creds' and 'tls-hostname' migration parameters
  migration: don't use an array for storing migrate parameters
  migration: move definition of struct QEMUFile back into qemu-file.c
  migration: delete QEMUFile stdio implementation
  migration: delete QEMUFile sockets implementation
  migration: delete QEMUSizedBuffer struct
  migration: delete QEMUFile buffer implementation
  migration: convert savevm to use QIOChannel for writing to files
  migration: convert RDMA to use QIOChannel interface
  migration: convert exec socket protocol to use QIOChannel
  migration: convert fd socket protocol to use QIOChannel
  migration: convert tcp socket protocol to use QIOChannel
  migration: rename unix.c to socket.c
  migration: convert unix socket protocol to use QIOChannel
  migration: convert post-copy to use QIOChannelBuffer
  migration: add reporting of errors for outgoing migration
  migration: add helpers for creating QEMUFile from a QIOChannel
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-26 16:09:27 +01:00
Peter Maydell
2c56d06baf Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches

# gpg: Signature made Wed 25 May 2016 18:32:40 BST using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"

* remotes/kevin/tags/for-upstream: (31 commits)
  blockjob: Remove BlockJob.bs
  commit: Use BlockBackend for I/O
  backup: Use BlockBackend for I/O
  backup: Remove bs parameter from backup_do_cow()
  backup: Pack Notifier within BackupBlockJob
  backup: Don't leak BackupBlockJob in error path
  mirror: Use BlockBackend for I/O
  mirror: Allow target that already has a BlockBackend
  stream: Use BlockBackend for I/O
  block: Make blk_co_preadv/pwritev() public
  block: Convert block job core to BlockBackend
  block: Default to enabled write cache in blk_new()
  block: Cancel jobs first in bdrv_close_all()
  block: keep a list of block jobs
  block: Rename blk_write_zeroes()
  dma-helpers: change BlockBackend to opaque value in DMAIOFunc
  dma-helpers: change interface to byte-based
  block: Propagate .drained_begin/end callbacks
  block: Fix reconfiguring graph with drained nodes
  block: Make bdrv_drain() use bdrv_drained_begin/end()
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-26 14:29:30 +01:00
Andreas Färber
a62c89117f qdev: Start disentangling bus from device
Move bus type and related APIs to a separate file bus.c.
This is a first step in breaking up qdev.c into more manageable chunks.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[AF: Rebased onto osdep.h]
Signed-off-by: Andreas Färber <afaerber@suse.de>
[PMM: added bus.o to link line for test-qdev-global-props]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-26 14:06:41 +01:00
Sergey Fedorov
c88c67e58b cpu-exec: Fix direct jump to TB spanning page
It is not safe to make a direct jump to a TB spanning two pages in
system emulation because the mapping for the second page can get changed
but we don't take care of direct jumps in this case.

However in user mode emulation, this is not the case because there's
only static address translation and TBs are always invalidated properly.

Fixes: 5b053a4a28 ("tcg: Clean up direct block chaining safety checks")

Reported-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Tested-by: Max Filippov <jcmvbkbc@gmail.com>
Message-id: 1463404380-29302-1-git-send-email-sergey.fedorov@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-26 13:14:29 +01:00
Peter Maydell
0533d3de60 Merge remote-tracking branch 'remotes/afaerber/tags/maintainers-for-peter' into staging
Andreas stepping down from most maintainer positions

# gpg: Signature made Wed 25 May 2016 16:53:45 BST using RSA key ID 3E7E013F
# gpg: Good signature from "Andreas Färber <afaerber@suse.de>"
# gpg:                 aka "Andreas Färber <afaerber@suse.com>"

* remotes/afaerber/tags/maintainers-for-peter:
  MAINTAINERS: Drop Andreas as CPU maintainer
  MAINTAINERS: Drop Andreas as 0.15 maintainer
  MAINTAINERS: Drop Andreas as PReP maintainer
  MAINTAINERS: Drop Andreas as Cocoa maintainer

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-26 12:41:12 +01:00
Daniel P. Berrange
12992c16d9 migration: remove qemu_get_fd method from QEMUFile
Now that there is a set_blocking callback in QEMUFileOps,
and all users needing non-blocking support have been
converted to QIOChannel, there is no longer any codepath
requiring the qemu_get_fd() method for QEMUFile. Remove it
to avoid further code being introduced with an expectation
of direct file handle access.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1461751518-12128-29-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-26 11:32:21 +05:30
Daniel P. Berrange
11808bb0c4 migration: remove support for non-iovec based write handlers
All the remaining QEMUFile implementations provide an iovec
based write handler, so the put_buffer callback can be removed
to simplify the code.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1461751518-12128-28-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-26 11:32:18 +05:30
Daniel P. Berrange
e122636562 migration: add support for encrypting data with TLS
This extends the migration_set_incoming_channel and
migration_set_outgoing_channel methods so that they
will automatically wrap the QIOChannel in a
QIOChannelTLS instance if TLS credentials are configured
in the migration parameters.

This allows TLS to work for tcp, unix, fd and exec
migration protocols. It does not (currently) work for
RDMA since it does not use these APIs, but it is
unlikely that TLS would be desired with RDMA anyway
since it would degrade the performance to that seen
with TCP defeating the purpose of using RDMA.

On the target host, QEMU would be launched with a set
of TLS credentials for a server endpoint

 $ qemu-system-x86_64 -monitor stdio -incoming defer \
    -object tls-creds-x509,dir=/home/berrange/security/qemutls,endpoint=server,id=tls0 \
    ...other args...

To enable incoming TLS migration 2 monitor commands are
then used

  (qemu) migrate_set_str_parameter tls-creds tls0
  (qemu) migrate_incoming tcp:myhostname:9000

On the source host, QEMU is launched in a similar
manner but using client endpoint credentials

 $ qemu-system-x86_64 -monitor stdio \
    -object tls-creds-x509,dir=/home/berrange/security/qemutls,endpoint=client,id=tls0 \
    ...other args...

To enable outgoing TLS migration 2 monitor commands are
then used

  (qemu) migrate_set_str_parameter tls-creds tls0
  (qemu) migrate tcp:otherhostname:9000

Thanks to earlier improvements to error reporting,
TLS errors can be seen 'info migrate' when doing a
detached migration. For example:

  (qemu) info migrate
  capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off x-postcopy-ram: off
  Migration status: failed
  total time: 0 milliseconds
  error description: TLS handshake failed: The TLS connection was non-properly terminated.

Or

  (qemu) info migrate
  capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off x-postcopy-ram: off
  Migration status: failed
  total time: 0 milliseconds
  error description: Certificate does not match the hostname localhost

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1461751518-12128-27-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-26 11:32:13 +05:30
Daniel P. Berrange
69ef1f36b0 migration: define 'tls-creds' and 'tls-hostname' migration parameters
Define two new migration parameters to be used with TLS encryption.
The 'tls-creds' parameter provides the ID of an instance of the
'tls-creds' object type, or rather a subclass such as 'tls-creds-x509'.
Providing these credentials will enable use of TLS on the migration
data stream.

If using x509 certificates, together with a migration URI that does
not include a hostname, the 'tls-hostname' parameter provides the
hostname to use when verifying the server's x509 certificate. This
allows TLS to be used in combination with fd: and exec: protocols
where a TCP connection is established by a 3rd party outside of
QEMU.

NB, this requires changing the migrate_set_parameter method in the
HMP to accept a 's' (string) value instead of 'i' (integer). This
is backwards compatible, because the parsing of strings allows the
quotes to be optional, thus any integer is also a valid string.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1461751518-12128-26-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-26 11:32:10 +05:30
Daniel P. Berrange
2594f56d4c migration: don't use an array for storing migrate parameters
The MigrateState struct uses an array for storing migration
parameters. This presumes that all future parameters will
be integers too, which is not going to be the case. There
is no functional reason why an array is used, if anything
it makes the code less clear. The QAPI schema already
defines a struct - MigrationParameters - capable of storing
all the individual parameters, so just use that instead of
an array.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1461751518-12128-25-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-26 11:32:07 +05:30
Daniel P. Berrange
a24939f279 migration: move definition of struct QEMUFile back into qemu-file.c
Now that the memory buffer based QEMUFile impl is gone, there
is no need for any backend to be accessing internals of the
QEMUFile struct, so it can be moved back into qemu-file.c

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1461751518-12128-24-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-26 11:32:05 +05:30
Daniel P. Berrange
7fdc61c75d migration: delete QEMUFile stdio implementation
Now that the exec migration backend and savevm have converted
to use the QIOChannel based QEMUFile, there is no user remaining
for the stdio based QEMUFile impl and it can be deleted.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1461751518-12128-23-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-26 11:32:03 +05:30
Daniel P. Berrange
40946ae40b migration: delete QEMUFile sockets implementation
Now that the tcp, unix and fd migration backends have converted
to use the QIOChannel based QEMUFile, there is no user remaining
for the sockets based QEMUFile impl and it can be deleted.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1461751518-12128-22-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-26 11:32:00 +05:30
Daniel P. Berrange
2a22b4f370 migration: delete QEMUSizedBuffer struct
Now that we don't have have a buffer based QemuFile
implementation, the QEMUSizedBuffer code is also
unused and can be deleted. A simpler buffer class
also exists in util/buffer.c which other code can
used as needed.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1461751518-12128-21-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-26 11:31:58 +05:30
Daniel P. Berrange
8b7c5c0f52 migration: delete QEMUFile buffer implementation
The qemu_bufopen() method is no longer used, so the memory
buffer based QEMUFile backend can be deleted entirely.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1461751518-12128-20-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-26 11:31:55 +05:30
Daniel P. Berrange
8925839f00 migration: convert savevm to use QIOChannel for writing to files
Convert the exec savevm code to use QIOChannel and QEMUFileChannel,
instead of the stdio APIs.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1461751518-12128-19-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-26 11:31:53 +05:30
Daniel P. Berrange
6ddd2d76ca migration: convert RDMA to use QIOChannel interface
This converts the RDMA code to provide a subclass of QIOChannel
that uses RDMA for the data transport.

This implementation of RDMA does not correctly handle non-blocking
mode. Reads might block if there was not already some pending data
and writes will block until all data is sent. This flawed behaviour
was already present in the existing impl, so appears to not be a
critical problem at this time. It should be on the list of things
to fix in the future though.

The RDMA code would be much better off it it could be split up in
a generic RDMA layer, a QIOChannel impl based on RMDA, and then
the RMDA migration glue. This is left as a future exercise for
the brave.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1461751518-12128-18-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-26 11:31:50 +05:30
Daniel P. Berrange
527792fae6 migration: convert exec socket protocol to use QIOChannel
Convert the exec socket migration protocol driver to use
QIOChannel and QEMUFileChannel, instead of the stdio
popen APIs. It can be unconditionally built because the
QIOChannelCommand class can report suitable error messages
on platforms which can't fork processes.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1461751518-12128-17-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-26 11:31:47 +05:30
Daniel P. Berrange
64802ee57f migration: convert fd socket protocol to use QIOChannel
Convert the fd socket migration protocol driver to use
QIOChannel and QEMUFileChannel, instead of plain sockets
APIs. It can be unconditionally built because the
QIOChannel APIs it uses will take care to report suitable
error messages if needed.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1461751518-12128-16-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-26 11:31:45 +05:30
Daniel P. Berrange
e65c67e4da migration: convert tcp socket protocol to use QIOChannel
Drop the current TCP socket migration driver and extend
the new generic socket driver to cope with the TCP address
format

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1461751518-12128-15-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-26 11:31:42 +05:30
Daniel P. Berrange
6f860ae755 migration: rename unix.c to socket.c
The unix.c file will be nearly the same as the tcp.c file,
only differing in the initial SocketAddress creation code.
Rename unix.c to socket.c and refactor it a little to
prepare for merging the TCP code.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1461751518-12128-14-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-26 11:31:40 +05:30
Daniel P. Berrange
d984464eb9 migration: convert unix socket protocol to use QIOChannel
Convert the unix socket migration protocol driver to use
QIOChannel and QEMUFileChannel, instead of plain sockets
APIs. It can be unconditionally built, since the socket
impl of QIOChannel will report a suitable error on platforms
where UNIX sockets are unavailable.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1461751518-12128-13-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-26 11:31:37 +05:30
Daniel P. Berrange
61b67d473d migration: convert post-copy to use QIOChannelBuffer
The post-copy code does some I/O to/from an intermediate
in-memory buffer rather than direct to the underlying
I/O channel. Switch this code to use QIOChannelBuffer
instead of QEMUSizedBuffer.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1461751518-12128-12-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-26 11:31:34 +05:30
Daniel P. Berrange
d59ce6f344 migration: add reporting of errors for outgoing migration
Currently if an application initiates an outgoing migration,
it may or may not, get an error reported back on failure. If
the error occurs synchronously to the 'migrate' command
execution, the client app will see the error message. This
is the case for DNS lookup failures. If the error occurs
asynchronously to the monitor command though, the error
will be thrown away and the client left guessing about
what went wrong. This is the case for failure to connect
to the TCP server (eg due to wrong port, or firewall
rules, or other similar errors).

In the future we'll be adding more scope for errors to
happen asynchronously with the TLS protocol handshake.
TLS errors are hard to diagnose even when they are well
reported, so discarding errors entirely will make it
impossible to debug TLS connection problems.

Management apps which do migration are already using
'query-migrate' / 'info migrate' to check up on progress
of background migration operations and to see their end
status. This is a fine place to also include the error
message when things go wrong.

This patch thus adds an 'error-desc' field to the
MigrationInfo struct, which will be populated when
the 'status' is set to 'failed':

(qemu) migrate -d tcp:localhost:9001
(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off x-postcopy-ram: off
Migration status: failed (Error connecting to socket: Connection refused)
total time: 0 milliseconds

In the HMP, when doing non-detached migration, it is
also possible to display this error message directly
to the app.

(qemu) migrate tcp:localhost:9001
Error connecting to socket: Connection refused

Or with QMP

  {
    "execute": "query-migrate",
    "arguments": {}
  }
  {
    "return": {
      "status": "failed",
      "error-desc": "address resolution failed for myhost:9000: No address associated with hostname"
    }
  }

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <1461751518-12128-11-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-26 11:31:30 +05:30
Daniel P. Berrange
48f07489ed migration: add helpers for creating QEMUFile from a QIOChannel
Currently creating a QEMUFile instance from a QIOChannel is
quite simple only requiring a single call to
qemu_fopen_channel_input or  qemu_fopen_channel_output
depending on the end of migration connection.

When QEMU gains TLS support, however, there will need to be
a TLS negotiation done inbetween creation of the QIOChannel
and creation of the final QEMUFile. Introduce some helper
methods that will encapsulate this logic, isolating the
migration protocol drivers from knowledge about TLS.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Acked-by: Juan Quintela <quintela@redhat.com>
Message-Id: <1461751518-12128-10-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-26 11:31:27 +05:30
Daniel P. Berrange
a9cfeb33bb migration: introduce a new QEMUFile impl based on QIOChannel
Introduce a new QEMUFile implementation that is based on
the QIOChannel objects. This impl is different from existing
impls in that there is no file descriptor that can be made
available, as some channels may be based on higher level
protocols such as TLS.

Although the QIOChannel based implementation can trivially
provide a bi-directional stream, initially we have separate
functions for opening input & output directions to fit with
the expectation of the current QEMUFile interface.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1461751518-12128-9-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-26 11:31:24 +05:30
Daniel P. Berrange
9e4d2b98ee migration: force QEMUFile to blocking mode for outgoing migration
Instead of relying on the default QEMUFile I/O blocking flag
state, explicitly turn on blocking I/O for outgoing migration
since it takes place in a background thread.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <1461751518-12128-8-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-26 11:31:21 +05:30
Daniel P. Berrange
06ad513532 migration: introduce set_blocking function in QEMUFileOps
Remove the assumption that every QEMUFile implementation has
a file descriptor available by introducing a new function
in QEMUFileOps to change the blocking state of a QEMUFile.

If not set, it will fallback to the original code using
the get_fd method.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <1461751518-12128-7-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-26 11:31:19 +05:30
Daniel P. Berrange
0436e09f96 migration: split migration hooks out of QEMUFileOps
The QEMUFileOps struct contains the I/O subsystem callbacks
and the migration stage hooks. Split the hooks out into a
separate QEMUFileHooks struct to make it easier to refactor
the I/O side of QEMUFile without affecting the hooks.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <1461751518-12128-6-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-26 11:31:16 +05:30
Daniel P. Berrange
baf51e7739 migration: ensure qemu_fflush() always writes full data amount
The QEMUFile writev_buffer / put_buffer functions are expected
to write out the full set of requested data, blocking until
complete. The qemu_fflush() caller does not expect to deal with
partial writes. Clarify the function comments and add a sanity
check to the code to catch mistaken implementations.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <1461751518-12128-5-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-26 11:31:14 +05:30
Daniel P. Berrange
a8ec4437cd migration: remove use of qemu_bufopen from vmstate tests
Some of the test-vmstate.c test cases use a temporary file
while others use a memory buffer. To facilitate the future
removal of the qemu_bufopen() function, convert all the tests
to use a temporary file.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <1461751518-12128-4-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-26 11:31:11 +05:30
Daniel P. Berrange
d656ec5ea8 io: avoid double-free when closing QIOChannelBuffer
The QIOChannelBuffer's close implementation will free
the internal data buffer. It failed to reset the pointer
to NULL though, so when the object is later finalized
it will free it a second time with predictable crash.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <1461751518-12128-3-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-26 11:31:09 +05:30
Daniel P. Berrange
1fd791f007 s390: use FILE instead of QEMUFile for creating text file
The s390 skeys monitor command needs to write out a plain text
file. Currently it is using the QEMUFile class for this, but
work is ongoing to refactor QEMUFile and eliminate much code
related to it. The only feature qemu_fopen() gives over fopen()
is support for QEMU FD passing, but this can be achieved with
qemu_open() + fdopen() too. Switching to regular stdio FILE
APIs avoids the need to sprintf via an intermedia buffer which
slightly simplifies the code.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <1461751518-12128-2-git-send-email-berrange@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-26 11:31:05 +05:30
Kevin Wolf
b75536c9fa blockjob: Remove BlockJob.bs
There is a single remaining user in qemu-img, and another one in a test
case, both of which can be trivially converted to using BlockJob.blk
instead.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-05-25 19:04:21 +02:00
Kevin Wolf
4653456a5f commit: Use BlockBackend for I/O
This changes the commit block job to use the job's BlockBackend for
performing its I/O. job->bs isn't used by the commit code any more
afterwards.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-25 19:04:21 +02:00
Kevin Wolf
5c438bc68c backup: Use BlockBackend for I/O
This changes the backup block job to use the job's BlockBackend for
performing its I/O. job->bs isn't used by the backup code any more
afterwards.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-25 19:04:21 +02:00
Kevin Wolf
8543c27414 backup: Remove bs parameter from backup_do_cow()
Now that we pass the job to the function, bs is implied by that.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
2016-05-25 19:04:21 +02:00
John Snow
12b3e52e48 backup: Pack Notifier within BackupBlockJob
Instead of relying on peeking at bs->job, we want to explicitly get
a reference to the job that was involved in this notifier callback.

Pack the Notifier inside of the BackupBlockJob so we can use
container_of to get a reference back to the BackupBlockJob object.

This cuts out one more case where we rely unnecessarily on bs->job.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-25 19:04:21 +02:00
Kevin Wolf
91ab688379 backup: Don't leak BackupBlockJob in error path
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
2016-05-25 19:04:21 +02:00
Kevin Wolf
e253f4b897 mirror: Use BlockBackend for I/O
This changes the mirror block job to use the job's BlockBackend for
performing its I/O. job->bs isn't used by the mirroring code any more
afterwards.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-25 19:04:21 +02:00
Kevin Wolf
b880481579 mirror: Allow target that already has a BlockBackend
We had to forbid mirroring to a target BDS that already had a BB
attached because the node swapping at job completion would add a second
BB and we didn't support multiple BBs on a single BDS at the time. Now
we do, so we can lift the restriction.

As we allow additional BlockBackends for the target, we must expect
other users to be sending requests. There may no requests be in flight
during the graph modification, so we have to drain those users now.

The core part of this patch is a revert of commit 40365552.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-25 19:04:21 +02:00
Kevin Wolf
03e35d820d stream: Use BlockBackend for I/O
This changes the streaming block job to use the job's BlockBackend for
performing the COR reads. job->bs isn't used by the streaming code any
more afterwards.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-25 19:04:21 +02:00
Kevin Wolf
1e98fefd95 block: Make blk_co_preadv/pwritev() public
Also add trace points now that the function can be directly called.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
2016-05-25 19:04:21 +02:00
Kevin Wolf
b6d2e59995 block: Convert block job core to BlockBackend
This adds a new BlockBackend field to the BlockJob struct, which
coexists with the BlockDriverState while converting the individual jobs.

When creating a block job, a new BlockBackend is created on top of the
given BlockDriverState, and it is destroyed when the BlockJob ends. The
reference to the BDS is now held by the BlockBackend instead of calling
bdrv_ref/unref manually.

We have to be careful when we use bdrv_replace_in_backing_chain() in
block jobs because this changes the BDS that job->blk points to. At the
moment block jobs are too tightly coupled with their BDS, so that moving
a job to another BDS isn't easily possible; therefore, we need to just
manually undo this change afterwards.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-25 19:04:21 +02:00
Kevin Wolf
0c3169dffa block: Default to enabled write cache in blk_new()
The existing users of the function are:

1. blk_new_open(), which already enabled the write cache
2. Some test cases that don't care about the setting
3. blockdev_init() for empty drives, where the cache mode is overridden
   with the value from the options when a medium is inserted

Therefore, this patch doesn't change the current behaviour. It will be
convenient, however, for additional users of blk_new() (like block
jobs) if the most sensible WCE setting is the default.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
2016-05-25 19:04:21 +02:00
Kevin Wolf
a1a2af0756 block: Cancel jobs first in bdrv_close_all()
So far, bdrv_close_all() first removed all root BlockDriverStates of
BlockBackends and monitor owned BDSes, and then assumed that the
remaining BDSes must be related to jobs and cancelled these jobs.

This order doesn't work that well any more when block jobs use
BlockBackends internally because then they will lose their BDS before
being cancelled.

This patch changes bdrv_close_all() to first cancel all jobs and then
remove all root BDSes from the remaining BBs.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-25 19:04:21 +02:00
Alberto Garcia
a7112795c1 block: keep a list of block jobs
The current way to obtain the list of existing block jobs is to
iterate over all root nodes and check which ones own a job.

Since we want to be able to support block jobs in other nodes as well,
this patch keeps a list of jobs that is updated every time one is
created or destroyed.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-25 19:04:21 +02:00
Eric Blake
d004bd52aa block: Rename blk_write_zeroes()
Commit 983a1600 changed the semantics of blk_write_zeroes() to
be byte-based rather than sector-based, but did not change the
name, which is an open invitation for other code to misuse the
function.  Renaming to pwrite_zeroes() makes it more in line
with other byte-based interfaces, and will help make it easier
to track which remaining write_zeroes interfaces still need
conversion.

Reported-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-25 19:04:21 +02:00
Paolo Bonzini
8a8e63ebdd dma-helpers: change BlockBackend to opaque value in DMAIOFunc
Callers of dma_blk_io have no way to pass extra data to the DMAIOFunc,
because the original callback and opaque are gone by the time DMAIOFunc
is called.  On the other hand, the BlockBackend is usually derived
from those extra data that you could pass to the DMAIOFunc (in the
next patch, that would be the SCSIRequest).

So change DMAIOFunc's prototype, decoupling it from blk_aio_readv
and blk_aio_writev's.  The new prototype loses the BlockBackend
and gains an extra opaque value which, in the case of dma_blk_readv
and dma_blk_writev, is of course used for the BlockBackend.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-25 19:04:11 +02:00
Paolo Bonzini
cbe0ed6247 dma-helpers: change interface to byte-based
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-25 19:04:11 +02:00
Kevin Wolf
20018e12cf block: Propagate .drained_begin/end callbacks
When draining intermediate nodes (i.e. nodes that aren't the root node
for at least one of their parents; with node references, the user can
always configure the graph to create this situation), we need to
propagate the .drained_begin/end callbacks all the way up to the root
for the drain to be effective.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-25 19:04:11 +02:00
Kevin Wolf
36fe13317b block: Fix reconfiguring graph with drained nodes
When changing the BlockDriverState that a BdrvChild points to while the
node is currently drained, we must call the .drained_end() parent
callback. Conversely, when this means attaching a new node that is
already drained, we need to call .drained_begin().

bdrv_root_attach_child() takes now an opaque parameter, which is needed
because the callbacks must also be called if we're attaching a new child
to the BlockBackend when the root node is already drained, and they need
a way to identify the BlockBackend. Previously, child->opaque was set
too late and the callbacks would still see it as NULL.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-25 19:04:10 +02:00
Kevin Wolf
6820643fdb block: Make bdrv_drain() use bdrv_drained_begin/end()
Until now, bdrv_drained_begin() used bdrv_drain() internally to drain
the queue. This is kind of backwards and caused quiescing code to be
duplicated because bdrv_drained_begin() had to ensure that no new
requests come in even after bdrv_drain() returns, whereas bdrv_drain()
had to have them because it could be called from other places.

Instead move the bdrv_drain() code to bdrv_drained_begin() and make
bdrv_drain() a simple wrapper around bdrv_drained_begin/end().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-25 19:04:10 +02:00
Kevin Wolf
e9740bc6d4 block: Introduce bdrv_replace_child()
This adds a common function that is called when attaching a new child to
a parent, removing a child from a parent and when reconfiguring the
graph so that an existing child points to a different node now.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-25 19:04:10 +02:00
Max Reitz
109525ad6a block: Drop errp parameter from blk_new()
blk_new() cannot fail so its Error ** parameter has become superfluous.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-25 19:04:10 +02:00
Max Reitz
6b574e09b3 block: Drop bdrv_parent_cb_...() from bdrv_close()
bdrv_close() now asserts that the BDS's refcount is 0, therefore it
cannot have any parents and the bdrv_parent_cb_change_media() call is a
no-op.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-25 19:04:10 +02:00
Max Reitz
30f55fb81f block: Assert !bs->refcnt in bdrv_close()
The only caller of bdrv_close() left is bdrv_delete(). We may as well
assert that, in a way (there are some things in bdrv_close() that make
more sense under that assumption, such as the call to
bdrv_release_all_dirty_bitmaps() which in turn assumes that no frozen
bitmaps are attached to the BDS).

In addition, being called only in bdrv_delete() means that we can drop
bdrv_close()'s forward declaration at the top of block.c.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-25 19:04:10 +02:00
Max Reitz
5b3639371c block: Make bdrv_open() return a BDS
There are no callers to bdrv_open() or bdrv_open_inherit() left that
pass a pointer to a non-NULL BDS pointer as the first argument of these
functions, so we can finally drop that parameter and just make them
return the new BDS.

Generally, the following pattern is applied:

    bs = NULL;
    ret = bdrv_open(&bs, ..., &local_err);
    if (ret < 0) {
        error_propagate(errp, local_err);
        ...
    }

by

    bs = bdrv_open(..., errp);
    if (!bs) {
        ret = -EINVAL;
        ...
    }

Of course, there are only a few instances where the pattern is really
pure.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-25 19:04:10 +02:00
Max Reitz
9bddf75979 block: Drop bdrv_new_root()
It is unused now, so we may just as well drop it.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-25 19:04:10 +02:00
Max Reitz
28eb9b12f7 block: Drop blk_new_with_bs()
Its only caller is blk_new_open(), so we can just inline it there.

The bdrv_new_root() call is dropped in the process because we can just
let bdrv_open() create the BDS.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-25 19:04:10 +02:00
Max Reitz
21a699afc8 tests: Drop BDS from test-throttle.c
Now that throttling has been moved to the BlockBackend level, we do not
need to create a BDS along with the BB in the I/O throttling test.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-25 19:04:10 +02:00
Max Reitz
668361898e block: Let bdrv_open_inherit() return the snapshot
If bdrv_open_inherit() creates a snapshot BDS and *pbs is NULL, that
snapshot BDS should be returned instead of the BDS under it.

This has worked so far because (nearly) all users of BDRV_O_SNAPSHOT use
blk_new_open() to create the BDS tree. bdrv_append() (which is called by
bdrv_append_temp_snapshot()) redirects pointers from parents (i.e. the
BB in this case) to the newly appended child (i.e. the overlay),
therefore, while bdrv_open_inherit() did not return the root BDS, the BB
still pointed to it.

The only instance where BDRV_O_SNAPSHOT is used but blk_new_open() is
not is in blockdev_init() if no BDS tree is created, and instead
blk_new() is used and the flags are stored in the BB root state.
However, qmp_blockdev_change_medium() filters the BDRV_O_SNAPSHOT flag
before invoking bdrv_open(), so it will not have any effect.

In any case, it would be nicer if bdrv_open_inherit() could just always
return the root of the BDS tree that has been created.

To this end, bdrv_append_temp_snapshot() now returns the snapshot BDS
instead of just appending it on top of the snapshotted BDS. Also, it
calls bdrv_ref() before bdrv_append() (which bdrv_open_inherit() has to
undo if not returning the overlay).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-25 19:04:10 +02:00
Max Reitz
506f8709ce block: Drop useless bdrv_new() call
bdrv_append_temp_snapshot() uses bdrv_new() to create an empty BDS
before invoking bdrv_open() on that BDS. This is probably a relict from
when it used to do some modifications on that empty BDS, but now that is
unnecessary, so we can just set bs_snapshot to NULL and let bdrv_open()
do the rest.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-25 19:04:10 +02:00
Kevin Wolf
88be7b4be4 block: Fix bdrv_next() memory leak
The bdrv_next() users all leaked the BdrvNextIterator after completing
the iteration. Simply changing bdrv_next() to free the iterator before
returning NULL at the end of list doesn't work because some callers exit
the loop before looking at all BDSes.

This patch moves the BdrvNextIterator from the heap to the stack of
the caller and switches to a bdrv_first()/bdrv_next() interface for
initialising the iterator.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-25 19:04:10 +02:00
Andreas Färber
12b0e69cd7 MAINTAINERS: Drop Andreas as CPU maintainer
Signed-off-by: Andreas Färber <afaerber@suse.de>
2016-05-25 17:44:15 +02:00
Andreas Färber
211b76d1db MAINTAINERS: Drop Andreas as 0.15 maintainer
Downgrade to orphan status, like all other remaining stable entries.

Signed-off-by: Andreas Färber <afaerber@suse.de>
2016-05-25 17:44:15 +02:00
Andreas Färber
9f38774da2 MAINTAINERS: Drop Andreas as PReP maintainer
Signed-off-by: Andreas Färber <andreas.faerber@web.de>
2016-05-25 17:44:15 +02:00
Andreas Färber
aa373a1ec8 MAINTAINERS: Drop Andreas as Cocoa maintainer
Peter has taken over Cocoa maintainership.

Signed-off-by: Andreas Färber <andreas.faerber@web.de>
2016-05-25 17:14:56 +02:00
Prasad J Pandit
3af9187fc6 net: mipsnet: check packet length against buffer
When receiving packets over MIPSnet network device, it uses
receive buffer of size 1514 bytes. In case the controller
accepts large(MTU) packets, it could lead to memory corruption.
Add check to avoid it.

Reported by: Oleksandr Bazhaniuk <oleksandr.bazhaniuk@intel.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-05-25 15:46:07 +08:00
Zhou Jie
11196e95f0 net/tap: Allocating Large sized arrays to heap
net_init_tap has a huge stack usage of 8192 bytes approx.
Moving large arrays to heap to reduce stack usage.

Signed-off-by: Zhou Jie <zhoujie2011@cn.fujitsu.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-05-25 15:46:07 +08:00
Peter Maydell
287db79df8 Merge remote-tracking branch 'remotes/ehabkost/tags/x86-pull-request' into staging
X86 queue, 2016-05-23

# gpg: Signature made Mon 23 May 2016 23:48:27 BST using RSA key ID 984DC5A6
# gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>"

* remotes/ehabkost/tags/x86-pull-request:
  target-i386: kvm: Eliminate kvm_msr_entry_set()
  target-i386: kvm: Simplify MSR setting functions
  target-i386: kvm: Simplify MSR array construction
  target-i386: kvm: Increase MSR_BUF_SIZE
  target-i386: kvm: Allocate kvm_msrs struct once per VCPU
  target-i386: Call cpu_exec_init() on realize
  target-i386: Move TCG initialization to realize time
  target-i386: Move TCG initialization check to tcg_x86_init()
  cpu: Eliminate cpudef_init(), cpudef_setup()
  target-i386: Set constant model_id for qemu64/qemu32/athlon
  pc: Set CPU model-id on compat_props for pc <= 2.4
  osdep: Move default qemu_hw_version() value to a macro
  target-i386: kvm: Use X86XSaveArea struct for xsave save/load
  target-i386: Use xsave structs for ext_save_area
  target-i386: Define structs for layout of xsave area

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-24 13:06:33 +01:00
Peter Maydell
99694362ee Merge remote-tracking branch 'remotes/amit-migration/tags/migration-2.7-1' into staging
migration fixes:

- ensure src block devices continue fine after a failed migration
- fail on migration blockers; helps 9p savevm/loadvm
- move autoconverge commands out of experimental state
- move the migration-specific qjson in migration/

# gpg: Signature made Mon 23 May 2016 18:15:09 BST using RSA key ID 657EF670
# gpg: Good signature from "Amit Shah <amit@amitshah.net>"
# gpg:                 aka "Amit Shah <amit@kernel.org>"
# gpg:                 aka "Amit Shah <amitshah@gmx.net>"

* remotes/amit-migration/tags/migration-2.7-1:
  migration: regain control of images when migration fails to complete
  savevm: fail if migration blockers are present
  migration: Promote improved autoconverge commands out of experimental state
  migration/qjson: Drop gratuitous use of QOM
  migration: Move qjson.[ch] to migration/

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-24 12:21:07 +01:00
Peter Maydell
b0f6ef8915 Merge remote-tracking branch 'remotes/amit-virtio-rng/tags/rng-2.7-1' into staging
rng: rename RndRandom to RndRandom

# gpg: Signature made Mon 23 May 2016 16:44:58 BST using RSA key ID 657EF670
# gpg: Good signature from "Amit Shah <amit@amitshah.net>"
# gpg:                 aka "Amit Shah <amit@kernel.org>"
# gpg:                 aka "Amit Shah <amitshah@gmx.net>"

* remotes/amit-virtio-rng/tags/rng-2.7-1:
  rng-random: rename RndRandom to RngRandom

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-24 11:38:22 +01:00
Peter Maydell
4c63a818de Merge remote-tracking branch 'remotes/xtensa/tags/20160523-opencores_eth' into staging
opencores_eth cleanups:
- use mii.h
- reduce stack usage in open_eth_start_xmit.

# gpg: Signature made Mon 23 May 2016 20:14:20 BST using RSA key ID F83FA044
# gpg: Good signature from "Max Filippov <max.filippov@cogentembedded.com>"
# gpg:                 aka "Max Filippov <jcmvbkbc@gmail.com>"

* remotes/xtensa/tags/20160523-opencores_eth:
  hw/net/opencores_eth: Allocating Large sized arrays to heap
  hw/net/opencores_eth: use mii.h

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-24 10:19:45 +01:00
Eduardo Habkost
1abc2cae46 target-i386: kvm: Eliminate kvm_msr_entry_set()
Inline the function inside kvm_msr_entry_add().

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-05-23 19:47:37 -03:00
Eduardo Habkost
e25ffda7bd target-i386: kvm: Simplify MSR setting functions
Simplify kvm_put_tscdeadline_msr() and
kvm_put_msr_feature_control() using kvm_msr_buf and the
kvm_msr_entry_add() helper.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-05-23 19:47:37 -03:00
Eduardo Habkost
9c600a8454 target-i386: kvm: Simplify MSR array construction
Add a helper function that appends new entries to the MSR buffer
and checks for the buffer size limit.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-05-23 19:47:37 -03:00
Eduardo Habkost
d1138251bf target-i386: kvm: Increase MSR_BUF_SIZE
We are dangerously close to the array limits in kvm_put_msrs()
and kvm_get_msrs(): with the default mcg_cap configuration, we
can set up to 148 MSRs in kvm_put_msrs(), and if we allow mcg_cap
to be changed, we can write up to 236 MSRs.

Use 4096 bytes for the buffer, that can hold 255 kvm_msr_entry
structs.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-05-23 19:47:37 -03:00
Eduardo Habkost
d71b62a165 target-i386: kvm: Allocate kvm_msrs struct once per VCPU
Instead of using 2400 bytes in the stack for 150 MSR entries in
kvm_get_msrs() and kvm_put_msrs(), allocate a buffer once for
each VCPU.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-05-23 19:47:37 -03:00
Eduardo Habkost
42ecabaae1 target-i386: Call cpu_exec_init() on realize
QOM instance_init functions are not supposed to have any side-effects,
as new objects may be created at any moment for querying property
information (see qmp_device_list_properties()).

Calling cpu_exec_init() also affects QEMU's ability to handle errors
during CPU creation, as some actions done by cpu_exec_init() can't be
reverted.

Move cpu_exec_init() call to realize so a simple object_new() won't
trigger it, and so that it is called after some basic validation of CPU
parameters.

Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-05-23 19:47:37 -03:00
Eduardo Habkost
57f2453ab4 target-i386: Move TCG initialization to realize time
QOM instance_init functions are not supposed to have any side-effects,
as new objects may be created at any moment for querying property
information (see qmp_device_list_properties()).

Move TCG initialization to realize time so it won't be called when just
doing object_new() on a X86CPU subclass.

Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-05-23 19:47:37 -03:00
Eduardo Habkost
4fe15cdedf target-i386: Move TCG initialization check to tcg_x86_init()
Instead of requiring cpu.c to check if TCG was already initialized,
simply let the function be called multiple times.

Suggested-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-05-23 19:47:37 -03:00
Eduardo Habkost
3e2c0e062f cpu: Eliminate cpudef_init(), cpudef_setup()
x86_cpudef_init() doesn't do anything anymore, cpudef_init(),
cpudef_setup(), and x86_cpudef_init() can be finally removed.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-05-23 19:47:37 -03:00
Eduardo Habkost
9cf2cc3d82 target-i386: Set constant model_id for qemu64/qemu32/athlon
Newer PC machines don't set hw_version, and older machines set
model-id on compat_props explicitly, so we don't need the
x86_cpudef_setup() code that sets model_id using
qemu_hw_version() anymore.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-05-23 19:47:32 -03:00
Zhou Jie
ea4d824168 hw/net/opencores_eth: Allocating Large sized arrays to heap
open_eth_start_xmit has a huge stack usage of 65536 bytes approx.
Moving large arrays to heap to reduce stack usage.

Reduce size of a buffer allocated on stack to 0x600 bytes, which is the
maximal frame length when HUGEN bit is not set in MODER, only allocate
buffer on heap when that is too small. Thus heap is not used in typical
use case.

Signed-off-by: Zhou Jie <zhoujie2011@cn.fujitsu.com>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2016-05-23 22:10:16 +03:00
Max Filippov
aa8e0ab975 hw/net/opencores_eth: use mii.h
Drop local definitions of MII registers and use constants from mii.h for
registers and register bits. No functional changes.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2016-05-23 22:10:16 +03:00
Greg Kurz
fe904ea824 migration: regain control of images when migration fails to complete
We currently have an error path during migration that can cause
the source QEMU to abort:

migration_thread()
  migration_completion()
    runstate_is_running() ----------------> true if guest is running
    bdrv_inactivate_all() ----------------> inactivate images
    qemu_savevm_state_complete_precopy()
     ... qemu_fflush()
           socket_writev_buffer() --------> error because destination fails
         qemu_fflush() -------------------> set error on migration stream
  migration_completion() -----------------> set migrate state to FAILED
migration_thread() -----------------------> break migration loop
  vm_start() -----------------------------> restart guest with inactive
                                            images

and you get:

qemu-system-ppc64: socket_writev_buffer: Got err=104 for (32768/18446744073709551615)
qemu-system-ppc64: /home/greg/Work/qemu/qemu-master/block/io.c:1342:bdrv_co_do_pwritev: Assertion `!(bs->open_flags & 0x0800)' failed.
Aborted (core dumped)

If we try postcopy with a similar scenario, we also get the writev error
message but QEMU leaves the guest paused because entered_postcopy is true.

We could possibly do the same with precopy and leave the guest paused.
But since the historical default for migration errors is to restart the
source, this patch adds a call to bdrv_invalidate_cache_all() instead.

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Message-Id: <146357896785.6003.11983081732454362715.stgit@bahia.huguette.org>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-23 22:19:36 +05:30
Eduardo Habkost
cd6c1b7057 pc: Set CPU model-id on compat_props for pc <= 2.4
Instead of relying on x86_cpudef_setup() calling
qemu_hw_version(), just make old machines set model-id explicitly
on compat_props for qemu64, qemu32, and athlon. This will allow
us to eliminate x86_cpudef_setup() later.

Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-05-23 13:19:36 -03:00
Eduardo Habkost
d494352c2f osdep: Move default qemu_hw_version() value to a macro
The macro will be used by code that will stop calling
qemu_hw_version() at runtime and just need a constant value.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-05-23 13:19:36 -03:00
Eduardo Habkost
86cd2ea071 target-i386: kvm: Use X86XSaveArea struct for xsave save/load
Instead of using offset macros and bit operations in a uint32_t
array, use the X86XSaveArea struct to perform the loading/saving
operations in kvm_put_xsave() and kvm_get_xsave().

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-05-23 13:19:36 -03:00
Eduardo Habkost
ee1b09f695 target-i386: Use xsave structs for ext_save_area
This doesn't introduce any change in the code, as the offsets and
struct sizes match what was present in the table. This can be
validated by the QEMU_BUILD_BUG_ON lines on target-i386/cpu.h,
which ensures the struct sizes and offsets match the existing
values in ext_save_area.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-05-23 13:19:36 -03:00
Eduardo Habkost
b503717d28 target-i386: Define structs for layout of xsave area
Add structs that define the layout of the xsave areas used by
Intel processors. Add some QEMU_BUILD_BUG_ON lines to ensure the
structs match the XSAVE_* macros in target-i386/kvm.c and the
offsets and sizes at target-i386/cpu.c:ext_save_areas.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-05-23 13:19:36 -03:00
Greg Kurz
24f3902b08 savevm: fail if migration blockers are present
QEMU has currently two ways to prevent migration to occur:
- migration blocker when it depends on runtime state
- VMStateDescription.unmigratable when migration is not supported at all

This patch gathers all the logic into a single function to be called from
both the savevm and the migrate paths.

This fixes a bug with 9p, at least, where savevm would succeed and the
following would happen in the guest after loadvm:

$ ls /host
ls: cannot access /host: Protocol error

With this patch:

(qemu) savevm foo
Migration is disabled when VirtFS export path '/' is mounted in the guest
using mount_tag 'host'

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <146239057139.11271.9011797645454781543.stgit@bahia.huguette.org>

[Update subject according to Paolo's suggestion - Amit]

Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-23 21:44:08 +05:30
Peter Maydell
c915854761 Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
* NMI cleanups (Bandan)
* RAMBlock/Memory cleanups and fixes (Dominik, Gonglei, Fam, me)
* first part of linuxboot support for fw_cfg DMA (Richard)
* IOAPIC fix (Peter Xu)
* iSCSI SG_IO fix (Vadim)
* Various infrastructure bug fixes (Zhijian, Peter M., Stefan)
* CVE fixes (Prasad)

# gpg: Signature made Mon 23 May 2016 16:06:18 BST using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"

* remotes/bonzini/tags/for-upstream: (24 commits)
  cpus: call the core nmi injection function
  nmi: remove x86 specific nmi handling
  target-i386: add a generic x86 nmi handler
  coccinelle: add g_assert_cmp* to macro file
  iscsi: pass SCSI status back for SG_IO
  esp: check dma length before reading scsi command(CVE-2016-4441)
  esp: check command buffer length before write(CVE-2016-4439)
  scripts/signrom.py: Check for magic in option ROMs.
  scripts/signrom.py: Allow option ROM checksum script to write the size header.
  Remove config-devices.mak on 'make clean'
  cpus.c: Use pthread_sigmask() rather than sigprocmask()
  memory: remove unnecessary masking of MemoryRegion ram_addr
  memory: Drop FlatRange.romd_mode
  memory: Remove code for mr->may_overlap
  exec: adjust rcu_read_lock requirement
  memory: drop find_ram_block()
  vl: change runstate only if new state is different from current state
  ioapic: clear remote irr bit for edge-triggered interrupts
  ioapic: keep RO bits for IOAPIC entry
  target-i386: key sfence availability on CPUID_SSE, not CPUID_SSE2
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-23 16:15:52 +01:00
Bandan Das
1453e6627d cpus: call the core nmi injection function
We can call the common function here directly since
x86 specific actions will be taken care of by the arch
specific nmi handler

Signed-off-by: Bandan Das <bsd@redhat.com>
Message-Id: <1463761717-26558-4-git-send-email-bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-23 16:53:47 +02:00
Bandan Das
f7e981f295 nmi: remove x86 specific nmi handling
nmi_monitor_handle is wired to call the x86 nmi
handler. So, we can directly use it at call sites.

Signed-off-by: Bandan Das <bsd@redhat.com>
Message-Id: <1463761717-26558-3-git-send-email-bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-23 16:53:46 +02:00
Bandan Das
1255166b99 target-i386: add a generic x86 nmi handler
Instead of having x86 ifdefs in core nmi code, this
change adds a arch specific handler that the nmi common
code can call.

Signed-off-by: Bandan Das <bsd@redhat.com>
Message-Id: <1463761717-26558-2-git-send-email-bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-23 16:53:46 +02:00
Paolo Bonzini
6ad978e9f4 coccinelle: add g_assert_cmp* to macro file
This helps applying semantic patches to unit tests.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-23 16:53:46 +02:00
Vadim Rozenfeld
644c6869d3 iscsi: pass SCSI status back for SG_IO
Signed-off-by: Vadim Rozenfeld <vrozenfe@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-23 16:53:46 +02:00
Prasad J Pandit
6c1fef6b59 esp: check dma length before reading scsi command(CVE-2016-4441)
The 53C9X Fast SCSI Controller(FSC) comes with an internal 16-byte
FIFO buffer. It is used to handle command and data transfer.
Routine get_cmd() uses DMA to read scsi commands into this buffer.
Add check to validate DMA length against buffer size to avoid any
overrun.

Fixes CVE-2016-4441.

Reported-by: Li Qiang <liqiang6-s@360.cn>
Cc: qemu-stable@nongnu.org
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <1463654371-11169-3-git-send-email-ppandit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-23 16:53:46 +02:00
Prasad J Pandit
c98c6c105f esp: check command buffer length before write(CVE-2016-4439)
The 53C9X Fast SCSI Controller(FSC) comes with an internal 16-byte
FIFO buffer. It is used to handle command and data transfer. While
writing to this command buffer 's->cmdbuf[TI_BUFSZ=16]', a check
was missing to validate input length. Add check to avoid OOB write
access.

Fixes CVE-2016-4439.

Reported-by: Li Qiang <liqiang6-s@360.cn>
Cc: qemu-stable@nongnu.org
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <1463654371-11169-2-git-send-email-ppandit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-23 16:53:45 +02:00
Richard W.M. Jones
fd28938b7a scripts/signrom.py: Check for magic in option ROMs.
Because of the risk that compilers might not emit the asm() block at
the beginning of the option ROM, check that the ROM contains the
required magic signature.

Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
Message-Id: <1463000807-18015-3-git-send-email-rjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-23 16:53:45 +02:00
Richard W.M. Jones
6f71b779c8 scripts/signrom.py: Allow option ROM checksum script to write the size header.
Modify the signrom.py script so that if the size byte in the header is
0 (ie. not set) then the script will set the size.  If the size byte
is non-zero then we do the same as before, so this doesn't require
changes to any existing ROM sourcecode.

Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
Message-Id: <1463000807-18015-2-git-send-email-rjones@redhat.com>
2016-05-23 16:53:45 +02:00
Peter Maydell
168340b6ba Remove config-devices.mak on 'make clean'
Our dependency mechanism works like this:
 * on first build there is neither a .o nor a .d
 * we create the .d as a side effect of creating the .o
 * for rebuilds we know when we need to update the .o,
   which also updates the .d

This system requires that you're never in a situation where there is
a .o file but no .d (because then we will never realise we need to
build the .d, and we will not have the dependency information about
when to rebuild the .o).

This is working fine for our object files, but we also try to use it
for $TARGET/config-devices.mak (where the dependency file is
in $TARGET-config-devices.mak.d). Unfortunately "make clean" doesn't
remove config-devices.mak, which means that it puts us in the
forbidden situation of "object file exists but not its .d file".
This in turn means that we will fail to notice when we need to rebuild:
  mkdir build/depbug
  (cd build/depbug && '../../configure')
  make -C build/depbug -j8
  make -C build/depbug clean
  echo "CONFIG_CANARY = y" >> default-configs/arm-softmmu.mak
  make -C build/depbug
  grep CANARY build/depbug/aarch64-softmmu/config-devices.mak

The CANARY token should show up in config-devices.mak but does not.

Fix this bug by making "make clean" delete the config-devices.mak files.
config-all-devices.mak doesn't have the same problem since it has
no .d file, but delete it too, since it is created by "make" and
logically should be removed by "make clean".

(Note that it is important not to remove config-devices.mak until
after we have recursively run 'make clean' in the subdirectories.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <1463484451-22979-1-git-send-email-peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-23 16:53:45 +02:00
Peter Maydell
a2d1761da1 cpus.c: Use pthread_sigmask() rather than sigprocmask()
On Linux, sigprocmask() and pthread_sigmask() are in practice the
same thing (they only set the signal mask for the calling thread),
but the documentation states that the behaviour of sigprocmask() in a
multithreaded process is undefined. Use pthread_sigmask() instead
(which is what we do in almost all places in QEMU that alter the
signal mask already).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <1463420039-29761-1-git-send-email-peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-23 16:53:45 +02:00
Paolo Bonzini
e4e697940d memory: remove unnecessary masking of MemoryRegion ram_addr
mr->ram_block->offset is already aligned to both host and target size
(see qemu_ram_alloc_internal).  Remove further masking as it is
unnecessary.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-23 16:53:45 +02:00
Fam Zheng
5b5660adf1 memory: Drop FlatRange.romd_mode
Its value is alway set to mr->romd_mode, so the removed comparisons are
fully superseded by "a->mr == b->mr".

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <1458900629-2334-3-git-send-email-famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-23 16:53:44 +02:00
Fam Zheng
b613597819 memory: Remove code for mr->may_overlap
The collision check does nothing and hasn't been used. Remove the
variable together with related code.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <1458900629-2334-2-git-send-email-famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-23 16:53:44 +02:00
Gonglei
ab0a995608 exec: adjust rcu_read_lock requirement
qemu_ram_unset_idstr() doesn't need rcu lock anymore,
meanwhile make the range of rcu lock in
qemu_ram_set_idstr() as small as possible.

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Message-Id: <1462845901-89716-3-git-send-email-arei.gonglei@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-23 16:53:44 +02:00
Gonglei
fa53a0e53e memory: drop find_ram_block()
On the one hand, we have already qemu_get_ram_block() whose function
is similar. On the other hand, we can directly use mr->ram_block but
searching RAMblock by ram_addr which is a kind of waste.

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-Id: <1462845901-89716-2-git-send-email-arei.gonglei@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-23 16:53:44 +02:00
Li Zhijian
e92a2d9cb3 vl: change runstate only if new state is different from current state
Previously, qemu will abort at following scenario:
(qemu) stop
(qemu) system_reset
(qemu) system_reset
(qemu) 2016-04-13T20:54:38.979158Z qemu-system-x86_64: invalid runstate transition: 'prelaunch' -> 'prelaunch'

Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1460604352-18630-1-git-send-email-lizhijian@cn.fujitsu.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-23 16:53:44 +02:00
Peter Xu
ed1263c363 ioapic: clear remote irr bit for edge-triggered interrupts
This is to better emulate IOAPIC version 0x1X hardware. Linux kernel
leveraged this "feature" to do explicit EOI since EOI register is still
not introduced at that time. This will also fix the issue that level
triggered interrupts failed to work when IR enabled (tested with Linux
kernel version 4.5).

Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1462875682-1349-3-git-send-email-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-23 16:53:43 +02:00
Peter Xu
479c2a1cb7 ioapic: keep RO bits for IOAPIC entry
Currently IOAPIC RO bits can be written. To be better aligned with
hardware, we should let them read-only.

Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1462875682-1349-2-git-send-email-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-23 16:53:43 +02:00
Paolo Bonzini
14cb949a3e target-i386: key sfence availability on CPUID_SSE, not CPUID_SSE2
sfence was introduced before lfence and mfence.  This fixes Linux
2.4's measurement of checksumming speeds for the pIII_sse
algorithm:

md: linear personality registered as nr 1
md: raid0 personality registered as nr 2
md: raid1 personality registered as nr 3
md: raid5 personality registered as nr 4
raid5: measuring checksumming speed
   8regs     :   384.400 MB/sec
   32regs    :   259.200 MB/sec
invalid operand: 0000
CPU:    0
EIP:    0010:[<c0240b2a>]    Not tainted
EFLAGS: 00000246
eax: c15d8000   ebx: 00000000   ecx: 00000000   edx: c15d5000
esi: 8005003b   edi: 00000004   ebp: 00000000   esp: c15bdf50
ds: 0018   es: 0018   ss: 0018
Process swapper (pid: 1, stackpage=c15bd000)
Stack: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000
       00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000
       00000000 00000206 c0241c6c 00001000 c15d4000 c15d7000 c15d4000
c15d4000
Call Trace:    [<c0241c6c>] [<c0105000>] [<c0241db4>] [<c010503b>]
[<c0105000>]
  [<c0107416>] [<c0105030>]

Code: 0f ae f8 0f 10 04 24 0f 10 4c 24 10 0f 10 54 24 20 0f 10 5c
 <0>Kernel panic: Attempted to kill init!

Reported-by: Stefan Weil <sw@weilnetz.de>
Fixes: 121f315788
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-23 16:53:43 +02:00
Stefan Weil
5919e0328b configure: Allow builds with extra warnings
The clang compiler supports a useful compiler option -Weverything,
and GCC also has other warnings not enabled by -Wall.

If glib header files trigger a warning, however, testing glib with
-Werror will always fail. A size mismatch is also detected without
-Werror, so simply remove it.

Cc: qemu-stable@nongnu.org
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Message-Id: <1461879221-13338-1-git-send-email-sw@weilnetz.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-23 16:53:43 +02:00
Prasad J Pandit
691a02e2ce i386: kvmvapic: initialise imm32 variable
When processing Task Priorty Register(TPR) access, it could leak
automatic stack variable 'imm32' in patch_instruction().
Initialise the variable to avoid it.

Reported by: Donghai Zdh <donghai.zdh@alibaba-inc.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <1460013608-16670-1-git-send-email-ppandit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-23 16:53:43 +02:00
Pranith Kumar
dfc007f7f7 docs/atomics.txt: Update pointer to linux macro
Add a missing end brace and update doc to point to the latest access
macro. ACCESS_ONCE() is deprecated.

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <1462198852-28694-1-git-send-email-bobby.prani@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-23 16:53:43 +02:00
Dominik Dingel
d2f39add72 exec.c: Ensure right alignment also for file backed ram
While in the anonymous ram case we already take care of the right alignment
such an alignment gurantee does not exist for file backed ram allocation.

Instead, pagesize is used for alignment. On s390 this is not enough for gmap,
as we need to satisfy an alignment up to segments.

Reported-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>

Message-Id: <1461585338-45863-1-git-send-email-dingel@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-23 16:53:42 +02:00
Peter Maydell
2b5f477789 Merge remote-tracking branch 'remotes/kraxel/tags/pull-usb-20160523-1' into staging
usb: add xen pvUSB backend, add num-ports check to ohci.

# gpg: Signature made Mon 23 May 2016 14:02:25 BST using RSA key ID D3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"

* remotes/kraxel/tags/pull-usb-20160523-1:
  usb/ohci: Fix crash with when specifying too many num-ports
  xen: add pvUSB backend
  xen: write information about supported backends
  xen: introduce dummy system device

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-23 15:53:02 +01:00
Peter Maydell
38629bf5e4 Merge remote-tracking branch 'remotes/kraxel/tags/pull-vga-20160523-1' into staging
vga: fix CVE-2016-3712 regression, misc virtio-gpu fixes.

# gpg: Signature made Mon 23 May 2016 13:30:26 BST using RSA key ID D3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"

* remotes/kraxel/tags/pull-vga-20160523-1:
  vga: add sr_vbe register set
  virtio-gpu: fix ui idx check
  virtio-gpu: use VIRTIO_GPU_MAX_SCANOUTS
  virtio-gpu: check max_outputs only
  virtio-gpu: check max_outputs value
  virtio-vga: propagate on gpu realized error
  virtio-gpu: check early scanout id

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-23 14:50:40 +01:00
Thomas Huth
d400fc018b usb/ohci: Fix crash with when specifying too many num-ports
QEMU currently crashes when an OHCI controller is instantiated with
too many ports, e.g. "-device pci-ohci,num-ports=100,masterbus=1".
Thus add a proper check in usb_ohci_init() to make sure that we
do not use more than OHCI_MAX_PORTS = 15 ports here.

Ticket: https://bugs.launchpad.net/qemu/+bug/1581308
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-id: 1463995387-11710-1-git-send-email-thuth@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-05-23 14:59:40 +02:00
Gerd Hoffmann
94ef4f337f vga: add sr_vbe register set
Commit "fd3c136 vga: make sure vga register setup for vbe stays intact
(CVE-2016-3712)." causes a regression.  The win7 installer is unhappy
because it can't freely modify vga registers any more while in vbe mode.

This patch introduces a new sr_vbe register set.  The vbe_update_vgaregs
will fill sr_vbe[] instead of sr[].  Normal vga register reads and
writes go to sr[].  Any sr register read access happens through a new
sr() helper function which will read from sr_vbe[] with vbe active and
from sr[] otherwise.

This way we can allow guests update sr[] registers as they want, without
allowing them disrupt vbe video modes that way.

Cc: qemu-stable@nongnu.org
Reported-by: Thomas Lamprecht <thomas@lamprecht.org>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 1463475294-14119-1-git-send-email-kraxel@redhat.com
2016-05-23 14:28:25 +02:00
Juergen Gross
816ac92ef7 xen: add pvUSB backend
Add a backend for para-virtualized USB devices for xen domains.

The backend is using host-libusb to forward USB requests from a
domain via libusb to the real device(s) passed through.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Message-id: 1463062421-613-4-git-send-email-jgross@suse.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-05-23 13:30:03 +02:00
Marc-André Lureau
6b860806c0 virtio-gpu: fix ui idx check
Fix off-by-one value check (0 is the first scanout).

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 1463653560-26958-7-git-send-email-marcandre.lureau@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-05-23 13:30:03 +02:00
Juergen Gross
637c53ffcb xen: write information about supported backends
Add a Xenstore directory for each supported pv backend. This will allow
Xen tools to decide which backend type to use in case there are
multiple possibilities.

The information is added under
/local/domain/<backend-domid>/device-model/<domid>/backends
before the "running" state is written to Xenstore. Using a directory
for each backend enables us to add parameters for specific backends
in the future.

This interface is documented in the Xen source repository in the file
docs/misc/qemu-backends.txt

In order to reuse the Xenstore directory creation already present in
hw/xen/xen_devconfig.c move the related functions to
hw/xen/xen_backend.c where they fit better.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Message-id: 1463062421-613-3-git-send-email-jgross@suse.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-05-23 13:30:03 +02:00
Marc-André Lureau
acfc484650 virtio-gpu: use VIRTIO_GPU_MAX_SCANOUTS
The value is defined in virtio_gpu.h already (changing from 4 to 16).

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 1463653560-26958-6-git-send-email-marcandre.lureau@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-05-23 13:30:03 +02:00
Juergen Gross
9432e53a5b xen: introduce dummy system device
Introduce a new dummy system device serving as parent for virtual
buses. This will enable new pv backends to introduce virtual buses
which are removable again opposed to system buses which are meant
to stay once added.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Message-id: 1463062421-613-2-git-send-email-jgross@suse.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-05-23 13:30:03 +02:00
Marc-André Lureau
2fe760554e virtio-gpu: check max_outputs only
The scanout id should not be above the configured num_scanouts.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 1463653560-26958-5-git-send-email-marcandre.lureau@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-05-23 13:30:03 +02:00
Marc-André Lureau
5e3d741c6a virtio-gpu: check max_outputs value
The value must be less than VIRTIO_GPU_MAX_SCANOUT.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 1463653560-26958-4-git-send-email-marcandre.lureau@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-05-23 13:30:03 +02:00
Marc-André Lureau
d0f0c8654a virtio-vga: propagate on gpu realized error
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 1463653560-26958-3-git-send-email-marcandre.lureau@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-05-23 13:30:03 +02:00
Marc-André Lureau
fe89fdebca virtio-gpu: check early scanout id
Before accessing the g->scanout array, in order to avoid potential
out-of-bounds access.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 1463653560-26958-2-git-send-email-marcandre.lureau@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-05-23 13:30:03 +02:00
Jason J. Herne
d85a31d1f4 migration: Promote improved autoconverge commands out of experimental state
The new autoconverge throttling commands have been tested for a release now. It
is time to move them out of the experimental state.

Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Message-Id: <1461262038-8197-1-git-send-email-jjherne@linux.vnet.ibm.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-23 16:05:09 +05:30
Peter Maydell
e081c24d30 Merge remote-tracking branch 'remotes/ehabkost/tags/machine-pull-request' into staging
Machine Core queue, 2016-05-20

# gpg: Signature made Fri 20 May 2016 21:26:49 BST using RSA key ID 984DC5A6
# gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>"

* remotes/ehabkost/tags/machine-pull-request: (21 commits)
  Use &error_fatal when initializing crypto on qemu-{img,io,nbd}
  vl: Use &error_fatal when parsing monitor options
  vl: Use &error_fatal when parsing VNC options
  machine: add properties to compat_props incrementaly
  vl: Simplify global property registration
  vl: Make display_remote a local variable
  vl: Move DisplayType typedef to vl.c
  vl: Make display_type a local variable
  vl: Replace DT_NOGRAPHIC with machine option
  milkymist: Move DT_NOGRAPHIC check outside milkymist_tmu2_create()
  spice: Initialization stubs on qemu-spice.h
  gtk: Initialization stubs
  cocoa: cocoa_display_init() stub
  sdl: Initialization stubs
  curses: curses_display_init() stub
  vnc: Initialization stubs
  vl: Add DT_COCOA DisplayType value
  vl: Replace *_vga_available() functions with class_names field
  vl: Table-based select_vgahw()
  vl: Use exit(1) when requested VGA interface is unavailable
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-23 10:30:41 +01:00
Markus Armbruster
b72fe9e690 migration/qjson: Drop gratuitous use of QOM
All the use of QOM buys us here is the ability to destroy the thing
with object_unref(OBJECT(vmdesc)).  Not worth the notational overhead.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <1462380558-2030-3-git-send-email-armbru@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-23 14:16:12 +05:30
Markus Armbruster
17b74b9867 migration: Move qjson.[ch] to migration/
Type QJSON lets you build JSON text.  Its interface mirrors (a subset
of) abstract JSON syntax.

QAPI output visitors also produce JSON text.  They assert their
preconditions and invariants, and therefore abort on incorrect use.

Contrastingly, QJSON does *not* detect incorrect use.  It happily
produces invalid JSON then.  This is what migration wants.

QJSON was designed for migration, and migration is its only user.
Move it to migration/ for proper coverage by MAINTAINERS, and to deter
accidental use outside migration.

[Pointed out by Eric: QJSON was added in commits 0457d07..b174257
 -- Amit]

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <1462380558-2030-2-git-send-email-armbru@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-23 14:16:09 +05:30
Wei Jiangang
cde6361534 rng-random: rename RndRandom to RngRandom
Usually, Random Number Generator is abbreviated to RNG/rng.
so replacing RndRandom with RngRandom seems more reasonable
and keep consistent with RngBackend.

Signed-off-by: Wei Jiangang <weijg.fnst@cn.fujitsu.com>
Reviewed-by: Pankaj Gupta <pagupta@redhat.com>
Message-Id: <1460684168-5403-1-git-send-email-weijg.fnst@cn.fujitsu.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-05-23 12:18:43 +05:30
Eduardo Habkost
e8f2d2722e Use &error_fatal when initializing crypto on qemu-{img,io,nbd}
In addition to making the code simpler, this will replace the
long error messages:
  cannot initialize crypto: Unable to initialize GNUTLS library: [...]
  cannot initialize crypto: Unable to initialize gcrypt
with shorter messages:
  Unable to initialize GNUTLS library: [...]
  Unable to initialize gcrypt

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-05-20 14:28:55 -03:00
Eduardo Habkost
822ac12df0 vl: Use &error_fatal when parsing monitor options
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-05-20 14:28:55 -03:00
Eduardo Habkost
7b1ee0f2b7 vl: Use &error_fatal when parsing VNC options
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-05-20 14:28:55 -03:00
Igor Mammedov
bacc344c54 machine: add properties to compat_props incrementaly
Switch to adding compat properties incrementaly instead of
completly overwriting compat_props per machine type.
That removes data duplication which we have due to nested
[PC|SPAPR]_COMPAT_* macros.

It also allows to set default device properties from
default foo_machine_options() hook, which will be used
in following patch for putting VMGENID device as
a function if ISA bridge on pc/q35 machines.

Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
[ehabkost: Fixed CCW_COMPAT_* and PC_COMPAT_0_* defines]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-05-20 14:28:54 -03:00
Eduardo Habkost
16714b1680 vl: Simplify global property registration
There's no need to use qdev_prop_register_global_list() and an
array, if we are registering a single GlobalProperty struct. Use
qdev_prop_register_global() instead.

Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-05-20 14:28:54 -03:00
Eduardo Habkost
1f0dfe02d4 vl: Make display_remote a local variable
The variable is used only inside main(), so it can be local.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-05-20 14:28:54 -03:00
Eduardo Habkost
0cb48c4678 vl: Move DisplayType typedef to vl.c
Now the type is only used inside vl.c and doesn't need to be in a
header file.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-05-20 14:28:54 -03:00
Eduardo Habkost
d29345d011 vl: Make display_type a local variable
Now display_type is only used inside main(), and don't need to be a
global variable.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-05-20 14:28:54 -03:00
Eduardo Habkost
cfc58cf373 vl: Replace DT_NOGRAPHIC with machine option
All DisplayType values are just UI options that don't affect any
hardware emulation code, except for DT_NOGRAPHIC. Replace
DT_NOGRAPHIC with DT_NONE plus a new "-machine graphics=on|off"
option, so hardware emulation code don't need to use the
display_type variable.

Cc: Michael Walle <michael@walle.cc>
Cc: Blue Swirl <blauwirbel@gmail.com>
Cc: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-05-20 14:28:54 -03:00
Eduardo Habkost
cf3dc71eb5 milkymist: Move DT_NOGRAPHIC check outside milkymist_tmu2_create()
DT_NOGRAPHIC handling will be moved to a MachineState field, and
it will be easier to change milkymist_init() to check that field.

Cc: Michael Walle <michael@walle.cc>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-05-20 14:28:54 -03:00
Eduardo Habkost
6f0c894c25 spice: Initialization stubs on qemu-spice.h
This reduces the number of CONFIG_SPICE #ifdefs in vl.c.

Cc: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-05-20 14:28:53 -03:00
Eduardo Habkost
19a2c6269f gtk: Initialization stubs
This reduces the number of CONFIG_GTK #ifdefs in vl.c.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-05-20 14:28:53 -03:00
Eduardo Habkost
e35ee7c1aa cocoa: cocoa_display_init() stub
One less #ifdef in vl.c.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-05-20 14:28:53 -03:00
Eduardo Habkost
476db0814d sdl: Initialization stubs
This reduces the number of CONFIG_SDL #ifdefs in vl.c.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-05-20 14:28:53 -03:00
Eduardo Habkost
674ec68693 curses: curses_display_init() stub
One less #ifdef in vl.c.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-05-20 14:28:53 -03:00
Eduardo Habkost
f8c75b2486 vnc: Initialization stubs
This reduces the number of CONFIG_VNC #ifdefs in the vl.c code.

The only user-visible difference is that this will make QEMU
complain about syntax when using "-display vnc" ("VNC requires a
display argument vnc=<display>") even if CONFIG_VNC is disabled.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-05-20 14:28:52 -03:00
Eduardo Habkost
7b7d2be50c vl: Add DT_COCOA DisplayType value
Instead of reusing DT_SDL for Cocoa, use DT_COCOA to indicate
that a Cocoa display was requested.

configure already ensures CONFIG_COCOA and CONFIG_SDL are never
set at the same time. The only case where DT_SDL is used outside
a #ifdef CONFIG_SDL block is in the no_frame/alt_grab/ctrl_grab
check. That means the only user-visible change is that we will
start printing a warning if the SDL-specific options are used in
Cocoa mode. This is a bugfix, because no_frame/alt_grab/ctrl_grab
are not used by Cocoa code.

Cc: Andreas Färber <andreas.faerber@web.de>
Cc: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Andreas Färber <andreas.faerber@web.de>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-05-20 14:28:52 -03:00
Eduardo Habkost
c2c7b22db1 vl: Replace *_vga_available() functions with class_names field
Instead of requiring a separate function for each VGA interface,
just enumerate the corresponding class names on struct
VGAInterfaceInfo.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-05-20 14:28:52 -03:00
Eduardo Habkost
8c9a2b71de vl: Table-based select_vgahw()
Instead of implementing separate check functions for each vga
interface type, add a table enumerating the possible VGA
interfaces.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-05-20 14:28:52 -03:00
Eduardo Habkost
4aeae8768a vl: Use exit(1) when requested VGA interface is unavailable
Instead of using exit(0), use exit(1) when an unavailable VGA
interface is used in the command-line to indicate it's an error.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-05-20 14:28:52 -03:00
Cao jin
07fcd59de6 pc-dimm: correct comment of MemoryHotplugState
correct comment and remove an unused macro. commit adcb4ee6
already correct its type

Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-05-20 14:28:52 -03:00
Paolo Bonzini
65603e2fc1 tci: do not include exec/exec-all.h
TCI does not need the runtime definition in exec-all.h.  It only needs the
host-side definitions in tcg/tcg.h.  Now that cpu.h is not included
everywhere, this caused a failure because exec-all.h does need cpu.h
but does not include it itself.

Fix by including the intended header.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1463745452-25831-1-git-send-email-pbonzini@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-20 15:07:46 +01:00
Paolo Bonzini
22b31af26f aspeed: include qemu/log.h
This is not visible with the default "log" trace backend.  With other
backends however trace.h does not include qemu/log.h, resulting in
build failures.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1463745452-25831-2-git-send-email-pbonzini@redhat.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-20 13:09:22 +01:00
Peter Maydell
6bd8ab6889 Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches

# gpg: Signature made Thu 19 May 2016 16:09:27 BST using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"

* remotes/kevin/tags/for-upstream: (31 commits)
  qemu-iotests: Fix regression in 136 on aio_read invalid
  qemu-iotests: Simplify 109 with unaligned qemu-img compare
  qemu-io: Fix recent UI updates
  block: clarify error message for qmp-eject
  qemu-iotests: Some more write_zeroes tests
  qcow2: Fix write_zeroes with partially allocated backing file cluster
  qcow2: fix condition in is_zero_cluster
  block: Propagate AioContext change to all children
  block: Remove BlockDriverState.blk
  block: Don't return throttling info in query-named-block-nodes
  block: Avoid bs->blk in bdrv_next()
  block: Add bdrv_has_blk()
  block: Remove bdrv_aio_multiwrite()
  blockjob: Don't touch BDS iostatus
  blockjob: Don't set iostatus of target
  block: User BdrvChild callback for device name
  block: Use BdrvChild callbacks for change_media/resize
  block: Don't check throttled reqs in bdrv_requests_pending()
  Revert "block: Forbid I/O throttling on nodes with multiple parents for 2.6"
  block: Remove bdrv_move_feature_fields()
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-19 16:54:12 +01:00
Kevin Wolf
7753da2351 Merge remote-tracking branch 'mreitz/tags/pull-block-for-kevin-2016-05-19' into queue-block
Block patches

# gpg: Signature made Thu May 19 16:58:53 2016 CEST using RSA key ID E838ACAD
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>"

* mreitz/tags/pull-block-for-kevin-2016-05-19:
  qemu-iotests: Fix regression in 136 on aio_read invalid
  qemu-iotests: Simplify 109 with unaligned qemu-img compare
  qemu-io: Fix recent UI updates

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-19 16:59:46 +02:00
Eric Blake
37546ff28f qemu-iotests: Fix regression in 136 on aio_read invalid
Commit 093ea232 removed the ability for aio_read and aio_write
to artificially inflate the invalid statistics counters for
block devices, since it no longer flags unaligned offset or
length.  Add 'aio_read -i' and 'aio_write -i' to restore
the ability, and update test 136 to use it.

Reported-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 1463416983-28318-4-git-send-email-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-05-19 16:56:58 +02:00
Eric Blake
9e28bb26c2 qemu-iotests: Simplify 109 with unaligned qemu-img compare
For some time now, qemu-img compare has been able to compare
unaligned images.  So we no longer need test 109's hack of
resizing to sector boundaries before invoking compare.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1463416983-28318-3-git-send-email-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-05-19 16:56:58 +02:00
Eric Blake
4ca1d3401b qemu-io: Fix recent UI updates
Commit 770e0e0e [*] tried to add 'writev -f', but didn't tweak
the getopt() call to actually let it work.  Likewise, commit
c2e001c missed implementing 'aio_write -u -z'.  The latter commit
also introduced a leak of ctx.

[*] does it sound "ech0e" in here? :)

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 1463416983-28318-2-git-send-email-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-05-19 16:56:58 +02:00
Peter Maydell
776efef324 Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
NEED_CPU_H cleanups, big enough to deserve their own pull request.

# gpg: Signature made Thu 19 May 2016 15:42:37 BST using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"

* remotes/bonzini/tags/for-upstream: (52 commits)
  hw: clean up hw/hw.h includes
  hw: remove pio_addr_t
  cpu: move exec-all.h inclusion out of cpu.h
  exec: extract exec/tb-context.h
  hw: explicitly include qemu/log.h
  mips: move CP0 functions out of cpu.h
  arm: move arm_log_exception into .c file
  qemu-common: push cpu.h inclusion out of qemu-common.h
  acpi: do not use TARGET_PAGE_SIZE
  s390x: reorganize CSS bits between cpu.h and other headers
  dma: do not depend on kvm_enabled()
  gdbstub: remove unnecessary includes from gdbstub-xml.c
  qemu-common: stop including qemu/host-utils.h from qemu-common.h
  qemu-common: stop including qemu/bswap.h from qemu-common.h
  cpu: move endian-dependent load/store functions to cpu-all.h
  hw: cannot include hw/hw.h from user emulation
  hw: move CPU state serialization to migration/cpu.h
  hw: do not use VMSTATE_*TL
  include: poison symbols in osdep.h
  apic: move target-dependent definitions to cpu.h
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-19 15:55:08 +01:00
John Snow
3a3086b72a block: clarify error message for qmp-eject
If you use HMP's eject but the CDROM tray is locked, you may get a
confusing error message informing you that the "tray isn't open."

As this is the point of eject, we can do a little better and help
clarify that the tray was locked and that it (might) open up later,
so try again.

It's not ideal, but it makes the semantics of the (legacy) eject
command more understandable to end users when they try to use it.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-19 16:45:31 +02:00
Kevin Wolf
1ef7d01021 qemu-iotests: Some more write_zeroes tests
This covers some more write_zeroes cases which are relevant for the
recent qcow2 optimisations that check the allocation status of the
backing file for partial cluster write_zeroes requests.

This needs to be separate from 034 because we can only support qcow2 in
this test case for multiple reasons: We check the allocation status
after write_zeroes with 'qemu-img map' and the optimised behaviour that
produces zero clusters is only implemented in qcow2; second, the map
command returns offsets that are qcow2 specific; and finally, we also
use 512 byte clusters which aren't supported for formats like qed.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-05-19 16:45:31 +02:00
Kevin Wolf
5efdf53227 qcow2: Fix write_zeroes with partially allocated backing file cluster
In order to correctly check whether a given cluster is read as zero, we
don't only need to check whether bdrv_get_block_status_above() sets
BDRV_BLOCK_ZERO, but also if all sectors for the whole cluster have the
same status.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
2016-05-19 16:45:31 +02:00
Denis V. Lunev
f575f145f4 qcow2: fix condition in is_zero_cluster
We should check for (res & BDRV_BLOCK_ZERO) only. The situation when we
will have !(res & BDRV_BLOCK_DATA) and will not have BDRV_BLOCK_ZERO is
not possible for images with bdi.unallocated_blocks_are_zero == true.

For those images where it's false, however, it can happen and we must
not consider the data zeroed then or we would corrupt the image.

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-19 16:45:31 +02:00
Max Reitz
b97511c7bc block: Propagate AioContext change to all children
Instead of propagating any change of a BDS's AioContext only to its file
and backing children and letting driver-specific code do the rest, just
propagate it to all and drop the thus superfluous implementations of
bdrv_{at,de}tach_aio_context() in Quorum, blkverify and VMDK.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-19 16:45:31 +02:00
Kevin Wolf
1f0c461b82 block: Remove BlockDriverState.blk
This patch removes the remaining users of bs->blk, which will allow us
to have multiple BBs on top of a single BDS. In the meantime, all checks
that are currently in place to prevent the user from creating such
setups can be switched to bdrv_has_blk() instead of accessing BDS.blk.

Future patches can allow them and e.g. enable users to mirror to a block
device that already has a BlockBackend on it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-19 16:45:31 +02:00
Kevin Wolf
79c719b755 block: Don't return throttling info in query-named-block-nodes
query-named-block-nodes should not return information that is related
to the attached BlockBackend rather than the node itself, so throttling
information needs to be removed from it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-19 16:45:31 +02:00
Kevin Wolf
7c8eece45b block: Avoid bs->blk in bdrv_next()
We need to introduce a separate BdrvNextIterator struct that can keep
more state than just the current BDS in order to avoid using the bs->blk
pointer.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-19 16:45:31 +02:00
Kevin Wolf
dde33812a8 block: Add bdrv_has_blk()
In many cases we just want to know whether a BDS has at least one BB
attached, without needing to know the exact BB that is attached. In
contrast to bs->blk, this is still a valid question when more than one
BB can be attached, so just answer it by checking the parents list.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-19 16:45:31 +02:00
Kevin Wolf
91c6e4b7bb block: Remove bdrv_aio_multiwrite()
Since virtio-blk implements request merging itself these days, the only
remaining users are test cases for the function. That doesn't make the
function exactly useful any more.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-05-19 16:45:31 +02:00
Kevin Wolf
66a0fae438 blockjob: Don't touch BDS iostatus
Block jobs don't actually make use of the iostatus for their BDSes, but
they manage a separate block job iostatus. Still, they require that it
is enabled for the source BDS and they enable it automatically for the
target and set the error handling mode - which ends up never being used
by the job.

This patch removes all of the BDS iostatus handling from the block job,
which removes another few bs->blk accesses.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-19 16:45:31 +02:00
Kevin Wolf
81e254dc83 blockjob: Don't set iostatus of target
When block job errors were introduced, we assigned the iostatus of the
target BDS "just in case". The field has never been accessible for the
user because the target isn't listed in query-block.

Before we can allow the user to have a second BlockBackend on the
target, we need to clean this up. If anything, we would want to set the
iostatus for the internal BB of the job (which we can always do later),
but certainly not for a separate BB which the job doesn't even use.

As a nice side effect, this gets us rid of another bs->blk use.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-19 16:45:31 +02:00
Kevin Wolf
4c265bf9f4 block: User BdrvChild callback for device name
In order to get rid of bs->blk for bdrv_get_device_name() and
bdrv_get_device_or_node_name(), ask all parents for their name and
simply pick the first one.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-19 16:45:31 +02:00
Kevin Wolf
5c8cab4808 block: Use BdrvChild callbacks for change_media/resize
We want to get rid of BlockDriverState.blk in order to allow multiple
BlockBackends per BDS. Converting the device callbacks in block.c (which
assume a single BlockBackend) to per-child callbacks gets us rid of the
first few instances.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-05-19 16:45:31 +02:00
Kevin Wolf
cbe1beb7a1 block: Don't check throttled reqs in bdrv_requests_pending()
Checking whether there are throttled requests requires going to the
associated BlockBackend, which we want to avoid.

All users of bdrv_requests_pending() in block/io.c already call
bdrv_parent_drained_begin() first, which restarts all throttled
requests, so no throttled requests can be left here and this is removal
of dead code.

The remaining users (assertions during graph manipulation in block.c)
don't care about requests that are still queued in the BlockBackend and
haven't been issued for a BlockDriverState yet.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19 16:45:31 +02:00
Kevin Wolf
b26ded9a7d Revert "block: Forbid I/O throttling on nodes with multiple parents for 2.6"
This reverts commit 76b223200e.

Now that I/O throttling is fully done on the BlockBackend level, there
is no reason any more to block I/O throttling for nodes with multiple
parents as the parents don't influence each other any more.

Conflicts:
	block.c

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19 16:45:31 +02:00
Kevin Wolf
08e83aabe4 block: Remove bdrv_move_feature_fields()
bdrv_move_feature_fields() and swap_feature_fields() are empty now, they
can be removed.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19 16:45:31 +02:00
Kevin Wolf
7ca7f0f6db block: Decouple throttling from BlockDriverState
This moves the throttling related part of the BDS life cycle management
to BlockBackend. The throttling group reference is now kept even when no
medium is inserted.

With this commit, throttling isn't disabled and then re-enabled any more
during graph reconfiguration. This fixes the temporary breakage of I/O
throttling when used with live snapshots or block jobs that manipulate
the graph.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19 16:45:30 +02:00
Kevin Wolf
bb9aaecaf1 block/io: Quiesce parents between drained_begin/end
So far, bdrv_parent_drained_begin/end() was called for the duration of
the actual bdrv_drain() at the beginning of a drained section, but we
really should keep parents quiesced until the end of the drained
section.

This does not actually change behaviour at this point because the only
user of the .drained_begin/end BdrvChildRole callback is I/O throttling,
which already doesn't send any new requests after flushing its queue in
.drained_begin. The patch merely removes a trap for future users.

Reported-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19 16:45:30 +02:00
Kevin Wolf
c2066af051 block: Drain throttling queue with BdrvChild callback
This removes the last part of I/O throttling from block/io.c and moves
it to the BlockBackend.

Instead of having knowledge about throttling inside io.c, we can call a
BdrvChild callback .drained_begin/end, which happens to drain the
throttled requests for BlockBackend parents.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19 16:45:30 +02:00
Kevin Wolf
22aa8b246a block: Introduce BdrvChild.opaque
BlockBackends use it to get a back pointer from BdrvChild to
BlockBackend in any BdrvChildRole callbacks.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19 16:45:30 +02:00
Kevin Wolf
97148076e8 block: Move I/O throttling configuration functions to BlockBackend
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19 16:45:30 +02:00
Kevin Wolf
441565b279 block: Move actual I/O throttling to BlockBackend
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19 16:45:30 +02:00
Kevin Wolf
27ccdd5259 block: Move throttling fields from BDS to BB
This patch changes where the throttling state is stored (used to be the
BlockDriverState, now it is the BlockBackend), but it doesn't actually
make it a BB level feature yet. For example, throttling is still
disabled when the BDS is detached from the BB.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19 16:45:30 +02:00
Kevin Wolf
49d2165d7d block: Convert throttle_group_get_name() to BlockBackend
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19 16:45:29 +02:00
Kevin Wolf
31dce3ccca block: throttle-groups: Use BlockBackend pointers internally
As a first step towards moving I/O throttling to the BlockBackend level,
this patch changes all pointers in struct ThrottleGroup from referencing
a BlockDriverState to referencing a BlockBackend.

This change is valid because we made sure that throttling can only be
enabled on BDSes which have a BB attached.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19 16:45:29 +02:00
Kevin Wolf
f2cd875d54 block: Introduce BlockBackendPublic
Some features, like I/O throttling, are implemented outside
block-backend.c, but still want to keep information in BlockBackend,
e.g. list entries that allow keeping a list of BlockBackends.

In order to avoid exposing the whole struct layout in the public header
file, this patch introduces an embedded public struct where such
information can be added and a pair of functions to convert between
BlockBackend and BlockBackendPublic.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19 16:45:29 +02:00
Kevin Wolf
a5614993d7 block: Make sure throttled BDSes always have a BB
It was already true in principle that a throttled BDS always has a BB
attached, except that the order of operations while attaching or
detaching a BDS to/from a BB wasn't careful enough.

This commit breaks graph manipulations while I/O throttling is enabled.
It would have been possible to keep things working with some temporary
hacks, but quite cumbersome, so it's not worth the hassle. We'll fix
things again in a minute.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-19 16:45:29 +02:00
Paolo Bonzini
df43d49cb8 hw: clean up hw/hw.h includes
Include qom/object.h and exec/memory.h instead of exec/ioport.h;
exec/ioport.h was almost everywhere required only for those two
includes, not for the content of the header itself.

Remove block/aio.h, everybody is already including it through
another path.

With this change, include/hw/hw.h is freed from qemu-common.h.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:30 +02:00
Paolo Bonzini
89a80e7400 hw: remove pio_addr_t
pio_addr_t is almost unused, because these days I/O ports are simply
accessed through the address space.  cpu_{in,out}[bwl] themselves are
almost unused; monitor.c and xen-hvm.c could use address_space_read/write
directly, since they have an integer size at hand.  This leaves qtest as
the only user of those functions.

On the other hand even portio_* functions use this type; the only
interesting use of pio_addr_t thus is include/hw/sysbus.h.  I guess I
could move it there, but I don't see much benefit in that either.  Using
uint32_t is enough and avoids the need to include ioport.h everywhere.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:30 +02:00
Paolo Bonzini
63c915526d cpu: move exec-all.h inclusion out of cpu.h
exec-all.h contains TCG-specific definitions.  It is not needed outside
TCG-specific files such as translate.c, exec.c or *helper.c.

One generic function had snuck into include/exec/exec-all.h; move it to
include/qom/cpu.h.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:29 +02:00
Paolo Bonzini
00f6da6a1a exec: extract exec/tb-context.h
TCG backends do not need most of exec-all.h; extract what they actually
need to a separate file or move it directly to tcg.h.  The next patch
will stop including exec-all.h from everywhere.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:29 +02:00
Paolo Bonzini
03dd024ff5 hw: explicitly include qemu/log.h
Move the inclusion out of hw/hw.h, most files do not need it.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:29 +02:00
Paolo Bonzini
e6623d88f4 mips: move CP0 functions out of cpu.h
These are here for historical reasons: they are needed from both gdbstub.c
and op_helper.c, and the latter was compiled with fixed AREG0.  It is
not needed anymore, so uninline them.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:29 +02:00
Paolo Bonzini
27a7ea8a1f arm: move arm_log_exception into .c file
Avoid need for qemu/log.h inclusion, and make the function static too.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:29 +02:00
Paolo Bonzini
33c11879fd qemu-common: push cpu.h inclusion out of qemu-common.h
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:29 +02:00
Paolo Bonzini
35c5a52d1d acpi: do not use TARGET_PAGE_SIZE
This is a #define used by the CPU.  NVDIMM can just use 4K
unconditionally.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:28 +02:00
Paolo Bonzini
bd3f16ac30 s390x: reorganize CSS bits between cpu.h and other headers
Move cpu_inject_* to the only C file where they are used.

Move ioinst.h declarations that need S390CPU to cpu.h, to make
ioinst.h independent of cpu.h.

Move channel declarations that only need SubchDev from cpu.h
to css.h, to make more channel users independent of cpu.h.

Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:28 +02:00
Paolo Bonzini
77ac58ddc6 dma: do not depend on kvm_enabled()
Memory barriers are needed also by Xen and, when the ioeventfd
bugs are fixed, by TCG as well.

sysemu/kvm.h is not anymore needed in sysemu/dma.h, move it to
the actual users.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:28 +02:00
Paolo Bonzini
da16384560 gdbstub: remove unnecessary includes from gdbstub-xml.c
gdbstub-xml.c defines a bunch of arrays of strings; there is no
need to include anything.  Keep osdep.h for consistency, but remove
the rest.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:28 +02:00
Paolo Bonzini
87776ab72b qemu-common: stop including qemu/host-utils.h from qemu-common.h
Move it to the actual users.  There are some inclusions of
qemu/host-utils.h in headers, but they are all necessary.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:28 +02:00
Paolo Bonzini
58369e22cf qemu-common: stop including qemu/bswap.h from qemu-common.h
Move it to the actual users.  There are still a few includes of
qemu/bswap.h in headers; removing them is left for future work.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:28 +02:00
Paolo Bonzini
a7d6039cb3 cpu: move endian-dependent load/store functions to cpu-all.h
Disentangle cpu-common.h and memory.h from NEED_CPU_H.  Prototypes are
not defined for !NEED_CPU_H, so remove them from poison.h too.  Only
macros need poisoning.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:28 +02:00
Paolo Bonzini
741da0d38b hw: cannot include hw/hw.h from user emulation
All qdev definitions are available from other headers, user-mode
emulation does not need hw/hw.h.

By considering system emulation only, it is simpler to disentangle
hw/hw.h from NEED_CPU_H.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:28 +02:00
Paolo Bonzini
1e00b8d57a hw: move CPU state serialization to migration/cpu.h
Remove usage of NEED_CPU_H from hw/hw.h.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:28 +02:00
Paolo Bonzini
cbd62f8616 hw: do not use VMSTATE_*TL
Reserve this to CPU state serialization.

Luckily, they were only used by sPAPR devices and these are ppc64
only.  So there is no change to migration format.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:28 +02:00
Paolo Bonzini
bdd902277c include: poison symbols in osdep.h
Ensure that all target-independent files ignore poisoned symbols,
and fix the fallout.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:28 +02:00
Paolo Bonzini
d613f8cc33 apic: move target-dependent definitions to cpu.h
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:28 +02:00
Paolo Bonzini
e81096b1c8 explicitly include linux/kvm.h
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:27 +02:00
Paolo Bonzini
3b3d264888 explicitly include hw/qdev-core.h
exec/cpu-all.h includes qom/cpu.h, which includes hw/qdev-core.h.
Explicit inclusion will keep things working when cpu.h will not be
included indirectly almost everywhere (either directly or through
qemu-common.h).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:27 +02:00
Paolo Bonzini
7d0c99a9d8 explicitly include qom/cpu.h
exec/cpu-all.h includes qom/cpu.h.  Explicit inclusion
will keep things working when cpu.h will not be included
indirectly almost everywhere (either directly or through
qemu-common.h).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:27 +02:00
Paolo Bonzini
8ea952d679 arm: remove useless cpu.h inclusion
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:27 +02:00
Paolo Bonzini
aa5a9e2484 ppc: use PowerPCCPU instead of CPUPPCState
This changes a cpu.h dependency for hw/ppc/ppc.h into a cpu-qom.h
dependency.  For it to compile we also need to clean up a few unused
definitions.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:27 +02:00
Paolo Bonzini
5a975d435a mips: use MIPSCPU instead of CPUMIPSState
This changes a cpu.h dependency into a cpu-qom.h dependency.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:27 +02:00
Paolo Bonzini
0774831d08 alpha: include cpu-qom.h in files that require AlphaCPU
This will keep things working when cpu.h will not be included
indirectly almost everywhere (either directly or through
qemu-common.h).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:27 +02:00
Paolo Bonzini
b4c1c6fc61 sh4: include cpu-qom.h in files that require SuperHCPU
This will keep things working when cpu.h will not be included
indirectly almost everywhere (either directly or through
qemu-common.h).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:27 +02:00
Paolo Bonzini
4669fcc7fa m68k: include cpu-qom.h in files that require M68KCPU
This will keep things working when cpu.h will not be included
indirectly almost everywhere (either directly or through
qemu-common.h).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:27 +02:00
Paolo Bonzini
16fd646182 arm: include cpu-qom.h in files that require ARMCPU
This will keep things working when cpu.h will not be included
indirectly almost everywhere (either directly or through
qemu-common.h).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:27 +02:00
Paolo Bonzini
da37426169 target-xtensa: make cpu-qom.h not target specific
Make XtensaCPU an opaque type within cpu-qom.h, and move all definitions
of private methods, as well as all type definitions that require knowledge
of the layout to cpu.h.  Conversely, move all definitions needed to
define a class to cpu-qom.h.  This helps making files independent of
NEED_CPU_H if they only need to pass around CPU pointers.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:42:26 +02:00
Paolo Bonzini
55b1142259 target-unicore32: make cpu-qom.h not target specific
Make UniCore32CPU an opaque type within cpu-qom.h, and move all
definitions of private methods, as well as all type definitions that
require knowledge of the layout to cpu.h.  This helps making files
independent of NEED_CPU_H if they only need to pass around CPU pointers.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:41:34 +02:00
Paolo Bonzini
fc111b107a target-tricore: make cpu-qom.h not target specific
Make TriCoreCPU an opaque type within cpu-qom.h, and move all definitions
of private methods, as well as all type definitions that require knowledge
of the layout to cpu.h.  This helps making files independent of NEED_CPU_H
if they only need to pass around CPU pointers.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:41:34 +02:00
Paolo Bonzini
d61d1b2061 target-sparc: make cpu-qom.h not target specific
Make SPARCCPU an opaque type within cpu-qom.h, and move all definitions
of private methods, as well as all type definitions that require knowledge
of the layout to cpu.h.  This helps making files independent of NEED_CPU_H
if they only need to pass around CPU pointers.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:41:34 +02:00
Paolo Bonzini
e6005f66f9 target-sh4: make cpu-qom.h not target specific
Make SuperHCPU an opaque type within cpu-qom.h, and move all definitions
of private methods, as well as all type definitions that require knowledge
of the layout to cpu.h.  This helps making files independent of NEED_CPU_H
if they only need to pass around CPU pointers.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:41:34 +02:00
Paolo Bonzini
a4a02f99ff target-s390x: make cpu-qom.h not target specific
Make S390XCPU an opaque type within cpu-qom.h, and move all definitions
of private methods, as well as all type definitions that require knowledge
of the layout to cpu.h.  This helps making files independent of NEED_CPU_H
if they only need to pass around CPU pointers.

Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:41:34 +02:00
Paolo Bonzini
2d34fe392c target-ppc: make cpu-qom.h not target specific
Make PowerPCCPU an opaque type within cpu-qom.h, and move all definitions
of private methods, as well as all type definitions that require knowledge
of the layout to cpu.h.  Conversely, move all definitions needed to define
a class to cpu-qom.h.  This helps making files independent of NEED_CPU_H
if they only need to pass around CPU pointers.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 16:41:33 +02:00
Paolo Bonzini
c771dabf55 target-ppc: do not make PowerPCCPUClass depend on target-specific symbols
Just leave some members in even if they are unused on e.g.
32-bit PPC or user-mode emulation.  This avoids complications
when using PowerPCCPUClass in code that is compiled just
once (because it applies to both 32-bit and 64-bit PPC
for example) but still needs to peek at PPC-specific members.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 13:08:05 +02:00
Paolo Bonzini
b2305601d3 target-ppc: do not use target_ulong in cpu-qom.h
Bring the PowerPCCPUClass handle_mmu_fault method type into line with
the one in CPUClass.

Using vaddr also makes the cpu-qom.h file target independent.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 13:08:05 +02:00
Paolo Bonzini
416bf93686 target-mips: make cpu-qom.h not target specific
Make MIPSCPU an opaque type within cpu-qom.h, and move all definitions of
private methods, as well as all type definitions that require knowledge
of the layout to cpu.h.  This helps making files independent of NEED_CPU_H
if they only need to pass around CPU pointers.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 13:08:05 +02:00
Paolo Bonzini
ffa3a3c6c1 target-microblaze: make cpu-qom.h not target specific
Make MicroBlazeCPU an opaque type within cpu-qom.h, and move all
definitions of private methods, as well as all type definitions that
require knowledge of the layout to cpu.h.  This helps making files
independent of NEED_CPU_H if they only need to pass around CPU pointers.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 13:08:05 +02:00
Paolo Bonzini
a836b8fa00 target-m68k: make cpu-qom.h not target specific
Make M68KCPU an opaque type within cpu-qom.h, and move all definitions of
private methods, as well as all type definitions that require knowledge
of the layout to cpu.h.  This helps making files independent of NEED_CPU_H
if they only need to pass around CPU pointers.

Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 13:08:05 +02:00
Paolo Bonzini
6adb9c5474 target-lm32: make cpu-qom.h not target specific
Make LM32CPU an opaque type within cpu-qom.h, and move all definitions of
private methods, as well as all type definitions that require knowledge
of the layout to cpu.h.  This helps making files independent of NEED_CPU_H
if they only need to pass around CPU pointers.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 13:08:04 +02:00
Paolo Bonzini
4da6f8d954 target-i386: make cpu-qom.h not target specific
Make X86CPU an opaque type within cpu-qom.h, and move all definitions of
private methods, as well as all type definitions that require knowledge
of the layout to cpu.h.  This helps making files independent of NEED_CPU_H
if they only need to pass around CPU pointers.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 13:08:04 +02:00
Paolo Bonzini
28618ac652 target-cris: make cpu-qom.h not target specific
Make CRISCPU an opaque type within cpu-qom.h, and move all definitions of
private methods, as well as all type definitions that require knowledge
of the layout to cpu.h.  This helps making files independent of NEED_CPU_H
if they only need to pass around CPU pointers.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 13:08:04 +02:00
Paolo Bonzini
74e755647c target-arm: make cpu-qom.h not target specific
Make ARMCPU an opaque type within cpu-qom.h, and move all definitions of
private methods, as well as all type definitions that require knowledge
of the layout to cpu.h.  This helps making files independent of NEED_CPU_H
if they only need to pass around CPU pointers.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 13:08:04 +02:00
Paolo Bonzini
1dc8e6b758 target-alpha: make cpu-qom.h not target specific
Make AlphaCPU an opaque type within cpu-qom.h, and move all definitions
of private methods, as well as all type definitions that require knowledge
of the layout to cpu.h.  This helps making files independent of NEED_CPU_H
if they only need to pass around CPU pointers.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 13:08:04 +02:00
Paolo Bonzini
347b1a5cc6 cpu: make cpu-qom.h only include-able from cpu.h
Make cpu-qom.h so that it is only included from cpu.h.  Then there
is no need for it to include cpu.h again.

Later we will make cpu-qom.h target independent and we will _want_
to include it from elsewhere, but for now reduce the number of cases
to handle.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 13:08:04 +02:00
Paolo Bonzini
f2937a33a5 log: do not use CONFIG_USER_ONLY
This decouples logging further from config-target.h

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 13:08:04 +02:00
Paolo Bonzini
4b4629d9d2 include: move CPU-related definitions out of qemu-common.h
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 13:08:04 +02:00
Paolo Bonzini
b01501db18 s390x: move .needed functions for subsections to machine.c
These functions are only used when defining subsections, so move
them there.

Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 13:07:34 +02:00
Paolo Bonzini
f115a19c40 scripts: add script to build QEMU and analyze inclusions
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19 12:09:28 +02:00
Peter Maydell
8ec4fe0a4b Merge remote-tracking branch 'remotes/mjt/tags/pull-trivial-patches-2016-05-18' into staging
trivial patches for 2016-05-18

# gpg: Signature made Wed 18 May 2016 13:04:43 BST using RSA key ID A4C3D7DB
# gpg: Good signature from "Michael Tokarev <mjt@tls.msk.ru>"
# gpg:                 aka "Michael Tokarev <mjt@corpit.ru>"
# gpg:                 aka "Michael Tokarev <mjt@debian.org>"

* remotes/mjt/tags/pull-trivial-patches-2016-05-18:
  Fix some typos found by codespell
  9p: drop unused declaration from coth.h
  smbios: fix typo
  accel: make configure_accelerator return void
  configure: Use uniform description for devel packages
  ipack: Update e-mail address
  util: fix comment typos
  qdict: fix unbounded stack warning for qdict_array_entries
  Fix typo in variable name (found and fixed by codespell)
  vl: fix comment about when parsing cpu definitions
  loader: fix potential memory leak
  remove comment for nonexistent structure member
  s390: remove misleading comment

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-19 09:27:28 +01:00
Stefan Weil
cb8d4c8f54 Fix some typos found by codespell
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-05-18 15:04:27 +03:00
Greg Kurz
d506dc87b9 9p: drop unused declaration from coth.h
Commit "ebac1202c95a virtio-9p: use QEMU thread pool" dropped function
v9fs_init_worker_threads.

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-05-18 15:04:27 +03:00
Cao jin
cc2324d03d smbios: fix typo
The spec says: "on paragraph (16-byte) boundaries"

Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-05-18 15:04:27 +03:00
Wei Jiangang
bdc3f61dec accel: make configure_accelerator return void
Return the negated value of accel_initialised is meaningless,
and the caller vl doesn't check it.

Signed-off-by: Wei Jiangang <weijg.fnst@cn.fujitsu.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-05-18 15:04:27 +03:00
Stefan Weil
3f3b5388d4 configure: Use uniform description for devel packages
As all other devel packages are written in the form "name devel",
use this form for libcap devel and libattr devel, too.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-05-18 15:04:27 +03:00
Alberto Garcia
b996aed510 ipack: Update e-mail address
I'm not really using the old one anymore.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-05-18 15:04:27 +03:00
Wei Jiangang
d43eda3d19 util: fix comment typos
Signed-off-by: Wei Jiangang <weijg.fnst@cn.fujitsu.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-05-18 15:04:27 +03:00
Peter Xu
de4905f4bc qdict: fix unbounded stack warning for qdict_array_entries
Here we use one g_strdup_printf() to replace the two stack allocated
array, considering it's more convenient, safe, and as long as it's
called rarely only when quorum device opens. This will remove the
unbound stack warning when compiling with "-Wstack-usage=1000000".

Reviewed-by:   Eric Blake <eblake@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-05-18 15:04:26 +03:00
Stefan Weil
1d817db3a0 Fix typo in variable name (found and fixed by codespell)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-05-18 15:04:26 +03:00
Wei Jiangang
37a3e630d9 vl: fix comment about when parsing cpu definitions
machine->init() was replaced with machine_class->init()
in 958db90cd5.

Signed-off-by: Wei Jiangang <weijg.fnst@cn.fujitsu.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-05-18 15:04:26 +03:00
Cao jin
ed2f3bc1fa loader: fix potential memory leak
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-05-18 15:04:26 +03:00
Cao jin
ec609656fc remove comment for nonexistent structure member
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-05-18 15:04:26 +03:00
Michael Tokarev
f35c1f66ad s390: remove misleading comment
The comment talks about a non-ELF object while the
example gives ELF object.

Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-05-18 15:04:26 +03:00
Peter Maydell
a257c74149 Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20160517' into staging
First batch of s390x patches for 2.7:
- The new machine for 2.7
- Make use of the runtime instrumentation support introduced in
  the kernel
- Enhance our ipl (boot) process: We can now start from devices
  in subchannel sets > 0 as well. As a bonus, the conversion to
  diag308 in the bios allows us to get rid of the gr7 hack.
- Xiaoqiang Zhao's SCLP qomification patches
- Several fixes in the s390x pci implementation

# gpg: Signature made Tue 17 May 2016 15:35:32 BST using RSA key ID C6F02FAF
# gpg: Good signature from "Cornelia Huck <huckc@linux.vnet.ibm.com>"
# gpg:                 aka "Cornelia Huck <cornelia.huck@de.ibm.com>"

* remotes/cohuck/tags/s390x-20160517:
  s390x/pci: remove whitespace
  s390x/pci: add length checking for pci sclp handlers
  s390x/pci: enhance mpcifc_service_call
  s390x/pci: fix s390_pci_sclp_deconfigure
  s390x/pci: introduce S390PCIBusDevice.iommu_enabled
  s390x/pci: export pci_dereg_ioat and pci_dereg_irqs
  s390x/pci: separate s390_pcihost_iommu_configure function
  s390x/pci: separate s390_sclp_configure function
  s390x/pci: fix reg_irqs()
  hw/char: QOM'ify sclpconsole.c
  hw/char: QOM'ify sclpconsole-lm.c
  s390x/ipl: Remove redundant usage of gr7
  s390-ccw.img: rebuild image
  pc-bios/s390-ccw: Get device address via diag 308/6
  s390x/ipl: Add ssid field to IplParameterBlock
  s390x/ipl: Provide ipl parameter block
  s390x/ipl: Add type and length checks for IplParameterBlock values
  s390x/ipl: Extend the IplParameterBlock struct
  s390x: enable runtime instrumentation
  s390x: add compat machine for 2.7

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-17 16:49:11 +01:00
Yi Min Zhao
c26916942a s390x/pci: remove whitespace
Fix indentation of PciCfgSccb struct.

Signed-off-by: Yi Min Zhao <zyimin@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-05-17 15:50:29 +02:00
Yi Min Zhao
3b40ea2957 s390x/pci: add length checking for pci sclp handlers
The configure/deconfigure sclp commands need a SCCB with a length of
at least 16. Indicate in the response code if this is not fulfilled.

Signed-off-by: Yi Min Zhao <zyimin@linux.vnet.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-05-17 15:50:29 +02:00
Yi Min Zhao
a6d9d4f26a s390x/pci: enhance mpcifc_service_call
Enhance error handling for mpcifc_service_call() to propagate errors
to guest by setting status codes or triggering program interrupts.

Signed-off-by: Yi Min Zhao <zyimin@linux.vnet.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-05-17 15:50:29 +02:00
Yi Min Zhao
259a4f0a76 s390x/pci: fix s390_pci_sclp_deconfigure
When deconfiguring a s390 pci device, we should deconfigure the
corresponding IOMMU memory region and the IRQs for the device.

Signed-off-by: Yi Min Zhao <zyimin@linux.vnet.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-05-17 15:50:29 +02:00
Yi Min Zhao
df6a050c82 s390x/pci: introduce S390PCIBusDevice.iommu_enabled
We introduce iommu_enabled field for S390PCIBusDevice struct to
track whether the iommu has been enabled for the device. This allows
us to stop temporarily changing ->configured while en/disabling the
iommu and to do conditional cleanup later.

Signed-off-by: Yi Min Zhao <zyimin@linux.vnet.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-05-17 15:50:29 +02:00
Yi Min Zhao
e141dbadfa s390x/pci: export pci_dereg_ioat and pci_dereg_irqs
dereg_irqs and dereg_ioat are needed by external functions. Let's
rename and export both of them in s390-pci-inst.h.

Signed-off-by: Yi Min Zhao <zyimin@linux.vnet.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-05-17 15:50:29 +02:00
Yi Min Zhao
715838881f s390x/pci: separate s390_pcihost_iommu_configure function
Split s390_pcihost_iommu_configure() into separate functions for
configuring and deconfiguring in order to make the code more readable.

Signed-off-by: Yi Min Zhao <zyimin@linux.vnet.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-05-17 15:50:29 +02:00
Yi Min Zhao
8f5cb69313 s390x/pci: separate s390_sclp_configure function
Split s390_sclp_configure() into separate functions for sclp
configuring and deconfiguring in order to make the code more readable.

Signed-off-by: Yi Min Zhao <zyimin@linux.vnet.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-05-17 15:50:29 +02:00
Yi Min Zhao
bac45d5147 s390x/pci: fix reg_irqs()
In reg_irqs(), present code assumes that map_indicator() always issues
successfully. Let's check it and return the error to caller in order to
inform guest.

Signed-off-by: Yi Min Zhao <zyimin@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-05-17 15:50:29 +02:00
xiaoqiang zhao
3f6ec642ae hw/char: QOM'ify sclpconsole.c
Drop the DO_UPCAST macro

Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Message-Id: <1459237645-17227-7-git-send-email-zxq_yx_007@163.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-05-17 15:50:29 +02:00
xiaoqiang zhao
e563c59b6a hw/char: QOM'ify sclpconsole-lm.c
Drop the DO_UPCAST macro

Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Message-Id: <1459237645-17227-6-git-send-email-zxq_yx_007@163.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-05-17 15:50:29 +02:00
Alexander Yarygin
010d45d279 s390x/ipl: Remove redundant usage of gr7
We don't need to pass device address for pc-bios using gr7 anymore as
the pcbios completely relies on diag308 now, so we can remove it from
qemu. devno, ssid and cssid are migrated but the value was never reused,
so we can safely ignore these fields and migrate 0.

Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-05-17 15:50:29 +02:00
Cornelia Huck
a388ac74de s390-ccw.img: rebuild image
Contains the following change:

pc-bios/s390-ccw: Get device address via diag 308/6

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-05-17 15:50:29 +02:00
Alexander Yarygin
d046c51dad pc-bios/s390-ccw: Get device address via diag 308/6
To IPL from a device, pc-bios receives from qemu a device address via
general register 7. The better way to do it is to use diag308/6
instruction which returns so called
"IplParameterBlock". IplParameterBlock contains the device address for
IPL and additional parameters that can be used by pc-bios.

This patch allows pc-bios to get device address via diag308/6 and
doesn't use gr7 passed boot information anymore.

Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-05-17 15:50:29 +02:00
Alexander Yarygin
3041e3bead s390x/ipl: Add ssid field to IplParameterBlock
Add the ssid field to the ipl parameter block struct and fill it when
necessary so the guest can use it.

Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-05-17 15:50:29 +02:00
Alexander Yarygin
6aed958978 s390x/ipl: Provide ipl parameter block
Right now we return the ipl parameter block only if the guest
specified one. Let's fill in the parameter block when bootindex
parameter is available and not booting from an external kernel.

Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-05-17 15:50:29 +02:00
Alexander Yarygin
9946a9113c s390x/ipl: Add type and length checks for IplParameterBlock values
We can check for valid type and lengths of the IplParameterBlock fields
when receiving the struct from the guest.

Length of the IplParameterBlock can be less than 4K. To play safe we can
read and write only required amount of data.

Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Reviewed-by: David Hildenband <dahi@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-05-17 15:50:29 +02:00
Alexander Yarygin
04ca4b92ec s390x/ipl: Extend the IplParameterBlock struct
The IplParameterBlock struct currently has only 200 bytes filled, but it
can be up to 4K.

This patch converts the struct to union with a fully populated struct
inside it and second struct with old values.

For compatibility reasons we disable migration of the extended iplb
field for pre-2.7 machines. Also a guest still can read/write only the
first 200 bytes of IPLB for now.

Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-05-17 15:50:29 +02:00
Fan Zhang
9700230b0d s390x: enable runtime instrumentation
Introduce run-time-instrumentation support when running under kvm for
virtio-ccw 2.7 machine and make sure older machines can not enable it.

The new ri_allowed field in the s390MachineClass serves as an indicator
whether the feature can be used by the machine and should therefore be
activated if available.

riccb_needed() is used to check whether riccb is needed or not in live
migration.

Signed-off-by: Fan Zhang <zhangfan@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-05-17 15:50:29 +02:00
Cornelia Huck
946e55f3c7 s390x: add compat machine for 2.7
Also add some of the option cascading we were missing.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-05-17 15:50:29 +02:00
Peter Maydell
5a3fd960f3 Merge remote-tracking branch 'remotes/stefanha/tags/tracing-pull-request' into staging
# gpg: Signature made Tue 17 May 2016 14:06:54 BST using RSA key ID 81AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"

* remotes/stefanha/tags/tracing-pull-request:
  hw/intc/arm_gic: add tracepoints

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-17 14:07:25 +01:00
Peter Maydell
3f5e34a45c Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging
# gpg: Signature made Tue 17 May 2016 01:19:39 BST using RSA key ID 81AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"

* remotes/stefanha/tags/block-pull-request:
  rfifolock: no need to get thread identifier when nesting

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-17 10:35:50 +01:00
Peter Maydell
c98e793711 Merge remote-tracking branch 'remotes/thibault/tags/samuel-thibault' into staging
slirp updates

# gpg: Signature made Mon 16 May 2016 20:22:36 BST using RSA key ID FB6B2F1D
# gpg: Good signature from "Samuel Thibault <samuel.thibault@gnu.org>"
# gpg:                 aka "Samuel Thibault <sthibault@debian.org>"
# gpg:                 aka "Samuel Thibault <samuel.thibault@inria.fr>"
# gpg:                 aka "Samuel Thibault <samuel.thibault@labri.fr>"
# gpg:                 aka "Samuel Thibault <samuel.thibault@ens-lyon.org>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 900C B024 B679 31D4 0F82  304B D017 8C76 7D06 9EE6
#      Subkey fingerprint: F632 74CD C630 0873 CB3D  29D9 E3E5 1CE8 FB6B 2F1D

* remotes/thibault/tags/samuel-thibault:
  slirp: Clean up osdep.h related header inclusions
  slirp: Remove some unused code from slirp.h
  slirp: Remove obsolete backward-compatibility cruft
  slirp: Clean up slirp_config.h

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-17 09:16:00 +01:00
Hollis Blanchard
2531088f6c hw/intc/arm_gic: add tracepoints
These are obviously critical to understanding interrupt delivery:
gic_enable_irq
gic_disable_irq
gic_set_irq (inbound irq from device models)
gic_update_set_irq (outbound irq to CPU)
gic_acknowledge_irq

The only one that I think might raise eyebrows is gic_update_bestirq, but I've
(sadly) debugged problems that ended up being caused by unexpected priorities.
Knowing that the GIC has an irq ready, but doesn't deliver to the CPU due to
priority, has also proven important.

Signed-off-by: Hollis Blanchard <hollis_blanchard@mentor.com>
Message-id: 1461252281-22399-1-git-send-email-hollis_blanchard@mentor.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-16 17:20:41 -07:00
Changlong Xie
de3e15a705 rfifolock: no need to get thread identifier when nesting
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Message-id: 1462874348-32396-1-git-send-email-xiecl.fnst@cn.fujitsu.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-05-16 15:29:44 -07:00
Thomas Huth
9892663dc4 slirp: Clean up osdep.h related header inclusions
qemu/osdep.h is included in some headers twice - one time
should be sufficient.
Also remove the inclusion of time.h since that is already
done by osdep.h, too (this makes scripts/clean-includes
happy again).

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
2016-05-16 21:01:16 +02:00
Thomas Huth
2cdc848eb5 slirp: Remove some unused code from slirp.h
These hunks are apparently not used anymore, so let's delete them.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
2016-05-16 21:00:31 +02:00
Thomas Huth
5469feadb1 slirp: Remove obsolete backward-compatibility cruft
The slirp code does not use index() and gethostid() anymore,
so these parts can be removed without problems.
memmove() and strerror() should be available on each of the
supported platforms nowadays, too, so these wrappers are also
not needed anymore.
And we certainly also do not support Ultrix anymore, so no
need to keep the code for this platform anymore.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
2016-05-16 20:58:47 +02:00
Thomas Huth
cebee21aca slirp: Clean up slirp_config.h
There are a lot of unused #defines / #undefs in slirp_config.h,
which are apparently left-overs from the very early slirp code.
Since there is no more code that uses them, let's simply remove
them from our version of slirp.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
2016-05-16 20:57:00 +02:00
Peter Maydell
70f87e0f0a Merge remote-tracking branch 'remotes/kraxel/tags/pull-ui-20160513-1' into staging
gtk/sdl build tweaks
fix gtk 3.20 warnings
gtk clipboard support
spice-gl monitor config support
fix coverity warnings

# gpg: Signature made Fri 13 May 2016 13:30:39 BST using RSA key ID D3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"

* remotes/kraxel/tags/pull-ui-20160513-1:
  gtk: don't leak the GtkBorder with VTE 0.36
  gtk: update grab code for gtk 3.20
  spice: fix coverity complains
  egl-helpers: fix possible resource leak
  Changed malloc to g_malloc, free to g_free in ui/shader.c
  spice/gl: add & use qemu_spice_gl_monitor_config
  ui/gtk: copy to clipboard support
  ui: gtk: Fix some deprecation warnings
  ui: gtk: Fix a runtime warning on vte >= 0.37
  configure: support vte-2.91
  configure: report SDL version
  configure: report GTK version
  configure: add echo_version helper
  configure: error on unknown --with-sdlabi value
  configure: build SDL if only SDL2 available
  ui: sdl2: Release grab before opening console window
  ui: gtk: fix crash when terminal inner-border is NULL

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-13 13:39:38 +01:00
Peter Maydell
14fccfa91e Merge remote-tracking branch 'remotes/lalrae/tags/mips-20160513' into staging
MIPS patches 2016-05-13

Changes:
* fix zeroing CP0.WatchLo registers in soft reset
* QOMify Jazz led

# gpg: Signature made Fri 13 May 2016 11:04:04 BST using RSA key ID 0B29DA6B
# gpg: Good signature from "Leon Alrae <leon.alrae@imgtec.com>"

* remotes/lalrae/tags/mips-20160513:
  hw/display: QOM'ify jazz_led.c
  target-mips: fix call to memset in soft reset code

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-13 11:50:42 +01:00
Alberto Garcia
6978dc4adc gtk: don't leak the GtkBorder with VTE 0.36
When gtk_widget_style_get() is used to get the "inner-border" style
property, it returns a copy of the GtkBorder which must be freed by
the caller.

This patch also fixes a warning about the unused 'padding' structure
with VTE 0.36.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 1463127654-5171-1-git-send-email-berto@igalia.com
Cc: Cole Robinson <crobinso@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>

[ kraxel: adapted to changes in ui patch queue ]

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-05-13 12:40:12 +02:00
Peter Maydell
20c20318f9 Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20160512' into staging
queued 2.7 patches

# gpg: Signature made Fri 13 May 2016 01:08:20 BST using RSA key ID 4DD0279B
# gpg: Good signature from "Richard Henderson <rth7680@gmail.com>"
# gpg:                 aka "Richard Henderson <rth@redhat.com>"
# gpg:                 aka "Richard Henderson <rth@twiddle.net>"

* remotes/rth/tags/pull-tcg-20160512: (39 commits)
  cpu-exec: Clean up 'interrupt_request' reloading in cpu_handle_interrupt()
  cpu-exec: Remove unused 'x86_cpu' and 'env' from cpu_exec()
  cpu-exec: Move TB execution stuff out of cpu_exec()
  cpu-exec: Move interrupt handling out of cpu_exec()
  cpu-exec: Move exception handling out of cpu_exec()
  cpu-exec: Move halt handling out of cpu_exec()
  cpu-exec: Remove relic orphaned comment
  tcg: Remove needless CPUState::current_tb
  cpu-exec: Move TB chaining into tb_find_fast()
  tcg: Rework tb_invalidated_flag
  tcg: Clean up from 'next_tb'
  cpu-exec: elide more icount code if CONFIG_USER_ONLY
  tcg: reorganize tb_find_physical loop
  tcg: code_bitmap and code_write_count are not used by user-mode emulation
  tcg: Allow goto_tb to any target PC in user mode
  tcg: Clean up direct block chaining safety checks
  tcg: Clean up tb_jmp_unlink()
  tcg: Extract removing of jumps to TB from tb_phys_invalidate()
  tcg: Rename tb_jmp_remove() to tb_remove_from_jmp_list()
  tcg: Clarify thread safety check in tb_add_jump()
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-13 10:42:40 +01:00
xiaoqiang.zhao
7fe91a5b33 hw/display: QOM'ify jazz_led.c
* Drop the old SysBus init function and use instance_init
* Move graphic_console_init into realize stage

Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
2016-05-13 09:33:38 +01:00
Sergey Fedorov
8b1fe3f439 cpu-exec: Clean up 'interrupt_request' reloading in cpu_handle_interrupt()
Suggested-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Message-Id: <1463071937-26607-1-git-send-email-sergey.fedorov@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12 14:07:16 -10:00
Sergey Fedorov
ba048a4ae1 cpu-exec: Remove unused 'x86_cpu' and 'env' from cpu_exec()
Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Message-Id: <1462962111-32237-6-git-send-email-sergey.fedorov@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12 14:06:42 -10:00
Sergey Fedorov
928de9ee14 cpu-exec: Move TB execution stuff out of cpu_exec()
Simplify cpu_exec() by extracting TB execution code outside of
cpu_exec() into a new static inline function cpu_loop_exec_tb().

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Message-Id: <1462962111-32237-5-git-send-email-sergey.fedorov@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12 14:06:42 -10:00
Sergey Fedorov
c385e6e497 cpu-exec: Move interrupt handling out of cpu_exec()
Simplify cpu_exec() by extracting interrupt handling code outside of
cpu_exec() into a new static inline function cpu_handle_interrupt().

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Richard Henderson  <rth@twiddle.net>
Message-Id: <1462962111-32237-4-git-send-email-sergey.fedorov@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12 14:06:42 -10:00
Sergey Fedorov
ea284766ec cpu-exec: Move exception handling out of cpu_exec()
Simplify cpu_exec() by extracting exception handling code out of
cpu_exec() into a new static inline function cpu_handle_exception().
Also make cpu_handle_debug_exception() inline as it is used only once.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Message-Id: <1462962111-32237-3-git-send-email-sergey.fedorov@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12 14:06:42 -10:00
Sergey Fedorov
8b2d34e997 cpu-exec: Move halt handling out of cpu_exec()
Simplify cpu_exec() by extracting CPU halt state handling code out of
cpu_exec() into a new static inline function cpu_handle_halt().

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Message-Id: <1462962111-32237-2-git-send-email-sergey.fedorov@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12 14:06:42 -10:00
Sergey Fedorov
c6f0d9f84c cpu-exec: Remove relic orphaned comment
This comment should have been deleted by commit 0ac087f1f3 ("removed
unused code") but somehow it is still here. There's no point to keep it.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Message-Id: <1462286050-21778-1-git-send-email-sergey.fedorov@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12 14:06:42 -10:00
Sergey Fedorov
3213525f8a tcg: Remove needless CPUState::current_tb
This field was used for telling cpu_interrupt() to unlink a chain of TBs
being executed when it worked that way. Now, cpu_interrupt() don't do
this anymore. So we don't need this field anymore.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Message-Id: <1462273462-14036-1-git-send-email-sergey.fedorov@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12 14:06:42 -10:00
Sergey Fedorov
a0522c7a55 cpu-exec: Move TB chaining into tb_find_fast()
Move tb_add_jump() call and surrounding code from cpu_exec() into
tb_find_fast(). That simplifies cpu_exec() a little by hiding the direct
chaining optimization details into tb_find_fast(). It also allows to
move tb_lock()/tb_unlock() pair into tb_find_fast(), putting it closer
to tb_find_slow() which also manipulates the lock.

Suggested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
[rth: Fixed rebase typo in nochain test.]
2016-05-12 14:06:42 -10:00
Sergey Fedorov
6f789be56d tcg: Rework tb_invalidated_flag
'tb_invalidated_flag' was meant to catch two events:
 * some TB has been invalidated by tb_phys_invalidate();
 * the whole translation buffer has been flushed by tb_flush().

Then it was checked:
 * in cpu_exec() to ensure that the last executed TB can be safely
   linked to directly call the next one;
 * in cpu_exec_nocache() to decide if the original TB should be provided
   for further possible invalidation along with the temporarily
   generated TB.

It is always safe to patch an invalidated TB since it is not going to be
used anyway. It is also safe to call tb_phys_invalidate() for an already
invalidated TB. Thus, setting this flag in tb_phys_invalidate() is
simply unnecessary. Moreover, it can prevent from pretty proper linking
of TBs, if any arbitrary TB has been invalidated. So just don't touch it
in tb_phys_invalidate().

If this flag is only used to catch whether tb_flush() has been called
then rename it to 'tb_flushed'. Declare it as 'bool' and stick to using
only 'true' and 'false' to set its value. Also, instead of setting it in
tb_gen_code(), just after tb_flush() has been called, do it right inside
of tb_flush().

In cpu_exec(), this flag is used to track if tb_flush() has been called
and have made 'next_tb' (a reference to the last executed TB) invalid
for linking it to directly call the next TB. tb_flush() can be called
during the CPU execution loop from tb_gen_code(), during TB execution or
by another thread while 'tb_lock' is released. Catch for translation
buffer flush reliably by resetting this flag once before first TB lookup
and each time we find it set before trying to add a direct jump. Don't
touch in in tb_find_physical().

Each vCPU has its own execution loop in multithreaded mode and thus
should have its own copy of the flag to be able to reset it with its own
'next_tb' and don't affect any other vCPU execution thread. So make this
flag per-vCPU and move it to CPUState.

In cpu_exec_nocache(), we only need to check if tb_flush() has been
called from tb_gen_code() called by cpu_exec_nocache() itself. To do
this reliably, preserve the old value of the flag, reset it before
calling tb_gen_code(), check afterwards, and combine the saved value
back to the flag.

This patch is based on the patch "tcg: move tb_invalidated_flag to
CPUState" from Paolo Bonzini <pbonzini@redhat.com>.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12 14:06:42 -10:00
Sergey Fedorov
819af24b9c tcg: Clean up from 'next_tb'
The value returned from tcg_qemu_tb_exec() is the value passed to the
corresponding tcg_gen_exit_tb() at translation time of the last TB
attempted to execute. It is a little confusing to store it in a variable
named 'next_tb'. In fact, it is a combination of 4-byte aligned pointer
and additional information in its two least significant bits. Break it
down right away into two variables named 'last_tb' and 'tb_exit' which
are a pointer to the last TB attempted to execute and the TB exit
reason, correspondingly. This simplifies the code and improves its
readability.

Correct a misleading documentation comment for tcg_qemu_tb_exec() and
fix logging in cpu_tb_exec(). Also rename a misleading 'next_tb' in
another couple of places.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12 14:06:42 -10:00
Paolo Bonzini
7687bf52e5 cpu-exec: elide more icount code if CONFIG_USER_ONLY
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[Alex Bennée: #ifndef replay code to match elided functions]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12 14:06:42 -10:00
Alex Bennée
1279f323d6 tcg: reorganize tb_find_physical loop
Put some comments and improve code structure. This should help reading
the code.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
[Sergey Fedorov: provide commit message; bring back resetting of
tb_invalidated_flag]
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Richard Henderson  <rth@twiddle.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12 14:06:42 -10:00
Paolo Bonzini
6fad459c91 tcg: code_bitmap and code_write_count are not used by user-mode emulation
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[Sergey Fedorov: eliminate the field entirely in user-mode]
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Richard Henderson  <rth@twiddle.net>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
[rth: merged followup fixup]
Message-Id: <1462982777-4513-1-git-send-email-sergey.fedorov@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12 14:06:42 -10:00
Sergey Fedorov
90aa39a1cc tcg: Allow goto_tb to any target PC in user mode
In user mode, there's only a static address translation, TBs are always
invalidated properly and direct jumps are reset when mapping change.
Thus the destination address is always valid for direct jumps and
there's no need to restrict it to the pages the TB resides in.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Cc: Riku Voipio <riku.voipio@iki.fi>
Cc: Blue Swirl <blauwirbel@gmail.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12 14:06:42 -10:00
Sergey Fedorov
5b053a4a28 tcg: Clean up direct block chaining safety checks
We don't take care of direct jumps when address mapping changes. Thus we
must be sure to generate direct jumps so that they always keep valid
even if address mapping changes. Luckily, we can only allow to execute a
TB if it was generated from the pages which match with current mapping.

Document tcg_gen_goto_tb() declaration and note the reason for
destination PC limitations.

Some targets with variable length instructions allow TB to straddle a
page boundary. However, we make sure that both of TB pages match the
current address mapping when looking up TBs. So it is safe to do direct
jumps into the both pages. Correct the checks for some of those targets.

Given that, we can safely patch a TB which spans two pages. Remove the
unnecessary check in cpu_exec() and allow such TBs to be patched.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12 14:06:41 -10:00
Sergey Fedorov
f9c5b66f48 tcg: Clean up tb_jmp_unlink()
Unify the code of this function with tb_jmp_remove_from_list(). Making
these functions similar improves their readability. Also this could be a
step towards making this function thread-safe.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12 14:06:41 -10:00
Sergey Fedorov
89bba49632 tcg: Extract removing of jumps to TB from tb_phys_invalidate()
Move the code for removing jumps to a TB out of tb_phys_invalidate() to
a separate static inline function tb_jmp_unlink(). This simplifies
tb_phys_invalidate() and improves code structure.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12 14:06:41 -10:00
Sergey Fedorov
133626783a tcg: Rename tb_jmp_remove() to tb_remove_from_jmp_list()
tb_jmp_remove() was only used to remove the TB from a list of all TBs
jumping to the same TB which is n-th jump destination of the given TB.
Put a comment briefly describing the function behavior and rename it to
better reflect its purpose.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12 14:06:41 -10:00
Sergey Fedorov
9962c478b1 tcg: Clarify thread safety check in tb_add_jump()
The check is to make sure that another thread hasn't already done the
same while we were outside of tb_lock. Mention this in a comment.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12 14:06:41 -10:00
Sergey Fedorov
901bc3deb4 tcg: Init TB's direct jumps before making it visible
Initialize TB's direct jump list data fields and reset the jumps before
tb_link_page() puts it into the physical hash table and the physical
page list. So TB is completely initialized before it becomes visible.

This is pure rearrangement of code to a more suitable place, though it
could be a preparation for relaxing the locking scheme in future.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12 14:06:41 -10:00
Sergey Fedorov
e90d96b158 tcg: Rearrange tb_link_page() to avoid forward declaration
Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12 14:06:41 -10:00
Sergey Fedorov
c37e6d7e35 tcg: Use uintptr_t type for jmp_list_{next|first} fields of TB
These fields do not contain pure pointers to a TranslationBlock
structure. So uintptr_t is the most appropriate type for them.
Also put some asserts to assure that the two least significant bits of
the pointer are always zero before assigning it to jmp_list_first.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12 14:06:41 -10:00
Sergey Fedorov
f309101c26 tcg: Clean up direct block chaining data fields
Briefly describe in a comment how direct block chaining is done. It
should help in understanding of the following data fields.

Rename some fields in TranslationBlock and TCGContext structures to
better reflect their purpose (dropping excessive 'tb_' prefix in
TranslationBlock but keeping it in TCGContext):
   tb_next_offset  =>  jmp_reset_offset
   tb_jmp_offset   =>  jmp_insn_offset
   tb_next         =>  jmp_target_addr
   jmp_next        =>  jmp_list_next
   jmp_first       =>  jmp_list_first

Avoid using a magic constant as an invalid offset which is used to
indicate that there's no n-th jump generated.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12 14:06:41 -10:00
Richard Henderson
7ba6a512ae translate-all: Adjust 256mb testing for mips64
Make sure we preserve the high 32-bits when masking for mips64.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12 14:06:41 -10:00
Emilio G. Cota
8bdf499782 translate-all: add missing munmap of the code_gen guard page for MIPS
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1461283314-2353-2-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12 14:06:41 -10:00
Emilio G. Cota
835154b6e2 translate-all: remove redundant setting of tcg_ctx.code_gen_buffer_size
The setting of tcg_ctx.code_gen_buffer_size is done by the only caller of
size_code_gen_buffer(), which is code_gen_alloc():

  $ git grep size_code_gen_buffer
  translate-all.c:static inline size_t size_code_gen_buffer(size_t tb_size)
  translate-all.c:    tcg_ctx.code_gen_buffer_size = size_code_gen_buffer(tb_size);

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1461283314-2353-1-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12 14:06:41 -10:00
Sergey Fedorov
10b4f48555 tcg: Note requirement on atomic direct jump patching
Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <1461341333-19646-12-git-send-email-sergey.fedorov@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12 14:06:41 -10:00
Sergey Fedorov
c82460a560 tcg/mips: Make direct jump patching thread-safe
Ensure direct jump patching in MIPS is atomic by using
atomic_read()/atomic_set() for code patching.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Message-Id: <1461341333-19646-11-git-send-email-sergey.fedorov@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
[rth: Merged the deposit32 followup.]
[rth: Merged the following followup.]
Message-Id: <1462210518-26522-1-git-send-email-sergey.fedorov@linaro.org>
2016-05-12 14:06:41 -10:00
Sergey Fedorov
84f79fb7c6 tcg/sparc: Make direct jump patching thread-safe
Ensure direct jump patching in SPARC is atomic by using
atomic_read()/atomic_set() for code patching.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <1461341333-19646-10-git-send-email-sergey.fedorov@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12 14:06:41 -10:00
Sergey Fedorov
9e26911295 tcg/aarch64: Make direct jump patching thread-safe
Ensure direct jump patching in AArch64 is atomic by using
atomic_read()/atomic_set() for code patching.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Message-Id: <1461341333-19646-9-git-send-email-sergey.fedorov@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12 14:06:41 -10:00
Sergey Fedorov
7d14e0e2d6 tcg/arm: Make direct jump patching thread-safe
Ensure direct jump patching in ARM is atomic by using
atomic_read()/atomic_set() for code patching.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Message-Id: <1461341333-19646-8-git-send-email-sergey.fedorov@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12 14:06:41 -10:00
Sergey Fedorov
ed3d51ecd7 tcg/s390: Make direct jump patching thread-safe
Ensure direct jump patching in s390 is atomic by:
 * naturally aligning a location of direct jump address;
 * using atomic_read()/atomic_set() for code patching.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Message-Id: <1461341333-19646-7-git-send-email-sergey.fedorov@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12 14:06:41 -10:00
Sergey Fedorov
0d07abf05e tcg/i386: Make direct jump patching thread-safe
Ensure direct jump patching in i386 is atomic by:
 * naturally aligning a location of direct jump address;
 * using atomic_read()/atomic_set() for code patching.

tcg_out_nopn() implementation:
Suggested-by: Richard Henderson <rth@twiddle.net>.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Message-Id: <1461341333-19646-6-git-send-email-sergey.fedorov@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12 14:06:41 -10:00
Sergey Fedorov
399f164857 tcg/ppc: Make direct jump patching thread-safe
Ensure direct jump patching in PPC is atomic by:
 * limiting translation buffer size in 32-bit mode to be addressable by
   Branch I-form instruction;
 * using atomic_read()/atomic_set() for code patching.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <1461341333-19646-5-git-send-email-sergey.fedorov@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12 14:06:40 -10:00
Sergey Fedorov
76442a939e tci: Make direct jump patching thread-safe
Ensure direct jump patching in TCI is atomic by:
 * naturally aligning a location of direct jump address;
 * using atomic_read()/atomic_set() to load/store the address.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Message-Id: <1461341333-19646-4-git-send-email-sergey.fedorov@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12 14:06:40 -10:00
Sergey Fedorov
6b587d3cda include/qemu/osdep.h: Add macros for pointer alignment
These macros provide a convenient way to n-byte align pointers up and
down and check if a pointer is n-byte aligned.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Message-Id: <1461341333-19646-3-git-send-email-sergey.fedorov@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12 14:06:40 -10:00
Sergey Fedorov
18a60a7614 include/qemu/osdep.h: Add a macro to check for alignment
Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Message-Id: <1461341333-19646-2-git-send-email-sergey.fedorov@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-05-12 14:06:40 -10:00
Emilio G. Cota
89fee74a0f tb: consistently use uint32_t for tb->flags
We are inconsistent with the type of tb->flags: usage varies loosely
between int and uint64_t. Settle to uint32_t everywhere, which is
superior to both: at least one target (aarch64) uses the most significant
bit in the u32, and uint64_t is wasteful.

Compile-tested for all targets.

Suggested-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Suggested-by: Richard Henderson <rth@twiddle.net>
Tested-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Message-Id: <1460049562-23517-1-git-send-email-cota@braap.org>
2016-05-12 14:06:40 -10:00
Peter Maydell
f68419eee9 Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches

# gpg: Signature made Thu 12 May 2016 14:37:05 BST using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"

* remotes/kevin/tags/for-upstream: (69 commits)
  qemu-iotests: iotests: fail hard if not run via "check"
  block: enable testing of LUKS driver with block I/O tests
  block: add support for encryption secrets in block I/O tests
  block: add support for --image-opts in block I/O tests
  qemu-io: Add 'write -z -u' to test MAY_UNMAP flag
  qemu-io: Add 'write -f' to test FUA flag
  qemu-io: Allow unaligned access by default
  qemu-io: Use bool for command line flags
  qemu-io: Make 'open' subcommand more like command line
  qemu-io: Add missing option documentation
  qmp: add monitor command to add/remove a child
  quorum: implement bdrv_add_child() and bdrv_del_child()
  Add new block driver interface to add/delete a BDS's child
  qemu-img: check block status of backing file when converting.
  iotests: fix the redirection order in 083
  block: Inactivate all children
  block: Drop superfluous invalidating bs->file from drivers
  block: Invalidate all children
  nbd: Simplify client FUA handling
  block: Honor BDRV_REQ_FUA during write_zeroes
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-12 16:33:40 +01:00
Peter Maydell
e4f70d6358 Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20160512' into staging
target-arm queue:
 * blizzard, omap_lcdc: code cleanup to remove DEPTH != 32 dead code
 * QOMify various ARM devices
 * bcm2835_property: use cached values when querying framebuffer
 * hw/arm/nseries: don't allocate large sized array on the stack
 * fix LPAE descriptor address masking (only visible for EL2)
 * fix stage 2 exec permission handling for AArch32
 * first part of supporting syndrome info for data aborts to EL2
 * virt: NUMA support
 * work towards i.MX6 support
 * avoid unnecessary TLB flush on TCR_EL2, TCR_EL3 writes

# gpg: Signature made Thu 12 May 2016 14:29:14 BST using RSA key ID 14360CDE
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>"
# gpg:                 aka "Peter Maydell <pmaydell@gmail.com>"
# gpg:                 aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>"

* remotes/pmaydell/tags/pull-target-arm-20160512: (43 commits)
  hw/arm: QOM'ify versatilepb.c
  hw/arm: QOM'ify strongarm.c
  hw/arm: QOM'ify stellaris.c
  hw/arm: QOM'ify spitz.c
  hw/arm: QOM'ify pxa2xx_pic.c
  hw/arm: QOM'ify pxa2xx.c
  hw/arm: QOM'ify integratorcp.c
  hw/arm: QOM'ify highbank.c
  hw/arm: QOM'ify armv7m.c
  target-arm: Avoid unnecessary TLB flush on TCR_EL2, TCR_EL3 writes
  hw/display/blizzard: Remove blizzard_template.h
  hw/display/blizzard: Expand out macros
  i.MX: Add sabrelite i.MX6 emulation.
  i.MX: Add i.MX6 SOC implementation.
  i.MX: Add the Freescale SPI Controller
  FIFO: Add a FIFO32 implementation
  i.MX: Add i.MX6 System Reset Controller device.
  ARM: Factor out ARM on/off PSCI control functions
  ACPI: Virt: Generate SRAT table
  ACPI: move acpi_build_srat_memory to common place
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-12 15:55:45 +01:00
Gerd Hoffmann
a69fc693e9 gtk: update grab code for gtk 3.20
Fixes the remaining gtk 3.20 warnings.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Cole Robinson <crobinso@redhat.com>
Tested-by: Cole Robinson <crobinso@redhat.com>
Message-id: 1463038146-13939-1-git-send-email-kraxel@redhat.com
2016-05-12 16:41:46 +02:00
Gonglei
28f4a7083d spice: fix coverity complains
Remove the unnecessary NULL check.

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1463047028-123868-3-git-send-email-arei.gonglei@huawei.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-05-12 16:41:46 +02:00
Gonglei
f454f49c42 egl-helpers: fix possible resource leak
CID 1352419, using g_strdup_printf instead of asprintf.

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1463047028-123868-2-git-send-email-arei.gonglei@huawei.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-05-12 16:41:46 +02:00
Md Haris Iqbal
42ddb8aa7c Changed malloc to g_malloc, free to g_free in ui/shader.c
Signed-off-by: Md Haris Iqbal <haris.phnx@gmail.com>
Message-id: 1459862499-4768-1-git-send-email-haris.phnx@gmail.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-05-12 16:41:46 +02:00
Gerd Hoffmann
39414ef4e9 spice/gl: add & use qemu_spice_gl_monitor_config
Cc: qemu-stable@nongnu.org
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2016-05-12 16:41:46 +02:00
Michael S. Tsirkin
44b31e0bc4 ui/gtk: copy to clipboard support
This adds a menu item to copy current selection to clipboard.
Seems handy for copying out guest error messages.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-id: 1460924740-24513-1-git-send-email-mst@redhat.com

[ kraxel: fix build with CONFIG_VTE=n ]
[ kraxel: fix build with CONFIG_VTE=n, now for real ]

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-05-12 16:41:18 +02:00
Peter Maydell
6ddeeffffe Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2016-05-12' into staging
QAPI patches for 2016-05-12

# gpg: Signature made Thu 12 May 2016 08:49:04 BST using RSA key ID EB918653
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>"
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>"

* remotes/armbru/tags/pull-qapi-2016-05-12: (23 commits)
  qapi: Change visit_type_FOO() to no longer return partial objects
  qapi: Simplify semantics of visit_next_list()
  qapi: Fix string input visitor handling of invalid list
  tests/string-input-visitor: Add negative integer tests
  qapi: Split visit_end_struct() into pieces
  qmp: Tighten output visitor rules
  qmp: Don't reuse qmp visitor after grabbing output
  spapr_drc: Expose 'null' in qom-get when there is no fdt
  qmp: Support explicit null during visits
  qapi: Add visit_type_null() visitor
  tests: Add check-qnull
  qapi: Document visitor interfaces, add assertions
  qmp-input: Refactor when list is advanced
  qmp-input: Require struct push to visit members of top dict
  qom: Wrap prop visit in visit_start_struct
  qapi-commands: Wrap argument visit in visit_start_struct
  qmp-input: Don't consume input when checking has_member
  qapi: Use strict QMP input visitor in more places
  qapi: Consolidate QMP input visitor creation
  qmp-input: Clean up stack handling
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-12 15:06:38 +01:00
Kevin Wolf
efc2645f71 Merge remote-tracking branch 'mreitz/tags/pull-block-for-kevin-2016-05-12' into queue-block
Block patches for 2.7

# gpg: Signature made Thu May 12 15:34:13 2016 CEST using RSA key ID E838ACAD
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>"

* mreitz/tags/pull-block-for-kevin-2016-05-12:
  qemu-iotests: iotests: fail hard if not run via "check"
  block: enable testing of LUKS driver with block I/O tests
  block: add support for encryption secrets in block I/O tests
  block: add support for --image-opts in block I/O tests
  qemu-io: Add 'write -z -u' to test MAY_UNMAP flag
  qemu-io: Add 'write -f' to test FUA flag
  qemu-io: Allow unaligned access by default
  qemu-io: Use bool for command line flags
  qemu-io: Make 'open' subcommand more like command line
  qemu-io: Add missing option documentation
  qmp: add monitor command to add/remove a child
  quorum: implement bdrv_add_child() and bdrv_del_child()
  Add new block driver interface to add/delete a BDS's child
  qemu-img: check block status of backing file when converting.
  iotests: fix the redirection order in 083

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:35:20 +02:00
Peter Maydell
f83b70f701 Merge remote-tracking branch 'remotes/kraxel/tags/pull-usb-20160511-1' into staging
usb: misc fixes

# gpg: Signature made Wed 11 May 2016 12:18:25 BST using RSA key ID D3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"

* remotes/kraxel/tags/pull-usb-20160511-1:
  usb: Support compilation without poll.h
  usb-mtp: fix usb_mtp_get_device_info so that libmtp on the guest doesn't complain
  usb:xhci: no DMA on HC reset

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-12 14:34:35 +01:00
Sascha Silbe
5a8fabf333 qemu-iotests: iotests: fail hard if not run via "check"
Running an iotests-based Python test directly might appear to work,
but may fail in subtle ways and is insecure:

- It creates files with predictable file names in a world-writable
  location (/var/tmp).

- Tests expect the environment to be set up by check. E.g. 041 and 055
  may take the wrong code paths if QEMU_DEFAULT_MACHINE is not
  set. This can lead to false negatives.

Instead fail hard and tell the user we want to be run via "check".

The actual environment expected by the tests is currently only defined
by the implementation of "check". We use two of the environment
variables set by "check" as indication of whether we're being run via
"check". Anyone writing their own test runner (replacing "check") will
need to replicate the full environment (in a broader sense, not just
environment variables) provided by "check" anyway, including setting
the two environment variables we check. Whereas a regular developer
just trying to invoke the tests usually won't have both of these
defined in their environment so we can catch their mistake and give
out useful advice.

Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Reviewed-by: Bo Tu <tubo@linux.vnet.ibm.com>
Message-id: 1461094442-16014-1-git-send-email-silbe@linux.vnet.ibm.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-05-12 15:33:24 +02:00
Daniel P. Berrange
4e9b25fb05 block: enable testing of LUKS driver with block I/O tests
This adds support for testing the LUKS driver with the block
I/O test framework.

   cd tests/qemu-io-tests
   ./check -luks

A handful of test cases are modified to work with luks

 - 004 - whitelist luks format
 - 012 - use TEST_IMG_FILE instead of TEST_IMG for file ops
 - 048 - use TEST_IMG_FILE instead of TEST_IMG for file ops.
         don't assume extended image contents is all zeros,
         explicitly initialize with zeros
         Make file size smaller to avoid having to decrypt
         1 GB of data.
 - 052 - don't assume initial image contents is all zeros,
         explicitly initialize with zeros
 - 100 - don't assume initial image contents is all zeros,
         explicitly initialize with zeros

With this patch applied, the results are as follows:

  Passed: 001 002 003 004 005 008 009 010 011 012 021 032 043
          047 048 049 052 087 100 134 143
  Failed: 033 120 140 145
 Skipped: 007 013 014 015 017 018 019 020 022 023 024 025 026
          027 028 029 030 031 034 035 036 037 038 039 040 041
          042 043 044 045 046 047 049 050 051 053 054 055 056
          057 058 059 060 061 062 063 064 065 066 067 068 069
          070 071 072 073 074 075 076 077 078 079 080 081 082
          083 084 085 086 087 088 089 090 091 092 093 094 095
          096 097 098 099 101 102 103 104 105 107 108 109 110
          111 112 113 114 115 116 117 118 119 121 122 123 124
          128 129 130 131 132 133 134 135 136 137 138 139 141
          142 144 146 148 150 152

The reasons for the failed tests are:

 - 033 - needs adapting to use image opts syntax with blkdebug
         and test image in order to correctly set align property
 - 120 - needs adapting to use correct -drive syntax for luks
 - 140 - needs adapting to use correct -drive syntax for luks
 - 145 - needs adapting to use correct -drive syntax for luks

The vast majority of skipped tests are exercising code that is
qcow2 specific, though a couple could probably be usefully
enabled for luks too.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1462896689-18450-4-git-send-email-berrange@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-05-12 15:33:24 +02:00
Daniel P. Berrange
b7e875b2f9 block: add support for encryption secrets in block I/O tests
The LUKS block driver tests will require the ability to specify
encryption secrets with block devices. This requires using the
--object argument to qemu-img/qemu-io to create a 'secret'
object.

When the IMGKEYSECRET env variable is set, it provides the
password to be associated with a secret called 'keysec0'

The _qemu_img_wrapper function isn't modified as that needs
to cope with differing syntax for subcommands, so can't be
made to use the image opts syntax unconditionally.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1462896689-18450-3-git-send-email-berrange@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-05-12 15:33:24 +02:00
Daniel P. Berrange
076003f526 block: add support for --image-opts in block I/O tests
Currently all block tests use the traditional syntax for images
just specifying a filename. To support the LUKS driver without
resorting to JSON, the tests need to be able to use the new
--image-opts argument to qemu-img and qemu-io.

This introduces a new env variable IMGOPTSSYNTAX. If this is
set to 'true', then qemu-img/qemu-io should use --image-opts.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1462896689-18450-2-git-send-email-berrange@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-05-12 15:33:24 +02:00
Eric Blake
c2e001cc82 qemu-io: Add 'write -z -u' to test MAY_UNMAP flag
Make it easier to control whether the BDRV_REQ_MAY_UNMAP flag
can be passed through a write_zeroes command, by adding the '-u'
flag to qemu-io 'write -z' and 'aio_write -z'.  To be useful,
the device has to be opened with BDRV_O_UNMAP (done by default
in qemu-io, but can be made explicit with '-d unmap').

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1462677405-4752-7-git-send-email-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-05-12 15:33:24 +02:00
Eric Blake
770e0e0e80 qemu-io: Add 'write -f' to test FUA flag
Make it easier to test block drivers with BDRV_REQ_FUA in
.supported_write_flags, by adding the '-f' flag to qemu-io to
conditionally pass the flag through to specific writes ('write',
'write -z', 'writev', 'aio_write', 'aio_write -z'). You'll want
to use 'qemu-io -t none' to actually make -f useful (as
otherwise, the default writethrough mode automatically sets the
FUA bit on every write).

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 1462677405-4752-6-git-send-email-eblake@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-05-12 15:33:24 +02:00
Eric Blake
093ea232b0 qemu-io: Allow unaligned access by default
There's no reason to require the user to specify a flag just so
they can pass in unaligned numbers.  Keep 'read -p' and 'write -p'
as no-ops so that I don't have to hunt down and update all users
of qemu-io, but otherwise make their behavior default as 'read' and
'write'.  Also fix 'write -z', 'readv', 'writev', 'writev',
'aio_read', 'aio_write', and 'aio_write -z'.  For now, 'read -b',
'write -b', and 'write -c' still require alignment (and 'multiwrite',
but that's slated to die soon).

qemu-iotest 23 is updated to match, as the only test that was
previously explicitly expecting an error on an unaligned request.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 1462677405-4752-5-git-send-email-eblake@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-05-12 15:33:24 +02:00
Eric Blake
dc38852aaa qemu-io: Use bool for command line flags
We require a C99 compiler; let's use it to express what we
really mean.

(Yes, we now have an instance of 'if (bool + bool + bool > 1)',
which, although semantically valid C, looks ugly; it gets
cleaned up later.)

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1462677405-4752-4-git-send-email-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-05-12 15:33:24 +02:00
Eric Blake
b8d970f1a9 qemu-io: Make 'open' subcommand more like command line
The command line defaults to BDRV_O_UNMAP, but can use
-d to reset it.  Meanwhile, the 'open' subcommand was
defaulting to no discards, with no way to set it.

The command line has both -n and -tMODE to set a variety
of cache modes, but the 'open' subcommand had only -n.

The 'open' subcommand had no way to set BDRV_O_NATIVE_AIO.

Note that the 'reopen' subcommand uses '-c' where the
command line and 'open' use -t.  Making that consistent
would be a separate patch.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 1462677405-4752-3-git-send-email-eblake@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-05-12 15:33:24 +02:00
Eric Blake
e4e12bb26d qemu-io: Add missing option documentation
The Usage: summary is missing several options, but rather than
having to maintain it, it's simpler to just state [OPTIONS],
since the options are spelled out below.

Commit 499afa2 added --image-opts, but forgot to document it in
--help.  Likewise for commit 9e8f183 and -d/--discard.

Commit e3aff4f6 put "-o/--offset" in the long opts, but it has
never been honored.

Add a note that '-n' is short for '-t none'.

Commit 9a2d77ad killed the -C option, but forgot to undocument
it for the 'open' subcommand.

Finally, commit 10d9d75 removed -g/--growable, but forgot to
cull it from the valid short options.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 1462677405-4752-2-git-send-email-eblake@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-05-12 15:33:23 +02:00
Wen Congyang
7f82159769 qmp: add monitor command to add/remove a child
The new QMP command name is x-blockdev-change. It's just for adding/removing
quorum's child now, and doesn't support all kinds of children, all kinds of
operations, nor all block drivers. So it is experimental now.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 1462865799-19402-4-git-send-email-xiecl.fnst@cn.fujitsu.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-05-12 15:33:23 +02:00
Wen Congyang
98292c61bc quorum: implement bdrv_add_child() and bdrv_del_child()
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Message-id: 1462865799-19402-3-git-send-email-xiecl.fnst@cn.fujitsu.com
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-05-12 15:33:23 +02:00
Wen Congyang
e06018ad28 Add new block driver interface to add/delete a BDS's child
In some cases, we want to take a quorum child offline, and take
another child online.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 1462865799-19402-2-git-send-email-xiecl.fnst@cn.fujitsu.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-05-12 15:33:23 +02:00
Ren Kimura
263a6f4c3a qemu-img: check block status of backing file when converting.
When converting images, check the block status of its backing file chain
to avoid needlessly reading zeros.

Signed-off-by: Ren Kimura <rkx1209dev@gmail.com>
Message-id: 1461773098-20356-1-git-send-email-rkx1209dev@gmail.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-05-12 15:33:23 +02:00
Wei Jiangang
9036e87c74 iotests: fix the redirection order in 083
It should redirect stdout to /dev/null first,
then redirect stderr to whatever stdout currently points at.

Signed-off-by: Wei Jiangang <weijg.fnst@cn.fujitsu.com>
Message-id: 1461665601-14908-1-git-send-email-weijg.fnst@cn.fujitsu.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-05-12 15:33:23 +02:00
Fam Zheng
aad0b7a0bf block: Inactivate all children
Currently we only inactivate the top BDS. Actually bdrv_inactivate
should be the opposite of bdrv_invalidate_cache.

Recurse into the whole subtree instead.

Because a node may have multiple parents, and because once
BDRV_O_INACTIVE is set for a node, further writes are not allowed, we
cannot interleave flag settings and .bdrv_inactivate calls (that may
submit write to other nodes in a graph) within a single pass. Therefore
two passes are used here.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:09 +02:00
Fam Zheng
c9e9e9c66c block: Drop superfluous invalidating bs->file from drivers
Now they are invalidated by the block layer, so it's not necessary to
do this in block drivers' implementations of .bdrv_invalidate_cache.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:09 +02:00
Fam Zheng
0d1c5c9160 block: Invalidate all children
Currently we only recurse to bs->file, which will miss the children in quorum
and VMDK.

Recurse into the whole subtree to avoid that.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:09 +02:00
Eric Blake
52a4650574 nbd: Simplify client FUA handling
Now that the block layer honors per-bds FUA support, we don't
have to duplicate the fallback flush at the NBD layer.  The
static function nbd_co_writev_flags() is no longer needed, and
the driver can just directly use nbd_client_co_writev().

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:09 +02:00
Eric Blake
465fe887cc block: Honor BDRV_REQ_FUA during write_zeroes
The block layer has a couple of cases where it can lose
Force Unit Access semantics when writing a large block of
zeroes, such that the request returns before the zeroes
have been guaranteed to land on underlying media.

SCSI does not support FUA during WRITESAME(10/16); FUA is only
supported if it falls back to WRITE(10/16).  But where the
underlying device is new enough to not need a fallback, it
means that any upper layer request with FUA semantics was
silently ignoring BDRV_REQ_FUA.

Conversely, NBD has situations where it can support FUA but not
ZERO_WRITE; when that happens, the generic block layer fallback
to bdrv_driver_pwritev() (or the older bdrv_co_writev() in qemu
2.6) was losing the FUA flag.

The problem of losing flags unrelated to ZERO_WRITE has been
latent in bdrv_co_do_write_zeroes() since commit aa7bfbff, but
back then, it did not matter because there was no FUA flag.  It
became observable when commit 93f5e6d8 paved the way for flags
that can impact correctness, when we should have been using
bdrv_co_writev_flags() with modified flags.  Compare to commit
9eeb6dd, which got flag manipulation right in
bdrv_co_do_zero_pwritev().

Symptoms: I tested with qemu-io with default writethrough cache
(which is supposed to use FUA semantics on every write), and
targetted an NBD client connected to a server that intentionally
did not advertise NBD_FLAG_SEND_FUA.  When doing 'write 0 512',
the NBD client sent two operations (NBD_CMD_WRITE then
NBD_CMD_FLUSH) to get the fallback FUA semantics; but when doing
'write -z 0 512', the NBD client sent only NBD_CMD_WRITE.

The fix is do to a cleanup bdrv_co_flush() at the end of the
operation if any step in the middle relied on a BDS that does
not natively support FUA for that step (note that we don't
need to flush after every operation, if the operation is broken
into chunks based on bounce-buffer sizing).  Each BDS gains a
new flag .supported_zero_flags, which parallels the use of
.supported_write_flags but only when accessing a zero write
operation (the flags MUST be different, because of SCSI having
different semantics based on WRITE vs. WRITESAME; and also
because BDRV_REQ_MAY_UNMAP only makes sense on zero writes).

Also fix some documentation to describe -ENOTSUP semantics,
particularly since iscsi depends on those semantics.

Down the road, we may want to add a driver where its
.bdrv_co_pwritev() honors all three of BDRV_REQ_FUA,
BDRV_REQ_ZERO_WRITE, and BDRV_REQ_MAY_UNMAP, and advertise
this via bs->supported_write_flags for blocks opened by that
driver; such a driver should NOT supply .bdrv_co_write_zeroes
nor .supported_zero_flags.  But none of the drivers touched
in this patch want to do that (the act of writing zeroes is
different enough from normal writes to deserve a second
callback).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:09 +02:00
Eric Blake
4df863f336 block: Make supported_write_flags a per-bds property
Pre-patch, .supported_write_flags lives at the driver level, which
means we are blindly declaring that all block devices using a
given driver will either equally support FUA, or that we need a
fallback at the block layer.  But there are drivers where FUA
support is a per-block decision: the NBD block driver is dependent
on the remote server advertising NBD_FLAG_SEND_FUA (and has
fallback code to duplicate the flush that the block layer would do
if NBD had not set .supported_write_flags); and the iscsi block
driver is dependent on the mode sense bits advertised by the
underlying device (and is currently silently ignoring FUA requests
if the underlying device does not support FUA).

The fix is to make supported flags as a per-BDS option, set during
.bdrv_open().  This patch moves the variable and fixes NBD and iscsi
to set it only conditionally; later patches will then further
simplify the NBD driver to quit duplicating work done at the block
layer, as well as tackle the fact that SCSI does not support FUA
semantics on WRITESAME(10/16) but only on WRITE(10/16).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:09 +02:00
Denis V. Lunev
2928abce6d qcow2: improve qcow2_co_write_zeroes()
There is a possibility that qcow2_co_write_zeroes() will be called
with the partial block. This could be synthetically triggered with
    qemu-io -c "write -z 32k 4k"
and can happen in the real life in qemu-nbd. The latter happens under
the following conditions:
    (1) qemu-nbd is started with --detect-zeroes=on and is connected to the
        kernel NBD client
    (2) third party program opens kernel NBD device with O_DIRECT
    (3) third party program performs write operation with memory buffer
        not aligned to the page
In this case qcow2_co_write_zeroes() is unable to perform the operation
and mark entire cluster as zeroed and returns ENOTSUP. Thus the caller
switches to non-optimized version and writes real zeroes to the disk.

The patch creates a shortcut. If the block is read as zeroes, f.e. if
it is unallocated, the request is extended to cover full block.
User-visible situation with this block is not changed. Before the patch
the block is filled in the image with real zeroes. After that patch the
block is marked as zeroed in metadata. Thus any subsequent changes in
backing store chain are not affected.

Kevin, thank you for a cool suggestion.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:09 +02:00
Eric Blake
7b1deac84e block: Kill unused sector-based blk_* functions
Now that there are no remaining clients, we can drop the
sector-based blk_read(), blk_write(), blk_aio_readv(), and
blk_aio_writev().  Sadly, there are still remaining
sector-based interfaces, such as blk_*discard(), or
blk_write_compressed(); those will have to wait for another
day.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:09 +02:00
Eric Blake
7b3f9712e1 qemu-io: Switch to byte-based block access
qemu-io is the last user of several sector-based interfaces.
This patch upgrades to the new interfaces under the hood,
then deletes the resulting dead code.  Note that for maximum
back-compat, while the -p option is no longer required to get
blk_pread(), it is still needed to allow for unaligned access;
this is because qemu-iotest 23 relies on qemu-io rejecting
unaligned accesses without -p.  A later patch may clean up the
interface to be more user-friendly, but it's better to separate
what's done under the hood from what the user sees.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:09 +02:00
Eric Blake
9166920a0b qemu-img: Switch to byte-based block access
Sector-based blk_write() should die; switch to byte-based
blk_pwrite() instead.  Likewise for blk_read().

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:09 +02:00
Eric Blake
bd31c214c3 nbd: Switch to byte-based block access
Sector-based blk_read() should die; switch to byte-based
blk_pread() instead.

Add a constant for our magic number 512, to make it obvious
that this size will NOT change even if BDRV_SECTOR_SIZE does,
even though the two happen to be the same for now.  Split
assignments from conditionals to keep checkpatch.pl happy.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:09 +02:00
Eric Blake
26a122d3d4 atapi: Switch to byte-based block access
Sector-based blk_read() should die; switch to byte-based
blk_pread() instead.

Add new defines ATAPI_SECTOR_BITS and ATAPI_SECTOR_SIZE to
use anywhere we were previously scaling BDRV_SECTOR_* by 4,
for better legibility.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:09 +02:00
Eric Blake
243e6f69c1 m25p80: Switch to byte-based block access
Sector-based blk_read() should die; switch to byte-based
blk_pread() instead.

Likewise for blk_aio_readv() and blk_aio_writev().

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:09 +02:00
Eric Blake
12c125cba9 sd: Switch to byte-based block access
Sector-based blk_write() should die; switch to byte-based
blk_pwrite() instead.  Likewise for blk_read().

Greatly simplifies the code, now that we let the block layer
take care of alignment and read-modify-write on our behalf :)
In fact, we no longer need to include 'buf' in the migration
stream (although we do have to ensure that the stream remains
compatible).

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:09 +02:00
Eric Blake
098e732dbe pflash: Switch to byte-based block access
Sector-based blk_write() should die; switch to byte-based
blk_pwrite() instead.  Likewise for blk_read().

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:09 +02:00
Eric Blake
441692ddd8 onenand: Switch to byte-based block access
Sector-based blk_write() should die; switch to byte-based
blk_pwrite() instead.  Likewise for blk_read().

This particular device picks its size during onenand_initfn(),
and can be at most 0x80000000 bytes; therefore, shifting an
'int sec' request to get back to a byte offset should never
overflow 32 bits.  But adding assertions to document that point
should not hurt.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:09 +02:00
Eric Blake
9fc0d361cc nand: Switch to byte-based block access
Sector-based blk_write() should die; switch to byte-based
blk_pwrite() instead.  Likewise for blk_read().

This file is doing some complex computations to map various
flash page sizes (256, 512, and 2048) atop generic uses of
512-byte sector operations.  Perhaps someone will want to tidy
up the file for fewer gymnastics in managing addresses and
offsets, and less wasteful visits of 256-byte pages, but it
was out of scope for this series, where I just went with the
mechanical conversion.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:09 +02:00
Eric Blake
a7a5b7c0fc fdc: Switch to byte-based block access
Sector-based blk_write() should die; switch to byte-based
blk_pwrite() instead.  Likewise for blk_read().

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:09 +02:00
Eric Blake
d00000f901 xen_disk: Switch to byte-based aio block access
Sector-based blk_aio_readv() and blk_aio_writev() should die; switch
to byte-based blk_aio_preadv() and blk_aio_pwritev() instead.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:09 +02:00
Eric Blake
b5772fdde4 virtio: Switch to byte-based aio block access
Sector-based blk_aio_readv() and blk_aio_writev() should die; switch
to byte-based blk_aio_preadv() and blk_aio_pwritev() instead.

The trace is modified at the same time, and nb_sectors is now
unused.  Fix a comment typo while in the vicinity.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:09 +02:00
Eric Blake
03c90063cc scsi-disk: Switch to byte-based aio block access
Sector-based blk_aio_readv() and blk_aio_writev() should die; switch
to byte-based blk_aio_preadv() and blk_aio_pwritev() instead.

As part of the cleanup, scsi_init_iovec() no longer needs to return
a value, and reword a comment.

[ kwolf: Fix read accounting change ]

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:09 +02:00
Eric Blake
d4f510eb3f ide: Switch to byte-based aio block access
Sector-based blk_aio_readv() and blk_aio_writev() should die; switch
to byte-based blk_aio_preadv() and blk_aio_pwritev() instead.

The patch had to touch multiple files at once, because dma_blk_io()
takes pointers to the functions, and ide_issue_trim() piggybacks on
the same interface (while ignoring offset under the hood).

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:08 +02:00
Eric Blake
60cb2fa7eb block: Introduce byte-based aio read/write
blk_aio_readv() and blk_aio_writev() are annoying in that they
can't access sub-sector granularity, and cannot pass flags.
Also, they require the caller to pass redundant information
about the size of the I/O (qiov->size in bytes must match
nb_sectors in sectors).

Add new blk_aio_preadv() and blk_aio_pwritev() functions to fix
the flaws. The next few patches will upgrade callers, then
finally delete the old interfaces.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:08 +02:00
Eric Blake
983a160050 block: Switch blk_*write_zeroes() to byte interface
Sector-based blk_write() should die; convert the one-off
variant blk_write_zeroes() to use an offset/count interface
instead.  Likewise for blk_co_write_zeroes() and
blk_aio_write_zeroes().

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:08 +02:00
Eric Blake
b7d17f9fa4 block: Switch blk_read_unthrottled() to byte interface
Sector-based blk_read() should die; convert the one-off
variant blk_read_unthrottled().

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:08 +02:00
Eric Blake
8341f00dc2 block: Allow BDRV_REQ_FUA through blk_pwrite()
We have several block drivers that understand BDRV_REQ_FUA,
and emulate it in the block layer for the rest by a full flush.
But without a way to actually request BDRV_REQ_FUA during a
pass-through blk_pwrite(), FUA-aware block drivers like NBD are
forced to repeat the emulation logic of a full flush regardless
of whether the backend they are writing to could do it more
efficiently.

This patch just wires up a flags argument; followup patches
will actually make use of it in the NBD driver and in qemu-io.

Signed-off-by: Eric Blake <eblake@redhat.com>
Acked-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:08 +02:00
Kevin Wolf
0e01b76e7c qemu-io: Fix memory leak in 'aio_write -z'
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-05-12 15:22:08 +02:00
Janne Karhunen
f249924e96 Allow users to specify the vmdk virtual hardware version.
Vmdk images have metadata to indicate the vmware virtual
hardware version image was created/tested to run with.
Allow users to specify that version via new 'hwversion'
option.

[ kwolf: Adjust qemu-iotests common.filter ]

Signed-off-by: Janne Karhunen <Janne.Karhunen@gmail.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:08 +02:00
Zhou Jie
ed79f37d9b block: always compile-check debug prints
Files with conditional debug statements should ensure that the printf is
always compiled. This prevents bitrot of the format string of the debug
statement. And switch debug output to stderr.

Signed-off-by: Zhou Jie <zhoujie2011@cn.fujitsu.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:08 +02:00
Wei Jiangang
547cb1574e block: Fix typo in comment
s/imlement/implement/

Signed-off-by: Wei Jiangang <weijg.fnst@cn.fujitsu.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:08 +02:00
Kevin Wolf
e3ddef25e9 block: Remove BlockDriver.bdrv_read/write
There are no block drivers left that implement the old .bdrv_read/write
interface, so it can be removed now. This gets us rid of the
corresponding emulation functions, too.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-12 15:22:08 +02:00
Kevin Wolf
4575eb496d vvfat: Implement .bdrv_co_preadv/pwritev interfaces
This doesn't really convert any of the actual vvfat logic to use
vectored I/O (and it's doubtful whether that would make sense), but
instead just adapts the wrappers to the modern interface.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-12 15:22:08 +02:00
Kevin Wolf
513b0f026b vpc: Implement .bdrv_co_pwritev() interface
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-12 15:22:08 +02:00
Kevin Wolf
d46b7cc680 vpc: Implement .bdrv_co_preadv() interface
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-12 15:22:08 +02:00
Kevin Wolf
37b1d7d8c9 vmdk: Implement .bdrv_co_pwritev() interface
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-12 15:22:08 +02:00
Kevin Wolf
f10cc24359 vmdk: Implement .bdrv_co_preadv() interface
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-12 15:22:08 +02:00
Kevin Wolf
a844a2b0d4 vmdk: Add vmdk_find_offset_in_cluster()
This is a byte granularity version of vmdk_find_index_in_cluster().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-12 15:22:08 +02:00
Kevin Wolf
fde9d56f5b vdi: Implement .bdrv_co_pwritev() interface
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-12 15:22:08 +02:00
Kevin Wolf
0865bb6f04 vdi: Implement .bdrv_co_preadv() interface
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-12 15:22:08 +02:00
Kevin Wolf
3edf1e73d5 dmg: Implement .bdrv_co_preadv() interface
This implements .bdrv_co_preadv() for the cloop block driver. While
updating the error paths, change -1 to a valid -errno code.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-12 15:22:08 +02:00
Kevin Wolf
5cd230819e cloop: Implement .bdrv_co_preadv() interface
This implements .bdrv_co_preadv() for the cloop block driver. While
updating the error paths, change -1 to a valid -errno code.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-12 15:22:08 +02:00
Kevin Wolf
3b8fd33011 bochs: Implement .bdrv_co_preadv() interface
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-12 15:22:08 +02:00
Kevin Wolf
3fb06697ae block: Introduce .bdrv_co_preadv/pwritev BlockDriver function
Many parts of the block layer are already byte granularity. The block
driver interface, however, was still missing an interface that allows
making use of this. This patch introduces a new BlockDriver interface,
which is based on coroutines, vectored, has flags and uses a byte
granularity. This is now the preferred interface for new drivers.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-12 15:22:08 +02:00
Kevin Wolf
cab3a3563c block: Rename bdrv_co_do_preadv/writev to bdrv_co_preadv/writev
It used to be an internal helper function just for implementing
bdrv_co_do_readv/writev(), but now that it's a public interface, it
deserves a name without "do" in it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-12 15:22:08 +02:00
Kevin Wolf
0884447382 block: Support AIO drivers in bdrv_driver_preadv/pwritev()
Instead of registering emulation functions as .bdrv_co_writev, just
directly check whether the function is there or not, and use the AIO
interface if it isn't. This makes the read/write functions more
consistent with how things are done in other places (flush, discard,
etc.)

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-12 15:22:07 +02:00
Kevin Wolf
78a07294d5 block: Introduce bdrv_driver_pwritev()
This is a function that simply calls into the block driver for doing a
write, providing the byte granularity interface we want to eventually
have everywhere, and using whatever interface that driver supports.

This one is a bit more interesting than the version for reads: It adds
support for .bdrv_co_writev_flags() everywhere, so that drivers
implementing this function can drop .bdrv_co_writev() now.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-12 15:22:07 +02:00
Kevin Wolf
166fe96051 block: Introduce bdrv_driver_preadv()
This is a function that simply calls into the block driver for doing a
read, providing the byte granularity interface we want to eventually
have everywhere, and using whatever interface that driver supports.

For now, this is just a wrapper for calling bs->drv->bdrv_co_readv().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2016-05-12 15:22:07 +02:00
Paolo Bonzini
dd7f7ed104 linux-aio: make it more type safe
Replace void* with an opaque LinuxAioState type.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:07 +02:00
Paolo Bonzini
6b98bd6495 block: plug whole tree at once, introduce bdrv_io_unplugged_begin/end
Extract the handling of io_plug "depth" from linux-aio.c and let the
main bdrv_drain loop do nothing but wait on I/O.

Like the two newly introduced functions, bdrv_io_plug and bdrv_io_unplug
now operate on all children.  The visit order is now symmetrical between
plug and unplug, making it possible for formats to implement plug/unplug.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:07 +02:00
Paolo Bonzini
ce0f141259 block: introduce bdrv_no_throttling_begin/end
Extract the handling of throttling from bdrv_flush_io_queue.  These
new functions will soon become BdrvChildRole callbacks, as they can
be generalized to "beginning of drain" and "end of drain".

Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:07 +02:00
Paolo Bonzini
b6e84c97ed block: extract bdrv_drain_poll/bdrv_co_yield_to_drain from bdrv_drain/bdrv_co_drain
Do not call bdrv_drain_recurse twice in bdrv_co_drain.  A small
tweak to the logic in Fam's patch, which is harmless since no
one implements bdrv_drain anyway.  But better get it right.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:07 +02:00
Paolo Bonzini
a72f641407 block: move restarting of throttled reqs to block/throttle-groups.c
We want to remove throttled_reqs from block/io.c.  This is the easy
part---hide the handling of throttled_reqs during disable/enable of
throttling within throttle-groups.c.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:07 +02:00
Paolo Bonzini
733bbc8cea block: make bdrv_start_throttled_reqs return void
The return value is unused and I am not sure why it would be useful.

Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:07 +02:00
Kevin Wolf
90c78624f1 block: Don't disable I/O throttling on sync requests
We had to disable I/O throttling with synchronous requests because we
didn't use to run timers in nested event loops when the code was
introduced. This isn't true any more, and throttling works just fine
even when using the synchronous API.

The removed code is in fact dead code since commit a8823a3b ('block: Use
blk_co_pwritev() for blk_write()') because I/O throttling can only be
set on the top layer, but BlockBackend always uses the coroutine
interface now instead of using the sync API emulation in block.c.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <1458660792-3035-2-git-send-email-kwolf@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-05-12 15:22:07 +02:00
xiaoqiang.zhao
0bc91ab3bb hw/arm: QOM'ify versatilepb.c
Drop the use of old SysBus init function and use instance_init

Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-12 13:42:12 +01:00
xiaoqiang.zhao
5a67508c7a hw/arm: QOM'ify strongarm.c
Drop the use of old SysBus init function and use instance_init

Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-12 13:42:11 +01:00
xiaoqiang.zhao
15c4fff5d8 hw/arm: QOM'ify stellaris.c
* Drop the use of old SysBus init function and use instance_init
* Use DeviceClass::vmsd instead of 'vmstate_register' function

Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-12 13:42:10 +01:00
xiaoqiang zhao
f68575c956 hw/arm: QOM'ify spitz.c
Drop the use of old SysBus init function and use instance_init

Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-12 13:42:09 +01:00
xiaoqiang.zhao
08ba3fde1d hw/arm: QOM'ify pxa2xx_pic.c
Remove the empty 'pxa2xx_pic_initfn' and it's
setup code in the 'pxa2xx_pic_class_init'

Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-12 13:42:08 +01:00
xiaoqiang.zhao
16fb31a382 hw/arm: QOM'ify pxa2xx.c
Drop the use of old SysBus init function and use instance_init

Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-12 13:42:07 +01:00
xiaoqiang.zhao
a1f42e0c9a hw/arm: QOM'ify integratorcp.c
* Drop the use of old SysBus init function and use instance_init
* Remove the empty 'icp_pic_class_init' from Typeinfo

Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-12 13:42:06 +01:00
xiaoqiang.zhao
ff7a27c15a hw/arm: QOM'ify highbank.c
Drop the use of old SysBus init function and use instance_init

Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-12 13:42:06 +01:00
xiaoqiang.zhao
3f5ab25490 hw/arm: QOM'ify armv7m.c
Drop the use of old SysBus init function and use instance_init

Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-12 13:40:48 +01:00
Peter Maydell
6459b94c26 target-arm: Avoid unnecessary TLB flush on TCR_EL2, TCR_EL3 writes
The TCR_EL2 and TCR_EL3 regdefs were incorrectly using the
vmsa_tcr_el1_write function for writes. Since these registers don't
have the A1 bit that TCR_EL1 does, we don't need to do a tlb_flush()
when they are written. Remove the unnecessary .writefn and also the
harmless but unneeded .raw_writefn and .resetfn definitions.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Sergey Fedorov <sergey.fedorov@linaro.org>
2016-05-12 13:22:30 +01:00
Peter Maydell
4274d821ff hw/display/blizzard: Remove blizzard_template.h
We no longer need to do the "multiply include this header" trick with
blizzard_template.h, and it is only used in a single .c file, so just
put its contents inline in blizzard.c.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1462371352-21498-3-git-send-email-peter.maydell@linaro.org
2016-05-12 13:22:30 +01:00
Peter Maydell
5c8759087d hw/display/blizzard: Expand out macros
Now that we can assume that only depth 32 is possible, there's no need
for the COPY_PIXEL1 and PIXEL_TYPE macros, and the SKIP_PIXEL, COPY_PIXEL
and SWAP_WORDS macros aren't used at all. Expand out COPY_PIXEL1 and
PIXEL_TYPE where they are used, delete the unused macro definitions, and
expand out the uses of glue(name_prefix, DEPTH).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1462371352-21498-2-git-send-email-peter.maydell@linaro.org
2016-05-12 13:22:29 +01:00
Jean-Christophe DUBOIS
3a0f31bcb8 i.MX: Add sabrelite i.MX6 emulation.
The sabrelite supports one SPI FLASH memory on SPI1

Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-12 13:22:29 +01:00
Jean-Christophe DUBOIS
ec46eaa83a i.MX: Add i.MX6 SOC implementation.
For now we only support the following devices:
* up to 4 Cortex A9 cores
* A9 MPCORE (SCU, GIC, TWD)
* 5 i.MX UARTs
* 2 EPIT timers
* 1 GPT timer
* 3 I2C controllers
* 7 GPIO controllers
* 6 SDHC controllers
* 5 SPI controllers
* 1 CCM device
* 1 SRC device
* various ROM/RAM areas.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-12 13:22:29 +01:00
Jean-Christophe DUBOIS
c906a3a015 i.MX: Add the Freescale SPI Controller
Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-12 13:22:29 +01:00
Jean-Christophe DUBOIS
53374b16a2 FIFO: Add a FIFO32 implementation
This one is build on top of the existing FIFO8

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-12 13:22:29 +01:00
Jean-Christophe DUBOIS
1983057470 i.MX: Add i.MX6 System Reset Controller device.
This controller is also present in i.MX5X devices but they are not
yet emulated by QEMU.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-12 13:22:28 +01:00
Jean-Christophe DUBOIS
825482adde ARM: Factor out ARM on/off PSCI control functions
Split ARM on/off function from PSCI support code.

This will allow to reuse these functions in other code.

Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-12 13:22:28 +01:00
Shannon Zhao
2b302e1e3c ACPI: Virt: Generate SRAT table
To support NUMA, it needs to generate SRAT ACPI table.

Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-id: 1461667229-9216-6-git-send-email-zhaoshenglong@huawei.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-12 13:22:28 +01:00
Shannon Zhao
64b831367b ACPI: move acpi_build_srat_memory to common place
Move acpi_build_srat_memory to common place so that it could be reused
by ARM. Rename it to build_srat_memory.

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-id: 1461667229-9216-5-git-send-email-zhaoshenglong@huawei.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-12 13:22:28 +01:00
Shannon Zhao
ea9fcbd7d0 ACPI: Fix the definition of proximity in AcpiSratMemoryAffinity
ACPI spec says that Proximity Domain is an "Integer that represents
the proximity domain to which the processor belongs". So define it as a
uint32_t.

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-id: 1461667229-9216-4-git-send-email-zhaoshenglong@huawei.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-12 13:22:28 +01:00
Shannon Zhao
e6e400d54f ACPI: Add GICC Affinity Structure
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-id: 1461667229-9216-3-git-send-email-zhaoshenglong@huawei.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-12 13:22:27 +01:00
Shannon Zhao
9695200ad8 ARM: Virt: Set numa-node-id for cpu and memory nodes
Generate memory nodes according to NUMA topology. Set numa-node-id
property for cpu and memory nodes.

Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-id: 1461667229-9216-2-git-send-email-zhaoshenglong@huawei.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-12 13:22:27 +01:00
xiaoqiang zhao
3c09d6caad hw/display: QOM'ify exynos4210_fimd.c
* Drop the old SysBus init function and use instance_init
* Move graphic_console_init into realize stage

Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Message-id: 1462417489-28603-2-git-send-email-zxq_yx_007@163.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-12 13:22:27 +01:00
Edgar E. Iglesias
cd694521ca target-arm/translate-a64.c: Unify some of the ldst_reg decoding
The various load/store variants under disas_ldst_reg can all reuse the
same decoding for opc, size, rt and is_vector.

This patch unifies the decoding in preparation for generating
instruction syndromes for data aborts.
This will allow us to reduce the number of places to hook in updates
to the load/store state needed to generate the insn syndromes.

No functional change.

Reviewed-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Message-id: 1461931684-1867-7-git-send-email-edgar.iglesias@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-12 13:22:27 +01:00
Edgar E. Iglesias
026a19c312 target-arm/translate-a64.c: Use extract32 in disas_ldst_reg_imm9
Use extract32 instead of open coding the bit masking when decoding
is_signed and is_extended. This streamlines the decoding with some
of the other ldst variants.

No functional change.

Reviewed-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Message-id: 1461931684-1867-6-git-send-email-edgar.iglesias@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-12 13:22:27 +01:00
Peter Maydell
094d028a79 target-arm: Split data abort syndrome generator
Split the data abort syndrome generator into two versions:
One with a valid Instruction Specific Syndrome (ISS) and another without.

The following new flags are supported by the syndrome generator
with ISS:
* isv - Instruction syndrome valid
* sas - Syndrome access size
* sse - Syndrome sign extend
* srt - Syndrome register transfer
* sf  - Sixty-Four bit register width
* ar  - Acquire/Release

These flags are not yet used, so this patch has no functional change
except that we will now correctly set the IL bit in data abort
syndromes without ISS information.

Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Message-id: 1461931684-1867-5-git-send-email-edgar.iglesias@gmail.com>
[PMM: squashed in with patch which was just adding the IL bit]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-12 13:22:26 +01:00
Edgar E. Iglesias
25caa94c4a gen-icount: Use tcg_set_insn_param
Use tcg_set_insn_param() instead of directly accessing internal
tcg data structures to update an insn param.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Message-id: 1461931684-1867-3-git-send-email-edgar.iglesias@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-12 13:22:26 +01:00
Edgar E. Iglesias
1d41478fd4 tcg: Add tcg_set_insn_param
Add tcg_set_insn_param as a mechanism to modify an insn
parameter after emiting the insn. This is useful for icount
and also for embedding fault information for a specific insn.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Message-id: 1461931684-1867-2-git-send-email-edgar.iglesias@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-12 13:22:26 +01:00
Sergey Sorokin
dddb522341 target-arm: Fix descriptor address masking in ARM address translation
There is a bug in ARM address translation regime with a long-descriptor
format. On the descriptor reading its address is formed from an index
which is a part of the input address. And on the first iteration this index
is incorrectly masked with 'grainsize' mask. But it can be wider according
to pseudo-code.
On the other hand on the iterations other than first the descriptor address
is formed from the previous level descriptor by masking with 'descaddrmask'
value. It always clears just 12 lower bits, but it must clear 'grainsize'
lower bits instead according to pseudo-code.
The patch fixes both cases.

Signed-off-by: Sergey Sorokin <afarallax@yandex.ru>
Message-id: 1460996853-22117-1-git-send-email-afarallax@yandex.ru
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-12 13:22:26 +01:00
Sergey Sorokin
dfda68377e target-arm: Stage 2 permission fault was fixed in AArch32 state
As described in AArch32.CheckS2Permission an instruction fetch fails if
XN bit is set or there is no read permission for the address.

Signed-off-by: Sergey Sorokin <afarallax@yandex.ru>
Message-id: 1461002400-3187-1-git-send-email-afarallax@yandex.ru
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-12 13:22:26 +01:00
Zhou Jie
0b062eb090 hw/arm/nseries: Allocating Large sized arrays to heap
n8x0_init has a huge stack usage of 65536 bytes approx.
Moving large arrays to heap to reduce stack usage.

Signed-off-by: Zhou Jie <zhoujie2011@cn.fujitsu.com>
Message-id: 1461651308-894-1-git-send-email-zhoujie2011@cn.fujitsu.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-12 13:22:25 +01:00
Sylvain Garrigues
27a5dc7be6 bcm2835_property: use cached values when querying framebuffer
As the framebuffer settings are copied into the result message before it is
reconfigured, inconsistent behavior can happen when, for instance, you set with
a single message the width, height, and depth, and ask at the same time to
allocate the buffer and get the pitch and the size.

In this case, the reported pitch and size would be incorrect as they were
computed with the initial values of width, height and depth, not the ones the
client requested.

Signed-off-by: Sylvain Garrigues <sylvain@sylvaingarrigues.com>
Reviewed-by: Andrew Baumann <Andrew.Baumann@microsoft.com>
Message-id: 1461325343-24995-1-git-send-email-sylvain@sylvaingarrigues.com
[PMM: folded a couple of long lines]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-12 13:22:25 +01:00
xiaoqiang zhao
0a750e2a78 hw/intc: QOM'ify omap_intc.c
* Split the old SysBus init into an instance_init and a
  DeviceClass::realize function
* Drop the old SysBus init function and use instance_init

Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-12 13:22:25 +01:00
xiaoqiang.zhao
22c70d8a6a hw/intc: QOM'ify grlib_irqmp.c
* Split the old SysBus init into an instance_init and a
  DeviceClass::realize function
* Drop the old SysBus init function

Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: corrected "can not" to "cannot" in error message]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-12 13:22:25 +01:00
xiaoqiang.zhao
c09008d2d3 hw/intc: QOM'ify slavio_intctl.c
Drop the old SysBus init function and use instance_init

Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-12 13:22:25 +01:00
xiaoqiang.zhao
e3be8b4f4f hw/intc: QOM'ify pl190.c
Drop the old SysBus init function and use instance_init

Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-12 13:22:25 +01:00
xiaoqiang.zhao
f777bda60f hw/intc: QOM'ify imx_avic.c
Drop the old SysBus init function and use instance_init

Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-12 13:22:24 +01:00
xiaoqiang.zhao
68d71616c0 hw/intc: QOM'ify exynos4210_gic.c
* Drop the old SysBus init function and use instance_init
* Split the exynos4210_irq_gate_init into an instance_init
  and a DeviceClass::realize function

Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-12 13:22:24 +01:00
xiaoqiang.zhao
d3d5a6febd hw/intc: QOM'ify exynos4210_combiner.c
Drop the old SysBus init function and use instance_init

Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-12 13:22:24 +01:00
xiaoqiang.zhao
b46818e9e7 hw/intc: QOM'ify etraxfs_pic.c
Drop the old SysBus init function and use instance_init

Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Tested-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-12 13:22:24 +01:00
Pooja Dhannawat
ea644cf343 omap_lcdc: Remove support for DEPTH != 32
surface_bits_per_pixel() always returns 32
so, removing other dead code which is
based on DEPTH !== 32

Signed-off-by: Pooja Dhannawat <dhannawatpooja1@gmail.com>
Message-id: 1459260142-9144-1-git-send-email-dhannawatpooja1@gmail.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-12 13:22:24 +01:00
Pooja Dhannawat
5c87c4089a blizzard: Remove support for DEPTH != 32
Removing support for DEPTH != 32 from blizzard template header
and file that includes it, as macro DEPTH == 32 only used.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Pooja Dhannawat <dhannawatpooja1@gmail.com>
Message-id: 1458971873-2768-1-git-send-email-dhannawatpooja1@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-12 13:22:24 +01:00
Peter Maydell
26617924e9 Open 2.7 development tree
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-12 12:35:25 +01:00
Aurelien Jarno
9d989c732b target-mips: fix call to memset in soft reset code
Recent versions of GCC report the following error when compiling
target-mips/helper.c:

  qemu/target-mips/helper.c:542:9: warning: ‘memset’ used with length
  equal to number of elements without multiplication by element size
  [-Wmemset-elt-size]

This is indeed correct and due to a wrong usage of sizeof(). Fix that.

Cc: Stefan Weil <sw@weilnetz.de>
Cc: Leon Alrae <leon.alrae@imgtec.com>
Cc: qemu-stable@nongnu.org
LP: https://bugs.launchpad.net/qemu/+bug/1577841
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Leon Alrae <leon.alrae@imgtec.com>
Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
2016-05-12 11:01:05 +01:00
Eric Blake
68ab47e4b4 qapi: Change visit_type_FOO() to no longer return partial objects
Returning a partial object on error is an invitation for a careless
caller to leak memory.  We already fixed things in an earlier
patch to guarantee NULL if visit_start fails ("qapi: Guarantee
NULL obj on input visitor callback error"), but that does not
help the case where visit_start succeeds but some other failure
happens before visit_end, such that we leak a partially constructed
object outside visit_type_FOO(). As no one outside the testsuite
was actually relying on these semantics, it is cleaner to just
document and guarantee that ALL pointer-based visit_type_FOO()
functions always leave a safe value in *obj during an input visitor
(either the new object on success, or NULL if an error is
encountered), so callers can now unconditionally use
qapi_free_FOO() to clean up regardless of whether an error occurred.

The decision is done by adding visit_is_input(), then updating the
generated code to check if additional cleanup is needed based on
the type of visitor in use.

Note that we still leave *obj unchanged after a scalar-based
visit_type_FOO(); I did not feel like auditing all uses of
visit_type_Enum() to see if the callers would tolerate a specific
sentinel value (not to mention having to decide whether it would
be better to use 0 or ENUM__MAX as that sentinel).

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1461879932-9020-25-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-05-12 09:47:55 +02:00
Eric Blake
d9f62dde13 qapi: Simplify semantics of visit_next_list()
The semantics of the list visit are somewhat baroque, with the
following pseudocode when FooList is used:

start()
for (prev = head; cur = next(prev); prev = &cur) {
    visit(&cur->value)
}

Note that these semantics (advance before visit) requires that
the first call to next() return the list head, while all other
calls return the next element of the list; that is, every visitor
implementation is required to track extra state to decide whether
to return the input as-is, or to advance.  It also requires an
argument of 'GenericList **' to next(), solely because the first
iteration might need to modify the caller's GenericList head, so
that all other calls have to do a layer of dereferencing.

Thankfully, we only have two uses of list visits in the entire
code base: one in spapr_drc (which completely avoids
visit_next_list(), feeding in integers from a different source
than uint8List), and one in qapi-visit.py.  That is, all other
list visitors are generated in qapi-visit.c, and share the same
paradigm based on a qapi FooList type, so we can refactor how
lists are laid out with minimal churn among clients.

We can greatly simplify things by hoisting the special case
into the start() routine, and flipping the order in the loop
to visit before advance:

start(head)
for (tail = *head; tail; tail = next(tail)) {
    visit(&tail->value)
}

With the simpler semantics, visitors have less state to track,
the argument to next() is reduced to 'GenericList *', and it
also becomes obvious whether an input visitor is allocating a
FooList during visit_start_list() (rather than the old way of
not knowing if an allocation happened until the first
visit_next_list()).  As a minor drawback, we now allocate in
two functions instead of one, and have to pass the size to
both functions (unless we were to tweak the input visitors to
cache the size to start_list for reuse during next_list, but
that defeats the goal of less visitor state).

The signature of visit_start_list() is chosen to match
visit_start_struct(), with the new parameters after 'name'.

The spapr_drc case is a virtual visit, done by passing NULL for
list, similarly to how NULL is passed to visit_start_struct()
when a qapi type is not used in those visits.  It was easy to
provide these semantics for qmp-output and dealloc visitors,
and a bit harder for qmp-input (several prerequisite patches
refactored things to make this patch straightforward).  But it
turned out that the string and opts visitors munge enough other
state during visit_next_list() to make it easier to just
document and require a GenericList visit for now; an assertion
will remind us to adjust things if we need the semantics in the
future.

Several pre-requisite cleanup patches made the reshuffling of
the various visitors easier; particularly the qmp input visitor.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1461879932-9020-24-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-05-12 09:47:55 +02:00
Eric Blake
74f24cb630 qapi: Fix string input visitor handling of invalid list
As shown in the previous commit, the string input visitor was
treating bogus input as an empty list rather than an error.
Fix parse_str() to set errp, then the callers to exit early if
an error was reported.

Meanwhile, fix the testsuite to use the generated
qapi_free_int16List() instead of rolling our own, and to
validate the fixed behavior, while at the same time documenting
one more change that we'd like to make in a later patch (a
failed visit_start_list should guarantee a NULL pointer,
regardless of what things were on input).

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1461879932-9020-23-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-05-12 09:47:55 +02:00
Markus Armbruster
7337468385 tests/string-input-visitor: Add negative integer tests
Add two negative tests, one for int and one for int16List.  The latter
exposes a bug: nonsensical input results in an empty list instead of
an error.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1461325048-14122-1-git-send-email-armbru@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1461879932-9020-22-git-send-email-eblake@redhat.com>
2016-05-12 09:47:55 +02:00
Eric Blake
15c2f669e3 qapi: Split visit_end_struct() into pieces
As mentioned in previous patches, we want to call visit_end_struct()
functions unconditionally, so that visitors can release resources
tied up since the matching visit_start_struct() without also having
to worry about error priority if more than one error occurs.

Even though error_propagate() can be safely used to ignore a second
error during cleanup caused by a first error, it is simpler if the
cleanup cannot set an error.  So, split out the error checking
portion (basically, input visitors checking for unvisited keys) into
a new function visit_check_struct(), which can be safely skipped if
any earlier errors are encountered, and leave the cleanup portion
(which never fails, but must be called unconditionally if
visit_start_struct() succeeded) in visit_end_struct().

Generated code in qapi-visit.c has diffs resembling:

|@@ -59,10 +59,12 @@ void visit_type_ACPIOSTInfo(Visitor *v,
|         goto out_obj;
|     }
|     visit_type_ACPIOSTInfo_members(v, obj, &err);
|-    error_propagate(errp, err);
|-    err = NULL;
|+    if (err) {
|+        goto out_obj;
|+    }
|+    visit_check_struct(v, &err);
| out_obj:
|-    visit_end_struct(v, &err);
|+    visit_end_struct(v);
| out:

and in qapi-event.c:

@@ -47,7 +47,10 @@ void qapi_event_send_acpi_device_ost(ACP
|         goto out;
|     }
|     visit_type_q_obj_ACPI_DEVICE_OST_arg_members(v, &param, &err);
|-    visit_end_struct(v, err ? NULL : &err);
|+    if (!err) {
|+        visit_check_struct(v, &err);
|+    }
|+    visit_end_struct(v);
|     if (err) {
|         goto out;

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1461879932-9020-20-git-send-email-eblake@redhat.com>
[Conflict with a doc fixup resolved]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-05-12 09:47:55 +02:00
Eric Blake
56a6f02b8c qmp: Tighten output visitor rules
Tighten assertions in the QMP output visitor, so that:

- qmp_output_get_qobject() can only be called after pairing a
visit_end_* for every visit_start_* (rather than allowing it on
a partially built object)

- qmp_output_get_qobject() cannot be called unless at least one
visit_type_* or visit_start/visit_end pair has occurred since
creation/reset (the accidental return of NULL fixed by commit
ab8bf1d7 would have been much easier to diagnose)

- ensure that we are encountering the expected object or list
type, to provide protection against mismatched push(struct)/
pop(list) or push(list)/pop(struct), similar to the qmp-input
protection added in commit bdd8e6b5.

- ensure that except for the root, 'name' is non-null inside a
dict, and NULL inside a list (this may need changing later if
we add "name.0" support for better error messages for a list,
but for now it makes sure all users are at least consistent)

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1461879932-9020-19-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-05-12 09:47:55 +02:00
Eric Blake
f2ff429bfa qmp: Don't reuse qmp visitor after grabbing output
The testsuite was the only client that attempted to reuse a
QmpOutputVisitor for a second visit after encountering an
error and/or calling qmp_output_get_qobject() on a first
visit.  The next patch is about to tighten the semantics to
be one-shot usage of the visitor, like all other visitors
(which will enable further simplifications down the road).

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1462854006-24658-1-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-05-12 09:47:55 +02:00
Eric Blake
a543a554cf spapr_drc: Expose 'null' in qom-get when there is no fdt
Now that the QMP output visitor supports an explicit null
output, we should utilize it to make it easier to diagnose
the difference between a missing fdt ('null') vs. a
present-but-empty one ('{}').

(Note that this reverts the behavior of commit ab8bf1d, taking
us back to the behavior of commit 6c2f9a1 [which in turn
stemmed from a crash fix in 1d10b44]; but that this time,
the change is intentional and not an accidental side-effect.)

Signed-off-by: Eric Blake <eblake@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <1461879932-9020-17-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-05-12 09:47:54 +02:00
Eric Blake
3df016f185 qmp: Support explicit null during visits
Implement the new type_null() callback for the qmp input and
output visitors. While we don't yet have a use for this in QAPI
input (the generator will need some tweaks first), some
potential usages have already been discussed on the list.
Meanwhile, the output visitor could already output explicit null
via type_any, but this gives us finer control.

At any rate, it's easy to test that we can round-trip an explicit
null through manual use of visit_type_null() wrapped by a virtual
visit_start_struct() walk, even if we can't do the visit in a
QAPI type.  Repurpose the test_visitor_out_empty test,
particularly since a future patch will tighten semantics to
forbid use of qmp_output_get_qobject() without at least one
intervening visit_type_*.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1461879932-9020-16-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-05-12 09:47:54 +02:00
Eric Blake
3bc97fd592 qapi: Add visit_type_null() visitor
Right now, qmp-output-visitor happens to produce a QNull result
if nothing is actually visited between the creation of the visitor
and the request for the resulting QObject.  A stronger protocol
would require that a QMP output visit MUST visit something.  But
to still be able to produce a JSON 'null' output, we need a new
visitor function that states our intentions.  Yes, we could say
that such a visit must go through visit_type_any(), but that
feels clunky.

So this patch introduces the new visit_type_null() interface and
its no-op interface in the dealloc visitor, and stubs in the
qmp visitors (the next patch will finish the implementation).
For the visitors that will not implement the callback, document
the situation. The code in qapi-visit-core unconditionally
dereferences the callback pointer, so that a segfault will inform
a developer if they need to implement the callback for their
choice of visitor.

Note that JSON has a primitive null type, with the single value
null; likewise with the QNull type for QObject; but for QAPI,
we just have the 'null' value without a null type.  We may
eventually want to add more support in QAPI for null (most likely,
we'd use it via an alternate type that permits 'null' or an
object); but we'll create that usage when we need it.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1461879932-9020-15-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-05-12 09:47:54 +02:00
Eric Blake
7d7a337ec3 tests: Add check-qnull
Add a new test, for checking reference counting of qnull(). As
part of the new file, move a previous reference counting change
added in commit a861564 to a more logical place.

Note that while most of the check-q*.c leave visitor stuff to
the test-qmp-*-visitor.c, in this case we actually want the
visitor tests in our new file because we are validating the
reference count of qnull_, which is an internal detail that
test-qmp-*-visitor should not be peeking into (or put another
way, qnull() is the only special case where we don't have
independent allocation of a QObject, so none of the other
visitor tests require the layering violation present in this
test).

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1461879932-9020-14-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-05-12 09:47:54 +02:00
Eric Blake
adfb264c9e qapi: Document visitor interfaces, add assertions
The visitor interface for mapping between QObject/QemuOpts/string
and QAPI is scandalously under-documented, making changes to visitor
core, individual visitors, and users of visitors difficult to
coordinate.  Among other questions: when is it safe to pass NULL,
vs. when a string must be provided; which visitors implement which
callbacks; the difference between concrete and virtual visits.

Correct this by retrofitting proper contracts, and document where some
of the interface warts remain (for example, we may want to modify
visit_end_* to require the same 'obj' as the visit_start counterpart,
so the dealloc visitor can be simplified).  Later patches in this
series will tackle some, but not all, of these warts.

Add assertions to (partially) enforce the contract.  Some of these
were only made possible by recent cleanup commits.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1461879932-9020-13-git-send-email-eblake@redhat.com>
[Doc fix from Eric squashed in]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-05-12 09:47:54 +02:00
Eric Blake
fcf3cb2178 qmp-input: Refactor when list is advanced
In the QMP input visitor, visiting a list traverses two objects:
the QAPI GenericList of the caller (which gets advanced in
visit_next_list() regardless of this patch), and the QList input
that we are converting to QAPI.  For consistency with QDict
visits, we want to consume elements from the input QList during
the visit_type_FOO() for the list element; that is, we want ALL
the code for consuming an input to live in qmp_input_get_object(),
rather than having it split according to whether we are visiting
a dict or a list.  Making qmp_input_get_object() the common point
of consumption will make it easier for a later patch to refactor
visit_start_list() to cover the GenericList * head of a QAPI list,
and in turn will get rid of the 'first' flag (which lived in
qmp_input_next_list() pre-patch, and is hoisted to StackObject
by this patch).

This patch is therefore altering the post-condition use of 'entry',
while keeping what gets visited unchanged, from:

        start_list next_list type_ELT ... next_list type_ELT next_list end_list
 visits                      1st elt                last elt
 entry  NULL       1st elt   1st elt      last elt  last elt NULL      gone

where type_ELT() returns (entry ? entry : 1st elt) and next_list() steps
entry

to this usage:

        start_list next_list type_ELT ... next_list type_ELT next_list end_list
 visits                      1st elt                last elt
 entry  1st elt    1nd elt   2nd elt      last elt  NULL     NULL      gone

where type_ELT() steps entry and returns the old entry, and next_list()
leaves entry alone.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1461879932-9020-12-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-05-12 09:47:54 +02:00
Eric Blake
ce140b1769 qmp-input: Require struct push to visit members of top dict
Don't embed the root of the visit into the stack of current
containers being visited.  That way, we no longer get confused
on whether the first visit of a dictionary is to the dictionary
itself or to one of the members of the dictionary, based on
whether the caller passed name=NULL; and makes the QMP Input
visitor like other visitors where the value of 'name' is now
ignored on the root visit.  (We may someday want to revisit
the rules on what 'name' should be on a top-level visit,
rather than just ignoring it; but that would be the topic of
another patch).

An audit of all qmp_input_visitor_new() call sites shows that
there were only two places where callers had previously been
visiting to a QDict with a non-NULL name to bypass a call to
visit_start_struct(), and those were fixed in prior patches.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1461879932-9020-11-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-05-12 09:47:54 +02:00
Eric Blake
ad739706bb qom: Wrap prop visit in visit_start_struct
The qmp-input visitor was allowing callers to play rather fast
and loose: when visiting a QDict, you could grab members of the
root dictionary without first pushing into the dict; the final
such culprit was the QOM code for converting to and from object
properties.  But we are about to tighten the input visitor, at
which point user_creatable_add_type() as called with a QMP input
visitor via qmp_object_add() MUST follow the same paradigms as
everyone else, of pushing into the struct before grabbing its
keys.

The use of 'err ? NULL : &err' is temporary; a later patch will
clean that up when it splits visit_end_struct().

Furthermore, note that both callers always pass qdict, so we can
convert the conditional into an assert and reduce indentation.

The change has no impact to the testsuite now, but is required to
avoid a failure in tests/test-netfilter once qmp-input is made
stricter to detect inconsistent 'name' arguments on the root visit.

Since user_creatable_add_type() is also called with OptsVisitor
through user_creatable_add_opts(), we must also check that there
is no negative impact there; both pre- and post-patch, we see:

$ ./x86_64-softmmu/qemu-system-x86_64 -nographic -nodefaults -qmp stdio -object secret,id=sec0,data=letmein,format=raw,foo=bar
qemu-system-x86_64: -object secret,id=sec0,data=letmein,format=raw,foo=bar: Property '.foo' not found

That is, the only new checking that the new visit_end_struct() can
perform is for excess input, but we already catch excess input
earlier in object_property_set().

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1461879932-9020-10-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-05-12 09:47:54 +02:00
Eric Blake
ed84153519 qapi-commands: Wrap argument visit in visit_start_struct
The qmp-input visitor was allowing callers to play rather fast
and loose: when visiting a QDict, you could grab members of the
root dictionary without first pushing into the dict; among the
culprit callers was the generated marshal code on the 'arguments'
dictionary of a QMP command.  But we are about to tighten the
input visitor, at which point the generated marshal code MUST
follow the same paradigms as everyone else, of pushing into the
struct before grabbing its keys.

Generated code grows as follows:

|@@ -515,7 +641,12 @@ void qmp_marshal_blockdev_backup(QDict *
|     BlockdevBackup arg = {0};
|
|     v = qmp_input_get_visitor(qiv);
|+    visit_start_struct(v, NULL, NULL, 0, &err);
|+    if (err) {
|+        goto out;
|+    }
|     visit_type_BlockdevBackup_members(v, &arg, &err);
|+    visit_end_struct(v, err ? NULL : &err);
|     if (err) {
|         goto out;
|     }
|@@ -527,7 +715,9 @@ out:
|     qmp_input_visitor_cleanup(qiv);
|     qdv = qapi_dealloc_visitor_new();
|     v = qapi_dealloc_get_visitor(qdv);
|+    visit_start_struct(v, NULL, NULL, 0, NULL);
|     visit_type_BlockdevBackup_members(v, &arg, NULL);
|+    visit_end_struct(v, NULL);
|     qapi_dealloc_visitor_cleanup(qdv);
| }

The use of 'err ? NULL : &err' is temporary; a later patch will
clean that up when it splits visit_end_struct().

Prior to this patch, the fact that there was no final
visit_end_struct() meant that even though we are using a strict
input visit, the marshalling code was not detecting excess input
at the top level (only in nested levels).  Fortunately, we have
code in monitor.c:qmp_check_client_args() that also checks for
no excess arguments at the top level.  But as the generated code
is more compact than the manual check, a later patch will clean
up monitor.c to drop the redundancy added here.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1461879932-9020-9-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-05-12 09:47:54 +02:00
Eric Blake
e5826a2fd7 qmp-input: Don't consume input when checking has_member
Commit e8316d7 mistakenly passed consume=true within
qmp_input_optional() when checking if an optional member was
present, but the mistake was silently ignored since the code
happily let us extract a member more than once.  Fix
qmp_input_optional() to not consume anything, then tighten up
the input visitor to ensure that a member is consumed exactly
once (all generated code follows this pattern; and the new
assert will catch any hand-written code that tries to visit
the same key more than once).

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1461879932-9020-8-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-05-12 09:47:54 +02:00
Eric Blake
240f64b6dc qapi: Use strict QMP input visitor in more places
The following uses of a QMP input visitor should be strict
(that is, excess keys in QDict input should be flagged if not
converted to QAPI):

- Testsuite code unrelated to explicitly testing non-strict
mode (test-qmp-commands, test-visitor-serialization); since
we want more code to be strict by default, having more tests
of strict mode doesn't hurt

- Code used for cloning QAPI objects (replay-input.c,
qemu-sockets.c); we are reparsing a QObject just barely
produced by the qmp output visitor and which therefore should
not have any garbage, so while it is extra work to be strict,
it validates that our clone is correct [note that a later patch
series will simplify these two uses by creating an actual
clone visitor that is much more efficient than a
generate/reparse cycle]

- qmp_object_add(), which calls into user_creatable_add_type().
Since command line parsing for '-object' uses the same
user_creatable_add_type() through the OptsVisitor, and that is
always strict, we want to ensure that any nested dictionaries
would be treated the same in QMP and from the command line (I
don't actually know if such nested dictionaries exist).  Note
that on this code change, strictness only matters for nested
dictionaries (if even possible), since we already flag excess
input at the top level during an earlier object_property_set()
on an unknown key, whether from QemuOpts:

$ ./x86_64-softmmu/qemu-system-x86_64 -nographic -nodefaults -qmp stdio -object secret,id=sec0,data=letmein,format=raw,foo=bar
qemu-system-x86_64: -object secret,id=sec0,data=letmein,format=raw,foo=bar: Property '.foo' not found

or from QMP:

$ ./x86_64-softmmu/qemu-system-x86_64 -nographic -nodefaults -qmp stdio
{"QMP": {"version": {"qemu": {"micro": 93, "minor": 5, "major": 2}, "package": ""}, "capabilities": []}}
{"execute":"qmp_capabilities"}
{"return": {}}
{"execute":"object-add","arguments":{"qom-type":"secret","id":"sec0","props":{"format":"raw","data":"letmein","foo":"bar"}}}
{"error": {"class": "GenericError", "desc": "Property '.foo' not found"}}

The only remaining uses of non-strict input visits are:

- QMP 'qom-set' (which eventually executes
object_property_set_qobject()) - mark it as something to revisit
in the future (I didn't want to spend any more time on this patch
auditing if we have any QOM dictionary properties that might be
impacted, and couldn't easily prove whether this code path is
shared with anything else).

- test-qmp-input-visitor: explicit tests of non-strict mode. If
we later get rid of users that don't need strictness, then this
test should be merged with test-qmp-input-strict

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1461879932-9020-7-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-05-12 09:47:54 +02:00
Eric Blake
fc471c18d5 qapi: Consolidate QMP input visitor creation
Rather than having two separate ways to create a QMP input
visitor, where the safer approach has the more verbose name,
it is better to consolidate things into a single function
where the caller must explicitly choose whether to be strict
or to ignore excess input.  This patch is the strictly
mechanical conversion; the next patch will then audit which
uses can be made stricter.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1461879932-9020-6-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-05-12 09:47:54 +02:00
Eric Blake
b471d012e5 qmp-input: Clean up stack handling
Management of the top of stack was a bit verbose; creating a
temporary variable and adding some comments makes the existing
code more legible before the next few patches improve things.
No semantic changes other than asserting that we are always
visiting a QObject, and not a NULL value.  In particular, the
check for 'name && qobject_type(qobj) == QTYPE_QDICT)' is a
bit overkill (a dict visit should always have a name); a later
patch revisits that, while this patch is only changing one
layer of indentation due to dropping 'if (qobj)'.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1461879932-9020-5-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-05-12 09:47:54 +02:00
Eric Blake
42a502a7a6 qmp: Drop dead command->type
Ever since QMP was first added back in commit 43c20a43, we have
never had any QmpCommandType other than QCT_NORMAL.  It's
pointless to carry around the cruft.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1461879932-9020-4-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-05-12 09:47:54 +02:00
Eric Blake
e58d695e6c qapi: Guarantee NULL obj on input visitor callback error
Our existing input visitors were not very consistent on errors in a
function taking 'TYPE **obj'.  These are start_struct(),
start_alternate(), type_str(), and type_any().  next_list() is
similar, but can't fail (see commit 08f9541).  While all of them set
'*obj' to allocated storage on success, it was not obvious whether
'*obj' was guaranteed safe on failure, or whether it was left
uninitialized.  But a future patch wants to guarantee that
visit_type_FOO() does not leak a partially-constructed obj back to
the caller; it is easier to implement this if we can reliably state
that input visitors assign '*obj' regardless of success or failure,
and that on failure *obj is NULL.  Add assertions to enforce
consistency in the final setting of err vs. *obj.

The opts-visitor start_struct() doesn't set an error, but it
also was doing a weird check for 0 size; all callers pass in
non-zero size if obj is non-NULL.

The testsuite has at least one spot where we no longer need
to pre-initialize a variable prior to a visit; valgrind confirms
that the test is still fine with the cleanup.

A later patch will document the design constraint implemented
here.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1461879932-9020-3-git-send-email-eblake@redhat.com>
[visit_start_alternate()'s assertion tightened, commit message tweaked]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-05-12 09:47:54 +02:00
Eric Blake
983f52d4b3 qapi-visit: Add visitor.type classification
We have three classes of QAPI visitors: input, output, and dealloc.
Currently, all implementations of these visitors have one thing in
common based on their visitor type: the implementation used for the
visit_type_enum() callback.  But since we plan to add more such
common behavior, in relation to documenting and further refining
the semantics, it makes more sense to have the visitor
implementations advertise which class they belong to, so the common
qapi-visit-core code can use that information in multiple places.

A later patch will better document the types of visitors directly
in visitor.h.

For this patch, knowing the class of a visitor implementation lets
us make input_type_enum() and output_type_enum() become static
functions, by replacing the callback function Visitor.type_enum()
with the simpler enum member Visitor.type.  Share a common
assertion in qapi-visit-core as part of the refactoring.

Move comments in opts-visitor.c to match the refactored layout.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1461879932-9020-2-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-05-12 09:47:54 +02:00
Peter Maydell
bfc766d38e Update version for v2.6.0 release
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-11 16:44:26 +01:00
Stefan Weil
a277c3e094 usb: Support compilation without poll.h
This is a hack to support compilation with Mingw-w64 which provides
a libusb-1.0 package, but no poll.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Message-id: 1458630800-10088-1-git-send-email-sw@weilnetz.de
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-05-11 10:37:39 +02:00
Isaac Lozano
1f66fe5778 usb-mtp: fix usb_mtp_get_device_info so that libmtp on the guest doesn't complain
If an application uses libmtp on the guest system,
it will complain with the warning message:
LIBMTP WARNING: VendorExtensionID: ffffffff
LIBMTP WARNING: VendorExtensionDesc: (null)
LIBMTP WARNING: this typically means the device is PTP (i.e. a camera) but
not a MTP device at all. Trying to continue anyway.

This is because libmtp expects a MTP Vendor Extension ID of 0x00000006 and a
MTP Version of 0x0064. These numbers are taken from Microsoft's MTP Vendor
Extension Identification Message page and are what most physical devices
show.

Signed-off-by: Isaac Lozano <109lozanoi@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1460892593-5908-1-git-send-email-109lozanoi@gmail.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-05-11 10:33:49 +02:00
Roman Kagan
491d68d938 usb:xhci: no DMA on HC reset
This patch is a rough fix to a memory corruption we are observing when
running VMs with xhci USB controller and OVMF firmware.

Specifically, on the following call chain

xhci_reset
  xhci_disable_slot
    xhci_disable_ep
      xhci_set_ep_state

QEMU overwrites guest memory using stale guest addresses.

This doesn't happen when the guest (firmware) driver sets up xhci for
the first time as there are no slots configured yet.  However when the
firmware hands over the control to the OS some slots and endpoints are
already set up with their context in the guest RAM.  Now the OS' driver
resets the controller again and xhci_set_ep_state then reads and writes
that memory which is now owned by the OS.

As a quick fix, skip calling xhci_set_ep_state in xhci_disable_ep if the
device context base address array pointer is zero (indicating we're in
the HC reset and no DMA is possible).

Cc: qemu-stable@nongnu.org
Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
Message-id: 1462384435-1034-1-git-send-email-rkagan@virtuozzo.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-05-11 10:29:28 +02:00
Cole Robinson
bb732ee78c ui: gtk: Fix some deprecation warnings
All device manager APIs are deprecated now. Much of our usage is
just to get the current pointer, so centralize that logic and use
the new seat APIs

Signed-off-by: Cole Robinson <crobinso@redhat.com>
Message-id: d6dec24220a4e1449a0172119c10c48e145c0f6f.1462557436.git.crobinso@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-05-11 08:02:41 +02:00
Cole Robinson
84e2dc4bf3 ui: gtk: Fix a runtime warning on vte >= 0.37
inner-border was dropped in vte API 2.91, in favor of the standard
padding style

Signed-off-by: Cole Robinson <crobinso@redhat.com>
Message-id: 60a6cdc337d611d902f53907e66a8f37ea374d65.1462557436.git.crobinso@redhat.com

[ kraxel: Fix warning with old vte version. ]

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-05-11 08:02:41 +02:00
Cole Robinson
c6feff9e09 configure: support vte-2.91
vte >= 0.37 expores API version 2.91, which is where all the active
development is. qemu builds and runs fine with that version, so use it
if it's available.

Signed-off-by: Cole Robinson <crobinso@redhat.com>
Message-id: b4f0375647f7b368d3dbd3834aee58cb0253566a.1462557436.git.crobinso@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-05-11 08:02:40 +02:00
Cole Robinson
d6a6dba359 configure: report SDL version
Signed-off-by: Cole Robinson <crobinso@redhat.com>
Message-id: 98e4a3b98dc824bfaff96db43b172272c780c15f.1462557436.git.crobinso@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-05-11 08:02:40 +02:00
Cole Robinson
f2a4e54828 configure: report GTK version
Signed-off-by: Cole Robinson <crobinso@redhat.com>
Message-id: 4c464e20d69fdcf21927ceed31a8d749b4af0c49.1462557436.git.crobinso@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-05-11 08:02:40 +02:00
Cole Robinson
02d34f62fd configure: add echo_version helper
Simplifies printing library versions, dependent on if the library
was even found

Signed-off-by: Cole Robinson <crobinso@redhat.com>
Message-id: 3c9ab16123e06bb4109771ef6ee8acd82d449ba0.1462557436.git.crobinso@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-05-11 08:02:40 +02:00
Cole Robinson
e07047cfd7 configure: error on unknown --with-sdlabi value
I accidentally tried --with-sdlabi="1.0", and it failed much later in
a weird way. Instead, throw an error if the value isn't in our
whitelist.

Signed-off-by: Cole Robinson <crobinso@redhat.com>
Message-id: 60e4822e17697d257a914df03bdb9fff4b4c0490.1462557436.git.crobinso@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-05-11 08:02:40 +02:00
Cole Robinson
ee8466d0ea configure: build SDL if only SDL2 available
Right now if SDL2 is installed but not SDL1, default configure will
entirely disable SDL. Check upfront for SDL2 using pkg-config, but
still prefer SDL1 if both versions are installed.

Signed-off-by: Cole Robinson <crobinso@redhat.com>
Message-id: c9e570b5964d128a3595efe3170129a3da459776.1462557436.git.crobinso@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-05-11 08:02:40 +02:00
Cole Robinson
56f289f383 ui: sdl2: Release grab before opening console window
sdl 2.0.4 currently has a bug which causes our UI shortcuts to fire
rapidly in succession:

  https://bugzilla.libsdl.org/show_bug.cgi?id=3287

It's a toss up whether ctrl+alt+f or ctrl+alt+2 will fire an
odd or even number of times, thus determining whether the action
succeeds or fails.

Opening monitor/serial windows is doubly broken, since it will often
lock the UI trying to grab the pointer:

  0x00007fffef3720a5 in SDL_Delay_REAL () at /lib64/libSDL2-2.0.so.0
  0x00007fffef3688ba in X11_SetWindowGrab () at /lib64/libSDL2-2.0.so.0
  0x00007fffef2f2da7 in SDL_SendWindowEvent () at /lib64/libSDL2-2.0.so.0
  0x00007fffef2f080b in SDL_SetKeyboardFocus () at /lib64/libSDL2-2.0.so.0
  0x00007fffef35d784 in X11_DispatchFocusIn.isra.8 () at /lib64/libSDL2-2.0.so.0
  0x00007fffef35dbce in X11_DispatchEvent () at /lib64/libSDL2-2.0.so.0
  0x00007fffef35ee4a in X11_PumpEvents () at /lib64/libSDL2-2.0.so.0
  0x00007fffef2eea6a in SDL_PumpEvents_REAL () at /lib64/libSDL2-2.0.so.0
  0x00007fffef2eeab5 in SDL_WaitEventTimeout_REAL () at /lib64/libSDL2-2.0.so.0
  0x000055555597eed0 in sdl2_poll_events (scon=0x55555876f928) at ui/sdl2.c:593

We can work around that hang by ungrabbing the pointer before launching
a new window. This roughly matches what our sdl1 code does

Signed-off-by: Cole Robinson <crobinso@redhat.com>
Message-id: 31c9ab6540b031f7a614c59edcecea9877685612.1462557436.git.crobinso@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-05-11 08:02:40 +02:00
Cole Robinson
4fd811a6bd ui: gtk: fix crash when terminal inner-border is NULL
VTE terminal inner-border can be NULL. The vte-0.36 (API 2.90)
code checks for the condition too so I assume it's not just a bug

Fixes a crash on Fedora 24 with gtk 3.20

Signed-off-by: Cole Robinson <crobinso@redhat.com>
Message-id: 2b2e85d403e8760ea53afd735a170500d5c17716.1462557436.git.crobinso@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-05-11 08:02:40 +02:00
Peter Maydell
860a3b3485 Update version for v2.6.0-rc5 release
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-09 14:08:12 +01:00
Peter Maydell
53db932604 Merge remote-tracking branch 'remotes/kraxel/tags/pull-vga-20160509-1' into staging
vga security fixes (CVE-2016-3710, CVE-2016-3712)

# gpg: Signature made Mon 09 May 2016 13:39:30 BST using RSA key ID D3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"

* remotes/kraxel/tags/pull-vga-20160509-1:
  vga: make sure vga register setup for vbe stays intact (CVE-2016-3712).
  vga: update vga register setup on vbe changes
  vga: factor out vga register setup
  vga: add vbe_enabled() helper
  vga: fix banked access bounds checking (CVE-2016-3710)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-09 13:42:25 +01:00
Peter Maydell
975eb6a547 Update version for v2.6.0-rc4 release
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-02 17:27:01 +01:00
Gerd Hoffmann
1beb99f787 Revert "acpi: mark PMTIMER as unlocked"
This reverts commit 7070e085d4.

Commit message claims locking is not needed, but that appears
to not be true, seabios ehci driver runs into timekeeping problems
with this, see
	https://bugzilla.redhat.com/show_bug.cgi?id=1322713

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 1460702609-25971-1-git-send-email-kraxel@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-02 17:19:13 +01:00
Gerd Hoffmann
fd3c136b3e vga: make sure vga register setup for vbe stays intact (CVE-2016-3712).
Call vbe_update_vgaregs() when the guest touches GFX, SEQ or CRT
registers, to make sure the vga registers will always have the
values needed by vbe mode.  This makes sure the sanity checks
applied by vbe_fixup_regs() are effective.

Without this guests can muck with shift_control, can turn on planar
vga modes or text mode emulation while VBE is active, making qemu
take code paths meant for CGA compatibility, but with the very
large display widths and heigts settable using VBE registers.

Which is good for one or another buffer overflow.  Not that
critical as they typically read overflows happening somewhere
in the display code.  So guests can DoS by crashing qemu with a
segfault, but it is probably not possible to break out of the VM.

Fixes: CVE-2016-3712
Reported-by: Zuozhi Fzz <zuozhi.fzz@alibaba-inc.com>
Reported-by: P J P <ppandit@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-05-02 16:02:59 +02:00
Gerd Hoffmann
2068192dcc vga: update vga register setup on vbe changes
Call the new vbe_update_vgaregs() function on vbe configuration
changes, to make sure vga registers are up-to-date.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-05-02 16:02:59 +02:00
Gerd Hoffmann
7fa5c2c5dc vga: factor out vga register setup
When enabling vbe mode qemu will setup a bunch of vga registers to make
sure the vga emulation operates in correct mode for a linear
framebuffer.  Move that code to a separate function so we can call it
from other places too.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-05-02 16:02:59 +02:00
Gerd Hoffmann
bfa0f151a5 vga: add vbe_enabled() helper
Makes code a bit easier to read.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-05-02 16:02:59 +02:00
Gerd Hoffmann
3bf1817079 vga: fix banked access bounds checking (CVE-2016-3710)
vga allows banked access to video memory using the window at 0xa00000
and it supports a different access modes with different address
calculations.

The VBE bochs extentions support banked access too, using the
VBE_DISPI_INDEX_BANK register.  The code tries to take the different
address calculations into account and applies different limits to
VBE_DISPI_INDEX_BANK depending on the current access mode.

Which is probably effective in stopping misprogramming by accident.
But from a security point of view completely useless as an attacker
can easily change access modes after setting the bank register.

Drop the bogus check, add range checks to vga_mem_{readb,writeb}
instead.

Fixes: CVE-2016-3710
Reported-by: Qinghao Tang <luodalongde@gmail.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-05-02 16:02:59 +02:00
Jan Vesely
277abf15a6 configure: Check if struct fsxattr is available from linux header
Fixes build failure with --enable-xfsctl and
new linux headers (>=4.5) and older xfsprogs(<4.5):
In file included from /usr/include/xfs/xfs.h:38:0,
                 from /var/tmp/portage/app-emulation/qemu-2.5.0-r1/work/qemu-2.5.0/block/raw-posix.c:97:
/usr/include/xfs/xfs_fs.h:42:8: error: redefinition of ‘struct fsxattr’
 struct fsxattr {
        ^
In file included from /var/tmp/portage/app-emulation/qemu-2.5.0-r1/work/qemu-2.5.0/block/raw-posix.c:60:0:
/usr/include/linux/fs.h:155:8: note: originally defined here
 struct fsxattr {

This is really a bug in the system headers, but we can work around it
by defining HAVE_FSXATTR in the QEMU headers if linux/fs.h provides
the struct, so that xfs_fs.h doesn't try to define it as well.

CC: qemu-trivial@nongnu.org
CC: Markus Armbruster <armbru@redhat.com>
CC: Peter Maydell <peter.maydell@linaro.org>
CC: Stefan Weil <sw@weilnetz.de>
Tested-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Jan Vesely <jano.vesely@gmail.com>
[PMM: adjusted commit message, comments]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-02 13:04:26 +01:00
Peter Maydell
20b0f5fef6 Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
acpi: last minute fix for 2.6

Minor, obvious fix only affecting BE hosts.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# gpg: Signature made Sun 01 May 2016 13:43:28 BST using RSA key ID D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>"

* remotes/mst/tags/for_upstream:
  acpi: fix bios linker loadder COMMAND_ALLOCATE on bigendian host

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-05-01 22:52:47 +01:00
Igor Mammedov
1dbfd7892b acpi: fix bios linker loadder COMMAND_ALLOCATE on bigendian host
'make check' fails with:

ERROR:tests/bios-tables-test.c:493:load_expected_aml:
   assertion failed: (g_file_test(aml_file, G_FILE_TEST_EXISTS))

since commit:
caf50c7166
tests: pc: acpi: drop not needed 'expected SSDT' blobs

Assert happens because qemu-system-x86_64 generates
SSDT table and test looks for a corresponding expected
table to compare with.

However there is no expected SSDT blob anymore, since
QEMU souldn't generate one. As it happens BIOS is not
able to read ACPI tables from QEMU and fallbacks to
embeded legacy ACPI codepath, which generates SSDT.
That happens due to wrongly sized endiannes conversion
which makes
 uint8_t BiosLinkerLoaderEntry.alloc.zone
end up with 0 due to truncation of 32 bit integer
which on host is 1 or 2.

Fix it by dropping invalid cpu_to_le32() as uint8_t
doesn't require any conversion.

RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1330174

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
2016-05-01 15:42:13 +03:00
Peter Maydell
47dac82d8b Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
vvfat fixes for 2.6.0-rc4

# gpg: Signature made Fri 29 Apr 2016 10:52:13 BST using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"

* remotes/kevin/tags/for-upstream:
  vvfat: Fix default volume label
  vvfat: Fix volume name assertion

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-29 12:12:33 +01:00
Peter Maydell
849880978e Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2016-04-29' into staging
QAPI patches for 2016-04-29

# gpg: Signature made Fri 29 Apr 2016 10:13:08 BST using RSA key ID EB918653
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>"
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>"

* remotes/armbru/tags/pull-qapi-2016-04-29:
  qapi: Don't pass NULL to printf in string input visitor

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-29 11:26:10 +01:00
Kevin Wolf
d208c50d9d vvfat: Fix default volume label
Commit d5941dd documented that it leaves the default volume name as it
was ("QEMU VVFAT"), but it doesn't actually implement this. You get an
empty name (eleven space characters) instead.

This fixes the implementation to apply the advertised default.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-04-29 11:14:13 +02:00
Kevin Wolf
ebb72c9f06 vvfat: Fix volume name assertion
Commit d5941dd made the volume name configurable, but it didn't consider
that the rw code compares the volume name string to assert that the
first directory entry is the volume name. This made vvfat crash in rw
mode.

This fixes the assertion to compare with the configured volume name
instead of a literal string.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-04-29 11:14:08 +02:00
Eric Blake
0a40bdab0d qapi: Don't pass NULL to printf in string input visitor
Make sure the error message for visit_type_uint64() gracefully
handles a NULL 'name' when called from the top level or a list
context, as not all the world behaves like glibc in allowing
NULL through a printf-family %s.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1461879932-9020-21-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-04-29 11:11:36 +02:00
Samuel Thibault
0d48dfedc5 slirp: fix guest network access with darwin host
On Darwin, connect, sendto and friends want the exact size of the sockaddr,
not more (and in particular, not sizeof(struct sockaddr_storaget))

This commit adds the sockaddr_size helper to be used when passing a sockaddr
size to such function, and makes use of it int sendto and connect calls.

Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: John Arbuckle <programmingkidx@gmail.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-28 18:12:08 +01:00
Peter Maydell
8c4bf97580 Merge remote-tracking branch 'remotes/lalrae/tags/mips-20160428' into staging
MIPS patches 2016-04-28

Changes:
* fixed RDHWR exception host PC

# gpg: Signature made Thu 28 Apr 2016 10:11:18 BST using RSA key ID 0B29DA6B
# gpg: Good signature from "Leon Alrae <leon.alrae@imgtec.com>"

* remotes/lalrae/tags/mips-20160428:
  target-mips: Fix RDHWR exception host PC

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-28 11:48:12 +01:00
Peter Maydell
736f85d5db Merge remote-tracking branch 'remotes/armbru/tags/pull-error-2016-04-28' into staging
Fix dangling pointers and error message regressions

# gpg: Signature made Thu 28 Apr 2016 07:25:51 BST using RSA key ID EB918653
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>"
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>"

* remotes/armbru/tags/pull-error-2016-04-28:
  qom: -object error messages lost location, restore it
  replay: Fix dangling location bug in replay_configure()
  QemuOpts: Fix qemu_opts_foreach() dangling location regression

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-28 11:05:37 +01:00
Peter Maydell
61861eff69 Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.6-20160426' into staging
ppc patch queue for 2016-04-26 (last minute qemu-2.6 fix)

This just has one, last-minute, fix for a serious regression of memory
hotplug.

Patch author's comment:
    Really sorry for the way last-minute fix, but without this memory
    hotplug is totally broken :( Hoping to get this in for Wednesday's
    RC4, which I think will be the final before release.

# gpg: Signature made Tue 26 Apr 2016 03:52:20 BST using RSA key ID 20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-2.6-20160426:
  spapr_drc: fix aborts during DRC-count based hotplug

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-28 10:25:26 +01:00
James Hogan
d96391c1ff target-mips: Fix RDHWR exception host PC
Commit b00c72180c ("target-mips: add PC, XNP reg numbers to RDHWR")
changed the rdhwr helpers to use check_hwrena() to check the register
being accessed is enabled in CP0_HWREna when used from user mode. If
that check fails an EXCP_RI exception is raised at the host PC
calculated with GETPC().

However check_hwrena() may not be fully inlined as the
do_raise_exception() part of it is common regardless of the arguments.
This causes GETPC() to calculate the address in the call in the helper
instead of the generated code calling the helper. No TB will be found
and the EPC reported with the resulting guest RI exception points to the
beginning of the TB instead of the RDHWR instruction.

We can't reliably force check_hwrena() to be inlined, and converting it
to a macro would be ugly, so instead pass the host PC in as an argument,
with each rdhwr helper passing GETPC(). This should avoid any dependence
on compiler behaviour, and in practice seems to ensure the full inlining
of check_hwrena() on x86_64.

This issue causes failures when running a MIPS KVM (trap & emulate)
guest in a MIPS QEMU TCG guest, as the inner guest kernel will do a
RDHWR of counter, which is disabled in the outer guest's CP0_HWREna by
KVM so it can emulate the inner guest's counter. The emulation fails and
the RI exception is passed to the inner guest.

Fixes: b00c72180c ("target-mips: add PC, XNP reg numbers to RDHWR")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Leon Alrae <leon.alrae@imgtec.com>
Cc: Yongbok Kim <yongbok.kim@imgtec.com>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Leon Alrae <leon.alrae@imgtec.com>
Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
2016-04-28 10:03:24 +01:00
Markus Armbruster
51b9b478cc qom: -object error messages lost location, restore it
qemu_opts_foreach() runs its callback with the error location set to
the option's location.  Any errors the callback reports use the
option's location automatically.

Commit 90998d5 moved the actual error reporting from "inside"
qemu_opts_foreach() to after it.  Here's a typical hunk:

	 if (qemu_opts_foreach(qemu_find_opts("object"),
    -                          object_create,
    -                          object_create_initial, NULL)) {
    +                          user_creatable_add_opts_foreach,
    +                          object_create_initial, &err)) {
    +        error_report_err(err);
	     exit(1);
	 }

Before, object_create() reports from within qemu_opts_foreach(), using
the option's location.  Afterwards, we do it after
qemu_opts_foreach(), using whatever location happens to be current
there.  Commonly a "none" location.

This is because Error objects don't have location information.
Problematic.

Reproducer:

    $ qemu-system-x86_64 -nodefaults -display none -object secret,id=foo,foo=bar
    qemu-system-x86_64: Property '.foo' not found

Note no location.  This commit restores it:

    qemu-system-x86_64: -object secret,id=foo,foo=bar: Property '.foo' not found

Note that the qemu_opts_foreach() bug just fixed could mask the bug
here: if the location it leaves dangling hasn't been clobbered, yet,
it's the correct one.

Reported-by: Eric Blake <eblake@redhat.com>
Cc: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1461767349-15329-4-git-send-email-armbru@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[Paragraph on Error added to commit message]
2016-04-28 08:19:36 +02:00
Markus Armbruster
d9d3aaea0b replay: Fix dangling location bug in replay_configure()
replay_configure() pushes and pops a Location with automatic storage
duration.  Except it fails to pop when -icount parameter "rr" isn't
given.  cur_loc then points to unused stack space, and will most
likely get clobbered in short order.

Clobbered cur_loc can make loc_pop() and error_print_loc() crash or
report bogus locations.

Broken in commit 890ad55.

I didn't take the time to find a reproducer.

Cc: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1461767349-15329-3-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
2016-04-28 08:19:20 +02:00
Markus Armbruster
37f32349ea QemuOpts: Fix qemu_opts_foreach() dangling location regression
qemu_opts_foreach() pushes and pops a Location with automatic storage
duration.  Except it fails to pop when @func() returns non-zero.
cur_loc then points to unused stack space, and will most likely get
clobbered in short order.

Clobbered cur_loc can make loc_pop() and error_print_loc() crash or
report bogus locations.

Affects several qemu command line options as well as qemu-img,
qemu-io, qemu-nbd -object, and blkdebug's configuration file.

Broken in commit a4c7367, v2.4.0.

Reproducer:
    $ qemu-system-x86_64 -nodefaults -display none -object secret,id=foo,foo=bar

main() reports "Property '.foo' not found" like this:

    if (qemu_opts_foreach(qemu_find_opts("object"),
                          user_creatable_add_opts_foreach,
                          object_create_delayed, &err)) {
        error_report_err(err);
        exit(1);
    }

cur_loc then points to where qemu_opts_foreach()'s Location used to
be, i.e. unused stack space.  With optimization, this Location doesn't
get clobbered for me, and also happens to be the correct location.
Without optimization, it does get clobbered in a way that makes
error_report_err() report no location.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1461767349-15329-2-git-send-email-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-04-28 08:18:56 +02:00
Michael Roth
df18b2db69 spapr_drc: fix aborts during DRC-count based hotplug
CPU/memory resources can be signalled en-masse via
spapr_hotplug_req_add_by_count(), and when doing so, actually change
the meaning of the 'drc' parameter passed to
spapr_hotplug_req_event() to be a count rather than an index.

f40eb92 added a hook in spapr_hotplug_req_event() to record when a
device had been 'signalled' to the guest, but that code assumes that
drc is always an index. In cases where it's a count, such as memory
hotplug, the DRC lookup will fail, leading to an assert.

Fix this by only explicitly setting the signalled state for cases where
we are doing PCI hotplug.

For other resources types, since we cannot selectively track whether a
resource has been signalled in cases where we signal attach as a count,
set the 'signalled' state to true immediately upon making the
resource available via drck->attach().

Reported-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: david@gibson.dropbear.id.au
Cc: qemu-ppc@nongnu.org
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-04-26 11:16:08 +10:00
Gerd Hoffmann
f419a626c7 usb/uhci: move pid check
commit "5f77e06 usb: add pid check at the first of uhci_handle_td()"
moved the pid verification to the start of the uhci_handle_td function,
to simplify the error handling (we don't have to free stuff which we
didn't allocate in the first place ...).

Problem is now the check fires too often, it raises error IRQs even for
TDs which we are not going to process because they are not set active.

So, lets move down the check a bit, so it is done only for active TDs,
but still before we are going to allocate stuff to process the requested
transfer.

Reported-by: Joe Clifford <joe@thunderbug.co.uk>
Tested-by: Joe Clifford <joe@thunderbug.co.uk>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 1461321893-15811-1-git-send-email-kraxel@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-25 12:05:05 +01:00
Peter Maydell
3123bd8ebf Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.6-20160423' into staging
ppc patch queue for 2016-03-23

A single fix for a bug in parameter handling for the spapr PCI host
bridge.

# gpg: Signature made Sat 23 Apr 2016 07:55:29 BST using RSA key ID 20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-2.6-20160423:
  hw/ppc/spapr: Fix crash when specifying bad parameters to spapr-pci-host-bridge

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-25 11:15:53 +01:00
Thomas Huth
da34fed707 hw/ppc/spapr: Fix crash when specifying bad parameters to spapr-pci-host-bridge
QEMU currently crashes when using bad parameters for the
spapr-pci-host-bridge device:

$ qemu-system-ppc64 -device spapr-pci-host-bridge,buid=0x123,liobn=0x321,mem_win_addr=0x1,io_win_addr=0x10
Segmentation fault

The problem is that spapr_tce_find_by_liobn() might return NULL, but
the code in spapr_populate_pci_dt() does not check for this condition
and then tries to dereference this NULL pointer.
Apart from that, the return value of spapr_populate_pci_dt() also
has to be checked for all PCI buses, not only for the last one, to
make sure we catch all errors.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-04-23 16:52:20 +10:00
Peter Maydell
53343338a6 Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Mirror block job fixes for 2.6.0-rc4

# gpg: Signature made Fri 22 Apr 2016 15:46:41 BST using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"

* remotes/kevin/tags/for-upstream:
  mirror: Workaround for unexpected iohandler events during completion
  aio-posix: Skip external nodes in aio_dispatch
  virtio: Mark host notifiers as external
  event-notifier: Add "is_external" parameter
  iohandler: Introduce iohandler_get_aio_context

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-22 16:17:12 +01:00
Fam Zheng
ab27c3b5e7 mirror: Workaround for unexpected iohandler events during completion
Commit 5a7e7a0ba moved mirror_exit to a BH handler but didn't add any
protection against new requests that could sneak in just before the
BH is dispatched. For example (assuming a code base at that commit):

        main_loop_wait # 1
          os_host_main_loop_wait
            g_main_context_dispatch
              aio_ctx_dispatch
                aio_dispatch
                  ...
                    mirror_run
                      bdrv_drain
    (a)               block_job_defer_to_main_loop
          qemu_iohandler_poll
            virtio_queue_host_notifier_read
              ...
                virtio_submit_multiwrite
    (b)           blk_aio_multiwrite

        main_loop_wait # 2
          <snip>
                aio_dispatch
                  aio_bh_poll
    (c)             mirror_exit

At (a) we know the BDS has no pending request. However, the same
main_loop_wait call is going to dispatch iohandlers (EventNotifier
events), which may lead to a new I/O from guest. So the invariant is
already broken at (c). Data loss.

Commit f3926945c8 made iohandler to use aio API.  The order of
virtio_queue_host_notifier_read and block_job_defer_to_main_loop within
a main_loop_wait becomes unpredictable, and even worse, if the host
notifier event arrives at the next main_loop_wait call, the
unpredictable order between mirror_exit and
virtio_queue_host_notifier_read is also a trouble. As shown below, this
commit made the bug easier to trigger:

    - Bug case 1:

        main_loop_wait # 1
          os_host_main_loop_wait
            g_main_context_dispatch
              aio_ctx_dispatch (qemu_aio_context)
                ...
                  mirror_run
                    bdrv_drain
    (a)             block_job_defer_to_main_loop
              aio_ctx_dispatch (iohandler_ctx)
                virtio_queue_host_notifier_read
                  ...
                    virtio_submit_multiwrite
    (b)               blk_aio_multiwrite

        main_loop_wait # 2
          ...
                aio_dispatch
                  aio_bh_poll
    (c)             mirror_exit

    - Bug case 2:

        main_loop_wait # 1
          os_host_main_loop_wait
            g_main_context_dispatch
              aio_ctx_dispatch (qemu_aio_context)
                ...
                  mirror_run
                    bdrv_drain
    (a)             block_job_defer_to_main_loop

        main_loop_wait # 2
          ...
            aio_ctx_dispatch (iohandler_ctx)
              virtio_queue_host_notifier_read
                ...
                  virtio_submit_multiwrite
    (b)             blk_aio_multiwrite
              aio_dispatch
                aio_bh_poll
    (c)           mirror_exit

In both cases, (b) breaks the invariant wanted by (a) and (c).

Until then, the request loss has been silent. Later, 3f09bfbc7b added
asserts at (c) to check the invariant (in
bdrv_replace_in_backing_chain), and Max reported an assertion failure
first visible there, by doing active committing while the guest is
running bonnie++.

2.5 added bdrv_drained_begin at (a) to protect the dataplane case from
similar problems, but we never realize the main loop bug until now.

As a bandage, this patch disables iohandler's external events
temporarily together with bs->ctx.

Launchpad Bug: 1570134

Cc: qemu-stable@nongnu.org
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-22 16:44:09 +02:00
Fam Zheng
37989ced44 aio-posix: Skip external nodes in aio_dispatch
aio_poll doesn't poll the external nodes so this should never be true,
but aio_ctx_dispatch may get notified by the events from GSource. To
make bdrv_drained_begin effective in main loop, we should check the
is_external flag here too.

Also do the check in aio_pending so aio_dispatch is not called
superfluously, when there is no events other than external ones.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-22 16:43:59 +02:00
Fam Zheng
14560d69e7 virtio: Mark host notifiers as external
The effect of this change is the block layer drained section can work,
for example when mirror job is being completed.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-22 16:43:58 +02:00
Fam Zheng
54e18d35e4 event-notifier: Add "is_external" parameter
All callers pass "false" keeping the old semantics. The windows
implementation doesn't distinguish the flag yet. On posix, it is passed
down to the underlying aio context.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-22 16:43:56 +02:00
Fam Zheng
bcd82a968f iohandler: Introduce iohandler_get_aio_context
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-22 16:43:42 +02:00
Christoffer Dall
ee1e0f8e5d util: align memory allocations to 2M on AArch64
For KVM to use Transparent Huge Pages (THP) we have to ensure that the
alignment of the userspace address of the KVM memory slot and the IPA
that the guest sees for a memory region have the same offset from the 2M
huge page size boundary.

One way to achieve this is to always align the IPA region at a 2M
boundary and ensure that the mmap alignment is also at 2M.

Unfortunately, we were only doing this for __arm__, not for __aarch64__,
so add this simple condition.

This fixes a performance regression using KVM/ARM on AArch64 platforms
that showed a performance penalty of more than 50%, introduced by the
following commit:

9fac18f (oslib: allocate PROT_NONE pages on top of RAM, 2015-09-10)

We were only lucky before the above commit, because we were allocating
large regions and naturally getting a 2M alignment on those allocations
then.

Cc: qemu-stable@nongnu.org
Reported-by: Shih-Wei Li <shihwei@cs.columbia.edu>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: wrapped long line]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-22 12:26:01 +01:00
Eric Blake
df7b97ff89 nbd: Don't mishandle unaligned client requests
The NBD protocol does not (yet) force any alignment constraints
on clients.  Even though qemu NBD clients always send requests
that are aligned to 512 bytes, we must be prepared for non-qemu
clients that don't care about alignment (even if it means they
are less efficient).  Our use of blk_read() and blk_write() was
silently operating on the wrong file offsets when the client
made an unaligned request, corrupting the client's data (but
as the client already has control over the file we are serving,
I don't think it is a security hole, per se, just a data
corruption bug).

Note that in the case of NBD_CMD_READ, an unaligned length could
cause us to return up to 511 bytes of uninitialized trailing
garbage from blk_try_blockalign() - hopefully nothing sensitive
from the heap's prior usage is ever leaked in that manner.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Tested-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1461249750-31928-1-git-send-email-eblake@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-22 11:55:35 +01:00
Peter Maydell
8d0d9b9f67 Update version for v2.6.0-rc3 release
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-21 17:46:50 +01:00
Aurelien Jarno
8d8fdbae01 tcg: check for CONFIG_DEBUG_TCG instead of NDEBUG
Check for CONFIG_DEBUG_TCG instead of NDEBUG, drop now useless code.

Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-id: 1461228530-14852-2-git-send-email-aurelien@aurel32.net
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-21 15:43:20 +01:00
Aurelien Jarno
eabb7b91b3 tcg: use tcg_debug_assert instead of assert (fix performance regression)
The TCG code is quite performance sensitive, but at the same time can
also be quite tricky. That is why asserts that can be enabled with the
--enable-debug-tcg configure option.

This used to work the following way:

| #include "config.h"
|
| ...
|
| #if !defined(CONFIG_DEBUG_TCG) && !defined(NDEBUG)
| /* define it to suppress various consistency checks (faster) */
| #define NDEBUG
| #endif
|
| ...
|
| #include <assert.h>

Since commit 757e725b (tcg: Clean up includes) "config.h" as been
replaced by "qemu/osdep.h" which itself includes <assert.h>. As a
consequence the assertions are always enabled, even when using
--disable-debug-tcg, causing a performance regression, especially on
targets with many registers. For instance on qemu-system-ppc the
speed difference is about 15%.

tcg_debug_assert is controlled directly by CONFIG_DEBUG_TCG and already
uses in some places. This patch replaces all the calls to assert into
calss to tcg_debug_assert.

Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Message-id: 1461228530-14852-1-git-send-email-aurelien@aurel32.net
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-21 15:41:47 +01:00
Sylvain Garrigues
b4850e5ae9 hw/arm/boot: always clear r0 when booting kernels
The 32-bit ARM Linux kernel booting ABI requires that r0 is 0
when calling the kernel image. A bug in commit 10b8ec73e6
meant that for boards which use the write_board_setup hook (which
means "highbank", "midway", "raspi2" and "xilinx-zynq-a9") we
were incorrectly skipping the "clear r0" instruction in the
mini-bootloader. Use the right offset in the "add lr, pc, #n"
instruction so that we return from the board-setup code to the
correct place.

Signed-off-by: Sylvain Garrigues <sylvain@sylvaingarrigues.com>
[PMM: Expanded commit message]
Cc: qemu-stable@nongnu.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-21 12:10:17 +01:00
Eduardo Habkost
81d9d1867f MAINTAINERS: Avoid using K: for NUMA section
When using K: in MAINTAINERS, false positives makes
get_maintainer.pl not use git history to find contributors. As
those patterns cause lots of false positives they are causing
more harm than good, so remove them.

Reported-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-id: 1461164130-3847-1-git-send-email-ehabkost@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-21 11:40:57 +01:00
Peter Maydell
befbaf51ce Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Mirror block job fixes for 2.6.0-rc3

# gpg: Signature made Wed 20 Apr 2016 15:56:43 BST using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"

* remotes/kevin/tags/for-upstream:
  iotests: Test case for drive-mirror with unaligned image size
  iotests: Add iotests.image_size
  mirror: Don't extend the last sub-chunk
  block/mirror: Refresh stale bitmap iterator cache
  block/mirror: Revive dead yielding code

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-20 16:43:53 +01:00
Peter Maydell
fa59dd9582 Merge remote-tracking branch 'remotes/sstabellini/tags/xen-2016-04-20' into staging
Xen 2016/04/20

# gpg: Signature made Wed 20 Apr 2016 12:08:56 BST using RSA key ID 70E1AE90
# gpg: Good signature from "Stefano Stabellini <stefano.stabellini@eu.citrix.com>"

* remotes/sstabellini/tags/xen-2016-04-20:
  xenfb: use the correct condition to avoid excessive looping

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-20 16:16:55 +01:00
Fam Zheng
8ca92f3c06 iotests: Test case for drive-mirror with unaligned image size
This is the regression test for the virtual size mismatch issue between
target and source images.

[ kwolf: Added test_unaligned_with_update ]

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
2016-04-20 16:52:55 +02:00
Fam Zheng
74f69050fe iotests: Add iotests.image_size
This retrieves the virtual size of the image out of qemu-img info.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-20 16:52:55 +02:00
Fam Zheng
4150ae60eb mirror: Don't extend the last sub-chunk
The last sub-chunk is rounded up to the copy granularity in the target
image, resulting in a larger size than the source.

Add a function to clip the copied sectors to the end.

This undoes the "wrong" changes to tests/qemu-iotests/109.out in
e5b43573e2. The remaining two offset changes are okay.

[ kwolf: Use DIV_ROUND_UP to calculate nb_chunks now ]

Reported-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
2016-04-20 16:52:55 +02:00
Max Reitz
f27a274259 block/mirror: Refresh stale bitmap iterator cache
If the drive's dirty bitmap is dirtied while the mirror operation is
running, the cache of the iterator used by the mirror code may become
stale and not contain all dirty bits.

This only becomes an issue if we are looking for contiguously dirty
chunks on the drive. In that case, we can easily detect the discrepancy
and just refresh the iterator if one occurs.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-20 16:52:55 +02:00
Max Reitz
9c83625bdd block/mirror: Revive dead yielding code
mirror_iteration() is supposed to wait if the current chunk is subject
to a still in-flight mirroring operation. However, it mixed checking
this conflict situation with checking the dirty status of a chunk. A
simplification for the latter condition (the first chunk encountered is
always dirty) led to neglecting the former: We just skip the first chunk
and thus never test whether it conflicts with an in-flight operation.

To fix this, pull out the code which waits for in-flight operations on
the first chunk of the range to be mirrored to settle.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-20 16:52:55 +02:00
Peter Maydell
4113b0532d Merge remote-tracking branch 'remotes/mdroth/tags/qga-pull-2016-04-19-tag' into staging
qemu-ga patch queue for 2.6

* fixes inadvertant change that unconditionally disables qemu-ga unit test
* fixes make check failures when building with --disable-guest-agent that
  were present visible before the unit test was inadvertantly disabled.

# gpg: Signature made Tue 19 Apr 2016 23:30:09 BST using RSA key ID F108B584
# gpg: Good signature from "Michael Roth <flukshun@gmail.com>"
# gpg:                 aka "Michael Roth <mdroth@utexas.edu>"
# gpg:                 aka "Michael Roth <mdroth@linux.vnet.ibm.com>"

* remotes/mdroth/tags/qga-pull-2016-04-19-tag:
  qemu-ga: do not run qga test when guest agent disabled

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-20 15:05:19 +01:00
Peter Maydell
fe98b18b6f Merge remote-tracking branch 'remotes/cody/tags/block-pull-request' into staging
# gpg: Signature made Tue 19 Apr 2016 17:28:01 BST using RSA key ID C0DE3057
# gpg: Good signature from "Jeffrey Cody <jcody@redhat.com>"
# gpg:                 aka "Jeffrey Cody <jeff@codyprime.org>"
# gpg:                 aka "Jeffrey Cody <codyprime@gmail.com>"

* remotes/cody/tags/block-pull-request:
  block/gluster: prevent data loss after i/o error
  block/gluster: code movement of qemu_gluster_close()
  block/gluster: return correct error value

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-20 14:42:09 +01:00
Yang Hongyang
fb91f30bb9 qemu-ga: do not run qga test when guest agent disabled
When configure with --disable-guest-agent, make check will fail with:
ERROR:tests/test-qga.c:74:fixture_setup: assertion failed (error == NULL):
 Failed to execute child process "/home/xx/qemu/qemu-ga" (No such file or
directory) (g-exec-error-quark, 8)
make: *** [check-tests/test-qga] Error 1

This check was commented out by bab47d9a75. I think that was by
mistake, because the commit message of that commit didn't mention
this change.

Signed-off-by: Yang Hongyang <hongyang.yang@easystack.cn>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Michael Roth <mdroth@linux.vnet.ibm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Cc: qemu-stable@nongnu.org
2016-04-19 16:51:15 -05:00
Peter Maydell
1f7685fafa Update language files for QEMU 2.6.0
Update translation files (change created via 'make -C po update').

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1461059023-14470-1-git-send-email-peter.maydell@linaro.org
Reviewed-by: Stefan Weil <sw@weilnetz.de>
2016-04-19 18:41:25 +01:00
Jeff Cody
d85fa9eb87 block/gluster: prevent data loss after i/o error
Upon receiving an I/O error after an fsync, by default gluster will
dump its cache.  However, QEMU will retry the fsync, which is especially
useful when encountering errors such as ENOSPC when using the werror=stop
option.  When using caching with gluster, however, the last written data
will be lost upon encountering ENOSPC.  Using the write-behind-cache
xlator option of 'resync-failed-syncs-after-fsync' should cause gluster
to retain the cached data after a failed fsync, so that ENOSPC and other
transient errors are recoverable.

Unfortunately, we have no way of knowing if the
'resync-failed-syncs-after-fsync' xlator option is supported, so for now
close the fd and set the BDS driver to NULL upon fsync error.

Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-04-19 12:24:59 -04:00
Jeff Cody
5d4343e6c2 block/gluster: code movement of qemu_gluster_close()
Move qemu_gluster_close() further up in the file, in preparation
for the next patch, to avoid a forward declaration.

Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-04-19 12:24:59 -04:00
Jeff Cody
a882745356 block/gluster: return correct error value
Upon error, gluster will call the aio callback function with a
ret value of -1, with errno set to the proper error value.  If
we set the acb->ret value to the return value in the callback,
that results in every error being EPERM (i.e. 1).  Instead, set
it to the proper error result.

Reviewed-by: Niels de Vos <ndevos@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-04-19 12:24:59 -04:00
Peter Maydell
d4dffa4a3f Merge remote-tracking branch 'remotes/armbru/tags/pull-fw_cfg-2016-04-19' into staging
fw_cfg: Adopt /opt/RFQDN convention

# gpg: Signature made Tue 19 Apr 2016 15:14:20 BST using RSA key ID EB918653
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>"
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>"

* remotes/armbru/tags/pull-fw_cfg-2016-04-19:
  fw_cfg: Adopt /opt/RFQDN convention

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-19 15:25:20 +01:00
Markus Armbruster
63d3145aad fw_cfg: Adopt /opt/RFQDN convention
FW CFG's primary user is QEMU, which uses it to expose configuration
information (in the widest sense) to Firmware.  Thus the name FW CFG.

FW CFG can also be used by others for their own purposes.  QEMU is
merely acting as transport then.  Names starting with opt/ are
reserved for such uses.  There is no provision, however, to guide safe
sharing among different such users.

Fix that, loosely following QMP precedence: names should start with
opt/RFQDN/, where RFQDN is a reverse fully qualified domain name you
control.

Based on a more ambitious patch from Michael Tsirkin.

Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Gabriel L. Somlo <somlo@cmu.edu>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Gabriel Somlo <somlo@cmu.edu>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
2016-04-19 16:09:50 +02:00
Peter Maydell
ef5d5641f5 Merge remote-tracking branch 'remotes/kraxel/tags/pull-usb-20160419-1' into staging
ehci: fix (s)iTD looping issue (CVE-2015-8558) in a different way.

# gpg: Signature made Tue 19 Apr 2016 07:22:22 BST using RSA key ID D3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"

* remotes/kraxel/tags/pull-usb-20160419-1:
  Revert "ehci: make idt processing more robust"
  ehci: apply limit to iTD/sidt descriptors

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-19 12:10:30 +01:00
Peter Maydell
bb97bfd901 Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.6-20160419' into staging
ppc patch queueu for 2016-04-19

A single fix for a regression since 2.5.  This should be the last ppc
pull request for 2.6.

# gpg: Signature made Tue 19 Apr 2016 02:48:30 BST using RSA key ID 20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-2.6-20160419:
  cuda: fix off-by-one error in SET_TIME command

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-19 11:15:32 +01:00
Michael S. Tsirkin
5eb0b194e9 cadence_uart: bounds check write offset
cadence_uart_init() initializes an I/O memory region of size 0x1000
bytes.  However in uart_write(), the 'offset' parameter (offset within
region) is divided by 4 and then used to index the array 'r' of size
CADENCE_UART_R_MAX which is much smaller: (0x48/4).  If 'offset>>=2'
exceeds CADENCE_UART_R_MAX, this will cause an out-of-bounds memory
write where the offset and the value are controlled by guest.

This will corrupt QEMU memory, in most situations this causes the vm to
crash.

Fix by checking the offset against the array size.

Cc: qemu-stable@nongnu.org
Reported-by: 李强 <liqiang6-s@360.cn>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Message-id: 20160418100735.GA517@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-19 11:13:59 +01:00
Peter Maydell
a087cc589d Merge remote-tracking branch 'remotes/ehabkost/tags/x86-pull-request' into staging
X86 fix for 2.6.0-rc3

# gpg: Signature made Mon 18 Apr 2016 20:02:15 BST using RSA key ID 984DC5A6
# gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>"

* remotes/ehabkost/tags/x86-pull-request:
  target-i386: Set AMD alias bits after filtering CPUID data

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-19 10:11:17 +01:00
Gerd Hoffmann
a49923d283 Revert "ehci: make idt processing more robust"
This reverts commit 156a2e4dbf.

Breaks FreeBSD.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-04-19 08:20:56 +02:00
Gerd Hoffmann
1ae3f2f178 ehci: apply limit to iTD/sidt descriptors
Commit "156a2e4 ehci: make idt processing more robust" tries to avoid a
DoS by the guest (create a circular iTD queue and let qemu ehci
emulation run in circles forever).  Unfortunately this has two problems:
First it misses the case of siTDs, and second it reportedly breaks
FreeBSD.

So lets go for a different approach: just count the number of iTDs and
siTDs we have seen per frame and apply a limit.  That should really
catch all cases now.

Reported-by: 杜少博 <dushaobo@360.cn>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-04-19 08:18:27 +02:00
Aurelien Jarno
ed3d807b0a cuda: fix off-by-one error in SET_TIME command
With the new framework the cuda_cmd_set_time command directly receive
the data, without the command byte. Therefore the time is stored at
in_data[0], not at in_data[1].

This fixes the "hwclock --systohc" command in a guest.

Cc: Hervé Poussineau <hpoussin@reactos.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Hervé Poussineau <hpoussin@reactos.org>
[this fixes a regression introduced by e647317 "cuda: port SET_TIME
 command to new framework"]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-04-19 11:39:23 +10:00
Eduardo Habkost
9997cf7bda target-i386: Set AMD alias bits after filtering CPUID data
QEMU complains about -cpu host on an AMD machine:
  warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 0]
For bits 0,1,3,4,5,6,7,8,9,12,13,14,15,16,17,23,24.

KVM_GET_SUPPORTED_CPUID and and x86_cpu_get_migratable_flags()
don't handle the AMD CPUID aliases bits, making
x86_cpu_filter_features() print warnings and clear those CPUID
bits incorrectly.

To avoid hacking x86_cpu_get_migratable_flags() to handle
CPUID_EXT2_AMD_ALIASES (just like the existing hack inside
kvm_arch_get_supported_cpuid()), simply move the
CPUID_EXT2_AMD_ALIASES code in x86_cpu_realizefn() after the
x86_cpu_filter_features() call.

This will probably make the CPUID_EXT2_AMD_ALIASES hack in
kvm_arch_get_supported_cpuid() unnecessary, too. The hack will be
removed in a follow-up patch after v2.6.0.

Reported-by: Radim Krčmář <rkrcmar@redhat.com>
Tested-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-04-18 15:49:17 -03:00
Peter Maydell
92b674b62a Merge remote-tracking branch 'remotes/afaerber/tags/qom-cpu-for-peter' into staging
QOM CPUState and X86CPU

* MAINTAINERS cleanup

# gpg: Signature made Mon 18 Apr 2016 17:23:16 BST using RSA key ID 3E7E013F
# gpg: Good signature from "Andreas Färber <afaerber@suse.de>"
# gpg:                 aka "Andreas Färber <afaerber@suse.com>"

* remotes/afaerber/tags/qom-cpu-for-peter:
  MAINTAINERS: Drop target-i386 from CPU subsystem

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-18 17:42:59 +01:00
Andreas Färber
2e4cad2833 MAINTAINERS: Drop target-i386 from CPU subsystem
X86CPU QOM type is in good hands and actively maintained these days, so
drop it from the generic QOM CPU subsystem.

Some refactorings and design questions will still intersect, but review
and discussions of individual series can still take place while opting out
of general X86CPU patch review.

Acked-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2016-04-18 18:14:52 +02:00
Peter Maydell
6a6fa68ae2 Merge remote-tracking branch 'remotes/mcayland/tags/qemu-openbios-signed' into staging
Update OpenBIOS images

# gpg: Signature made Mon 18 Apr 2016 09:39:31 BST using RSA key ID AE0F321F
# gpg: Good signature from "Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>"

* remotes/mcayland/tags/qemu-openbios-signed:
  Update OpenBIOS images

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-18 11:55:10 +01:00
Peter Maydell
ba3899507a Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.6-20160418' into staging
ppc patch queue for 2-16-04-18

Three bugfixe patches for 2.6 here.
* Two for bad implementation of some of the strong load/store
  instructions

* One for bad migration of the XER register.  This is a regression
  from 2.5, cause by a change in the way we represent at XER during
  runtime.

# gpg: Signature made Mon 18 Apr 2016 06:17:03 BST using RSA key ID 20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-2.6-20160418:
  ppc: Fix migration of the XER register
  ppc: Fix the bad exception NIP value and the range check in LSWX
  ppc: Fix the range check in the LSWI instruction

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-18 11:11:45 +01:00
Peter Maydell
adde0204e4 Merge remote-tracking branch 'remotes/otubo/tags/pull-seccomp-20160416' into staging
seccomp branch queue

# gpg: Signature made Sat 16 Apr 2016 19:58:46 BST using RSA key ID 12F8BD2F
# gpg: Good signature from "Eduardo Otubo (Software Engineer @ ProfitBricks) <eduardo.otubo@profitbricks.com>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 1C96 46B6 E1D1 C38A F2EC  3FDE FD0C FF5B 12F8 BD2F

* remotes/otubo/tags/pull-seccomp-20160416:
  seccomp: adding sysinfo system call to whitelist
  seccomp: Whitelist cacheflush since 2.2.0 not 2.2.3
  configure: Enable seccomp sandbox for MIPS

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-18 10:22:44 +01:00
Peter Maydell
c6c598ca5f Merge remote-tracking branch 'remotes/weil/tags/pull-wxx-20160415' into staging
wxx patch queue

# gpg: Signature made Fri 15 Apr 2016 18:36:41 BST using RSA key ID 677450AD
# gpg: Good signature from "Stefan Weil <sw@weilnetz.de>"
# gpg:                 aka "Stefan Weil <stefan.weil@weilnetz.de>"
# gpg:                 aka "Stefan Weil <stefan.weil@bib.uni-mannheim.de>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 4923 6FEA 75C9 5D69 8EC2  B78A E08C 21D5 6774 50AD

* remotes/weil/tags/pull-wxx-20160415:
  wxx: Fix broken TCP networking (regression)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-18 09:55:16 +01:00
Mark Cave-Ayland
afc474863f Update OpenBIOS images
Update OpenBIOS images to SVN r1395 built from submodule.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
2016-04-18 09:38:55 +01:00
Thomas Huth
aa378598fe ppc: Fix migration of the XER register
env->xer only holds the lower bits of the XER register nowadays, the
SO, OV and CA bits are stored in separate variables (see the function
cpu_write_xer() for details). Since the migration code currently only
reads the "xer" variable, the upper bits are lost during migration.
Fix it by using cpu_read_xer() instead.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-04-18 15:14:38 +10:00
Thomas Huth
537d3e8e6b ppc: Fix the bad exception NIP value and the range check in LSWX
The range checks in the LSWX instruction are completely insufficient:
They do not take the wrap-around case into account, and the check
"reg < rx" should be "reg <= rx" instead. Fix it by using the new
lsw_reg_in_range() helper function that is already used for LSWI, too.

Then there is a second problem: In case the INVAL exception is generated,
the NIP value is wrong, it currently points to the instruction before
the LSWX instruction. This is because gen_lswx() already decreases the
NIP value by 4 (to be prepared for page fault exceptions), and
powerpc_excp() later decreases it again by 4 while handling the program
exception. So to get this right, we've got to undo the "- 4" from
gen_lswx() here before calling helper_raise_exception_err().

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-04-18 15:14:38 +10:00
Thomas Huth
afbee7128c ppc: Fix the range check in the LSWI instruction
There are two issues: First, the number of registers that are used has
to be calculated with "(nb + 3) / 4" (i.e. round always up, not down).
Second, the "start <= ra && (start + nr - 32) > ra" condition for the
wrap-around case is wrong: It has to be tested with "||" instead of "&&".
Since we can reuse this check later for the LSWX instruction, let's
place the fixed code into a helper function, too.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-04-18 15:14:38 +10:00
Miroslav Rezanina
8e08f8a4a7 seccomp: adding sysinfo system call to whitelist
Newer version of nss-softokn libraries (> 3.16.2.3) use sysinfo call
so qemu using rbd image hang after start when run in sandbox mode.

To allow using rbd images in sandbox mode we have to whitelist it.

Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>
Acked-by: Eduardo Otubo <eduardo.otubo@profitbricks.com>
2016-04-16 20:27:44 +02:00
James Hogan
81bed73b53 seccomp: Whitelist cacheflush since 2.2.0 not 2.2.3
The cacheflush system call (found on MIPS and ARM) has been included in
the libseccomp header since 2.2.0, so include it back to that version.
Previously it was only enabled since 2.2.3 since that is when it was
enabled properly for ARM.

This will allow seccomp support to be enabled for MIPS back to
libseccomp 2.2.0.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Reviewed-By: Andrew Jones <drjones@redhat.com>
Acked-by: Eduardo Otubo <eduardo.otubo@profitbricks.com>
2016-04-16 20:27:41 +02:00
James Hogan
5ce4397281 configure: Enable seccomp sandbox for MIPS
Enable seccomp on MIPS since libseccomp version 2.2.0 when MIPS support
was first added.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Acked-by: Eduardo Otubo <eduardo.otubo@profitbricks.com>
2016-04-16 20:27:37 +02:00
Stefan Weil
3424c8a9c8 wxx: Fix broken TCP networking (regression)
It is broken since commit c619644067.

Reported-by: Michael Fritscher <michael@fritscher.net>
Tested-by: Michael Fritscher <michael@fritscher.net>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-04-15 19:35:17 +02:00
Peter Maydell
072035eba1 Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches for 2.6.0-rc3

# gpg: Signature made Fri 15 Apr 2016 17:02:23 BST using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"

* remotes/kevin/tags/for-upstream:
  nbd: Don't kill server on client that doesn't request TLS
  nbd: fix assert() on qemu-nbd stop
  nbd: Don't fail handshake on NBD_OPT_LIST descriptions
  qemu-iotests: 041: More robust assertion on quorum node
  qemu-iotests: place valgrind log file in scratch dir
  qemu-iotests: tests: do not set unused tmp variable
  qemu-iotests: common.rc: drop unused _do()
  qemu-iotests: drop unused _within_tolerance() filter
  Fix pflash migration
  block: Don't ignore flags in blk_{,co,aio}_write_zeroes()
  block/vpc: update comments to be compliant w/coding guidelines
  block/vpc: set errp in vpc_open
  block/vpc: make checks on max table size a bit more lax
  block/vpc: Use the correct max sector count for VHD images
  block/vpc: use current_size field for XenConverter VHD images
  vpc: use current_size field for XenServer VHD images
  block/vpc: set errp in vpc_create
  block: Fix blk_aio_write_zeroes()
  qemu-io: Support 'aio_write -z'

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-15 18:26:49 +01:00
Peter Maydell
c7b45f1282 Merge remote-tracking branch 'remotes/armbru/tags/pull-backends-2016-04-15' into staging
hostmem-file: plug a small leak

# gpg: Signature made Fri 15 Apr 2016 17:30:42 BST using RSA key ID EB918653
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>"
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>"

* remotes/armbru/tags/pull-backends-2016-04-15:
  hostmem-file: plug a small leak

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-15 17:43:34 +01:00
Kevin Wolf
cdc8845331 Merge remote-tracking branch 'mreitz/tags/pull-block-for-kevin-2016-04-15' into queue-block
Block patches for 2.6.0-rc3.

# gpg: Signature made Fri Apr 15 17:57:30 2016 CEST using RSA key ID E838ACAD
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>"

* mreitz/tags/pull-block-for-kevin-2016-04-15:
  nbd: Don't kill server on client that doesn't request TLS
  nbd: fix assert() on qemu-nbd stop
  nbd: Don't fail handshake on NBD_OPT_LIST descriptions
  qemu-iotests: 041: More robust assertion on quorum node
  qemu-iotests: place valgrind log file in scratch dir
  qemu-iotests: tests: do not set unused tmp variable
  qemu-iotests: common.rc: drop unused _do()
  qemu-iotests: drop unused _within_tolerance() filter

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-15 17:59:42 +02:00
Eric Blake
d1129a8ad9 nbd: Don't kill server on client that doesn't request TLS
Upstream NBD documents (as of commit 4feebc95) that servers MAY
choose to operate in a conditional mode, where it is up to the
client whether to use TLS.  For qemu's case, we want to always be
in FORCEDTLS mode, because of the risk of man-in-the-middle
attacks, and since we never export more than one device; likewise,
the qemu client will ALWAYS send NBD_OPT_STARTTLS as its first
option.  But now that SELECTIVETLS servers exist, it is feasible
to encounter a (non-qemu) client that is programmed to talk to
such a server, and does not do NBD_OPT_STARTTLS first, but rather
wants to probe if it can use a non-encrypted export.

The NBD protocol documents that we should let such a client
continue trying, on the grounds that maybe the client will get the
hint to send NBD_OPT_STARTTLS, rather than immediately dropping
the connection.

Note that NBD_OPT_EXPORT_NAME is a special case: since it is the
only option request that can't have an error return, we have to
(continue to) drop the connection on that one; rather, what we are
fixing here is that all other replies prior to TLS initiation tell
the client NBD_REP_ERR_TLS_REQD, but keep the connection alive.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 1460671343-18485-1-git-send-email-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-04-15 17:56:56 +02:00
Pavel Butsykin
23994a5f52 nbd: fix assert() on qemu-nbd stop
From time to time qemu-nbd is crashing on the following assert:
    assert(state == TERMINATING);
    nbd_export_closed
    nbd_export_put
    main
and the state at the moment of the crash is evaluated to TERMINATE.

During shutdown process of the client the nbd_client_thread thread sends
SIGTERM signal and the main thread calls the nbd_client_closed callback.
If the SIGTERM callback will be executed after change the state to
TERMINATING, then the state will once again be TERMINATE.

To solve the issue, we must change the state to TERMINATE only if the state
is RUNNING. In the other case we are shutting down already.

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1460629215-11567-1-git-send-email-den@openvz.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-04-15 17:56:56 +02:00
Eric Blake
200650d49f nbd: Don't fail handshake on NBD_OPT_LIST descriptions
The NBD Protocol states that NBD_REP_SERVER may set
'length > sizeof(namelen) + namelen'; in which case the rest
of the packet is a UTF-8 description of the export.  While we
don't know of any NBD servers that send this description yet,
we had better consume the data so we don't choke when we start
to talk to such a server.

Also, a (buggy/malicious) server that replies with length <
sizeof(namelen) would cause us to block waiting for bytes that
the server is not sending, and one that replies with super-huge
lengths could cause us to temporarily allocate up to 4G memory.
Sanity check things before blindly reading incorrectly.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 1460077777-31004-1-git-send-email-eblake@redhat.com
Reviewed-by: Alex Bligh <alex@alex.org.uk>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-04-15 17:56:56 +02:00
Fam Zheng
e71fc0bae7 qemu-iotests: 041: More robust assertion on quorum node
Block nodes are now assigned names automatically, therefore the test
case is fragile in using fixed indices in result. Introduce a method in
iotests.py and do the matching more sensibly.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1460518995-1338-1-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-04-15 17:56:56 +02:00
Sascha Silbe
5f1525a685 qemu-iotests: place valgrind log file in scratch dir
Do not place the valgrind log file at a predictable path in a
world-writable location. Use the common scratch directory (${TEST_DIR})
instead.

Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Reviewed-by: Bo Tu <tubo@linux.vnet.ibm.com>
Message-id: 1460472980-26319-5-git-send-email-silbe@linux.vnet.ibm.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-04-15 17:56:56 +02:00
Sascha Silbe
339f06a3bc qemu-iotests: tests: do not set unused tmp variable
The previous commit removed the last usage of ${tmp} inside the tests
themselves; the only remaining users are sourced by check. So we can now
drop this variable from the tests.

Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Reviewed-by: Bo Tu <tubo@linux.vnet.ibm.com>
Message-id: 1460472980-26319-4-git-send-email-silbe@linux.vnet.ibm.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-04-15 17:56:56 +02:00
Sascha Silbe
6bb6f6cd9e qemu-iotests: common.rc: drop unused _do()
_do() was never used and possibly creates temporary files at
predictable, world-writable locations. Get rid of it.

Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Reviewed-by: Bo Tu <tubo@linux.vnet.ibm.com>
Message-id: 1460472980-26319-3-git-send-email-silbe@linux.vnet.ibm.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-04-15 17:56:55 +02:00
Sascha Silbe
242fbc19ef qemu-iotests: drop unused _within_tolerance() filter
_within_tolerance() isn't used anymore and possibly creates temporary
files at predictable, world-writable locations. Get rid of it.

If it's needed again in the future it can be revived easily and fixed up
to use TEST_DIR and / or safely created temporary files.

Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Reviewed-by: Bo Tu <tubo@linux.vnet.ibm.com>
Message-id: 1460472980-26319-2-git-send-email-silbe@linux.vnet.ibm.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-04-15 17:56:55 +02:00
Marc-André Lureau
bc78a01319 hostmem-file: plug a small leak
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1460566660-19241-1-git-send-email-marcandre.lureau@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-04-15 17:56:06 +02:00
Dr. David Alan Gilbert
90c647db8d Fix pflash migration
Pflash migration (e.g. q35 + EFI variable storage) fails
with the assert:

bdrv_co_do_pwritev: Assertion `!(bs->open_flags & 0x0800)' failed.

This avoids the problem by delaying the pflash update until after
the device loads complete.

Tested by:
  Migrating Q35/EFI vm.
  Changing efi variable content (with efiboot in the guest)
  md5sum'ing the variable file before migration and after.

This is a fix that Paolo posted in the message
  570244B3.4070105@redhat.com

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-15 17:27:34 +02:00
Kevin Wolf
16aaf975ee block: Don't ignore flags in blk_{,co,aio}_write_zeroes()
Commit 57d6a428 neglected to pass the given flags to blk_aio_prwv(),
which broke discard by WRITE SAME for scsi-disk (the UNMAP bit would be
ignored).

Commit fc1453cd introduced the same bug for blk_write_zeroes(). This is
used for 'qemu-img convert' without has_zero_init (e.g. on a block
device) and for preallocation=falloc in parallels.

Commit 8896e088 is the version for blk_co_write_zeroes(). This function
is only used in qemu-io.

Reported-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-04-15 17:22:12 +02:00
Jeff Cody
9c057d0b68 block/vpc: update comments to be compliant w/coding guidelines
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-15 17:22:12 +02:00
Jeff Cody
32f6439cf7 block/vpc: set errp in vpc_open
Add more useful error information to failure paths in vpc_open

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-15 17:22:12 +02:00
Jeff Cody
66176fc6a7 block/vpc: make checks on max table size a bit more lax
The check on the max_table_size field not being larger than required is
valid, and in accordance with the VHD spec.  However, there have been
VHD images encountered in the wild that have an out-of-spec max table
size that is technically too large.

There is no issue in allowing this larger table size, as we also
later verify that the computed size (used for the pagetable) is
large enough to fit all sectors.  In addition, max_table_entries
is bounds checked against SIZE_MAX and INT_MAX.

Remove the strict check, so that we can accomodate these sorts of
images that are benignly out of spec.

Reported-by: Stefan Hajnoczi <stefanha@redhat.com>
Reported-by: Grant Wu <grantwwu@gmail.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-15 17:22:12 +02:00
Jeff Cody
c23fb11bbb block/vpc: Use the correct max sector count for VHD images
The old VHD_MAX_SECTORS value is incorrect, and is a throwback
to the CHS calculations.  The VHD specification allows images up to 2040
GiB, which (using 512 byte sectors) corresponds to a maximum number of
sectors of 0xff000000, rather than the old value of 0xfe0001ff.

Update VHD_MAX_SECTORS to reflect the correct value.

Also, update comment references to the actual size limit, and correct
one compare so that we can have sizes up to the limit.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-15 17:22:12 +02:00
Jeff Cody
bab246db1d block/vpc: use current_size field for XenConverter VHD images
XenConverter VHD images are another VHD image where current_size is
different from the CHS values in the the format header.  Use
current_size as the default, by looking at the creator_app signature
field.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-15 17:22:12 +02:00
Stefan Hajnoczi
9bdfb9e8ac vpc: use current_size field for XenServer VHD images
The vpc driver has two methods of determining virtual disk size.  The
correct one to use depends on the software that generated the image
file.  Add the XenServer creator_app signature so that image size is
correctly detected for those images.

Reported-by: Grant Wu <grantwwu@gmail.com>
Reported-by: Spencer Baugh <sbaugh@catern.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-15 17:22:12 +02:00
Jeff Cody
0211b9becc block/vpc: set errp in vpc_create
Add more useful error information to failure paths in vpc_create().

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-15 17:22:11 +02:00
Kevin Wolf
7fa84cd8d4 block: Fix blk_aio_write_zeroes()
Commit 57d6a428 broke blk_aio_write_zeroes() because in some write
functions in the call path don't have an explicit length argument but
reuse qiov->size instead. Which is great, except that write_zeroes
doesn't have a qiov, which this commit interprets as 0 bytes.
Consequently, blk_aio_write_zeroes() didn't effectively do anything.

This patch introduces an explicit acb->bytes in BlkAioEmAIOCB and uses
that instead of acb->rwco.size.

The synchronous version of the function is okay because it does pass a
qiov (with the right size and a NULL pointer as its base).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-04-15 17:22:11 +02:00
Kevin Wolf
5ceb77652e qemu-io: Support 'aio_write -z'
This allows testing blk_aio_write_zeroes().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-04-15 17:22:11 +02:00
Peter Maydell
538a467329 Merge remote-tracking branch 'remotes/mcayland/tags/qemu-sparc-signed' into staging
qemu-sparc update

# gpg: Signature made Fri 15 Apr 2016 09:30:58 BST using RSA key ID AE0F321F
# gpg: Good signature from "Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>"

* remotes/mcayland/tags/qemu-sparc-signed:
  target-sparc: fix Trap Based Address Register behavior for sparc64
  target-sparc: fix Nucleus quad LDD 128 bit access for windowed registers

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-15 10:49:04 +01:00
Artyom Tarasenko
de5f107744 target-sparc: fix Trap Based Address Register behavior for sparc64
Accoding the chapter 7.6 Trap Processing of the SPARC Architecture Manual v9,
the Trap Based Address Register is not modified as a trap is taken.

This fix allows booting FreeBSD-10.3-RELEASE-sparc64.

Signed-off-by: Artyom Tarasenko <atar4qemu@gmail.com>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
2016-04-15 09:30:40 +01:00
Artyom Tarasenko
01a780d51a target-sparc: fix Nucleus quad LDD 128 bit access for windowed registers
Fix register offset calculation when regwptr is used.

Signed-off-by: Artyom Tarasenko <atar4qemu@gmail.com>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
2016-04-15 09:30:39 +01:00
Peter Maydell
bc8995cafa Update version for v2.6.0-rc2 release
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-14 17:30:28 +01:00
Peter Maydell
3e7cac31d6 Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
tpm, vhost, virtio: fixes for 2.6

Minor fixes all over the place.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# gpg: Signature made Thu 14 Apr 2016 14:45:55 BST using RSA key ID D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>"

* remotes/mst/tags/for_upstream:
  hw/virtio/balloon: Replace TARGET_PAGE_SIZE with BALLOON_PAGE_SIZE
  tpm: Fix write to file descriptor function
  tpm: acpi: remove IRQ from TPM's CRS to make Windows not see conflict
  pc: acpi: tpm: add missing MMIO resource to PCI0._CRS
  specs/vhost-user: spelling fix
  specs/vhost-user: improve VHOST_SET_VRING_NUM documentation

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-14 14:55:25 +01:00
Thomas Huth
01310e2aa7 hw/virtio/balloon: Replace TARGET_PAGE_SIZE with BALLOON_PAGE_SIZE
The balloon code currently calls madvise() with TARGET_PAGE_SIZE as
length parameter. Since the virtio-balloon protocol is always based
on 4k pages, no matter what the host and guest are using as page size,
this could cause problems: If TARGET_PAGE_SIZE is bigger than 4k, the
madvise call also destroys the 4k areas after the current one - which
might be wrong since the guest did not want free that area yet (in
case the guest used as smaller MMU page size than the hard-coded
TARGET_PAGE_SIZE). So to fix this issue, introduce a proper define
called BALLOON_PAGE_SIZE (which is 4096) to use this as the size
parameter for the madvise() call instead.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-04-14 16:44:42 +03:00
Peter Maydell
33e5702889 Merge remote-tracking branch 'remotes/kraxel/tags/pull-input-20160413-1' into staging
virtio-input; live migration support, various bugfixes.

# gpg: Signature made Wed 13 Apr 2016 16:41:27 BST using RSA key ID D3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"

* remotes/kraxel/tags/pull-input-20160413-1:
  virtio-input: support absolute axis config in pass-through
  input-linux: refine mouse detection
  virtio-input: fix emulated tablet axis ranges
  virtio-input: add live migration support
  virtio-input: implement pass-through evdev writes
  virtio-input: retrieve EV_LED host config bits
  virtio-input: add missing key mappings
  move const_le{16, 23} to qemu/bswap.h, add comment
  virtio-input: add parenthesis to const_le{16, 32}

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-13 20:35:23 +01:00
Peter Maydell
8b4aaba736 Merge remote-tracking branch 'remotes/elmarco/tags/ivshmem-fix-pull-request' into staging
# gpg: Signature made Wed 13 Apr 2016 11:04:51 BST using RSA key ID 75969CE5
# gpg: Good signature from "Marc-André Lureau <marcandre.lureau@redhat.com>"
# gpg:                 aka "Marc-André Lureau <marcandre.lureau@gmail.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 87A9 BD93 3F87 C606 D276  F62D DAE8 E109 7596 9CE5

* remotes/elmarco/tags/ivshmem-fix-pull-request:
  ivshmem: fix ivshmem-{plain,doorbell} crash without arg

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-13 18:48:28 +01:00
Stefan Berger
e7658fcc4c tpm: Fix write to file descriptor function
Fix a bug introduced in commit 46f296c while moving send_all to the
tpm_passthrough code. Fix the name of the variable used in the loop.

Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-04-13 19:52:34 +03:00
Igor Mammedov
52e38eb051 tpm: acpi: remove IRQ from TPM's CRS to make Windows not see conflict
IRQ 5 used by TPM conflicts with PNP0C0F IRQs,
as result Windows fails driver initialization with reason
  'device cannot find enough free resources'
But if TPM._CRS.IRQ entry is commented out, Windows
seems to initialize driver without errors as it doesn't
notice possible conflict and it seems to work
probably due to a link with IRQ 5 being unused/disabled.

So temporary comment out TPM._CRS.IRQ to 'fix'
regression in TPM, with intent to fix it correctly
later i.e.:
  1. pick unused IRQ as default one for TPM
  2. fetch IRQ value from device model so that user
     could override default one if it conflicts with
     some other device.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-04-13 19:52:34 +03:00
Igor Mammedov
2b1c2e8e5f pc: acpi: tpm: add missing MMIO resource to PCI0._CRS
Windows will fail initialize TMP driver with the reason:
  'device cannot find enough free resources'
That happens because parent BUS doesn't describe
MMIO resources used by TPM child device.
Fix it by describing it in top-most parent bus scope PCI0.

It was 'regressed' by commit
  5cb18b3d TPM2 ACPI table support
with following fixup
  9e472263 acpi: add missing ssdt
which did the right thing by moving TPM to BUS
it belongs to but lacked a proper resource declaration.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-04-13 19:52:34 +03:00
Marc-André Lureau
c954f09ee5 specs/vhost-user: spelling fix
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-04-13 19:52:34 +03:00
Marc-André Lureau
09230cb867 specs/vhost-user: improve VHOST_SET_VRING_NUM documentation
"number of vrings" doesn't help me understand the purpose of this
message. My understanding is that it is rather the size of the queue (in
modern terms).

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-04-13 19:52:34 +03:00
Peter Maydell
c0bc0fa352 Merge remote-tracking branch 'remotes/jnsnow/tags/ide-pull-request' into staging
# gpg: Signature made Wed 13 Apr 2016 00:32:22 BST using RSA key ID AAFC390E
# gpg: Good signature from "John Snow (John Huston) <jsnow@redhat.com>"

* remotes/jnsnow/tags/ide-pull-request:
  ide: really restart pending and in-flight atapi dma
  ide: restart atapi dma by re-evaluating command packet
  ide: don't lose pending dma state
  xen: Fix IDE unplug

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-13 16:32:29 +01:00
Ladi Prosek
b065e275a8 virtio-input: support absolute axis config in pass-through
VIRTIO_INPUT_CFG_ABS_INFO was not implemented for pass-through input
devices. This patch follows the existing design and pre-fetches the
config for all absolute axes using EVIOCGABS at realize time.

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Message-id: 1460558603-18331-1-git-send-email-lprosek@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-04-13 17:26:12 +02:00
Gerd Hoffmann
ce47d3d427 input-linux: refine mouse detection
Read absolute and relative axis information, only classify
devices as mouse/tablet in case the x axis is present.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-04-13 15:52:28 +02:00
Ladi Prosek
0263b3a72f virtio-input: fix emulated tablet axis ranges
The reported maximum was wrong. The X and Y coordinates are 0-based
so if size is 8000 maximum must be 7FFF.

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Message-id: 1460128893-10244-1-git-send-email-lprosek@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-04-13 15:52:28 +02:00
Gerd Hoffmann
2d73837466 virtio-input: add live migration support
virtio-input is simple enough that it doesn't need to xfer any state.
Still we have to wire up savevm manually, so the generic pci and virtio
are saved correctly.

Additionally we need to do some post-load processing to figure whenever
the guest uses the device or not, so we can give input routing hints to
the qemu input layer using qemu_input_handler_{activate,deactivate}.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 1459859501-16965-1-git-send-email-kraxel@redhat.com
2016-04-13 15:52:28 +02:00
Ladi Prosek
1a782629f6 virtio-input: implement pass-through evdev writes
The write path for pass-through devices, commonly used for controlling
keyboard LEDs via EV_LED, was not implemented. This commit adds the
necessary plumbing to connect the status virtio queue to the host evdev
file descriptor.

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Message-id: 1459511146-12060-1-git-send-email-lprosek@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-04-13 15:52:28 +02:00
Ladi Prosek
848c4d4480 virtio-input: retrieve EV_LED host config bits
VIRTIO_INPUT_CFG_EV_BITS with subsel of EV_LED was always
returning an empty bitmap for pass-through input devices.

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Message-id: 1459418028-7473-1-git-send-email-lprosek@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-04-13 15:52:28 +02:00
Ladi Prosek
27a7bbcdf9 virtio-input: add missing key mappings
KEY_PAUSE is flat out missing. KEY_SYSRQ already has a keycode
assigned but it's not what I'm seeing on my system. The mapping
doesn't appear to have to be unique so both keycodes now map to
KEY_SYSRQ which is what the "Keyboard PrintScreen", HID usage ID
0x46, translates to.

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Message-id: 1459343240-19483-1-git-send-email-lprosek@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-04-13 15:52:28 +02:00
Gerd Hoffmann
441330f714 move const_le{16, 23} to qemu/bswap.h, add comment
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1460441239-867-1-git-send-email-kraxel@redhat.com
2016-04-13 15:52:28 +02:00
Gerd Hoffmann
a263bac192 virtio-input: add parenthesis to const_le{16, 32}
"_x" must be "(_x)" otherwise things fail if you pass in expressions.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1460440299-26654-1-git-send-email-kraxel@redhat.com
2016-04-13 15:52:28 +02:00
Marc-André Lureau
6dc64780c2 ivshmem: fix ivshmem-{plain,doorbell} crash without arg
"qemu -device ivshmem-{plain,doorbell}" will crash, because the device
doesn't check that the required argument is provided. (screwed up in
commit 5400c02)

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
2016-04-13 12:01:47 +02:00
Pavel Butsykin
502356eeeb ide: really restart pending and in-flight atapi dma
Restart of ATAPI DMA used to be unreachable, because the request to do
so wasn't indicated in bus->error_status due to the lack of spare bits, and
ide_restart_bh() would return early doing nothing.

This patch makes use of the observation that not all bit combinations were
possible in ->error_status. In particular, IDE_RETRY_READ only made sense
together with IDE_RETRY_DMA or IDE_RETRY_PIO. This allows to re-use
IDE_RETRY_READ alone as an indicator of ATAPI DMA restart request.

To makes things more uniform, ATAPI DMA gets its own value for ->dma_cmd.
As a means against confusion, macros are added to test the state of
->error_status.

The patch fixes the restart of both in-flight and pending ATAPI DMA,
following the scheme similar to that of IDE DMA.

[Including a fixup patch:
Message-id: 1460465594-15777-1-git-send-email-pbutsykin@virtuozzo.com
--js]

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 1459924806-306-4-git-send-email-den@openvz.org
Signed-off-by: John Snow <jsnow@redhat.com>
2016-04-12 18:48:15 -04:00
Pavel Butsykin
9a41826f38 ide: restart atapi dma by re-evaluating command packet
ide_atapi_dma_restart() used to just complete the DMA with an error,
under the assumption that there isn't enough information to restart it.

However, as the contents of the ->io_buffer is preserved, it looks safe to
just re-evaluate it and dispatch the ATAPI command again.

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 1459924806-306-3-git-send-email-den@openvz.org
Signed-off-by: John Snow <jsnow@redhat.com>
2016-04-12 16:47:52 -04:00
Pavel Butsykin
218fd37c68 ide: don't lose pending dma state
If the migration occurs after the IDE DMA has been set up but before it
has been initiated, the state gets lost upon save/restore. Specifically,
->dma_cb callback gets cleared, so, when the guest eventually starts bus
mastering, the DMA never completes, causing the guest to time out the
operation.

OTOH all the infrastructure is already in place to restart the DMA if
the migration happens while the DMA is in progress.

So reuse that infrastructure, by setting bus->error_status based on
->dma_cmd in pre_save if ->dma_cb callback is already set but DMAING is
clear. This will indicate the need for restart and make sure ->dma_cb
is restored in ide_restart_bh(); howeover since DMAING is clear the state
upon restore will be exactly "ready for DMA" as before the save.

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 1459924806-306-2-git-send-email-den@openvz.org
Signed-off-by: John Snow <jsnow@redhat.com>
2016-04-12 16:47:52 -04:00
Anthony PERARD
d1fc684f36 xen: Fix IDE unplug
After commit e5e7855 (blockdev: Separate BB name management), starting a
guest with PVHVM support result in this assert:
qemu-system-i386: block/block-backend.c:173: blk_delete: Assertion `!blk->name' failed.

A backtrace show that a caller is pci_piix3_xen_ide_unplug().

This patch fix it.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Message-id: 1460382666-29885-1-git-send-email-anthony.perard@citrix.com
Signed-off-by: John Snow <jsnow@redhat.com>
2016-04-12 16:47:52 -04:00
Wei Liu
4df26e88ee xenfb: use the correct condition to avoid excessive looping
In commit ac0487e1 ("xenfb.c: avoid expensive loops when prod <=
out_cons"), ">=" was used. In fact, a full ring is a legit state.
Correct the test to use ">".

Reported-by: "Hao, Xudong" <xudong.hao@intel.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Tested-by: "Hao, Xudong" <xudong.hao@intel.com>
Acked-by: Anthony Perard <anthony.perard@citrix.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
2016-04-12 10:16:08 -07:00
Peter Maydell
d44122ecd0 Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches for 2.6

# gpg: Signature made Tue 12 Apr 2016 17:10:29 BST using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"

* remotes/kevin/tags/for-upstream:
  qemu-iotests: iotests.py: get rid of __all__
  qemu-iotests: 068: don't require KVM
  qemu-iotests: 148: properly skip test if quorum support is missing
  qemu-iotests: iotests.VM: remove qtest socket on error
  qemu-iotests: fix 051 on non-PC architectures
  qemu-iotests: check: don't place files with predictable names in /tmp
  MAINTAINERS: Block layer core, qcow2 and blkdebug
  qcow2: Prevent backing file names longer than 1023
  vpc: fix return value check for blk_pwrite
  iotests: Make 150 use qemu-img map instead of du
  block: initialize qcrypto API at startup
  qemu-img: fix formatting of error message
  iotests: fix the broken 026.nocache output

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-12 17:47:15 +01:00
Kevin Wolf
5158ac5830 Merge remote-tracking branch 'mreitz/tags/pull-block-for-kevin-2016-04-12' into queue-block
Block patches for 2.6-rc2.

# gpg: Signature made Tue Apr 12 18:08:20 2016 CEST using RSA key ID E838ACAD
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>"

* mreitz/tags/pull-block-for-kevin-2016-04-12:
  qemu-iotests: iotests.py: get rid of __all__
  qemu-iotests: 068: don't require KVM
  qemu-iotests: 148: properly skip test if quorum support is missing
  qemu-iotests: iotests.VM: remove qtest socket on error
  qemu-iotests: fix 051 on non-PC architectures
  qemu-iotests: check: don't place files with predictable names in /tmp

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-12 18:09:16 +02:00
Sascha Silbe
3ef3dcef56 qemu-iotests: iotests.py: get rid of __all__
The __all__ list contained a typo for as long as the iotests module
existed. That typo prevented "from iotests import *" (which is the
only case where iotests.__all__ is used at all) from ever working.

The names used by iotests are highly prone to name collisions, so
importing them all unconditionally is a bad idea anyway. Since __all__
is not adding any value, let's just get rid of it.

Fixes: f345cfd0 ("qemu-iotests: add iotests Python module")
Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Reviewed-by: Bo Tu <tubo@linux.vnet.ibm.com>
Message-id: 1459848109-29756-8-git-send-email-silbe@linux.vnet.ibm.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-04-12 18:07:39 +02:00
Sascha Silbe
9bf8027dde qemu-iotests: 068: don't require KVM
None of the other test cases explicitly enable KVM and there's no
obvious reason for 068 to require it. Drop this so all test cases can be
executed in environments where KVM is not available (e.g. because the
user doesn't have sufficient permissions to access /dev/kvm).

Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Reviewed-by: Bo Tu <tubo@linux.vnet.ibm.com>
Message-id: 1459848109-29756-6-git-send-email-silbe@linux.vnet.ibm.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-04-12 18:07:39 +02:00
Sascha Silbe
3f647b510f qemu-iotests: 148: properly skip test if quorum support is missing
qemu-iotests test case 148 already had some code for skipping the test
if quorum support is missing, but it didn't work in all
cases. TestQuorumEvents.setUp() gets run before the actual test class
(which contains the skipping code) and tries to start qemu with a drive
using the quorum driver. For some reason this works fine when using
qcow2, but fails for raw.

As the entire test case requires quorum, just check for availability
before even starting the test suite. Introduce a verify_quorum()
function in iotests.py for this purpose so future test cases can make
use of it.

Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Reviewed-by: Bo Tu <tubo@linux.vnet.ibm.com>
Message-id: 1459848109-29756-5-git-send-email-silbe@linux.vnet.ibm.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-04-12 18:07:39 +02:00
Sascha Silbe
c1c71e49bc qemu-iotests: iotests.VM: remove qtest socket on error
On error, VM.launch() cleaned up the monitor unix socket, but left the
qtest unix socket behind. This caused the remaining sub-tests to fail
with EADDRINUSE:

+======================================================================
+ERROR: testQuorum (__main__.TestFifoQuorumEvents)
+----------------------------------------------------------------------
+Traceback (most recent call last):
+  File "148", line 63, in setUp
+    self.vm.launch()
+  File "/home6/silbe/qemu/tests/qemu-iotests/iotests.py", line 247, in launch
+    self._qmp.accept()
+  File "/home6/silbe/qemu/tests/qemu-iotests/../../scripts/qmp/qmp.py", line 141, in accept
+    return self.__negotiate_capabilities()
+  File "/home6/silbe/qemu/tests/qemu-iotests/../../scripts/qmp/qmp.py", line 57, in __negotiate_capabilities
+    raise QMPConnectError
+QMPConnectError
+
+======================================================================
+ERROR: testQuorum (__main__.TestQuorumEvents)
+----------------------------------------------------------------------
+Traceback (most recent call last):
+  File "148", line 63, in setUp
+    self.vm.launch()
+  File "/home6/silbe/qemu/tests/qemu-iotests/iotests.py", line 244, in launch
+    self._qtest = qtest.QEMUQtestProtocol(self._qtest_path, server=True)
+  File "/home6/silbe/qemu/tests/qemu-iotests/../../scripts/qtest.py", line 33, in __init__
+    self._sock.bind(self._address)
+  File "/usr/lib64/python2.7/socket.py", line 224, in meth
+    return getattr(self._sock,name)(*args)
+error: [Errno 98] Address already in use

Fix this by cleaning up both the monitor socket and the qtest socket iff
they exist.

Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Reviewed-by: Bo Tu <tubo@linux.vnet.ibm.com>
Message-id: 1459848109-29756-4-git-send-email-silbe@linux.vnet.ibm.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-04-12 18:07:39 +02:00
Sascha Silbe
1759386b7c qemu-iotests: fix 051 on non-PC architectures
Commit 61de4c68 [block: Remove BDRV_O_CACHE_WB] updated the reference
output for PCs, but neglected to do the same for the generic reference
output file. Fix 051 on all non-PC architectures by applying the same
change to the generic output file.

Fixes: 61de4c68 ("block: Remove BDRV_O_CACHE_WB")
Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Reviewed-by: Bo Tu <tubo@linux.vnet.ibm.com>
Message-id: 1459848109-29756-3-git-send-email-silbe@linux.vnet.ibm.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-04-12 18:07:39 +02:00
Sascha Silbe
0145b4e130 qemu-iotests: check: don't place files with predictable names in /tmp
Placing files with predictable or even hard-coded names in /tmp is a
security risk and can prevent or disturb operation on a multi-user
machine. Place them inside the "scratch" directory instead, as we
already do for most other test-related files.

Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Reviewed-by: Bo Tu <tubo@linux.vnet.ibm.com>
Message-id: 1459848109-29756-2-git-send-email-silbe@linux.vnet.ibm.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-04-12 18:07:39 +02:00
Max Reitz
c4189d85bc MAINTAINERS: Block layer core, qcow2 and blkdebug
As agreed with Kevin and already practiced for a while, I am adding
myself as co-maintainer of the block layer core, qcow2 and blkdebug.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-12 18:06:52 +02:00
Max Reitz
4e876bcf2b qcow2: Prevent backing file names longer than 1023
We reject backing file names with a length of more than 1023 characters
when opening a qcow2 file, so we should not produce such files
ourselves.

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-12 18:06:51 +02:00
Paolo Bonzini
40a99aace3 vpc: fix return value check for blk_pwrite
bdrv_pwrite_sync used to return zero or negative error, while blk_pwrite returns
the number of written bytes when successful.  This caused VPC image creation
to fail spectacularly: it wrote the first 512 bytes, and then exited immediately
because of the non-zero answer from blk_pwrite.  But the truly spectacular part
is that it returns a positive value (the 512 that blk_pwrite returned) causing
everyone to believe that it succeeded.

This fixes qemu-iotests with vpc format.

Fixes: b8f45cdf78
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-12 18:06:51 +02:00
Max Reitz
1fd06db03d iotests: Make 150 use qemu-img map instead of du
The actual on-disk size of a file does not only depend on factors qemu
can control. Thus, we should not depend on this to determine whether a
file has indeed been fully allocated. Instead, use qemu-img map and hope
that if an area is referenced, it is indeed allocated, too.

Also, limit the supported image formats to raw and qcow2 because the
actual qemu-img map output may depend on the image format.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Tested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-12 18:06:51 +02:00
Daniel P. Berrange
c229708848 block: initialize qcrypto API at startup
Any programs which call the qcrypto APIs should ensure that
qcrypto_init() has been called before anything else which
can use crypto. Essentially this means right at the start
of the main method before initializing anything else.

This is important because some versions of gnutls/gcrypt
require explicit initialization before use.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Alex Bligh <alex@alex.org.uk>
Tested-by: Alex Bligh <alex@alex.org.uk>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-12 18:06:51 +02:00
Daniel P. Berrange
143605a200 qemu-img: fix formatting of error message
The error_reportf_err() will not automatically append a
': ' before adding its suffix, so we must include that
in the message we pass it, otherwise we get a badly
formatted message lacking whitespace:

qemu-img: Could not open 'driver=nbd,host=127.0.0.1,port=6666,tls-creds=tls0'Failed to connect socket: Connection refused

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-12 18:06:51 +02:00
Pavel Butsykin
af74e865c4 iotests: fix the broken 026.nocache output
This patch fixes longstanding issue with 026 iotest. Unfortunately,
this test contains 2 versions of the correct output, one for cached
writes and one for non-cached ones. People tends to fix only one
version of output of the test and thus noncached version becomes
broken. Unfortunately, it is default in tests/check-block.sh

The following problematic commits were made:
    commit 3b5e14c76a
    Author: Max Reitz <mreitz@redhat.com>
    Date:   Tue Dec 2 18:32:51 2014 +0100
    qcow2: Flushing the caches in qcow2_close may fail

    commit a069e2f137
    Author: John Snow <jsnow@redhat.com>
    Date:   Fri Feb 6 16:26:17 2015 -0500
    blkdebug: fix "once" rule

    commit b106ad9185
    Author: Kevin Wolf <kwolf@redhat.com>
    Date:   Fri Mar 28 18:06:31 2014 +0100
    qcow2: Don't rely on free_cluster_index in alloc_refcount_block()

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Max Reitz <mreitz@redhat.com>
CC: John Snow <jsnow@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-12 18:06:51 +02:00
Peter Maydell
42bb626f7e Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging
# gpg: Signature made Tue 12 Apr 2016 09:29:54 BST using RSA key ID 81AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"

* remotes/stefanha/tags/block-pull-request:
  MAINTAINERS: Add Fam Zheng as a co-maintainer of block I/O path
  mirror: Replace bdrv_drain(bs) with bdrv_co_drain(bs)
  block: Fix bdrv_drain in coroutine

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-12 09:34:52 +01:00
Fam Zheng
9ca3003df3 MAINTAINERS: Add Fam Zheng as a co-maintainer of block I/O path
As agreed with Stefan, I'm listing myself a co-maintainer of block I/O
path and assist with the maintainership.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1459849105-7767-1-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-04-11 16:59:10 +01:00
Fam Zheng
39bf92dd70 mirror: Replace bdrv_drain(bs) with bdrv_co_drain(bs)
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1459855253-5378-3-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-04-11 16:59:09 +01:00
Fam Zheng
a77fd4bb29 block: Fix bdrv_drain in coroutine
Using the nested aio_poll() in coroutine is a bad idea. This patch
replaces the aio_poll loop in bdrv_drain with a BH, if called in
coroutine.

For example, the bdrv_drain() in mirror.c can hang when a guest issued
request is pending on it in qemu_co_mutex_lock().

Mirror coroutine in this case has just finished a request, and the block
job is about to complete. It calls bdrv_drain() which waits for the
other coroutine to complete. The other coroutine is a scsi-disk request.
The deadlock happens when the latter is in turn pending on the former to
yield/terminate, in qemu_co_mutex_lock(). The state flow is as below
(assuming a qcow2 image):

  mirror coroutine               scsi-disk coroutine
  -------------------------------------------------------------
  do last write

    qcow2:qemu_co_mutex_lock()
    ...
                                 scsi disk read

                                   tracked request begin

                                   qcow2:qemu_co_mutex_lock.enter

    qcow2:qemu_co_mutex_unlock()

  bdrv_drain
    while (has tracked request)
      aio_poll()

In the scsi-disk coroutine, the qemu_co_mutex_lock() will never return
because the mirror coroutine is blocked in the aio_poll(blocking=true).

With this patch, the added qemu_coroutine_yield() allows the scsi-disk
coroutine to make progress as expected:

  mirror coroutine               scsi-disk coroutine
  -------------------------------------------------------------
  do last write

    qcow2:qemu_co_mutex_lock()
    ...
                                 scsi disk read

                                   tracked request begin

                                   qcow2:qemu_co_mutex_lock.enter

    qcow2:qemu_co_mutex_unlock()

  bdrv_drain.enter
>   schedule BH
>   qemu_coroutine_yield()
>                                  qcow2:qemu_co_mutex_lock.return
>                                  ...
                                   tracked request end
    ...
    (resumed from BH callback)
  bdrv_drain.return
  ...

Reported-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1459855253-5378-2-git-send-email-famz@redhat.com
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-04-11 16:59:09 +01:00
Peter Maydell
4e71220387 Merge remote-tracking branch 'remotes/mcayland/tags/qemu-sparc-signed' into staging
qemu-sparc update

# gpg: Signature made Mon 11 Apr 2016 16:30:02 BST using RSA key ID AE0F321F
# gpg: Good signature from "Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>"

* remotes/mcayland/tags/qemu-sparc-signed:
  target-sparc: fix ldstub sign-extension bug

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-11 16:46:37 +01:00
Mark Cave-Ayland
4553e10360 target-sparc: fix ldstub sign-extension bug
ldstub [addr], reg incorrectly reads a signed byte from memory which causes
problems in the 32-bit Solaris mutex code. Here the byte value being read is
0xff which is incorrectly sign-extended to 0xffffffff before being written back
to the target register causing lock detection to behave incorrectly.

This fixes the intermittent hangs and MUTEX_HELD warnings issued to the
console when running 32-bit Solaris images under qemu-system-sparc.

With thanks to Joseph Dery for providing a condensed test image to consistently
reproduce the problem on demand, and Martin Husemann for allowing me access to
real hardware for comparison.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-By: Artyom Tarasenko <atar4qemu@gmail.com>
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
2016-04-11 16:25:07 +01:00
Peter Maydell
dc1ffa6661 Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20160411' into staging
target-arm queue:
 * stellaris_enet: don't overrun buffer if fed oversize packet

# gpg: Signature made Mon 11 Apr 2016 14:36:27 BST using RSA key ID 14360CDE
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>"
# gpg:                 aka "Peter Maydell <pmaydell@gmail.com>"
# gpg:                 aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>"

* remotes/pmaydell/tags/pull-target-arm-20160411:
  net: stellaris_enet: check packet length against receive buffer

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-11 14:37:53 +01:00
Prasad J Pandit
3a15cc0e1e net: stellaris_enet: check packet length against receive buffer
When receiving packets over Stellaris ethernet controller, it
uses receive buffer of size 2048 bytes. In case the controller
accepts large(MTU) packets, it could lead to memory corruption.
Add check to avoid it.

Reported-by: Oleksandr Bazhaniuk <oleksandr.bazhaniuk@intel.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-id: 1460095428-22698-1-git-send-email-ppandit@redhat.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-11 14:22:33 +01:00
Peter Maydell
5144fe3605 Merge remote-tracking branch 'remotes/kraxel/tags/pull-vga-20160411-1' into staging
virtio-gpu: pixman surface fix, block live migration

# gpg: Signature made Mon 11 Apr 2016 11:45:18 BST using RSA key ID D3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"

* remotes/kraxel/tags/pull-vga-20160411-1:
  virtio-gpu: block live migration
  ui/virtio-gpu: add and use qemu_create_displaysurface_pixman

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-11 13:32:50 +01:00
Gerd Hoffmann
fa49e4656a virtio-gpu: block live migration
Feeling a bit nervous putting the full live migration support
patch (https://patchwork.ozlabs.org/patch/606902/) in that
late in the 2.6 devel cycle as it carries some non-trivial
changes.  So disable migration in case virtio-gpu is present
for now.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-04-11 12:36:34 +02:00
Gerd Hoffmann
ca58b45fbe ui/virtio-gpu: add and use qemu_create_displaysurface_pixman
Add a the new qemu_create_displaysurface_pixman function, to create
a DisplaySurface backed by an existing pixman image.  In that case
there is no need to create a new pixman image pointing to the same
backing storage.  We can just use the existing image directly.

This does not only simplify things a bit, but most importantly it
gets the reference counting right, so the backing storage for the
pixman image wouldn't be released underneath us.

Use new function in virtio-gpu, where using it actually fixes
use-after-free crashes.

Cc: qemu-stable@nongnu.org
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 1459499240-742-1-git-send-email-kraxel@redhat.com
2016-04-11 12:32:01 +02:00
Peter Maydell
9628af036f Merge remote-tracking branch 'remotes/lalrae/tags/mips-20160408' into staging
MIPS patches 2016-04-08

Changes:
* fix off-by-one error in ITU

# gpg: Signature made Fri 08 Apr 2016 10:43:16 BST using RSA key ID 0B29DA6B
# gpg: Good signature from "Leon Alrae <leon.alrae@imgtec.com>"

* remotes/lalrae/tags/mips-20160408:
  hw/mips_itu: fix off-by-one reported by Coverity

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-08 13:45:52 +01:00
Peter Maydell
8227e2d167 Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
pci, virtio, acpi: fixes for 2.6

Fixes all over the place. Most notably, fixes migration
for systems with pci express bridges, and random crashes
observed with virtio blk and scsi dataplane.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# gpg: Signature made Fri 08 Apr 2016 08:53:46 BST using RSA key ID D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>"

* remotes/mst/tags/for_upstream:
  hw/pci-bridge: Add missing unref in case register-bus fails
  virtio: merge virtio_queue_aio_set_host_notifier_handler with virtio_queue_set_aio
  virtio-scsi: use aio handler for data plane
  virtio-blk: use aio handler for data plane
  virtio: add aio handler
  virtio-scsi: fix disabled mode
  virtio-blk: fix disabled mode
  virtio: make virtio_queue_notify_vq static
  tests/bios-tables-test: fix assert
  virtio-balloon: reset the statistic timer to load device
  Migration: Add i82801b11 migration data
  Sort the fw_cfg file list
  xen: piix reuse pci generic class init function
  pci-testdev: fast mmio support
  acpi: Add missing GCC_FMT_ATTR

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-08 12:45:53 +01:00
Peter Maydell
3be4f4d724 Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.6-20160408' into staging
ppc patch queue for 2016-04-08

Just a single bugfix for spapr in this batch, but I want to make sure
it gets in for 2.6.

# gpg: Signature made Fri 08 Apr 2016 06:02:45 BST using RSA key ID 20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-2.6-20160408:
  spapr: Fix ibm,lrdr-capacity

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-08 11:54:19 +01:00
Peter Maydell
24790aefe0 Merge remote-tracking branch 'remotes/xtensa/tags/20160408-xtensa' into staging
Xtensa-related fixes:

- fix networking on xtfpga platform in linux v4.5 by indicating
  autonegotiation completion in opencores_eth MII BMSR.

# gpg: Signature made Thu 07 Apr 2016 23:33:59 BST using RSA key ID F83FA044
# gpg: Good signature from "Max Filippov <max.filippov@cogentembedded.com>"
# gpg:                 aka "Max Filippov <jcmvbkbc@gmail.com>"

* remotes/xtensa/tags/20160408-xtensa:
  opencores_eth: indicate autonegotiation completion

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-08 11:28:49 +01:00
Peter Maydell
5542417dae Merge remote-tracking branch 'remotes/weil/tags/pull-tci-20160407' into staging
tci patch queue

# gpg: Signature made Thu 07 Apr 2016 18:01:55 BST using RSA key ID 677450AD
# gpg: Good signature from "Stefan Weil <sw@weilnetz.de>"
# gpg:                 aka "Stefan Weil <stefan.weil@weilnetz.de>"
# gpg:                 aka "Stefan Weil <stefan.weil@bib.uni-mannheim.de>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 4923 6FEA 75C9 5D69 8EC2  B78A E08C 21D5 6774 50AD

* remotes/weil/tags/pull-tci-20160407:
  tci: Fix build regression

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-08 10:51:45 +01:00
Peter Maydell
28ee01269e Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
* NBD fixes from Alex and Eric
* Debug code bitrot from Emilio
* HPET fix from Bill
* ps2kbd fix from Hervé
* PKU fix from myself
* Coverity fixes from Gonglei
* More memory.txt update from Jiangang
* .gitignore maintenance from Changlong

# gpg: Signature made Thu 07 Apr 2016 23:08:12 BST using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"

* remotes/bonzini/tags/for-upstream:
  target-i386: check for PKU even for non-writable pages
  tests: ignore test-logging
  translate-all: add missing fold of tb_ctx into tcg_ctx
  hostmem-file: fix memory leak
  spapr: fix possible Negative array index read
  nbd: do not hang nbd_wr_syncv if outside a coroutine and no available data
  nbd: Don't kill server when client requests unknown option
  nbd: Fix NBD unsupported options
  qemu-nbd: Document -x option
  nbd: Improve debug traces on little-endian
  nbd: Avoid bitrot in TRACE() usage
  nbd: Return correct error for write to read-only export
  docs: fix typo in memory.txt
  hw/timer: Revert "hpet: inverse polarity when pin above ISA_NUM_IRQS"
  ps2kbd: default to scancode_set 2, as with KBD_CMD_RESET

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-08 10:25:22 +01:00
Leon Alrae
f2eb665a11 hw/mips_itu: fix off-by-one reported by Coverity
Fix off-by-one error in ITC Tag read.

Remove the switch as we just want to check if index is in valid range
rather than test against list of values.

Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
2016-04-08 09:19:26 +01:00
Bharata B Rao
a110655a06 spapr: Fix ibm,lrdr-capacity
ibm,lrdr-capacity has a field to describe the maximum address in bytes
and therefore, the most memory that can be allocated to this guest. We
are using maxmem for this field, but instead should use the actual RAM
address corresponding to the end of hotplug region.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-04-08 11:18:10 +10:00
Paolo Bonzini
44d066a2f7 target-i386: check for PKU even for non-writable pages
Xiao Guangrong ran kvm-unit-tests on an actual machine with PKU and
found that it fails:

test pte.p pte.user pde.p pde.user pde.a pde.pse pkru.wd pkey=1 user write efer.nx cr4.pke: FAIL: error code 27 expected 7
Dump mapping: address: 0x123400000000
------L4: 2ebe007
------L3: 2ebf007
------L2: 8000000020000a5

(All failures are combinations of "pde.user pde.p pkru.wd pkey=1",
plus either "pde.pse" or "pte.p pte.user", plus one of "user cr0.wp",
"cr0.wp" or "user", plus unimportant bits such as accessed/dirty or
efer.nx).

So PFEC.PKEY is set even if the ordinary check failed (which it did
because pde.w is zero).  Adjust QEMU to match behavior of silicon.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-04-08 00:07:56 +02:00
Changlong Xie
57a6c059a6 tests: ignore test-logging
Commit 3514552e added a new test, but did not mark it for
exclusion in .gitignore.

Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1459903756-30672-1-git-send-email-xiecl.fnst@cn.fujitsu.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-04-08 00:07:56 +02:00
Emilio G. Cota
7e6bd36d61 translate-all: add missing fold of tb_ctx into tcg_ctx
Since 5e5f07e08 "TCG: Move translation block variables
to new context inside tcg_ctx: tb_ctx" on Feb 1 2013, compilation
of usermode + TB_DEBUG_CHECK has been broken. Fix it.

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1459834253-8291-2-git-send-email-cota@braap.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-04-08 00:07:56 +02:00
Gonglei
696b55017d hostmem-file: fix memory leak
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Message-Id: <1456998223-12356-5-git-send-email-arei.gonglei@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-04-08 00:07:56 +02:00
Gonglei
1a5512bb7e spapr: fix possible Negative array index read
fix CID 1351391.

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Message-Id: <1456998223-12356-6-git-send-email-arei.gonglei@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-04-08 00:07:56 +02:00
Paolo Bonzini
dacca04c8d nbd: do not hang nbd_wr_syncv if outside a coroutine and no available data
Until commit 1c778ef7 ("nbd: convert to using I/O channels for actual
socket I/O", 2016-02-16), nbd_wr_sync returned -EAGAIN this scenario.
nbd_reply_ready required these semantics because it has two conflicting
requirements:

1) if a reply can be received on the socket, nbd_reply_ready needs
to read the header outside coroutine context to identify _which_
coroutine to enter to process the rest of the reply

2) on the other hand, nbd_reply_ready can find a false positive if
another thread (e.g. a VCPU thread running aio_poll) sneaks in and
calls nbd_reply_ready too.  In this case nbd_reply_ready does nothing
and expects nbd_wr_syncv to return -EAGAIN.

Currently, the solution to the first requirement is to wait in the very
rare case of a read() that doesn't retrieve the reply header in its
entirety; this is what nbd_wr_syncv does by calling qio_channel_wait().
However, the unconditional call to qio_channel_wait() breaks the second
requirement.  To fix this, the patch makes nbd_wr_syncv return -EAGAIN
if done is zero, similar to the code before commit 1c778ef7.

This is okay because NBD client-side negotiation is the only other case
that calls nbd_wr_syncv outside a coroutine, and it places the socket
in blocking mode.  On the other hand, it is a bit unpleasant to put
this in nbd_wr_syncv(), because the function is used by both client
and server.

The full fix would be to add a counter to NbdClientSession for how
many bytes have been filled in s->reply.  Then a reply can be filled
by multiple separate invocations of nbd_reply_ready and the
qio_channel_wait() call can be removed completely.  Something to
consider for 2.7...

Reported-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-04-08 00:07:44 +02:00
Eric Blake
156f6a10c2 nbd: Don't kill server when client requests unknown option
nbd-server.c currently fails to handle unsupported options properly.
If during option haggling the client sends an unknown request, the
server kills the connection instead of letting the client try to
fall back to something older.  This is precisely what advertising
NBD_FLAG_FIXED_NEWSTYLE was supposed to fix.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1459982918-32229-1-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-04-08 00:07:44 +02:00
Alex Bligh
6ff5816478 nbd: Fix NBD unsupported options
nbd-client.c currently fails to handle unsupported options properly.
If during option haggling the server finds an option that is
unsupported, it returns an NBD_REP_ERR_UNSUP reply.

According to nbd's proto.md, the format for such a reply
should be:

  S: 64 bits, 0x3e889045565a9 (magic number for replies)
  S: 32 bits, the option as sent by the client to which this is a reply
  S: 32 bits, reply type (e.g., NBD_REP_ACK for successful completion,
     or NBD_REP_ERR_UNSUP to mark use of an option not known by this server
  S: 32 bits, length of the reply. This may be zero for some replies,
     in which case the next field is not sent
  S: any data as required by the reply (e.g., an export name in the case
     of NBD_REP_SERVER, or optional UTF-8 message for NBD_REP_ERR_*)

However, in nbd-client.c, the reply type was being read, and if it
contained an error, it was bailing out and issuing the next option
request without first reading the length. This meant that the
next option / handshake read had an extra 4 or more bytes of data in it.
In practice, this makes Qemu incompatible with servers that do not
support NBD_OPT_LIST.

To verify this isn't an error in the specification or my reading of
it, replies are sent by the reference implementation here:
 https://github.com/yoe/nbd/blob/66dfb35/nbd-server.c#L1232
and as is evident it always sends a 'datasize' (aka length) 32 bit
word. Unsupported elements are replied to here:
 https://github.com/yoe/nbd/blob/66dfb35/nbd-server.c#L1371

Signed-off-by: Alex Bligh <alex@alex.org.uk>
Message-Id: <1459882500-24316-1-git-send-email-alex@alex.org.uk>
[rework to ALWAYS consume an optional UTF-8 message from the server]
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1459961962-18771-1-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-04-08 00:07:44 +02:00
Eric Blake
332a254b66 qemu-nbd: Document -x option
Commit 3d4b2f9c added -x to force qemu-nbd to use new-style
negotiation, but while it documented it in the man page, it
omitted docs in the --help output.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1459908128-11925-1-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-04-08 00:07:44 +02:00
Eric Blake
7548fe3116 nbd: Improve debug traces on little-endian
Print debug tracing messages while data is still in native
ordering, rather than after we've potentially swapped it into
network order for transmission.  Also, it's nice if the server
mentions what it is replying, to correlate it to with what the
client says it is receiving.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1459913704-19949-4-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-04-08 00:07:44 +02:00
Eric Blake
8c6597123a nbd: Avoid bitrot in TRACE() usage
The compiler is smart enough to optimize out 'if (0)', but won't
type-check our printfs if they are hidden behind #if.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1459913704-19949-3-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-04-08 00:07:43 +02:00
Eric Blake
c0301fcc81 nbd: Return correct error for write to read-only export
The NBD Protocol requires that servers should send EPERM for
attempts to write (or trim) a read-only export.  We were
correct for TRIM (blk_co_discard() gave EPERM); but were
manually setting EROFS which then got mapped to EINVAL over
the wire on writes.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1459913704-19949-2-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-04-08 00:07:43 +02:00
Wei Jiangang
b3f3fdeb95 docs: fix typo in memory.txt
The space between 7000 and 8000 is too wide by 1 character.
Also correct the range of vga-window example 0xa0000-0xbffff.

Signed-off-by: Wei Jiangang <weijg.fnst@cn.fujitsu.com>
Message-Id: <1458639954-9980-1-git-send-email-weijg.fnst@cn.fujitsu.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-04-08 00:07:43 +02:00
Bill Paul
ecba19935a hw/timer: Revert "hpet: inverse polarity when pin above ISA_NUM_IRQS"
This reverts commit 0d63b2dd31.

This change was originally intended to correct the HPET behavior
in conjunction with Linux, however the behavior that it actually creates
is not compatible with the ioapic.c implementation; it used to be
compatible with KVM's own IOAPIC but it is not anymore.

Signed-off-by: Bill Paul <wpaul@windriver.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Richard Henderson <rth@twiddle.net>
CC: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <201604051558.20070.wpaul@windriver.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-04-08 00:07:43 +02:00
Hervé Poussineau
089adafdc6 ps2kbd: default to scancode_set 2, as with KBD_CMD_RESET
This line has been added in commit ef74679a81 with
other initializations. However, scancode set 0 doesn't exist (only 1, 2, 3).
This works well as long as operating system is resetting keyboard, or overwriting
the current scancode set with the one it wants.

This fixes IBM 40p firmware, which doesn't bother sending KBD_CMD_RESET or KBD_CMD_SCANCODE.

Signed-off-by: Hervé Poussineau <hpoussin@reactos.org>
Message-Id: <1458714100-28885-1-git-send-email-hpoussin@reactos.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-04-08 00:07:36 +02:00
Peter Maydell
ead5268f21 Merge remote-tracking branch 'remotes/mdroth/tags/qga-pull-2016-04-07-tag' into staging
qemu-ga patch queue for 2.6

* fix w32 bug where output from guest-exec is not properly captured
* fix w32 bug where FDs are leaked after guest-exec is invoked

# gpg: Signature made Thu 07 Apr 2016 17:46:21 BST using RSA key ID F108B584
# gpg: Good signature from "Michael Roth <flukshun@gmail.com>"
# gpg:                 aka "Michael Roth <mdroth@utexas.edu>"
# gpg:                 aka "Michael Roth <mdroth@linux.vnet.ibm.com>"

* remotes/mdroth/tags/qga-pull-2016-04-07-tag:
  qga: Workaround for console redirection from non-interactive qemu-ga service
  qga: fix fd leak with guest-exec i/o channels

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-07 18:06:14 +01:00
Stefan Weil
3ccdbecf80 tci: Fix build regression
Commit d38ea87ac5 cleaned the include
statements which resulted in a wrong order of assert.h and the definition
of NDEBUG in tci.c. Normally NDEBUG modifies the definition of the assert
macro, but here this definition comes too late which results in a failing
build.

To fix this, a new macro tci_assert which depends on CONFIG_DEBUG_TCG
is introduced. Only builds with CONFIG_DEBUG_TCG will use assertions.
Even in this case, it is still possible to disable assertions by
defining NDEBUG via compiler settings.

Tested-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-04-07 19:01:21 +02:00
Wei Jiangang
2e4278b534 hw/pci-bridge: Add missing unref in case register-bus fails
The error paths after a successful qdev_create/pci_bus_new
should contain a object_unref/object_unparent.
pxb_dev_init_common() did not yet, so add it.

Signed-off-by: Wei Jiangang <weijg.fnst@cn.fujitsu.com>
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
2016-04-07 19:57:33 +03:00
Paolo Bonzini
a378b49a43 virtio: merge virtio_queue_aio_set_host_notifier_handler with virtio_queue_set_aio
Eliminating the reentrancy is actually a nice thing that we can do
with the API that Michael proposed, so let's make it first class.
This also hides the complex assign/set_handler conventions from
callers of virtio_queue_aio_set_host_notifier_handler, which in
fact was always called with assign=true.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-04-07 19:57:33 +03:00
Paolo Bonzini
a8f2e5c8ff virtio-scsi: use aio handler for data plane
In addition to handling IO in vcpu thread and in io thread, dataplane
introduces yet another mode: handling it by AioContext.

This reuses the same handler as previous modes, which triggers races as
these were not designed to be reentrant.  Use a separate handler just
for aio, and disable regular handlers when dataplane is active.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-04-07 19:57:33 +03:00
Michael S. Tsirkin
8a2fad57eb virtio-blk: use aio handler for data plane
In addition to handling IO in vcpu thread and in io thread, dataplane
introduces yet another mode: handling it by AioContext.

This reuses the same handler as previous modes, which triggers races as
these were not designed to be reentrant.  Use a separate handler just
for aio, and disable regular handlers when dataplane is active.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-04-07 19:57:33 +03:00
Michael S. Tsirkin
344dc16fae virtio: add aio handler
In addition to handling IO in vcpu thread and in io thread, blk dataplane
introduces yet another mode: handling it by AioContext.

Currently, this reuses the same handler as previous modes,
which triggers races as these were not designed to be reentrant.
Add instead a separate handler just for aio; this will make
it possible to disable regular handlers when dataplane is active.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-04-07 19:57:33 +03:00
Paolo Bonzini
43c696a298 virtio-scsi: fix disabled mode
Add two missing checks for s->dataplane_fenced.  In one case, QEMU
would skip injecting an IRQ due to a write to an uninitialized
EventNotifier's file descriptor.

In the second case, the dataplane_disabled field was used by mistake;
in fact after fixing this occurrence it is completely unused.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-04-07 19:57:33 +03:00
Paolo Bonzini
eb41cf78fc virtio-blk: fix disabled mode
We must not call virtio_blk_data_plane_notify if dataplane is
disabled: we would hit a segmentation fault in notify_guest_bh as
s->guest_notifier has not been setup and is NULL.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-04-07 19:57:33 +03:00
Paolo Bonzini
2b2cbcadc1 virtio: make virtio_queue_notify_vq static
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-04-07 19:57:33 +03:00
Marcel Apfelbaum
a3973f551d tests/bios-tables-test: fix assert
Newer iasl does not add the aml file name to the Definition Block.
See acpica tools commit  1ecbb3d5:
  "Emit the AMLFilename as a zero-length string. Allows the compiler to create
   the name later -- making it easier to rename the parent ASL (DSL) file."

That causes an assert in acpi tests:
   tests/bios-tables-test.c:455:normalize_asl: assertion failed: (block_name)

Fix it by striping the start of the definition block line until the first comma.
The block name is always the first parameter and
the grammar does not allow comma in between, so it is safe.

Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-04-07 19:57:33 +03:00
Pavel Butsykin
fecb48f744 virtio-balloon: reset the statistic timer to load device
If before loading snapshot we had set the timer of statistics, then after
applying snapshot the expiry time would be irrelevant for the restored
state of the virtual clocks. A simple fix is just to restart the timer
after loading snapshot.

For the user it may look like a long delay of statistics update after switch
to the snapshot.

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-04-07 19:57:33 +03:00
Dr. David Alan Gilbert
3d100d0fa9 Migration: Add i82801b11 migration data
The i82801b11 bridge didn't have a vmsd and thus didn't send
any migration data, including that of its parent PCIBridge object.
The symptom being if the guest used any devices behind the bridge
the guest crashed (mostly with various interrupt related issues).

Note: This will cause migration from old qemus that used this device to
explicitly fail during migration as opposed to the guest crashing.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Suggested-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-04-07 19:57:33 +03:00
Gerd Hoffmann
bab47d9a75 Sort the fw_cfg file list
Entries are inserted in filename order instead of being
appended to the end in case sorting is enabled.

This will avoid any future issues of moving the file creation
around, it doesn't matter what order they are created now,
the will always be in filename order.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

Added machine type handling for compatibility.  This was
a fairly complex change, this will preserve the order of fw_cfg
for older versions no matter what order the firmware files
actually come in.  A list is kept of the correct legacy order
and the entries will be inserted based upon their order in
the list.  Except that some entries are ordered (in a specific
area of the list) based upon what order they appear on the
command line.  Special handling is added for those entries.

Signed-off-by: Corey Minyard <cminyard@mvista.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-04-07 19:57:33 +03:00
Michael S. Tsirkin
0f8445820f xen: piix reuse pci generic class init function
piix3_ide_xen_class_init is identical to piix3_ide_class_init
except it's buggy as it does not set exit and does not disable
hotplug properly.

Switch to the generic one.

Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-04-07 19:57:33 +03:00
Michael S. Tsirkin
45aa4e8e39 pci-testdev: fast mmio support
Teach PCI testdev to use fast MMIO when kvm makes it available.

Before:
    mmio-wildcard-eventfd:pci-mem 2271
After:
    mmio-wildcard-eventfd:pci-mem 1218

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-04-07 19:57:33 +03:00
Stefan Weil
8d0ac88e23 acpi: Add missing GCC_FMT_ATTR
This fixes a compiler warning when compiling with -Wextra.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-04-07 19:57:33 +03:00
Yuri Pudgorodskiy
27559c214d qga: Workaround for console redirection from non-interactive qemu-ga service
mingw-glib uses helper process to assist gspawn() api. There are two
versions of helpers, one with main() and another with WinMain() startup
routines.

Whenever gspawn() detects consoleless environment (and qemu-ga is running
in such environment as Win32 service), it chooses helper with main()
instead of WinMain. It is done by name, e.g.
gspawn-win32-helper-console.exe vs gspawn-win32-helper.exe

Running console-aware application like any win32 console apps from main()
crt initalized process results in redirection of stdout to console created
in crt startup instead of parent-provided handle connected to subprocess
pipe. Thus, stdout/stderr redirection do not work correctly.

The patch makes WinMain()'s version of helper be used as the only helper
shipped with qemu-ga package. Using only win32 helper ensures console
is created before any redirection and fixes stdout/stderr redirection
issue.

Signed-off-by: Yuri Pudgorodskiy <yur@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2016-04-07 11:43:54 -05:00
Yuriy Pudgorodskiy
3005c2c2fa qga: fix fd leak with guest-exec i/o channels
Signed-off-by: Yuriy Pudgorodskiy <yur@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Michael Roth <mdroth@linux.vnet.ibm.com>
* squashed in g_io_channel_shutdown() to match cleanup paths for
  input/output
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2016-04-07 11:40:19 -05:00
Peter Maydell
e380023898 Merge remote-tracking branch 'remotes/thibault/tags/samuel-thibault' into staging
slirp updates

# gpg: Signature made Thu 07 Apr 2016 12:02:23 BST using RSA key ID FB6B2F1D
# gpg: Good signature from "Samuel Thibault <samuel.thibault@gnu.org>"
# gpg:                 aka "Samuel Thibault <sthibault@debian.org>"
# gpg:                 aka "Samuel Thibault <samuel.thibault@inria.fr>"
# gpg:                 aka "Samuel Thibault <samuel.thibault@labri.fr>"
# gpg:                 aka "Samuel Thibault <samuel.thibault@ens-lyon.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 900C B024 B679 31D4 0F82  304B D017 8C76 7D06 9EE6
#      Subkey fingerprint: F632 74CD C630 0873 CB3D  29D9 E3E5 1CE8 FB6B 2F1D

* remotes/thibault/tags/samuel-thibault:
  slirp: handle deferred ECONNREFUSED on non-blocking TCP sockets
  slirp: Propagate host TCP RST to the guest.
  slirp: avoid use-after-free in slirp_pollfds_poll() if soread() returns an error
  slirp: don't crash when tcp_sockclosed() is called with a NULL tp

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-07 12:15:33 +01:00
Steven Luo
6625d83a6e slirp: handle deferred ECONNREFUSED on non-blocking TCP sockets
slirp currently only handles ECONNREFUSED in the case where connect()
returns immediately with that error; since we use non-blocking sockets,
most of the time we won't receive the error until we later try to read
from the socket.  Ensure that we deliver the appropriate RST to the
guest in this case.

Signed-off-by: Steven Luo <steven+qemu@steven676.net>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
2016-04-07 13:02:05 +02:00
Edgar E. Iglesias
27d92ebc5e slirp: Propagate host TCP RST to the guest.
When the host aborts (RST) its side of a TCP connection we need to
propagate that RST to the guest. The current code can leave such guest
connections dangling forever. Spotted by Jason Wessel.

Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
[steven@steven676.net: coding style adjustments]
Signed-off-by: Steven Luo <steven+qemu@steven676.net>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
2016-04-07 13:01:45 +02:00
Peter Maydell
0f9d6bd210 Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging
# gpg: Signature made Wed 06 Apr 2016 03:21:19 BST using RSA key ID 398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F  3562 EF04 965B 398D 6211

* remotes/jasowang/tags/net-pull-request:
  filter-buffer: fix segfault when starting qemu with status=off property
  rtl8139: using CP_TX_OWN for ownership transferring during tx
  net: fix OptsVisitor memory leak
  net: Allocating Large sized arrays to heap
  util: Improved qemu_hexmap() to include an ascii dump of the buffer

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-07 10:14:41 +01:00
Steven Luo
bfb1ac1402 slirp: avoid use-after-free in slirp_pollfds_poll() if soread() returns an error
Samuel Thibault pointed out that it's possible that slirp_pollfds_poll()
will try to use a socket even after soread() returns an error, resulting
in an use-after-free if the socket was removed while handling the error.
Avoid this by refusing to continue to work with the socket in this case.

Signed-off-by: Steven Luo <steven+qemu@steven676.net>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
2016-04-07 10:27:42 +02:00
Steven Luo
b5ab677189 slirp: don't crash when tcp_sockclosed() is called with a NULL tp
Signed-off-by: Steven Luo <steven+qemu@steven676.net>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
2016-04-07 10:27:22 +02:00
zhanghailiang
e0a039e50d filter-buffer: fix segfault when starting qemu with status=off property
After commit 338d3f, we support 'status' property for filter object.
The segfault can be triggered by starting qemu with 'status=off' property
for filter, when the s->incoming_queue is NULL, we reference it directly
in qemu_net_queue_flush() which was called in status_changed() callback
function.

We shouldn't trigger status_changed() before the filter was initialized,
We can check the value of 'nf->netdev' to confirm if the filter is
initialized or not, so let's check its value before calling
status_changed().

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-04-06 09:52:07 +08:00
Jason Wang
91731d5f6d rtl8139: using CP_TX_OWN for ownership transferring during tx
Through CP_TX_OWN and CP_RX_OWN points to the same bit, we'd better use
CP_TX_OWN for tx descriptor handling.

Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-04-06 09:52:07 +08:00
Paolo Bonzini
044d65525f net: fix OptsVisitor memory leak
Fixes 96a1616("qapi-dealloc: Reduce use outside of generated code")
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-04-06 09:52:07 +08:00
Pooja Dhannawat
74044c8ffc net: Allocating Large sized arrays to heap
nc_sendv_compat has a huge stack usage of 69680 bytes approx.
Moving large arrays to heap to reduce stack usage.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Pooja Dhannawat <dhannawatpooja1@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-04-06 09:52:07 +08:00
Isaac Lozano
a1555559ab util: Improved qemu_hexmap() to include an ascii dump of the buffer
qemu_hexdump() in util/hexdump.c has been changed to give also include a
ascii dump of the buffer. Also, calls to hex_dump() in net/net.c have
been replaced with calls to qemu_hexdump(). This takes care of two misc
BiteSized Tasks.

Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Isaac Lozano <109lozanoi@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-04-06 09:52:07 +08:00
Peter Maydell
7acbff99c6 Update version for v2.6.0-rc1 release
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-05 21:53:18 +01:00
Peter Maydell
627b4e23cc Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20160405' into staging
tcg/mips compilation fix

# gpg: Signature made Tue 05 Apr 2016 20:48:38 BST using RSA key ID 4DD0279B
# gpg: Good signature from "Richard Henderson <rth7680@gmail.com>"
# gpg:                 aka "Richard Henderson <rth@redhat.com>"
# gpg:                 aka "Richard Henderson <rth@twiddle.net>"

* remotes/rth/tags/pull-tcg-20160405:
  tcg/mips: Fix type of tcg_target_reg_alloc_order[]

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-05 21:24:49 +01:00
James Hogan
2dc7553d0c tcg/mips: Fix type of tcg_target_reg_alloc_order[]
The MIPS TCG backend is the only one to have
tcg_target_reg_alloc_order[] elements of type TCGReg rather than int.
This resulted in commit 91478cefaa ("tcg: Allocate indirect_base
temporaries in a different order") breaking the build on MIPS since the
type differed from indirect_reg_alloc_order[]:

tcg/tcg.c:1725:44: error: pointer type mismatch in conditional expression [-Werror]
     order = rev ? indirect_reg_alloc_order : tcg_target_reg_alloc_order;
                                            ^

Make it an array of ints to fix the build and match other architectures.

Fixes: 91478cefaa ("tcg: Allocate indirect_base temporaries in a different order")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Aurelien Jarno <aurelien@aurel32.net>
Message-Id: <1459522179-6584-1-git-send-email-james.hogan@imgtec.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-04-05 12:47:47 -07:00
Ed Maste
43b0ea1a41 bsd-user: Suppress gcc 4.x -Wpointer-sign (included in -Wall) warning
This is the same change as b55266b5 in linux-user.

Signed-off-by: Ed Maste <emaste@freebsd.org>
Message-id: 1459867593-72017-1-git-send-email-emaste@freebsd.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-05 17:49:41 +01:00
Ed Maste
abd4556a17 bsd-user: add qemu/cutils.h include after f348b6d
Signed-off-by: Ed Maste <emaste@freebsd.org>
Message-id: 1459864881-71319-1-git-send-email-emaste@freebsd.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-05 17:49:35 +01:00
Peter Maydell
31370dbe5d Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches for 2.6

# gpg: Signature made Tue 05 Apr 2016 16:32:25 BST using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"

* remotes/kevin/tags/for-upstream:
  crypto: Avoid memory leak on failure
  qemu-iotests: 149: Use "/usr/bin/env python"
  block: Forbid I/O throttling on nodes with multiple parents for 2.6
  block: forbid x-blockdev-del from acting on DriveInfo

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-05 17:03:32 +01:00
Kevin Wolf
6a5c357fdb Merge remote-tracking branch 'mreitz/tags/pull-block-for-kevin-2016-04-05' into queue-block
Block patches for the 2.6 release

# gpg: Signature made Tue Apr  5 17:23:48 2016 CEST using RSA key ID E838ACAD
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>"

* mreitz/tags/pull-block-for-kevin-2016-04-05:
  crypto: Avoid memory leak on failure
  qemu-iotests: 149: Use "/usr/bin/env python"

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-05 17:31:20 +02:00
Eric Blake
95c3df5a24 crypto: Avoid memory leak on failure
Commit 7836857 introduced a memory leak due to invalid use of
Error vs. visit_type_end().  If visiting the intermediate
members fails, we clear the error and unconditionally use
visit_end_struct() on the same error object; but if that
cleanup succeeds, we then skip the qapi_free call.

Until a later patch adds visit_check_struct(), the only safe
approach is to use two separate error objects.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 1459526222-30052-1-git-send-email-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-04-05 17:23:21 +02:00
Fam Zheng
08db36f6ec qemu-iotests: 149: Use "/usr/bin/env python"
Do the same as other scripts, to pick the correct interpreter between
python2 and python3 from the environment.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1459504593-2692-1-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-04-05 17:23:21 +02:00
Peter Maydell
a226f76536 Merge remote-tracking branch 'remotes/berrange/tags/pull-qcrypto-2016-04-05-1' into staging
Merge QCrypto fixes 2016/04/05 v1

# gpg: Signature made Tue 05 Apr 2016 10:53:59 BST using RSA key ID 15104FDF
# gpg: Good signature from "Daniel P. Berrange <dan@berrange.com>"
# gpg:                 aka "Daniel P. Berrange <berrange@redhat.com>"

* remotes/berrange/tags/pull-qcrypto-2016-04-05-1:
  crypto: fix nettle config check for running pbkdf test
  crypto: fix typo in docs for secret object type

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-05 11:53:53 +01:00
Peter Maydell
cc621a9838 Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
* FreeBSD build fixes (atomics, qapi/error.h)
* x86 KVM fixes (SynIC, KVM_GET/SET_MSRS)
* Memory API doc fix
* checkpatch fix
* Chardev and socket fixes
* NBD fixes
* exec.c SEGV fix

# gpg: Signature made Tue 05 Apr 2016 10:47:49 BST using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"

* remotes/bonzini/tags/for-upstream:
  net: fix missing include of qapi/error.h in netmap.c
  nbd: Fix poor debug message
  include/qemu/atomic: add compile time asserts
  cpus: don't use atomic_read for vm_clock_warp_start
  nbd: don't request FUA on FLUSH
  doc/memory: update MMIO section
  char: ensure all clients are in non-blocking mode
  char: fix broken EAGAIN retry on OS-X due to errno clobbering
  util: retry getaddrinfo if getting EAI_BADFLAGS with AI_V4MAPPED
  checkpatch: add target_ulong to typelist
  target-i386: assert that KVM_GET/SET_MSRS can set all requested MSRs
  target-i386: do not pass MSR_TSC_AUX to KVM ioctls if CPUID bit is not set
  memory: fix segv on qemu_ram_free(block=0x0)
  target-i386/kvm: Hyper-V VMBus hypercalls blank handlers
  update Linux headers to 4.6

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-05 11:03:18 +01:00
Daniel P. Berrange
c44e92a415 crypto: fix nettle config check for running pbkdf test
The pbkdf test is being built based on a check for CONFIG_NETTLE.
As of fff2f982ab, it should be
instead checking CONFIG_NETTLE_KDF

Reported-by: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Tested-by: Bruce Rogers <brogers@suse.com>
Tested-by: Ed Maste <emaste@freebsd.org>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-04-05 10:52:57 +01:00
Daniel P. Berrange
69c0b278af crypto: fix typo in docs for secret object type
The docs for the secret object type specified the wrong number
of bytes for the AES initialization vector.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-04-05 10:52:33 +01:00
Daniel P. Berrange
2354bebaa4 net: fix missing include of qapi/error.h in netmap.c
The netmap.c file fails to build on FreeBSD with

net/netmap.c:95:9: warning: implicit declaration of function 'error_setg_errno' is invalid in C99 [-Wimplicit-function-declaration]
     error_setg_errno(errp, errno, "Failed to nm_open() %s",
     ^
net/netmap.c:432:9: warning: implicit declaration of function 'error_propagate' is invalid in C99 [-Wimplicit-function-declaration]
     error_propagate(errp, err);
     ^

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1459429690-6144-1-git-send-email-berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-04-05 11:46:52 +02:00
Eric Blake
b6afc654ae nbd: Fix poor debug message
The client sends messages to the server, not itself.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1459459222-8637-3-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-04-05 11:46:52 +02:00
Alex Bennée
ca47a926ad include/qemu/atomic: add compile time asserts
To be safely portable no atomic access should be trying to do more than
the natural word width of the host. The most common abuse is trying to
atomically access 64 bit values on a 32 bit host.

This patch adds some QEMU_BUILD_BUG_ON to the __atomic instrinsic paths
to create a build failure if (sizeof(*ptr) > sizeof(void *)).

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <1459780549-12942-3-git-send-email-alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-04-05 11:46:52 +02:00
Alex Bennée
ccffff48c9 cpus: don't use atomic_read for vm_clock_warp_start
As vm_clock_warp_start is a 64 bit value this causes problems for the
compiler trying to come up with a suitable atomic operation on 32 bit
hosts. Because the variable is protected by vm_clock_seqlock, we check its
value inside a seqlock critical section.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <1459780549-12942-2-git-send-email-alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-04-05 11:46:52 +02:00
Eric Blake
a89ef0c357 nbd: don't request FUA on FLUSH
The NBD protocol does not clearly document what will happen
if a client sends NBD_CMD_FLAG_FUA on NBD_CMD_FLUSH.
Historically, both the qemu and upstream NBD servers silently
ignored that flag, but that feels a bit risky.  Meanwhile, the
qemu NBD client unconditionally sends the flag (without even
bothering to check whether the caller cares; at least with
NBD_CMD_WRITE the client only sends FUA if requested by a
higher layer).

There is ongoing discussion on the NBD list to fix the
protocol documentation to require that the server MUST ignore
the flag (unless the kernel folks can better explain what FUA
means for a flush), but until those doc improvements land, the
current nbd.git master was recently changed to reject the flag
with EINVAL (see nbd commit ab22e082), which now makes it
impossible for a qemu client to use FLUSH with an upstream NBD
server.

We should not send FUA with flush unless the upstream protocol
documents what it will do, and even then, it should be something
that the caller can opt into, rather than being unconditional.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1459526902-32561-1-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-04-05 11:46:52 +02:00
Cao jin
0c52a80eeb doc/memory: update MMIO section
There is no memory_region_io(). And remove a stray '-'.

Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Message-Id: <1459507677-16662-1-git-send-email-caoj.fnst@cn.fujitsu.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-04-05 11:46:52 +02:00
Daniel P. Berrange
64c800f808 char: ensure all clients are in non-blocking mode
Only some callers of tcp_chr_new_client are putting the
socket client into non-blocking mode. Move the call to
qio_channel_set_blocking() into the tcp_chr_new_client
method to guarantee that all code paths set non-blocking
mode

Reported-by: Andrew Baumann <Andrew.Baumann@microsoft.com>
Reported-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1458324041-22709-1-git-send-email-berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-04-05 11:46:52 +02:00
Daniel P. Berrange
53628efbc8 char: fix broken EAGAIN retry on OS-X due to errno clobbering
Some of the chardev I/O paths really want to write the
complete data buffer even though the channel is in
non-blocking mode. To achieve this they look for EAGAIN
and g_usleep() for 100ms. Unfortunately the code is set
to check errno == EAGAIN a second time, after the g_usleep()
call has completed. On OS-X at least, g_usleep clobbers
errno to ETIMEDOUT, causing the retry to be skipped.

This failure to retry means the full data isn't written
to the chardev backend, which causes various failures
including making the tests/ahci-test qtest hang.

Rather than playing games trying to reset errno just
simplify the code to use a goto to retry instead of a
a loop.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1459438168-8146-2-git-send-email-berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-04-05 11:46:52 +02:00
Daniel P. Berrange
340849a9ff util: retry getaddrinfo if getting EAI_BADFLAGS with AI_V4MAPPED
The FreeBSD header files define the AI_V4MAPPED but its
implementation of getaddrinfo() always returns an error
when that flag is set. eg

  address resolution failed for localhost:9000: Invalid value for ai_flags

There are also reports of the same problem on OS-X 10.6

Since AI_V4MAPPED is not critical functionality, if we
get an EAI_BADFLAGS error then just retry without the
AI_V4MAPPED flag set. Use a static var to cache this
status so we don't have to retry on every single call.

Also remove its use from the test suite since it serves
no useful purpose there.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1459786920-15961-1-git-send-email-berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-04-05 11:46:52 +02:00
Cédric Le Goater
f0707d2e03 checkpatch: add target_ulong to typelist
In some occasions, a patch [1] can start with a hunk containing a
simple type cast. At the time annotate_values() is run, the type is
unknown and the cast type is misinterpreted as a identifier, resulting
in an error if it is followed with a negative value:

	ERROR: spaces required around that '-' (ctx:WxV)

It seems complex to catch all possible types in a cast expression. So,
as a fallback solution, let's add some common qemu types to the
typeList array.

[1] http://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg06741.html

Signed-off-by: Cédric Le Goater <clg@fr.ibm.com>
Message-Id: <1459503606-31603-1-git-send-email-clg@fr.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-04-05 11:46:52 +02:00
Paolo Bonzini
48e1a45c31 target-i386: assert that KVM_GET/SET_MSRS can set all requested MSRs
This would have caught the bug in the previous patch.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-04-05 11:46:52 +02:00
Paolo Bonzini
273c515c0a target-i386: do not pass MSR_TSC_AUX to KVM ioctls if CPUID bit is not set
KVM does not let you read or write this MSR if the corresponding CPUID
bit is not set.  This in turn causes MSRs that come after MSR_TSC_AUX
to be ignored by KVM_SET_MSRS.

One visible symptom is that s3.flat from kvm-unit-tests fails with
CPUs that do not have RDTSCP, because the SMBASE is not reset to
0x30000 after reset.

Fixes: c9b8f6b621
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-04-05 11:46:52 +02:00
Marc-André Lureau
85bc2a1512 memory: fix segv on qemu_ram_free(block=0x0)
Since f1060c55bf, the pointer is directly passed to
qemu_ram_free(). However, on initialization failure, it may be called
with a NULL pointer. Return immediately in this case.

This fixes a SEGV when memory initialization failed, for example
permission denied on open backing store /dev/hugepages, with -object
memory-backend-file,mem-path=/dev/hugepages.

Program received signal SIGSEGV, Segmentation fault.
0x00005555556e67e7 in qemu_ram_free (block=0x0) at /home/elmarco/src/qemu/exec.c:1775

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1459250451-29984-1-git-send-email-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-04-05 11:46:52 +02:00
Andrey Smetanin
1b0d9b05d4 target-i386/kvm: Hyper-V VMBus hypercalls blank handlers
Add Hyper-V VMBus hypercalls blank handlers which
just returns error code - HV_STATUS_INVALID_HYPERCALL_CODE.
This is required when the synthetic interrupt controller is
active.

Fixes: 50efe82c3c
Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Richard Henderson <rth@twiddle.net>
CC: Eduardo Habkost <ehabkost@redhat.com>
CC: "Andreas Färber" <afaerber@suse.de>
CC: Marcelo Tosatti <mtosatti@redhat.com>
CC: Roman Kagan <rkagan@virtuozzo.com>
CC: Denis V. Lunev <den@openvz.org>
CC: kvm@vger.kernel.org
Message-Id: <1456309368-29769-2-git-send-email-asmetanin@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-04-05 11:46:52 +02:00
Paolo Bonzini
b89485a52e update Linux headers to 4.6
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-04-05 11:46:52 +02:00
Peter Maydell
972e3ca3c1 Merge remote-tracking branch 'remotes/stsquad/tags/travis-pull-05042016' into staging
This pull request includes:
  - further collapse of the build matrix
  - enabling MacOSX in the build
  - make -j3 change

Other pending updates are deferred for later in the cycle.

# gpg: Signature made Tue 05 Apr 2016 10:11:25 BST using RSA key ID 5A9E2A44
# gpg: Good signature from "Alex Bennée (Master Work Key) <alex.bennee@linaro.org>"

* remotes/stsquad/tags/travis-pull-05042016:
  .travis.yml: make -j3
  .travis.yml: enable OSX builds
  .travis.yml: collapse the test matrix

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-05 10:40:54 +01:00
Alex Bennée
7436268ce7 .travis.yml: make -j3
The move from Travis VMs to Containers came with a upgrade from 1.5
cores to 2. The received wisdom is -j N+1 means a core can be doing work
while other threads wait for IO to complete. This is hard to test on the
Travis infrastructure but an initial before/after eyeballing seems to
confirm it is an improvement.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2016-04-05 10:08:15 +01:00
Alex Bennée
1d002037f9 .travis.yml: enable OSX builds
Travis has support for OSX builds. Making the setup work cleanly
involves a little hacking about with the .travis.yml file but rather
than make it too messy I've pushed all the "brew" install stuff into a
support script called ./scripts/macosx-brew.sh.

Currently only the default ./configure ${CONFIG} is built as I'm not
sure what extra coverage would come from the other build stanzas.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-05 10:08:11 +01:00
Alex Bennée
6c93329186 .travis.yml: collapse the test matrix
Remove the concept of TARGETS and build the complete target list for
each config combination. Now the matrix is just based on CONFIG stanzas
and we use the additional stuff for:

  - things that only work on one compiler (sparse, gcov, gprof)
  - combos where "make check" fails

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2016-04-05 10:08:09 +01:00
Peter Maydell
1dbc7cc9b9 Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.6-20160405' into staging
ppc patch queue for 2016-03-24

Three bugfixes for target-ppc, pseries machine type and related devices.

1. Fix a bug in the core code where kvm_vcpu_dirty would not be set
   before the very first system reset.  This meant that if things in
   the reset path did their own cpu_synchronize_state() it would pull
   stale data out of KVM.

   On ppc this, in combination with a previous cleanup meant that the
   MSR would be zeroed before entry, instead of correctly having the
   SF (64-bit mode) bit set.

2. Allow immediate detach of hot-added PCI devices which haven't yet
   been announced to the guest.

   This fixes a regression: because of a case where we now defer
   announcement of non-zero functions to the guest, an incorrect
   hot-add of such a device can't be backed out until the add is
   completed, which is counter-intuitive to say the least.

3. Fix migration of alternate interrupt locations.  The location of
   interrupt vectors can be affected by the LPCR, and we weren't
   correctly recalculating this after migration of a non-standard LPCR
   value.

# gpg: Signature made Tue 05 Apr 2016 03:13:41 BST using RSA key ID 20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-2.6-20160405:
  vl: Move cpu_synchronize_all_states() into qemu_system_reset()
  spapr_drc: enable immediate detach for unsignalled devices
  ppc: Rework POWER7 & POWER8 exception model

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-05 09:32:35 +01:00
Kevin Wolf
76b223200e block: Forbid I/O throttling on nodes with multiple parents for 2.6
As the patches to move I/O throttling to BlockBackend didn't make it in
time for the 2.6 release, but the release adds new ways of configuring
VMs whose behaviour would change once the move is done, we need to
outlaw such configurations temporarily.

The problem exists whenever a BDS has more users than just its BB, for
example it is used as a backing file for another node. (This wasn't
possible in 2.5 yet as we introduced node references to specify a
backing file only recently.) In these cases, the throttling would
apply to these other users now, but after moving throttling to the
BlockBackend the other users wouldn't be throttled any more.

This patch prevents making new references to a throttled node as well as
using monitor commands to throttle a node with multiple parents.

Compared to 2.5 this changes behaviour in some corner cases where
references were allowed before, like bs->file or Quorum children. It
seems reasonable to assume that users didn't use I/O throttling on such
low level nodes. With the upcoming move of throttling into BlockBackend,
such configurations won't be possible anyway.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-04-05 09:22:28 +02:00
Paolo Bonzini
5cf87fd68e block: forbid x-blockdev-del from acting on DriveInfo
Failing on -drive/drive_add created BlockBackends was a
requirement for x-blockdev-del, but it sneaked through
the patch review.  Let's fix it now.

Example:

$ x86_64-softmmu/qemu-system-x86_64 -drive if=none,file=null-co://,id=null -qmp stdio
>> {'execute':'qmp_capabilities'}
<< {"return": {}}
>> {'execute':'x-blockdev-del','arguments':{'id':'null'}}
<< {"error": {"class": "GenericError", "desc": "Deleting block backend added with drive-add is not supported"}}

And without a DriveInfo:

>> { "execute": "blockdev-add", "arguments": { "options": { "driver":"null-co", "id":"null2"}}}
<< {"return": {}}
>> {'execute':'x-blockdev-del','arguments':{'id':'null2'}}
<< {"return": {}}

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-05 09:22:28 +02:00
David Gibson
efdaf797de vl: Move cpu_synchronize_all_states() into qemu_system_reset()
There are currently 3 calls to qemu_system_reset() in vl.c.  Two of them
are immediately preceded by a cpu_synchronize_all_states9) and the
remaining one should be.

The one which doesn't is the very first reset called directly from main().
Without a cpu_synchronize_all_states(), kvm_vcpu_dirty is false at this
point from the earlier cpu_synchronize_all_post_init().  That's incorrect
because the reset path is quite likely to update the CPU state, and that
updated state should be pushed back to KVM, not overwritten with stale
data pushed to KVM immediately after init.

This patch moves the call to cpu_synchronize_all_states() into
qemu_system_reset() for safety, so it is always called.  AFAICT this should
be safe for the handful of callers outside vl.c - these all appear to be in
places where the cpu state is already synchronized so the extra call
will be a no-op.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Tested-by: Laurent Vivier <lvivier@redhat.com>
2016-04-05 10:49:10 +10:00
Michael Roth
f40eb921da spapr_drc: enable immediate detach for unsignalled devices
Currently spapr doesn't support "aborting" hotplug of PCI
devices by allowing device_del to immediately remove the
device if we haven't signalled the presence of the device
to the guest.

In the past this wasn't an issue, since we always immediately
signalled device attach and simply relied on full guest-aware
add->remove path for device removal. However, as of 788d259,
we now defer signalling for PCI functions until function 0
is attached, so now we need to deal with these "abort" operations
for cases where a user hotplugs a non-0 function, then opts to
remove it prior hotplugging function 0. Currently they'd have to
reboot before the unplug completed. PCIe multifunction hotplug
does not have this requirement however, so from a management
implementation perspective it would be good to address this within
the same release as 788d259.

We accomplish this by simply adding a 'signalled' flag to track
whether a device hotplug event has been sent to the guest. If it
hasn't, we allow immediate removal under the assumption that the
guest will not be using the device. Devices present at boot/reset
time are also assumed to be 'signalled'.

For CPU/memory/etc, signalling will still happen immediately
as part of device_add, so only PCI functions should be affected.

Cc: bharata@linux.vnet.ibm.com
Cc: david@gibson.dropbear.id.au
Cc: sbhat@linux.vnet.ibm.com
Cc: qemu-ppc@nongnu.org
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
[dwg: This fixes a regression where an incorrect hot-add of a non-zero
      function can no longer be backed out until function 0 is added]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-04-05 10:47:03 +10:00
Cédric Le Goater
5c94b2a5e5 ppc: Rework POWER7 & POWER8 exception model
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>

This patch fixes the current AIL implementation for POWER8. The
interrupt vector address can be calculated directly from LPCR when the
exception is handled. The excp_prefix update becomes useless and we
can cleanup the H_SET_MODE hcall.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[clg: Removed LPES0/1 handling for HV vs. !HV
      Fixed LPCR_ILE case for POWERPC_EXCP_POWER8 ]
Signed-off-by: Cédric Le Goater <clg@fr.ibm.com>
[dwg: This was written as a cleanup, but it also fixes a real bug
      where setting an alternative interrupt location would not be
      correctly migrated]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-04-05 10:38:24 +10:00
Peter Maydell
2e3a76ae3e Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20160404' into staging
target-arm queue:
 * bcm2836: wire up CPU timer interrupts correctly
 * linux-user: ignore EXCP_YIELD in ARM cpu_loop()
 * target-arm: correctly reset SCTLR_EL3
 * target-arm: remove incorrect ALIAS tags from ESR_EL2 and ESR_EL3
 * target-arm: make the 64-bit version of VTCR do the migration

# gpg: Signature made Mon 04 Apr 2016 17:42:16 BST using RSA key ID 14360CDE
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>"
# gpg:                 aka "Peter Maydell <pmaydell@gmail.com>"
# gpg:                 aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>"

* remotes/pmaydell/tags/pull-target-arm-20160404:
  target-arm: Make the 64-bit version of VTCR do the migration
  target-arm: Remove incorrect ALIAS tags from ESR_EL2 and ESR_EL3
  target-arm: Correctly reset SCTLR_EL3 for 64-bit CPUs
  linux-user: arm: Handle (ignore) EXCP_YIELD in ARM cpu_loop()
  hw/arm/bcm2836: Wire up CPU timer interrupts correctly

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-04 17:43:39 +01:00
Peter Maydell
bf06c1123a target-arm: Make the 64-bit version of VTCR do the migration
Move the ALIAS tag from VTCR_EL2 to VTCR so that we migrate the
64-bit version, as is usual. (This has no particular effect now
unless the guest wrote to the high RES0 bits of VTCR_EL2.)
Add a comment about why it's OK that we don't have the various
accessor functions that the EL1 TCR regdefs do.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Message-id: 1459435778-5526-4-git-send-email-peter.maydell@linaro.org
2016-04-04 17:33:52 +01:00
Peter Maydell
094a7d0b9d target-arm: Remove incorrect ALIAS tags from ESR_EL2 and ESR_EL3
The regdefs for the ESR_EL2 and ESR_EL3 system registers should not
be marked as ARM_CP_ALIAS, because these are the master copies; the
DFSR regdef in vmsa_pmsa_cp_reginfo[] is marked as an alias.
Remove the ALIAS tags so that these registers are correctly migrated.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Sergey Fedorov <sergey.fedorov@linaro.rog>
Message-id: 1459435778-5526-3-git-send-email-peter.maydell@linaro.org
2016-04-04 17:33:51 +01:00
Peter Maydell
e24fdd238a target-arm: Correctly reset SCTLR_EL3 for 64-bit CPUs
The regdef for SCTRL_EL3 was incorrectly marked as being an
ARM_CP_ALIAS, with the remark that this was because the 32-bit
definition would take care of reset and migration. However the
intention for banked registers as documented in the comment in
add_cpreg_to_hashtable() is:

 * 2) If ARMv8 is enabled then we can count on a 64-bit version
 *    taking care of the secure bank.  This requires that separate
 *    32 and 64-bit definitions are provided.

and so it marks the 32-bit secure banked version as an alias.
This results in the sctlr_s/sctlr_el[3] field never being reset
or migrated for a 64-bit CPU with EL3 enabled.

Fix this by removing the ARM_CP_ALIAS annotation from SCTLR_EL3.
Since this means it now needs a real reset value, move the regdef
into the same place that we define the 32-bit SCTLR.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Reviewed-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Message-id: 1459435778-5526-2-git-send-email-peter.maydell@linaro.org
2016-04-04 17:33:51 +01:00
Peter Maydell
f911e0a323 linux-user: arm: Handle (ignore) EXCP_YIELD in ARM cpu_loop()
The new-in-ARMv8 YIELD instruction has been implemented to throw
an EXCP_YIELD back up to the QEMU main loop. In system emulation
we use this to decide to schedule a different guest CPU in SMP
configurations. In usermode emulation there is nothing to do,
so just ignore it and resume the guest.

This prevents an abort with "unhandled CPU exception 0x10004"
if the guest process uses the YIELD instruction.

Reported-by: Hunter Laux <hunterlaux@gmail.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1456833171-31900-1-git-send-email-peter.maydell@linaro.org
2016-04-04 17:33:51 +01:00
Peter Maydell
0dc1982312 hw/arm/bcm2836: Wire up CPU timer interrupts correctly
Wire up the CPU timer interrupts in the right order, with the
nonsecure physical timer on cntpnsirq, the hyp timer on cnthpirq,
and the secure physical timer on cntpsirq. (We did get the
virt timer right, at least.)

Reported-by: Antonio Huete Jiménez <tuxillo@quantumachine.net>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Andrew Baumann <Andrew.Baumann@microsoft.com>
Message-id: 1458210790-6621-1-git-send-email-peter.maydell@linaro.org
2016-04-04 17:33:51 +01:00
Ed Maste
c40e13e106 bsd-user: add necessary includes to fix warnings
Signed-off-by: Ed Maste <emaste@freebsd.org>
Message-id: 1459781903-64465-1-git-send-email-emaste@freebsd.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-04 16:17:18 +01:00
Daniel P. Berrange
e31f045187 net: fix missing include of qapi/error.h in netmap.c
The netmap.c file fails to build on FreeBSD with

net/netmap.c:95:9: warning: implicit declaration of function 'error_setg_errno' is invalid in C99 [-Wimplicit-function-declaration]
     error_setg_errno(errp, errno, "Failed to nm_open() %s",
     ^
net/netmap.c:432:9: warning: implicit declaration of function 'error_propagate' is invalid in C99 [-Wimplicit-function-declaration]
     error_propagate(errp, err);
     ^

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1459429690-6144-1-git-send-email-berrange@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-04 15:01:14 +01:00
John Arbuckle
9d227f194d ui/cocoa.m: Add support for cdr files
Allow the user to select .cdr files in the file open dialog.

Signed-off-by: John Arbuckle <programmingkidx@gmail.com>
Message-id: 32C964D4-3F17-47B7-AE7E-593E6BFD8855@gmail.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-04 13:54:44 +01:00
Peter Maydell
bdc5db01c3 Merge remote-tracking branch 'remotes/thibault/tags/samuel-thibault-2' into staging
slirp updates (2)

# gpg: Signature made Fri 01 Apr 2016 16:52:09 BST using RSA key ID FB6B2F1D
# gpg: Good signature from "Samuel Thibault <samuel.thibault@gnu.org>"
# gpg:                 aka "Samuel Thibault <sthibault@debian.org>"
# gpg:                 aka "Samuel Thibault <samuel.thibault@inria.fr>"
# gpg:                 aka "Samuel Thibault <samuel.thibault@labri.fr>"
# gpg:                 aka "Samuel Thibault <samuel.thibault@ens-lyon.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 900C B024 B679 31D4 0F82  304B D017 8C76 7D06 9EE6
#      Subkey fingerprint: F632 74CD C630 0873 CB3D  29D9 E3E5 1CE8 FB6B 2F1D

* remotes/thibault/tags/samuel-thibault-2:
  slirp: Allow disabling IPv4 or IPv6

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-04 12:09:27 +01:00
Max Filippov
34fe9af09b opencores_eth: indicate autonegotiation completion
Indicate that autonegotiation is complete in the MII BMSR. This fixes
networking on xtfpga platform in linux v4.5.

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2016-04-04 07:08:26 +03:00
Samuel Thibault
0b11c03662 slirp: Allow disabling IPv4 or IPv6
Add ipv4 and ipv6 boolean options, so the user can setup IPv4-only and
IPv6-only network environments.

Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
2016-04-01 17:51:55 +02:00
Peter Maydell
de1d099a44 Merge remote-tracking branch 'remotes/thibault/tags/samuel-thibault-2' into staging
slirp updates (2)

# gpg: Signature made Thu 31 Mar 2016 23:19:08 BST using RSA key ID FB6B2F1D
# gpg: Good signature from "Samuel Thibault <samuel.thibault@gnu.org>"
# gpg:                 aka "Samuel Thibault <sthibault@debian.org>"
# gpg:                 aka "Samuel Thibault <samuel.thibault@inria.fr>"
# gpg:                 aka "Samuel Thibault <samuel.thibault@labri.fr>"
# gpg:                 aka "Samuel Thibault <samuel.thibault@ens-lyon.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 900C B024 B679 31D4 0F82  304B D017 8C76 7D06 9EE6
#      Subkey fingerprint: F632 74CD C630 0873 CB3D  29D9 E3E5 1CE8 FB6B 2F1D

* remotes/thibault/tags/samuel-thibault-2:
  slirp: Fix migration from older versions of QEMU to the current one

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-01 11:15:20 +01:00
Thomas Huth
eaf136f9a2 slirp: Fix migration from older versions of QEMU to the current one
While adding the IPv6 support, the commit eae303ff23
("slirp: Make Socket structure IPv6 compatible") changed the format of
the migration stream, without taking into account that we might still
receive an old migration stream layout when upgrading from QEMU version
2.5 (or older) to QEMU 2.6. Currently, QEMU bails out when doing a
migration from QEMU 2.5 to the recent master version when it has
been started with a "-net user,guestfwd=..." network. So let's fix
this by checking the version ID of the migration stream and by using
the old behavior if we've detected version 3 or less.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
2016-04-01 00:05:06 +02:00
Thomas Huth
57528a3fef MAINTAINERS: Delete invalid maintainer entries of the Exynos section
Mails to these e-mail addresses are rejected by the mail server
of Samsung with "User unknown" messages, so it seems like these
Exynos maintainers are no longer available.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-id: 1459341140-16892-1-git-send-email-thuth@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-31 18:21:01 +01:00
Stefano Stabellini
3623c57ed2 Xen: update MAINTAINERS info
Add Anthony Perard as Xen co-maintainer.
Update my email address.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Anthony Perard <anthony.perard@citrix.com>
Message-id: alpine.DEB.2.02.1603241131520.18380@kaball.uk.xensource.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-31 18:20:39 +01:00
Peter Maydell
1458317c8a Merge remote-tracking branch 'remotes/stefanha/tags/tracing-pull-request' into staging
# gpg: Signature made Thu 31 Mar 2016 13:35:23 BST using RSA key ID 81AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"

* remotes/stefanha/tags/tracing-pull-request:
  trace-events: Fix typos (found by codespell)
  log: move qemu_log_close/qemu_log_flush from header to log.c
  trace: do not always call exit() in trace_enable_events
  docs: Update documentation for stderr (now log) tracing backend.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-31 13:49:59 +01:00
Peter Maydell
92741fc4b6 Merge remote-tracking branch 'remotes/thibault/tags/samuel-thibault' into staging
slirp updates

# gpg: Signature made Thu 31 Mar 2016 00:08:38 BST using RSA key ID FB6B2F1D
# gpg: Good signature from "Samuel Thibault <samuel.thibault@gnu.org>"
# gpg:                 aka "Samuel Thibault <sthibault@debian.org>"
# gpg:                 aka "Samuel Thibault <samuel.thibault@inria.fr>"
# gpg:                 aka "Samuel Thibault <samuel.thibault@labri.fr>"
# gpg:                 aka "Samuel Thibault <samuel.thibault@ens-lyon.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 900C B024 B679 31D4 0F82  304B D017 8C76 7D06 9EE6
#      Subkey fingerprint: F632 74CD C630 0873 CB3D  29D9 E3E5 1CE8 FB6B 2F1D

* remotes/thibault/tags/samuel-thibault:
  Fix ipv6 options according to documentation

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-31 11:52:44 +01:00
Peter Maydell
a1a668efd5 Merge remote-tracking branch 'remotes/cody/tags/block-pull-request' into staging
# gpg: Signature made Wed 30 Mar 2016 21:51:01 BST using RSA key ID C0DE3057
# gpg: Good signature from "Jeffrey Cody <jcody@redhat.com>"
# gpg:                 aka "Jeffrey Cody <jeff@codyprime.org>"
# gpg:                 aka "Jeffrey Cody <codyprime@gmail.com>"

* remotes/cody/tags/block-pull-request:
  block/nfs: add missing #include "qemu/cutils.h"
  block/nfs: add missing #include "qapi/error.h"

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-31 11:06:33 +01:00
Stefan Weil
a6d4953b60 trace-events: Fix typos (found by codespell)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
Message-id: 1458743900-14742-1-git-send-email-sw@weilnetz.de
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-03-31 10:37:00 +01:00
Denis V. Lunev
99affd1d5b log: move qemu_log_close/qemu_log_flush from header to log.c
There is no particular reason to keep these functions in the header.
Suggested by Paolo.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1458128212-4197-3-git-send-email-den@openvz.org
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-03-31 09:58:32 +01:00
Denis V. Lunev
acc6809ddc trace: do not always call exit() in trace_enable_events
The problem is that
  virsh qemu-monitor-command --hmp VM log trace:help
forces QEMU to exit even when running VM normally.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1458128212-4197-2-git-send-email-den@openvz.org
CC: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-03-31 09:48:59 +01:00
Richard W.M. Jones
ab8eb29c4a docs: Update documentation for stderr (now log) tracing backend.
This fixes commit ed7f5f1d8d.

Signed-off-by: Richard W.M. Jones.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 1458507614-32470-1-git-send-email-rjones@redhat.com
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-03-31 09:48:59 +01:00
Samuel Thibault
891a2bb58c Fix ipv6 options according to documentation
The options names were fixed in the qapi layer, but not in the command-line
options.

Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-03-31 01:08:29 +02:00
Stefan Hajnoczi
0d94b74655 block/nfs: add missing #include "qemu/cutils.h"
parse_uint_full() used to be included from qemu-common.h but was moved
to qemu/cutils.h in commit f348b6d1a5
("util: move declarations out of qemu-common.h").

Cc: Veronia Bahaa <veroniabahaa@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1459341994-20567-3-git-send-email-stefanha@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-03-30 16:50:39 -04:00
Stefan Hajnoczi
d165b8cb8b block/nfs: add missing #include "qapi/error.h"
error_setg() used to be included indirectly through qemu/osdep.h.  Since
commit da34e65cb4 ("include/qemu/osdep.h:
Don't include qapi/error.h") it requires an explicit include.

Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1459341994-20567-2-git-send-email-stefanha@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-03-30 16:50:39 -04:00
Peter Maydell
9370a3bbc4 Update version for v2.6.0-rc0 release
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-30 19:25:40 +01:00
Peter Maydell
4468d4e0f3 Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20160330-1' into staging
target-arm queue:
 * virt: fix the virtual power button by adding a modelled
   "key press for 100ms" device
 * various improvements to m25p80 flash devices
 * implement new QMP query-gic-capability command to let the
   management layer know what versions of GIC we support

# gpg: Signature made Wed 30 Mar 2016 17:30:51 BST using RSA key ID 14360CDE
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>"
# gpg:                 aka "Peter Maydell <pmaydell@gmail.com>"
# gpg:                 aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>"

* remotes/pmaydell/tags/pull-target-arm-20160330-1:
  arm: implement query-gic-capabilities
  kvm: add kvm_device_supported() helper function
  arm: enhance kvm_arm_create_scratch_host_vcpu
  arm: qmp: add query-gic-capabilities interface
  block: m25p80: at25128a/at25256a models
  block: m25p80: n25q256a/n25q512a models
  block: m25p80: Implemented FSR register
  block: m25p80: Fast read and 4bytes commands
  block: m25p80: Dummy cycles for N25Q256/512
  block: m25p80: Add configuration registers
  block: m25p80: 4byte address mode
  block: m25p80: Extend address mode
  block: m25p80: Widen flags variable
  block: m25p80: RESET_ENABLE and RESET_MEMORY commands
  block: m25p80: Removed unused variable
  ARM: Virt: Use gpio_key for power button
  hw/gpio: Add the emulation of gpio_key

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-30 17:32:11 +01:00
Peter Xu
db31e49a56 arm: implement query-gic-capabilities
For emulated GIC capabilities, currently only gicv2 is supported. We
need to add gicv3 in when emulated gicv3 ready. For KVM accelerated ARM
VM, we detect the capability bits by creating a scratch VM.

Signed-off-by: Peter Xu <peterx@redhat.com>
Acked-by: Sergey Fedorov <serge.fdrv@gmail.com>
Message-id: 1458788142-17509-5-git-send-email-peterx@redhat.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-30 17:27:24 +01:00
Peter Xu
29039acf58 kvm: add kvm_device_supported() helper function
This can be used when probing whether KVM support specific device. Here,
a raw vmfd is used.

Signed-off-by: Peter Xu <peterx@redhat.com>
Acked-by: Sergey Fedorov <serge.fdrv@gmail.com>
Message-id: 1458788142-17509-4-git-send-email-peterx@redhat.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-30 17:27:24 +01:00
Peter Xu
2f340e9c24 arm: enhance kvm_arm_create_scratch_host_vcpu
Support passing NULL for the first parameter (with the same effect
as passing an empty array) and for the third parameter (meaning
that we should not attempt to init the vcpu).

Signed-off-by: Peter Xu <peterx@redhat.com>
Acked-by: Sergey Fedorov <serge.fdrv@gmail.com>
Message-id: 1458788142-17509-3-git-send-email-peterx@redhat.com
[PMM: tweaked commit message, comment]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-30 17:27:24 +01:00
Peter Xu
ae50a7702c arm: qmp: add query-gic-capabilities interface
This patch add "query-gic-capabilities" but does not implement it. The
command is ARM-only. The command will return a list of GICCapability
structs that describes all GIC versions that current QEMU and system
support.

Libvirt is possibly the first consumer of this new command.

Before this patch, a libvirt user can successfully configure all kinds
of GIC devices for ARM guests, no matter whether current QEMU/kernel
supports them. If the specified GIC version/type is not supported, the
user will get an ambiguous "QEMU boot failure" error when trying to start
the VM. This is not user-friendly.

With this patch, libvirt should be able to query which type (and which
version) of GIC device is supported. Using this information, libvirt
can warn the user during configuration of guests when specified GIC
device type is not supported. Or better, we can just list those versions
that we support, and filter out the unsupported ones.

For example, if we got the query result:

{"return": [{"emulated": false, "version": 3, "kernel": true},
            {"emulated": true, "version": 2, "kernel": false}]}

then it means that we support emulated GIC version 2 using:

  qemu-system-aarch64 -M virt,accel=tcg,gic-version=2 ...

or KVM-accelerated GIC version 3 using:

  qemu-system-aarch64 -M virt,accel=kvm,gic-version=3 ...

If we specify other explicit GIC versions rather than the above, QEMU
will not be able to boot.

The community is working on a more generic way to query these kinds of
information about valid values of machine properties. However, due to
the importance of supporting this specific use case, weecided to first
implement this ad-hoc one; then when the generic method is ready, we
can move on to that one smoothly.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1458788142-17509-2-git-send-email-peterx@redhat.com
[PMM: tweaked commit message a bit; monitor.o is CONFIG_SOFTMMU only]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-30 17:27:24 +01:00
Marcin Krzeminski
1435bcd612 block: m25p80: at25128a/at25256a models
Signed-off-by: Marcin Krzeminski <marcin.krzeminski@nokia.com>
Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Message-id: 1458719789-29868-12-git-send-email-marcin.krzeminski@nokia.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-30 17:27:24 +01:00
Marcin Krzeminski
d31912bd7e block: m25p80: n25q256a/n25q512a models
Signed-off-by: Marcin Krzeminski <marcin.krzeminski@nokia.com>
Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Message-id: 1458719789-29868-11-git-send-email-marcin.krzeminski@nokia.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-30 17:27:24 +01:00
Marcin Krzeminski
9fbaa36477 block: m25p80: Implemented FSR register
Implements FSR register, it is used for busy waits.

Signed-off-by: Marcin Krzeminski <marcin.krzeminski@nokia.com>
Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Message-id: 1458719789-29868-10-git-send-email-marcin.krzeminski@nokia.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-30 17:27:23 +01:00
Marcin Krzeminski
63e47f6f72 block: m25p80: Fast read and 4bytes commands
Adds fast read and 4bytes commands family.
This work is based on Pawel Lenkow patch from v1.

Signed-off-by: Marcin Krzeminski <marcin.krzeminski@nokia.com>
Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Message-id: 1458719789-29868-9-git-send-email-marcin.krzeminski@nokia.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-30 17:27:23 +01:00
Marcin Krzeminski
aeb83edbf3 block: m25p80: Dummy cycles for N25Q256/512
Use the setting from the volatile cfg register to correctly
set the number of dummy cycles.

Signed-off-by: Marcin Krzeminski <marcin.krzeminski@nokia.com>
Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Message-id: 1458719789-29868-8-git-send-email-marcin.krzeminski@nokia.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-30 17:27:23 +01:00
Marcin Krzeminski
cb475951c0 block: m25p80: Add configuration registers
This patch adds both volatile and non volatile configuration registers
and commands to allow modify them. It is needed for proper handling
dummy cycles. Initialization of those registers and flash state
has been included as well.
Some of this registers are used by kernel.

Signed-off-by: Marcin Krzeminski <marcin.krzeminski@nokia.com>
Acked-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Message-id: 1458719789-29868-7-git-send-email-marcin.krzeminski@nokia.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-30 17:27:23 +01:00
Marcin Krzeminski
c0f3f6754a block: m25p80: 4byte address mode
This patch adds only 4byte address mode (does not cover dummy cycles).
This mode is needed to access more than 16 MiB of flash.

Signed-off-by: Marcin Krzeminski <marcin.krzeminski@nokia.com>
Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Message-id: 1458719789-29868-6-git-send-email-marcin.krzeminski@nokia.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-30 17:27:23 +01:00
Marcin Krzeminski
d8a29a7a89 block: m25p80: Extend address mode
Extend address mode allows to switch flash 16 MiB banks,
allowing user to access all flash sectors.
This access mode is used by u-boot.

Signed-off-by: Marcin Krzeminski <marcin.krzeminski@nokia.com>
Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Message-id: 1458719789-29868-5-git-send-email-marcin.krzeminski@nokia.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-30 17:27:23 +01:00
Marcin Krzeminski
76e872695a block: m25p80: Widen flags variable
Extend the width of the flags variable to support the already existing
(but unused) WR_1 flag, which is above the range of 8 bits.
This allows support of EEPROM emulation which requires the WR_1 feature.

Signed-off-by: Marcin Krzeminski <marcin.krzeminski@nokia.com>
Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Message-id: 1458719789-29868-4-git-send-email-marcin.krzeminski@nokia.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-30 17:27:22 +01:00
Marcin Krzeminski
187c26364c block: m25p80: RESET_ENABLE and RESET_MEMORY commands
Signed-off-by: Marcin Krzeminski <marcin.krzeminski@nokia.com>
Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Message-id: 1458719789-29868-3-git-send-email-marcin.krzeminski@nokia.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-30 17:27:22 +01:00
Marcin Krzeminski
e8710c2293 block: m25p80: Removed unused variable
Signed-off-by: Marcin Krzeminski <marcin.krzeminski@nokia.com>
Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Message-id: 1458719789-29868-2-git-send-email-marcin.krzeminski@nokia.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-30 17:27:22 +01:00
Shannon Zhao
94f02c5ea9 ARM: Virt: Use gpio_key for power button
There is a problem for power button that it will not work if an early
system_powerdown request happens before guest gpio driver loads.

Fix this problem by using gpio_key.

Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Message-id: 1458221140-15232-3-git-send-email-zhaoshenglong@huawei.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-30 17:27:22 +01:00
Shannon Zhao
e5a8152c9b hw/gpio: Add the emulation of gpio_key
This will be used by ARM virt machine as a power button.

Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Message-id: 1458221140-15232-2-git-send-email-zhaoshenglong@huawei.com
[PMM: Use hyphen rather than underscore in type names;
 add a comment briefly describing what the device does]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-30 17:27:22 +01:00
Peter Maydell
489ef4c810 Merge remote-tracking branch 'remotes/lalrae/tags/mips-20160329-2' into staging
MIPS patches 2016-03-29

Changes:
* add initial MIPS CPS support
* implement ITU block
* implement MAAR

# gpg: Signature made Wed 30 Mar 2016 09:27:01 BST using RSA key ID 0B29DA6B
# gpg: Good signature from "Leon Alrae <leon.alrae@imgtec.com>"

* remotes/lalrae/tags/mips-20160329-2: (21 commits)
  target-mips: add MAAR, MAARI register
  target-mips: use CP0_CHECK for gen_m{f|t}hc0
  hw/mips/cps: enable ITU for multithreading processors
  target-mips: make ITC Configuration Tags accessible to the CPU
  target-mips: check CP0 enabled for CACHE instruction also in R6
  hw/mips: implement ITC Storage - Bypass View
  hw/mips: implement ITC Storage - P/V Sync and Try Views
  hw/mips: implement ITC Storage - Empty/Full Sync and Try Views
  hw/mips: implement ITC Storage - Control View
  hw/mips: implement ITC Configuration Tags and Storage Cells
  target-mips: enable CM GCR in MIPS64R6-generic CPU
  hw/mips_malta: add CPS to Malta board
  hw/mips_malta: move CPU creation to a separate function
  hw/mips_malta: remove redundant irq and clock init
  hw/mips_malta: remove CPUMIPSState from the write_bootloader()
  hw/mips/cps: create CPC block inside CPS
  hw/mips: add initial Cluster Power Controller support
  hw/mips/cps: create GCR block inside CPS
  hw/mips: add initial Global Config Register support
  target-mips: add CMGCRBase register
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-30 16:06:45 +01:00
Peter Maydell
69bc7f5029 Merge remote-tracking branch 'remotes/berrange/tags/pull-qcrypto-2016-03-30-1' into staging
Merge qcrypto fixes 2016/03/30 v1

# gpg: Signature made Wed 30 Mar 2016 14:59:19 BST using RSA key ID 15104FDF
# gpg: Good signature from "Daniel P. Berrange <dan@berrange.com>"
# gpg:                 aka "Daniel P. Berrange <berrange@redhat.com>"

* remotes/berrange/tags/pull-qcrypto-2016-03-30-1:
  crypto: do an explicit check for nettle pbkdf functions

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-30 15:04:08 +01:00
Daniel P. Berrange
fff2f982ab crypto: do an explicit check for nettle pbkdf functions
Support for the PBKDF functions in nettle was not introduced
until version 2.6. Some distros QEMU targets have older
versions and thus lack PBKDF support. Address this by doing
a check in configure for the desired function and then skipping
compilation of the nettle-pbkdf.o module

Reported-by: Wen Congyang <wency@cn.fujitsu.com>
Tested-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-03-30 14:55:11 +01:00
Peter Maydell
b9c27e7ae6 Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches

# gpg: Signature made Wed 30 Mar 2016 11:57:54 BST using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"

* remotes/kevin/tags/for-upstream: (48 commits)
  iotests: Test qemu-img convert -S 0 behavior
  block/null-{co,aio}: Implement get_block_status()
  block/null-{co,aio}: Allow reading zeroes
  qemu-img: Fix preallocation with -S 0 for convert
  block: Remove bdrv_(set_)enable_write_cache()
  block: Remove BDRV_O_CACHE_WB
  block: Remove bdrv_parse_cache_flags()
  qemu-io: Use bdrv_parse_cache_mode() in reopen_f()
  block: Use bdrv_parse_cache_mode() in drive_init()
  raw: Support BDRV_REQ_FUA
  nbd: Support BDRV_REQ_FUA
  iscsi: Support BDRV_REQ_FUA
  block: Introduce bdrv_co_writev_flags()
  block/qapi: Use blk_enable_write_cache()
  block: Move enable_write_cache to BB level
  block: Handle flush error in bdrv_pwrite_sync()
  block: Always set writeback mode in blk_new_open()
  block: blockdev_init(): Call blk_set_enable_write_cache() explicitly
  xen_disk: Call blk_set_enable_write_cache() explicitly
  qemu-img: Call blk_set_enable_write_cache() explicitly
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-30 13:43:05 +01:00
Peter Maydell
8850dcbfd7 Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging
# gpg: Signature made Wed 30 Mar 2016 02:07:15 BST using RSA key ID 398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F  3562 EF04 965B 398D 6211

* remotes/jasowang/tags/net-pull-request:
  Revert "e1000: fix hang of win2k12 shutdown with flood ping"
  e1000: Fixing interrupts pace.
  tests/test-filter-redirector: Add unit test for filter-redirector
  net/filter-mirror: implement filter-redirector
  net/filter-mirror: Change filter_mirror_send interface
  tests/test-filter-mirror:add filter-mirror unit test
  net/filter-mirror:Add filter-mirror

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-30 12:30:38 +01:00
Max Reitz
f4e732a0a7 iotests: Test qemu-img convert -S 0 behavior
Passing -S 0 to qemu-img convert should result in all source data being
copied to the output, even if that source data is known to be 0. The
output image should therefore have exactly the same size on disk as an
image which we explicitly filled with data.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 12:16:04 +02:00
Max Reitz
a90639270d block/null-{co,aio}: Implement get_block_status()
Signed-off-by: Max Reitz <mreitz@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 12:16:04 +02:00
Max Reitz
cd219eb1e5 block/null-{co,aio}: Allow reading zeroes
This is optional so that it does not impede the null block driver's
performance unless this behavior is desired.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 12:16:03 +02:00
Max Reitz
aad15de427 qemu-img: Fix preallocation with -S 0 for convert
When passing -S 0 to qemu-img convert, the target image is supposed to
be fully allocated. Right now, this is not the case if the source image
contains areas which bdrv_get_block_status() reports as being zero.

This patch changes a zeroed area's status from BLK_ZERO to BLK_DATA
before invoking convert_write() if -S 0 has been specified. In addition,
the check whether convert_read() actually needs to do anything
(basically only if the current area is a BLK_DATA area) is pulled out of
that function to the caller.

If -S 0 has been specified, zeroed areas need to be written as data to
the output, thus they then have to be accounted when calculating the
progress made.

This patch changes the reference output for iotest 122; contrary to what
it assumed, -S 0 really should allocate everything in the output, not
just areas that are filled with zeros (as opposed to being zeroed).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 12:16:03 +02:00
Kevin Wolf
09cf9db1bc block: Remove bdrv_(set_)enable_write_cache()
The only remaining users were block jobs (mirror and backup) which
unconditionally enabled WCE on the BlockBackend of the target image. As
these block jobs don't go through BlockBackend for their I/O requests,
they aren't affected by this setting anyway but always get a writeback
mode, so that call can be removed.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:03 +02:00
Kevin Wolf
61de4c6808 block: Remove BDRV_O_CACHE_WB
The previous patches have successively made blk->enable_write_cache the
true source for the information whether a writethrough mode must be
implemented. The corresponding BDRV_O_CACHE_WB is only useless baggage
we're carrying around, so now's the time to remove it.

At the same time, we remove the 'cache.writeback' option parsing on the
BDS level as the only effect was setting the BDRV_O_CACHE_WB flag.

This change requires test cases that explicitly enabled the option to
drop it. Other than that and the change of the error message when
writethrough is enabled on the BDS level (from "Can't set writethrough
mode" to "doesn't support the option"), there should be no change in
behaviour.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:03 +02:00
Kevin Wolf
53e8ae0100 block: Remove bdrv_parse_cache_flags()
All users are converted to bdrv_parse_cache_mode() now.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:03 +02:00
Kevin Wolf
19dbecdcee qemu-io: Use bdrv_parse_cache_mode() in reopen_f()
We must forbid changing the WCE flag in bdrv_reopen() in the same patch,
as otherwise the behaviour would change so that the flag takes
precedence over the explicitly specified option.

The correct value of the WCE flag depends on the BlockBackend user (e.g.
guest device) and isn't a decision that the QMP client makes, so this
change is what we want.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:03 +02:00
Kevin Wolf
04feb4a507 block: Use bdrv_parse_cache_mode() in drive_init()
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:02 +02:00
Kevin Wolf
5481531154 raw: Support BDRV_REQ_FUA
Pass through the FUA flag to the lower layer so that the separate flush
can be saved in practically relevant cases where a (raw) format driver
sits on top of the protocol driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:02 +02:00
Kevin Wolf
2b556518c3 nbd: Support BDRV_REQ_FUA
The NBD server already used to send a FUA flag when the writethrough
mode was set. This code was a remnant from the times where protocol
drivers actually had to implement writethrough modes. Since nowadays the
block layer sends flushes in writethrough mode and non-root nodes are
always writeback, this was mostly dead code - only mostly because if NBD
was configured to be used without a format, we sent _both_ FUA and an
explicit flush afterwards, which makes the code not technically dead,
but useless overhead.

This patch changes the code so that the block layer's FUA flag is
recognised and translated into a NBD FUA flag. The additional flush is
avoided now.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:02 +02:00
Kevin Wolf
9f0eb9e129 iscsi: Support BDRV_REQ_FUA
This replaces the existing hack in the iscsi driver that sent the FUA
bit in writethrough mode and ignored the following flush in order to
optimise the number of roundtrips (see commit 73b5394e).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:02 +02:00
Kevin Wolf
93f5e6d88a block: Introduce bdrv_co_writev_flags()
This function will allow drivers to implement BDRV_REQ_FUA natively
instead of sending a separate flush after the write.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:02 +02:00
Kevin Wolf
c83f9fba2a block/qapi: Use blk_enable_write_cache()
Now that WCE is handled on the BlockBackend level, the flag is
meaningless for BDSes. As the schema requires us to fill the field,
we return an enabled write cache for them.

Note that this means that querying the BlockBackend name may return
writethrough as the cache information, whereas querying the node-name of
the root of that same BlockBackend will return writeback.

This may appear odd at first, but it actually makes sense because it
correctly repesents the layer that implements the WCE handling. This
becomes more apparent when you consider nodes that are the root node of
multiple BlockBackends, where each BB can have its own WCE setting.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:02 +02:00
Kevin Wolf
bfd18d1e0b block: Move enable_write_cache to BB level
Whether a write cache is used or not is a decision that concerns the
user (e.g. the guest device) rather than the backend. It was already
logically part of the BB level as bdrv_move_feature_fields() always kept
it on top of the BDS tree; with this patch, the core of it (the actual
flag and the additional flushes) is also implemented there.

Direct callers of bdrv_open() must pass BDRV_O_CACHE_WB now if bs
doesn't have a BlockBackend attached.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:02 +02:00
Kevin Wolf
855a6a93a1 block: Handle flush error in bdrv_pwrite_sync()
We don't want to silently ignore a flush error.

Also, there is little point in avoiding the flush for writethrough modes
and once WCE is moved to the BB layer, we definitely need the flush here
because bdrv_pwrite() won't involve one any more.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:01 +02:00
Kevin Wolf
72e775c7d9 block: Always set writeback mode in blk_new_open()
All callers of blk_new_open() either don't rely on the WCE bit set after
blk_new_open() because they explicitly set it anyway, or they pass
BDRV_O_CACHE_WB unconditionally.

This patch changes blk_new_open() so that it always enables writeback
mode and asserts that BDRV_O_CACHE_WB is clear. For those callers that
used to pass BDRV_O_CACHE_WB unconditionally, the flag is removed now.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:01 +02:00
Kevin Wolf
e4b24b497e block: blockdev_init(): Call blk_set_enable_write_cache() explicitly
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:01 +02:00
Kevin Wolf
ecdd3cc82d xen_disk: Call blk_set_enable_write_cache() explicitly
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:01 +02:00
Kevin Wolf
ce09954720 qemu-img: Call blk_set_enable_write_cache() explicitly
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:01 +02:00
Kevin Wolf
e699614341 qemu-img: Expand all BDRV_O_FLAGS uses
It always only set the BDRV_O_CACHE_WB flag, which is going to go away.
In order to make the next changes more local for better reviewability
this patches expands the macro.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:01 +02:00
Kevin Wolf
e151fc16dd qemu-io: Call blk_set_enable_write_cache() explicitly
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:01 +02:00
Kevin Wolf
6effd5bfc2 qemu-nbd: Call blk_set_enable_write_cache() explicitly
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:00 +02:00
Kevin Wolf
baf5602ed9 block: Add bdrv_parse_cache_mode()
It's like bdrv_parse_cache_flags(), except that writethrough mode isn't
included in the flags, but returned as a separate bool.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 12:16:00 +02:00
Pavel Dovgalyuk
63785678f3 replay: introduce block devices record/replay
This patch introduces block driver that implement recording
and replaying of block devices' operations.
All block completion operations are added to the queue.
Queue is flushed at checkpoints and information about processed requests
is recorded to the log. In replay phase the queue is matched with
events read from the log. Therefore block devices requests are processed
deterministically.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
[ kwolf: Rebased onto modified and already applied part of the series ]
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 12:15:57 +02:00
Pavel Dovgalyuk
95b4aed5fd replay: fix error message
This patch fixes error message in saving loop of the asynchronous events queue.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
[ kwolf: Fixed format string to use PRId64 instead of %d ]
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 12:12:15 +02:00
Pavel Dovgalyuk
58a0067aa8 replay: bh scheduling fix
This patch fixes scheduling of bottom halves when record/replay is enabled.
Now BH are not added to replay queue when asynchronous events are disabled.
This may happen in startup and loadvm/savevm phases of execution.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 12:12:15 +02:00
Pavel Dovgalyuk
c32b82afaf block: add flush callback
This patch adds callback for flush request. This callback is responsible
for flushing whole block devices stack. bdrv_flush function does not
proceed to underlying devices. It should be performed by this callback
function, if needed.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 12:12:15 +02:00
Daniel P. Berrange
6278ae035f block: an interoperability test for luks vs dm-crypt/cryptsetup
It is important that the QEMU luks implementation retains 100%
compatibility with the reference implementation provided by
the combination of the linux kernel dm-crypt module and cryptsetup
userspace tools.

There is a matrix of tests to be performed with different sets
of encryption settings. For each matrix entry, two tests will
be performed. One will create a LUKS image with the cryptsetup
tool and then do I/O with both cryptsetup & qemu-io. The other
will create the image with qemu-img and then again do I/O with
both cryptsetup and qemu-io.

The new I/O test 149 performs interoperability testing between
QEMU and the reference implementation. Such testing inherantly
requires elevated privileges, so to this this the user must have
configured passwordless sudo access. The test will automatically
skip if sudo is not available.

The test has to be run explicitly thus:

    cd tests/qemu-iotests
    ./check -luks 149

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 12:12:15 +02:00
Daniel P. Berrange
e6ff69bf5e block: move encryption deprecation warning into qcow code
For a couple of releases we have been warning

  Encrypted images are deprecated
  Support for them will be removed in a future release.
  You can use 'qemu-img convert' to convert your image to an unencrypted one.

This warning was issued by system emulators, qemu-img, qemu-nbd
and qemu-io. Such a broad warning was issued because the original
intention was to rip out all the code for dealing with encryption
inside the QEMU block layer APIs.

The new block encryption framework used for the LUKS driver does
not rely on the unloved block layer API for encryption keys,
instead using the QOM 'secret' object type. It is thus no longer
appropriate to warn about encryption unconditionally.

When the qcow/qcow2 drivers are converted to use the new encryption
framework too, it will be practical to keep AES-CBC support present
for use in qemu-img, qemu-io & qemu-nbd to allow for interoperability
with older QEMU versions and liberation of data from existing encrypted
qcow2 files.

This change moves the warning out of the generic block code and
into the qcow/qcow2 drivers. Further, the warning is set to only
appear when running the system emulators, since qemu-img, qemu-io,
qemu-nbd are expected to support qcow2 encryption long term now that
the maint burden has been eliminated.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 12:12:15 +02:00
Daniel P. Berrange
78368575a6 block: add generic full disk encryption driver
Add a block driver that is capable of supporting any full disk
encryption format. This utilizes the previously added block
encryption code, and at this time supports the LUKS format.

The driver code is capable of supporting any format supported
by the QCryptoBlock module, so it registers one block driver
for each format. This patch only registers the "luks" driver
since the "qcow" driver is there only for back-compatibility
with existing qcow built-in encryption.

New LUKS compatible volumes can be formatted using qemu-img
with defaults for all settings.

$ qemu-img create --object secret,data=123456,id=sec0 \
      -f luks -o key-secret=sec0 demo.luks 10G

Alternatively the cryptographic settings can be explicitly
set

$ qemu-img create --object secret,data=123456,id=sec0 \
      -f luks -o key-secret=sec0,cipher-alg=aes-256,\
                 cipher-mode=cbc,ivgen-alg=plain64,hash-alg=sha256 \
      demo.luks 10G

And query its size

$ qemu-img info demo.img
image: demo.img
file format: luks
virtual size: 10G (10737418240 bytes)
disk size: 132K
encrypted: yes

Note that it was not necessary to provide the password
when querying info for the volume. The password is only
required when performing I/O on the volume

All volumes created by this new 'luks' driver should be
capable of being opened by the kernel dm-crypt driver.

The only algorithms listed in the LUKS spec that are
not currently supported by this impl are sha512 and
ripemd160 hashes and cast6 cipher.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
[ kwolf - Added #include to resolve conflict with da34e65c ]
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 12:11:26 +02:00
Daniel P. Berrange
a2d1c8fd84 tests: add output filter to python I/O tests helper
Add a 'log' method to iotests.py which prints messages to
stdout, with optional filtering of data. Port over some
standard filters already present in the shell common.filter
code to be usable in python too.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 11:59:32 +02:00
Daniel P. Berrange
c6a92369dc tests: refactor python I/O tests helper main method
The iotests.py helper provides a main() method for running
tests via the python unit test framework. Not all tests
will want to use this, so refactor it to split the testing
of compatible formats and platforms into separate helper
methods

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 11:59:32 +02:00
Daniel P. Berrange
491e5e85ef tests: redirect stderr to stdout for iotests
The python I/O tests helper for running qemu-img/qemu-io
setup stdout to be captured to a pipe, but left stderr
untouched. As a result, if something failed in qemu-img/
qemu-io, data written to stderr would get output directly
and not line up with data on the test stdout due to
buffering.  If we explicitly redirect stderr to the same
pipe as stdout, things are much clearer when they go
wrong.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 11:59:32 +02:00
Daniel P. Berrange
4ef130fca8 qemu-img/qemu-io: don't prompt for passwords if not required
The qemu-img/qemu-io tools prompt for disk encryption passwords
regardless of whether any are actually required. Adding a check
on bdrv_key_required() avoids this prompt for disk formats which
have been converted to the QCryptoSecret APIs.

This is just a temporary hack to ensure the block I/O tests
continue to work after each patch, since the last patch will
completely delete all the password prompting code.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 11:59:32 +02:00
Daniel P. Berrange
abb06c5ac1 block: add flag to indicate that no I/O will be performed
When opening an image it is useful to know whether the caller
intends to perform I/O on the image or not. In the case of
encrypted images this will allow the block driver to avoid
having to prompt for decryption keys when we merely want to
query header metadata about the image. eg qemu-img info

This flag is enforced at the top level only, since even if
we don't want todo I/O on the 'qcow2' file payload, the
underlying 'file' driver will still need todo I/O to read
the qcow2 header, for example.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 11:59:32 +02:00
Max Reitz
5430215699 block/qapi: Pass bdrv_query_blk_stats() s->stats
bdrv_query_blk_stats() does not need access to all of BlockStats,
BlockDeviceStats is enough and is what this function is actually
supposed to fill.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 11:59:32 +02:00
Max Reitz
0e8f44bee9 block/qapi: Set s->device in bdrv_query_stats()
This is the only instance of bdrv_query_blk_stats() accessing anything
in the BlockStats structure other than s->stats, so let us move it to
its caller (where it makes just as much sense) allowing us to make
bdrv_query_blk_stats() take a pointer to the BlockDeviceStats instead of
BlockStats.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 11:59:32 +02:00
Peter Xu
5eda622768 block/qapi: fix unbounded stack for dump_qdict
Using heap instead of stack for better safety.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 11:59:32 +02:00
Peter Xu
853ccfed8f block/qapi: make two printf() formats literal
Fix two places to use literal printf format when possible.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 11:59:32 +02:00
Kevin Wolf
72f41b6fbd block: Remove blk_set_bs()
The function is unused since commit f21d96d0 ('block: Use BdrvChild in
BlockBackend').

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-03-30 11:59:32 +02:00
Programmingkid
d0855f1235 block/raw-posix.c: Make physical devices usable in QEMU under Mac OS X host
Mac OS X can be picky when it comes to allowing the user
to use physical devices in QEMU. Most mounted volumes
appear to be off limits to QEMU. If an issue is detected,
a message is displayed showing the user how to unmount a
volume. Now QEMU uses both CD and DVD media.

Signed-off-by: John Arbuckle <programmingkidx@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 11:59:32 +02:00
Kevin Wolf
73ac451f34 block: Reject writethrough mode except at the root
Writethrough mode is going to become a BlockBackend feature rather than
a BDS one, so forbid it in places where we won't be able to support it
when the code finally matches the envisioned design.

We only allowed setting the cache mode of non-root nodes after the 2.5
release, so we're still free to make this change.

The target of block jobs is now always opened in a writeback mode
because it doesn't have a BlockBackend attached. This makes more sense
anyway because block jobs know when to flush. If the graph is modified
on job completion, the original cache mode moves to the new root, so
for the guest device writethough always stays enabled if it was
configured this way.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-03-30 11:59:32 +02:00
Kevin Wolf
b8816a4386 block: Make backing files always writeback
First of all, we're generally not writing to backing files, but when we
do, it's in the context of block jobs which know very well when to flush
the image.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-03-30 11:59:32 +02:00
Kevin Wolf
aaa436f998 block: Remove cache.writeback from blockdev-add
The WCE bit is a frontend property and should not be part of the backend
configuration. This is especially important because the same BDS can be
used by different users with different WCE requirements.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-30 11:59:32 +02:00
Kevin Wolf
7a827aaec8 block: Remove dirty bitmaps from bdrv_move_feature_fields()
This patch changes dirty bitmaps from following a BlockBackend in graph
changes to sticking with the node they were created at. For the full
discussion, read the following mailing list thread:

  [Qemu-block] block: Dirty bitmaps and COR in bdrv_move_feature_fields()
  https://lists.nongnu.org/archive/html/qemu-block/2016-02/msg00745.html

In summary, the justification for this change is:

* When moving the dirty bitmap to the top of the tree was introduced in
  bdrv_append() in commit a9fc4408, it didn't actually have any effect
  because there could never be a bitmap in use when bdrv_append() was
  called (op blockers would prevent this). This is still true today for
  all internal uses of dirty bitmaps.

* Support for user-defined dirty bitmaps was introduced in 2.4, but we
  discouraged users from using it because we didn't consider it ready
  yet.

  Moreover, in 2.5, the bdrv_swap() removal introduced a bug that left
  dangling pointers if a dirty bitmap was present (the anchors of the
  dirty bitmap were swapped, but the back link in the first element
  wasn't updated), so it didn't even work correctly.

* block-dirty-bitmap-add takes an arbitrary node name, even if no
  BlockBackend is attached. This suggests that it is a node level
  operation and not a BlockBackend one. Consequently, there is no reason
  for dirty bitmaps to stay with a BlockBackend that was attached to the
  node they were created for.

* It was suggested that block-dirty-bitmap-add could track the node if a
  node name was specified, and track the BlockBackend if the device name
  was specified. This would however be inconsistent with other QMP
  commands. Commands that accept both device and node names currently
  interpret the device name just as an alias for the current root node
  of that BlockBackend.

* Dirty bitmaps have a name that is only unique amongst the bitmaps in a
  specific node. Moving bitmaps could lead to name clashes. Automatic
  renaming would involve too much magic.

* Persistent bitmaps are stored in a specific node. Moving them around
  automatically might be at least surprising, but it would probably also
  become a real problem because that would have to happen atomically
  without the management tool knowing of the operation.

At the end of the day it seems to be very clear that it was a mistake to
include dirty bitmaps in bdrv_move_feature_fields(). The functionality
of moving bitmaps and/or attaching them to a BlockBackend instead will
probably be needed, but it should be done with a new explicit QMP
command or option.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-03-30 11:59:32 +02:00
Kevin Wolf
4c8449832c block: Remove copy-on-read from bdrv_move_feature_fields()
Ever since we first introduced bdrv_append() in commit 8802d1fd ('qapi:
Introduce blockdev-group-snapshot-sync command'), the copy-on-read flag
was moved to the new top layer when taking a snapshot. The only problem
is that it doesn't make a whole lot of sense.

The use case for manually enabled CoR is to avoid reading data twice
from a slow remote image, so we want to save it to a local overlay, say
an ISO image accessed via HTTP to a local qcow2 overlay. When taking a
snapshot, we end up with a backing chain like this:

    http <- local.qcow2 <- snap_overlay.qcow2

There is no point in doing CoR from local.qcow2 into snap_overlay.qcow2,
we just want to keep copying data from the remote source into
local.qcow2.

The other use case of CoR is in the context of streaming, which isn't
very interesting for bdrv_move_feature_fields() because op blockers
prevent this combination.

This patch makes the copy-on-read flag stay on the image for which it
was originally set and prevents it from being propagated to the new
overlay. It is no longer intended to move CoR to the BlockBackend level.
In order for this to make sense, we also need to keep the respective
image read-write.

As a side effect of these changes, creating a live snapshot image (as
opposed to using an existing externally created one) on top of a COR
block device works now. It used to fail because it tried to open its
backing file both read-only and with COR.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-03-30 11:59:32 +02:00
Kevin Wolf
63eaaae08c block: Remove bdrv_make_anon()
The call in hmp_drive_del() is dead code because blk_remove_bs() is
called a few lines above. The only other remaining user is
bdrv_delete(), which only abuses bdrv_make_anon() to remove it from the
named nodes list. This path inlines the list entry removal into
bdrv_delete() and removes bdrv_make_anon().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-30 11:59:32 +02:00
Yongbok Kim
f6d4dd8109 target-mips: add MAAR, MAARI register
The MAAR register is a read/write register included in Release 5
of the architecture that defines the accessibility attributes of
physical address regions. In particular, MAAR defines whether an
instruction fetch or data load can speculatively access a memory
region within the physical address bounds specified by MAAR.

As QEMU doesn't do speculative access, hence this patch only
provides ability to access the registers.

Signed-off-by: Yongbok Kim <yongbok.kim@imgtec.com>
Reviewed-by: Leon Alrae <leon.alrae@imgtec.com>
Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
2016-03-30 09:14:00 +01:00
Yongbok Kim
c98d3d79ee target-mips: use CP0_CHECK for gen_m{f|t}hc0
Reuse CP0_CHECK macro for gen_m{f|t}hc0.

Signed-off-by: Yongbok Kim <yongbok.kim@imgtec.com>
Reviewed-by: Leon Alrae <leon.alrae@imgtec.com>
Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
2016-03-30 09:14:00 +01:00
Leon Alrae
408294352a hw/mips/cps: enable ITU for multithreading processors
Make ITU available in the system if CPU supports multithreading
and is part of CPS.

Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
2016-03-30 09:14:00 +01:00
Leon Alrae
0d74a222c2 target-mips: make ITC Configuration Tags accessible to the CPU
Add CP0.ErrCtl register with WST, SPR and ITC bits. In 34K and interAptiv
processors these bits are used to enable CACHE instruction access to
different arrays. When WST=0, SPR=0 and ITC=1 the CACHE instruction will
access ITC tag values.

Generally we do not model caches and we have been treating the CACHE
instruction as NOP. But since CACHE can operate on ITC Tags new
MIPS_HFLAG_ITC_CACHE hflag is introduced to generate the helper only when
CACHE is in the ITC Access mode.

Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
2016-03-30 09:14:00 +01:00
Leon Alrae
40d48212f9 target-mips: check CP0 enabled for CACHE instruction also in R6
Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
2016-03-30 09:14:00 +01:00
Leon Alrae
25a611e3e4 hw/mips: implement ITC Storage - Bypass View
Bypass View does not cause issuing thread to block and does not affect
any of the cells state bit.

Read from a FIFO cell returns the value of the oldest entry.
Store to a FIFO cell changes the value of the newest entry.

Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
2016-03-30 09:14:00 +01:00
Leon Alrae
40dc9dc339 hw/mips: implement ITC Storage - P/V Sync and Try Views
P/V Synchronized and Try Views can be used to access Semaphore cells.
Load returns current value and post-decrements the value in the cell
(until it reaches zero). Stores increment the value (until it saturates
at 0xFFFF).

P/V Synchronized View causes the issuing thread to block on read if value
is 0. P/V Try View does not block the thread, it returns 0 in this case.

Cell's Empty and Full bits are not modified.

Trap bit (i.e. Gating Storage exceptions) not implemented.

Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
2016-03-30 09:14:00 +01:00
Leon Alrae
4051089d61 hw/mips: implement ITC Storage - Empty/Full Sync and Try Views
Empty/Full Synchronized and Try views can be used to access FIFO cells.
Store to the FIFO cell pushes the value into the queue, load pops the oldest
element from the queue. Cell's Full and Empty bits are automatically updated
to reflect new state of the cell.

Empty/Full Synchronized View causes the issuing thread to block when FIFO is
empty while thread is performing a read, or FIFO is full while thread is
performing a write.

Empty/Full Try View never blocks the thread. If cell is full then write is
ignored, if cell is empty then load returns 0.

Trap bit (i.e. Gating Storage exceptions) not implemented.
Store Conditional support for E/F Try View (i.e. indicate failure if FIFO
is full) not implemented.

Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
2016-03-30 09:14:00 +01:00
Leon Alrae
5924c869c0 hw/mips: implement ITC Storage - Control View
Control view is used to access the ITC Storage Cell Tags. It never causes
the issuing thread to block.

Guest can empty the FIFO cell by setting Empty bit to 1.

Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
2016-03-30 09:14:00 +01:00
Leon Alrae
34fa7e83e1 hw/mips: implement ITC Configuration Tags and Storage Cells
Implement ITC as a single object consisting of two memory regions:

1) tag_io: ITC Configuration Tags (i.e. ITCAddressMap{0,1} registers) which
are accessible by the CPU via CACHE instruction. Also adding
MemoryRegion *itc_tag to the CPUMIPSState so that CACHE instruction will
dispatch reads/writes directly.

2) storage_io: memory-mapped ITC Storage whose address space is configurable
(i.e. enabled/remapped/resized) by writing to ITCAddressMap{0,1} registers.

ITC Storage contains FIFO and Semaphore cells. Read-only FIFO bit in the
ITC cell tag indicates the type of the cell. If the ITC Storage contains
both types of cells then FIFOs are located before Semaphores.

Since issuing thread can get blocked on the access to a cell (in E/F
Synchronized and P/V Synchronized Views) each cell has a bitmap to track
which threads are currently blocked.

Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
2016-03-30 09:14:00 +01:00
Leon Alrae
a9a9506171 target-mips: enable CM GCR in MIPS64R6-generic CPU
Indicate that in the MIPS64R6-generic CPU the memory-mapped
Global Configuration Register Space is implemented.

Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
2016-03-30 09:13:59 +01:00
Leon Alrae
bff384a4fb hw/mips_malta: add CPS to Malta board
If the user specifies smp > 1 and the CPU with CM GCR support, then
create Coherent Processing System (which takes care of instantiating CPUs)
rather than CPUs directly and connect i8259 and cbus to the pins exposed by
CPS. However, there is no GIC yet, thus CPS exposes CPU's IRQ pins so use
the same pin numbers as before.

Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
2016-03-30 09:13:59 +01:00
Leon Alrae
67a5496184 hw/mips_malta: move CPU creation to a separate function
Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
2016-03-30 09:13:59 +01:00
Leon Alrae
dc520a7dee hw/mips_malta: remove redundant irq and clock init
Global smp_cpus is never zero (even if user provides -smp 0), thus clocks
and irqs are always initialized for each created CPU in the loop at the
beginning of mips_malta_init.

These two lines cause a leak of already allocated timer and irqs for the
first CPU - remove them.

Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
2016-03-30 09:13:59 +01:00
Leon Alrae
cc518af0b2 hw/mips_malta: remove CPUMIPSState from the write_bootloader()
Remove CPUMIPSState from the write_bootloader() argument list as it
is not used in the function.

Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
2016-03-30 09:13:59 +01:00
Leon Alrae
2edd5261ff hw/mips/cps: create CPC block inside CPS
Create Cluster Power Controller and add a link to the CPC MemoryRegion
in GCR. Guest can enable / map CPC to any physical address by writing to
the memory-mapped GCR_CPC_BASE register.

Set vp-start-reset property to 1 to allow only first VP to run from reset.
Others are brought up by the guest via CPC memory-mapped registers.

Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
2016-03-30 09:13:59 +01:00
Leon Alrae
1f93a6e4f3 hw/mips: add initial Cluster Power Controller support
Cluster Power Controller (CPC) is responsible for power management in
multiprocessing system. It provides registers to control the power and the
clock frequency of the individual elements in the system.

This patch implements only three registers that are used to control the
power state of each VP on a single core:
* VP Run is a write-only register used to set each VP to the run state
* VP Stop is a write-only register used to set each VP to the suspend state
* VP Running is a read-only register indicating the run state of each VP

Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
2016-03-30 09:13:59 +01:00
Leon Alrae
a9bd9b5a86 hw/mips/cps: create GCR block inside CPS
Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
2016-03-30 09:13:59 +01:00
Yongbok Kim
3994215db4 hw/mips: add initial Global Config Register support
Add initial GCR support to indicate number of VPs present in the system,
L2 bypass mode and revision number.

Signed-off-by: Yongbok Kim <yongbok.kim@imgtec.com>
[leon.alrae@imgtec.com:
 * removed GIC part,
 * changed commit message,
 * replaced %lx format spec. with PRIx64,
 * renamed mips_gcr.{c,h} to mips_cmgcr.{c,h},
 * replaced CONFIG_MIPS_GIC with CONFIG_MIPS_CPS]
Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
2016-03-30 09:13:59 +01:00
Yongbok Kim
c870e3f52c target-mips: add CMGCRBase register
Physical base address for the memory-mapped Coherency Manager Global
Configuration Register space.
The MIPS default location for the GCR_BASE address is 0x1FBF_8.
This register only exists if Config3 CMGCR is set to one.

Signed-off-by: Yongbok Kim <yongbok.kim@imgtec.com>
[leon.alrae@imgtec.com: move CMGCR enabling to a separate patch]
Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
2016-03-30 09:13:59 +01:00
Leon Alrae
8e7e8a5b7b hw/mips: implement generic MIPS Coherent Processing System container
Implement generic MIPS Coherent Processing System (CPS) which in this
commit just creates VPs, but it will serve as a container also for
other components like Global Configuration Registers and Cluster Power
Controller.

Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
2016-03-30 09:13:58 +01:00
Sameeh Jubran
8e0f7dd251 Revert "e1000: fix hang of win2k12 shutdown with flood ping"
This reverts commit 9596ef7c7b.

This workaround in order to fix endless interrupts is no
longer needed because it was superseded by the previous patch
(e1000: Fixing interrupt pace).

Signed-off-by: Sameeh Jubran <sameeh@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-03-30 08:57:42 +08:00
Sameeh Jubran
74004e8ce4 e1000: Fixing interrupts pace.
This patch introduces an upper bound for number of interrupts
per second. Without this bound an interrupt storm can occur as
it has been observed on Windows 10 when disabling the device.

According to the SPEC - Intel PCI/PCI-X Family of Gigabit
Ethernet Controllers Software Developer's Manual, section
13.4.18 - the Ethernet controller guarantees a maximum
observable interrupt rate of 7813 interrupts/sec. If there is
no upper bound this could lead to an interrupt storm by e1000
(when mit_delay < 500) causing interrupts to fire at a very high
pace.
Thus if mit_delay < 500 then the delay should be set to the
minimum delay possible which is 500. This can be calculated
easily as follows:

Interval = 10^9 / (7813 * 256) = 500.

Signed-off-by: Sameeh Jubran <sameeh@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-03-30 08:57:36 +08:00
Zhang Chen
9fd3c5d556 tests/test-filter-redirector: Add unit test for filter-redirector
In this unit test,we will test the filter redirector function.

Case 1, tx traffic flow:

qemu side              | test side
                       |
+---------+            |  +-------+
| backend <---------------+ sock0 |
+----+----+            |  +-------+
     |                 |
+----v----+  +-------+ |
|  rd0    +->+chardev| |
+---------+  +---+---+ |
                 |     |
+---------+      |     |
|  rd1    <------+     |
+----+----+            |
     |                 |
+----v----+            |  +-------+
|  rd2    +--------------->sock1  |
+---------+            |  +-------+
                       +

a. we(sock0) inject packet to qemu socket backend
b. backend pass packet to filter redirector0(rd0)
c. rd0 redirect packet to out_dev(chardev) which is connected with
filter redirector1's(rd1) in_dev
d. rd1 read this packet from in_dev, and pass to next filter redirector2(rd2)
e. rd2 redirect packet to rd2's out_dev which is connected with an opened socketed(sock1)
f. we read packet from sock1 and compare to what we inject

Start qemu with:

"-netdev socket,id=qtest-bn0,fd=%d "
"-device rtl8139,netdev=qtest-bn0,id=qtest-e0 "
"-chardev socket,id=redirector0,path=%s,server,nowait "
"-chardev socket,id=redirector1,path=%s,server,nowait "
"-chardev socket,id=redirector2,path=%s,nowait "
"-object filter-redirector,id=qtest-f0,netdev=qtest-bn0,"
"queue=tx,outdev=redirector0 "
"-object filter-redirector,id=qtest-f1,netdev=qtest-bn0,"
"queue=tx,indev=redirector2 "
"-object filter-redirector,id=qtest-f2,netdev=qtest-bn0,"
"queue=tx,outdev=redirector1 "

--------------------------------------
Case 2, rx traffic flow
qemu side              | test side
                       |
+---------+            |  +-------+
| backend +---------------> sock1 |
+----^----+            |  +-------+
     |                 |
+----+----+  +-------+ |
|  rd0    +<-+chardev| |
+---------+  +---+---+ |
                 ^     |
+---------+      |     |
|  rd1    +------+     |
+----^----+            |
     |                 |
+----+----+            |  +-------+
|  rd2    <---------------+sock0  |
+---------+            |  +-------+

a. we(sock0) insert packet to filter redirector2(rd2)
b. rd2 pass packet to filter redirector1(rd1)
c. rd1 redirect packet to out_dev(chardev) which is connected with
   filter redirector0's(rd0) in_dev
d. rd0 read this packet from in_dev, and pass ti to qemu backend which is
   connected with an opened socketed(sock1)
e. we read packet from sock1 and compare to what we inject

Start qemu with:

"-netdev socket,id=qtest-bn0,fd=%d "
"-device rtl8139,netdev=qtest-bn0,id=qtest-e0 "
"-chardev socket,id=redirector0,path=%s,server,nowait "
"-chardev socket,id=redirector1,path=%s,server,nowait "
"-chardev socket,id=redirector2,path=%s,nowait "
"-object filter-redirector,id=qtest-f0,netdev=qtest-bn0,"
"queue=rx,outdev=redirector0 "
"-object filter-redirector,id=qtest-f1,netdev=qtest-bn0,"
"queue=rx,indev=redirector2 "
"-object filter-redirector,id=qtest-f2,netdev=qtest-bn0,"
"queue=rx,outdev=redirector1 "

Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-03-30 08:57:33 +08:00
Zhang Chen
d46f75b2e9 net/filter-mirror: implement filter-redirector
Filter-redirector is a netfilter plugin.
It gives qemu the ability to redirect net packet.
redirector can redirect filter's net packet to outdev.
and redirect indev's packet to filter.

                      filter
                        +
            redirector  |
               +--------------+
               |        |     |
  indev +-----------+   +---------->  outdev
               |    |         |
               +--------------+
                    |
                    v
                  filter

usage:

-netdev user,id=hn0
-chardev socket,id=s0,host=ip_primary,port=X,server,nowait
-chardev socket,id=s1,host=ip_primary,port=Y,server,nowait
-filter-redirector,id=r0,netdev=hn0,queue=tx/rx/all,indev=s0,outdev=s1

Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-03-30 08:57:28 +08:00
Zhang Chen
ba8940dd86 net/filter-mirror: Change filter_mirror_send interface
Change filter_mirror_send interface to make it easier
to used by other filter

Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-03-30 08:57:23 +08:00
Zhang Chen
06809ecf73 tests/test-filter-mirror:add filter-mirror unit test
In this unit test we will test the mirror function.

start qemu with:
      -netdev socket,id=qtest-bn0,fd=sockfd
      -device e1000,netdev=qtest-bn0,id=qtest-e0
      -chardev socket,id=mirror0,path=/tmp/filter-mirror-test.sock,server,nowait
      -object filter-mirror,id=qtest-f0,netdev=qtest-bn0,queue=tx,outdev=mirror0

We inject packet to netdev socket id = qtest-bn0,
filter-mirror will copy and mirror the packet to mirror0.
we read packet from mirror0 and then compare to what we injected.

Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-03-30 08:57:16 +08:00
Zhang Chen
f6d3afb51f net/filter-mirror:Add filter-mirror
Filter-mirror is a netfilter plugin.
It gives qemu the ability to mirror
packets to a chardev.

usage:

-netdev tap,id=hn0
-chardev socket,id=mirror0,host=ip_primary,port=X,server,nowait
-filter-mirror,id=m0,netdev=hn0,queue=tx/rx/all,outdev=mirror0

Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Reviewed-by: Yang Hongyang <hongyang.yang@easystack.cn>
Reviewed-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-03-30 08:54:29 +08:00
Peter Maydell
553934db66 Merge remote-tracking branch 'remotes/cody/tags/block-pull-request' into staging
# gpg: Signature made Tue 29 Mar 2016 01:48:09 BST using RSA key ID C0DE3057
# gpg: Good signature from "Jeffrey Cody <jcody@redhat.com>"
# gpg:                 aka "Jeffrey Cody <jeff@codyprime.org>"
# gpg:                 aka "Jeffrey Cody <codyprime@gmail.com>"

* remotes/cody/tags/block-pull-request:
  qemu-iotests: add no-op streaming test
  qemu-iotests: fix test_stream_partial()
  block: never cancel a streaming job without running stream_complete()

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-29 19:54:49 +01:00
Peter Maydell
5b8e6b4cc2 Merge remote-tracking branch 'remotes/thibault/tags/samuel-thibault' into staging
slirp updates

# gpg: Signature made Tue 29 Mar 2016 00:16:05 BST using RSA key ID FB6B2F1D
# gpg: Good signature from "Samuel Thibault <samuel.thibault@gnu.org>"
# gpg:                 aka "Samuel Thibault <sthibault@debian.org>"
# gpg:                 aka "Samuel Thibault <samuel.thibault@inria.fr>"
# gpg:                 aka "Samuel Thibault <samuel.thibault@labri.fr>"
# gpg:                 aka "Samuel Thibault <samuel.thibault@ens-lyon.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 900C B024 B679 31D4 0F82  304B D017 8C76 7D06 9EE6
#      Subkey fingerprint: F632 74CD C630 0873 CB3D  29D9 E3E5 1CE8 FB6B 2F1D

* remotes/thibault/tags/samuel-thibault:
  Rework ipv6 options
  Use C99 flexible array instead of 1-byte trailing array
  Avoid embedding struct mbuf in other structures
  slirp: send icmp6 errors when UDP send failed
  slirp: Fix memory leak on small incoming ipv4 packet

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-29 18:25:27 +01:00
Peter Maydell
7cd592bc65 Merge remote-tracking branch 'remotes/awilliam/tags/vfio-update-20160328.0' into staging
VFIO updates 2016-03-28

 - Use 128bit math to avoid asserts with IOMMU regions (Bandan Das)

# gpg: Signature made Mon 28 Mar 2016 23:16:52 BST using RSA key ID 3BB08B22
# gpg: Good signature from "Alex Williamson <alex.williamson@redhat.com>"
# gpg:                 aka "Alex Williamson <alex@shazbot.org>"
# gpg:                 aka "Alex Williamson <alwillia@redhat.com>"
# gpg:                 aka "Alex Williamson <alex.l.williamson@gmail.com>"

* remotes/awilliam/tags/vfio-update-20160328.0:
  vfio: convert to 128 bit arithmetic calculations when adding mem regions

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-29 17:39:41 +01:00
Samuel Thibault
d8eb386495 Rework ipv6 options
Rename the recently-added ip6-foo options into ipv6-foo options, to make
them coherent with other ipv6 options.

Also rework the documentation.

Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-03-29 01:15:43 +02:00
Peter Maydell
1c3c8e9547 Use C99 flexible array instead of 1-byte trailing array
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
2016-03-29 01:15:02 +02:00
Bandan Das
55efcc537d vfio: convert to 128 bit arithmetic calculations when adding mem regions
vfio_listener_region_add for a iommu mr results in
an overflow assert since iommu memory region is initialized
with UINT64_MAX. Convert calculations to 128 bit arithmetic
for iommu memory regions and let int128_get64 assert for non iommu
regions if there's an overflow.

Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bandan Das <bsd@redhat.com>
[missed (end - 1) on 2nd trace call, move llsize closer to use]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-03-28 13:27:49 -06:00
Alberto Garcia
409d54986d qemu-iotests: add no-op streaming test
This patch tests that in a partial block-stream operation, no data is
ever copied from the base image.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 5272a2aa57bc0b3f981f8b3e0c813e58a88c974b.1458566441.git.berto@igalia.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-03-28 13:56:44 -04:00
Alberto Garcia
5e302a7de6 qemu-iotests: fix test_stream_partial()
This test is streaming to the top layer using the intermediate image
as the base. This is a mistake since block-stream never copies data
from the base image and its backing chain, so this is effectively a
no-op.

In addition to fixing the base parameter, this patch also writes some
data to the intermediate image before the test, so there's something
to copy and the test is meaningful.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 2efa304da38b32d47c120ce728568a589c5a3afc.1458566441.git.berto@igalia.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-03-28 13:56:44 -04:00
Alberto Garcia
6578629e08 block: never cancel a streaming job without running stream_complete()
We need to call stream_complete() in order to do all the necessary
clean-ups, even if there's an early failure. At the moment it's only
useful to make sure that s->backing_file_str is not leaked, but it
will become more important if we introduce support for streaming to
any intermediate node.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 2abedf2debc65c250560237f31a8e6756883c8fc.1458566441.git.berto@igalia.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-03-28 13:56:44 -04:00
Peter Maydell
84a5a80148 Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
* Log filtering from Alex and Peter
* Chardev fix from Marc-André
* config.status tweak from David
* Header file tweaks from Markus, myself and Veronia (Outreachy candidate)
* get_ticks_per_sec() removal from Rutuja (Outreachy candidate)
* Coverity fix from myself
* PKE implementation from myself, based on rth's XSAVE support

# gpg: Signature made Thu 24 Mar 2016 20:15:11 GMT using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"

* remotes/bonzini/tags/for-upstream: (28 commits)
  target-i386: implement PKE for TCG
  config.status: Pass extra parameters
  char: translate from QIOChannel error to errno
  exec: fix error handling in file_ram_alloc
  cputlb: modernise the debug support
  qemu-log: support simple pid substitution for logs
  target-arm: dfilter support for in_asm
  qemu-log: dfilter-ise exec, out_asm, op and opt_op
  qemu-log: new option -dfilter to limit output
  qemu-log: Improve the "exec" TB execution logging
  qemu-log: Avoid function call for disabled qemu_log_mask logging
  qemu-log: correct help text for -d cpu
  tcg: pass down TranslationBlock to tcg_code_gen
  util: move declarations out of qemu-common.h
  Replaced get_tick_per_sec() by NANOSECONDS_PER_SECOND
  hw: explicitly include qemu-common.h and cpu.h
  include/crypto: Include qapi-types.h or qemu/bswap.h instead of qemu-common.h
  isa: Move DMA_transfer_handler from qemu-common.h to hw/isa/isa.h
  Move ParallelIOArg from qemu-common.h to sysemu/char.h
  Move QEMU_ALIGN_*() from qemu-common.h to qemu/osdep.h
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Conflicts:
	scripts/clean-includes
2016-03-24 21:42:40 +00:00
Peter Maydell
b68a80139e Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20160324' into staging
Support for booting from virtio-scsi devices in the s390-ccw bios.

# gpg: Signature made Thu 24 Mar 2016 08:14:21 GMT using RSA key ID C6F02FAF
# gpg: Good signature from "Cornelia Huck <huckc@linux.vnet.ibm.com>"
# gpg:                 aka "Cornelia Huck <cornelia.huck@de.ibm.com>"

* remotes/cohuck/tags/s390x-20160324:
  s390-ccw.img: rebuild image
  pc-bios/s390-ccw: disambiguation of "No zIPL magic" message
  pc-bios/s390-ccw: enhance bootmap detection
  pc-bios/s390-ccw: enable virtio-scsi
  pc-bios/s390-ccw: add virtio-scsi implementation
  pc-bios/s390-ccw: add scsi definitions
  pc-bios/s390-ccw: add simplified virtio call
  pc-bios/s390-ccw: make provisions for different backends
  pc-bios/s390-ccw: add vdev object to store all device details
  pc-bios/s390-ccw: update virtio implementation to allow up to 3 vrings
  pc-bios/s390-ccw: qemuize types
  pc-bios/s390-ccw: add utility functions and "export" some others
  pc-bios/s390-ccw: virtio_panic -> panic
  pc-bios/s390-ccw: add more disk layout checks

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-24 16:24:02 +00:00
Peter Maydell
f18f2e7cfc Merge remote-tracking branch 'remotes/kraxel/tags/pull-ui-20160324-1' into staging
input-linux + spice fixes

# gpg: Signature made Thu 24 Mar 2016 07:54:45 GMT using RSA key ID D3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"

* remotes/kraxel/tags/pull-ui-20160324-1:
  spice: Disallow use of gl + TCP port
  input-linux: fix Coverity warning
  input-linux: switch over to -object

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-24 16:00:14 +00:00
Peter Maydell
490dda053e Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.6-20160324' into staging
ppc patch queue for 2016-03-24

Accumulated patches for target-ppc, pseries machine type and related
devices.

* Preliminary patches from BenH & Cédric Le Goater's powernv code
    * We don't want the full machine type before 2.7
    * Adding some of the SPRs also fixes migration corner cases for
      spapr (when qemu has no knowledge of the registers, they're
      obviously not migrated)
    * We include some patches that aren't strictly fixes, but make
      applying the others easier, and they're low risk
* Fix to buffer management which significantly improves throughput in
  the spapr-llan virtual network device
* Start with 64-bit mode enabled on spapr.  This is the way it's
  supposed to be but we broke it a while back and didn't notice
  because Linux guests cope anyway.
    * Picked up by kvm-unit-tests
    * Still some bugs here that I'm working on

# gpg: Signature made Thu 24 Mar 2016 04:29:42 GMT using RSA key ID 20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-2.6-20160324:
  ppc: move POWER8 Book4 regs in their own routine
  hw/net/spapr_llan: Enable the RX buffer pools by default for new machines
  hw/net/spapr_llan: Fix receive buffer handling for better performance
  hw/net/spapr_llan: Extract rx buffer code into separate functions
  ppc: A couple more dummy POWER8 Book4 regs
  ppc: Add dummy CIABR SPR
  ppc: Add POWER8 IAMR register
  ppc: Fix writing to AMR/UAMOR
  ppc: Initialize AMOR in PAPR mode
  ppc: Add dummy SPR_IC for POWER8
  ppc: Create cpu_ppc_set_papr() helper
  ppc: Add a bunch of hypervisor SPRs to Book3s
  ppc: Add macros to register hypervisor mode SPRs
  ppc: Update SPR definitions
  spapr/target-ppc/kvm: Only add hcall-instructions if KVM supports it
  ppc64: set MSR_SF bit

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-24 15:22:39 +00:00
Peter Maydell
1080534481 Merge remote-tracking branch 'remotes/lalrae/tags/mips-20160323' into staging
MIPS patches 2016-03-23

Changes:
* add mips-softmmu-common.mak
* indicate presence of IEEE 754-2008 FPU in MIPS64R6-generic and P5600

# gpg: Signature made Wed 23 Mar 2016 16:38:04 GMT using RSA key ID 0B29DA6B
# gpg: Good signature from "Leon Alrae <leon.alrae@imgtec.com>"

* remotes/lalrae/tags/mips-20160323:
  default-configs: add mips-softmmu-common.mak
  target-mips: indicate presence of IEEE 754-2008 FPU in R6/R5+MSA CPUs

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-24 14:30:20 +00:00
Peter Maydell
4f57a35d81 Merge remote-tracking branch 'remotes/pmaydell/tags/pull-cocoa-20160323-1' into staging
cocoa queue:
 * update cocoa UI front end to use QKeyCodes
 * fix the help menu documentation links to actually work
   (with both an installed and an uninstalled QEMU)

# gpg: Signature made Wed 23 Mar 2016 14:31:01 GMT using RSA key ID 14360CDE
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>"
# gpg:                 aka "Peter Maydell <pmaydell@gmail.com>"
# gpg:                 aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>"

* remotes/pmaydell/tags/pull-cocoa-20160323-1:
  ui/cocoa.m: switch to QKeyCode
  qapi-schema.json: Add power and keypad equal keys
  ui/cocoa.m: fix help menus

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-24 13:43:30 +00:00
Paolo Bonzini
0f70ed4759 target-i386: implement PKE for TCG
Tested with kvm-unit-tests.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-24 14:01:08 +01:00
Dr. David Alan Gilbert
cf7cc9291b config.status: Pass extra parameters
This allows you to do:
  ./config.status --the-option-you-forgot

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <1452599928-7471-1-git-send-email-dgilbert@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-24 14:01:08 +01:00
Peter Maydell
a2ecc80db5 Merge remote-tracking branch 'remotes/bkoppelmann/tags/pull-tricore-20160323' into staging
TriCore FPU + bugfixes

# gpg: Signature made Wed 23 Mar 2016 08:26:03 GMT using RSA key ID 6B69CA14
# gpg: Good signature from "Bastian Koppelmann <kbastian@mail.uni-paderborn.de>"

* remotes/bkoppelmann/tags/pull-tricore-20160323:
  target-tricore: Add ftoi and itof instructions
  target-tricore: Add cmp.f instruction
  target-tricore: Add div.f instruction
  target-tricore: Add mul.f instruction
  target-tricore: add add.f/sub.f instructions
  target-tricore: Move general CHECK_REG_PAIR of decode_rrr_divide
  target-tricore: Add FPU infrastructure
  target-tricore: Fix psw_read() clearing too many bits
  target-tricore: Fix helper_msub64_q_ssov not reseting OVF bit
  target-tricore: add missing break in insn decode switch stmt

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-24 12:36:39 +00:00
Christophe Fergeau
569a93cbbe spice: Disallow use of gl + TCP port
Currently, virgl support has to go through a local unix socket, trying
to connect to a VM using -spice gl through spice://localhost:5900 will
only result in a black screen.
This commit errors out when the user tries to start a VM with both GL
support and a port/tls-port set.
This would fit better in spice-server, but currently QEMU does not call
into spice-server when parsing 'gl' on its command line, so we have to
do this check in QEMU instead.

Signed-off-by: Christophe Fergeau <cfergeau@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 1457955672-28758-1-git-send-email-cfergeau@redhat.com

[ applied codestyle fix: break long line ]

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-03-24 08:04:01 +01:00
Gerd Hoffmann
81b00c968a input-linux: fix Coverity warning
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 1458129049-12484-1-git-send-email-kraxel@redhat.com
2016-03-24 07:58:20 +01:00
Gerd Hoffmann
0e066b2cc5 input-linux: switch over to -object
This patches makes input-linux use -object instead of a new command line
switch.  So, instead of the switch ...

    -input-linux /dev/input/event$nr

... you must create an object this way:

    -object input-linux,id=$name,evdev=/dev/input/event$nr

Bonus is that you can hot-add and hot-remove them via monitor now.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1457681901-30916-1-git-send-email-kraxel@redhat.com
2016-03-24 07:58:20 +01:00
Cédric Le Goater
9d0e5c8ceb ppc: move POWER8 Book4 regs in their own routine
commit fce55481360d "ppc: A couple more dummy POWER8 Book4 regs"
squashed in to rapidly a set of POWER8 Book4 regs in the wrong
routine. This patch introduces the missing gen_spr_power8_book4()
routine to fix their location.

Signed-off-by: Cédric Le Goater <clg@fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-03-24 11:17:34 +11:00
Thomas Huth
57c522f47b hw/net/spapr_llan: Enable the RX buffer pools by default for new machines
RX buffer pools are now enabled by default for new machine types.
For older machine types, they are still disabled to avoid breaking
migration.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-03-24 11:17:34 +11:00
Thomas Huth
831e882253 hw/net/spapr_llan: Fix receive buffer handling for better performance
tl;dr:
This patch introduces an alternate way of handling the receive
buffers of the spapr-vlan device, resulting in much better
receive performance for the guest.

Full story:
One of our testers recently discovered that the performance of the
spapr-vlan device is very poor compared to other NICs, and that
a simple "ping -i 0.2 -s 65507 someip" in the guest can result
in more than 50% lost ping packets (especially with older guest
kernels < 3.17).

After doing some analysis, it was clear that there is a problem
with the way we handle the receive buffers in spapr_llan.c: The
ibmveth driver of the guest Linux kernel tries to add a lot of
buffers into several buffer pools (with 512, 2048 and 65536 byte
sizes by default, but it can be changed via the entries in the
/sys/devices/vio/1000/pool* directories of the guest). However,
the spapr-vlan device of QEMU only tries to squeeze all receive
buffer descriptors into one single page which has been supplied
by the guest during the H_REGISTER_LOGICAL_LAN call, without
taking care of different buffer sizes. This has two bad effects:
First, only a very limited number of buffer descriptors is accepted
at all. Second, we also hand 64k buffers to the guest even if
the 2k buffers would fit better - and this results in dropped packets
in the IP layer of the guest since too much skbuf memory is used.

Though it seems at a first glance like PAPR says that we should store
the receive buffer descriptors in the page that is supplied during
the H_REGISTER_LOGICAL_LAN call, chapter 16.4.1.2 in the LoPAPR spec
declares that "the contents of these descriptors are architecturally
opaque, none of these descriptors are manipulated by code above
the architected interfaces". That means we don't have to store
the RX buffer descriptors in this page, but can also manage the
receive buffers at the hypervisor level only. This is now what we
are doing here: Introducing proper RX buffer pools which are also
sorted by size of the buffers, so we can hand out a buffer with
the best fitting size when a packet has been received.

To avoid problems with migration from/to older version of QEMU,
the old behavior is also retained and enabled by default. The new
buffer management has to be enabled via a new "use-rx-buffer-pools"
property.

Now with the new buffer pool management enabled, the problem with
"ping -s 65507" is fixed for me, and the throughput of a simple
test with wget increases from creeping 3MB/s up to 20MB/s!

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-03-24 11:17:34 +11:00
Thomas Huth
d6f39fdfcd hw/net/spapr_llan: Extract rx buffer code into separate functions
Refactor the code a little bit by extracting the code that reads
and writes the receive buffer list page into separate functions.
There should be no functional change in this patch, this is just
a preparation for the upcoming extensions that introduce receive
buffer pools.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-03-24 11:17:34 +11:00
Benjamin Herrenschmidt
9c1cf38d28 ppc: A couple more dummy POWER8 Book4 regs
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[clg: squashed in patch 'ppc: Add dummy ACOP SPR' ]
Signed-off-by: Cédric Le Goater <clg@fr.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-03-24 11:17:34 +11:00
Benjamin Herrenschmidt
eb5ceb4d38 ppc: Add dummy CIABR SPR
We should implement HW breakpoint/watchpoint, qemu supports them...

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-03-24 11:17:34 +11:00
Benjamin Herrenschmidt
a6eabb9e59 ppc: Add POWER8 IAMR register
With appropriate AMR-like masks. Not actually used by the translation
logic at that point

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[clg: changed spr_register_hv(SPR_IAMR) to spr_register_kvm_hv(SPR_IAMR)
      changed gen_spr_amr() prototype ]
Signed-off-by: Cédric Le Goater <clg@fr.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-03-24 11:17:34 +11:00
Benjamin Herrenschmidt
97eaf30ec6 ppc: Fix writing to AMR/UAMOR
The masks weren't chosen nor applied properly. The architecture specifies
that writes to AMR are masked by UAMOR for PR=1, otherwise AMOR for HV=0.

The writes to UAMOR are masked by AMOR for HV=0

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[clg: moved gen_spr_amr() prototype change to next patch ]
Signed-off-by: Cédric Le Goater <clg@fr.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-03-24 11:17:34 +11:00
Benjamin Herrenschmidt
6a9c4ef452 ppc: Initialize AMOR in PAPR mode
Make sure we give the guest full authorization

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-03-24 11:17:34 +11:00
Benjamin Herrenschmidt
21a558bed9 ppc: Add dummy SPR_IC for POWER8
It's supposed to be an instruction counter. For now make us not
crash when accessing it.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-03-24 11:17:34 +11:00
Benjamin Herrenschmidt
26a7f1291b ppc: Create cpu_ppc_set_papr() helper
And move the code adjusting the MSR mask and calling kvmppc_set_papr()
to it. This allows us to add a few more things such as disabling setting
of MSR:HV and appropriate LPCR bits which will be used when fixing
the exception model.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
[clg: removed LPCR setting ]
Signed-off-by: Cédric Le Goater <clg@fr.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-03-24 11:17:34 +11:00
Benjamin Herrenschmidt
f401dd32cb ppc: Add a bunch of hypervisor SPRs to Book3s
We don't give them a KVM reg number to most of the registers yet as no
current KVM version supports HV mode. For DAWR and DAWRX, the KVM reg
number is needed since this register can be set by the guest via the
H_SET_MODE hypercall.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[clg: squashed in patch 'ppc: Add KVM numbers to some P8 SPRs'
      changed the commit log with a proposal of Thomas Huth
      removed all hunks except those related to AMOR and DAWR* ]
Signed-off-by: Cédric Le Goater <clg@fr.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-03-24 11:17:34 +11:00
Benjamin Herrenschmidt
eb94268e73 ppc: Add macros to register hypervisor mode SPRs
The current set of spr_register_* macros only take the user and
supervisor function pointers. To make the transition easy, we
don't change that but we add "_hv" variants that can be used to
register all 3 sets.

To simplify the transition, users of the "old" macro will set the
hypervisor callback to be the same as the supervisor one. The new
registration function only needs to be used for registers that are
either hypervisor only or behave differently in HV mode.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
[clg: fixed else if condition in gen_op_mfspr() ]
Signed-off-by: Cédric Le Goater <clg@fr.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-03-24 11:17:33 +11:00
Benjamin Herrenschmidt
1488270e82 ppc: Update SPR definitions
Add definitions for additional SPR numbers and SPR bit definitions
that will be relevant for subsequent improvements to POWER8 emulation

Also fix the definition of LPIDR which was incorrect (and is different
for server and embedded).

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-03-24 11:17:33 +11:00
Alexey Kardashevskiy
0ddbd05362 spapr/target-ppc/kvm: Only add hcall-instructions if KVM supports it
ePAPR defines "hcall-instructions" device-tree property which contains
code to call hypercalls in ePAPR paravirtualized guests.  In general
pseries guests won't use this property, instead using the PAPR defined
hypercall interface.

However, this property has been re-used to implement a hack to allow
PR KVM to run (slightly modified) guests in some situations where it
otherwise wouldn't be able to (because the system's L0 hypervisor
doesn't forward the PAPR hypercalls to the PR KVM kernel).

Hence, this property is always present in the device tree for pseries
guests. All KVM guests use it at least to read features via the
KVM_HC_FEATURES hypercall.

The property is populated by the code returned from the KVM's
KVM_PPC_GET_PVINFO ioctl; if not implemented in the KVM, QEMU supplies
code which will fail all hypercall attempts. If QEMU does not create
the property, and the guest kernel is compiled with
CONFIG_EPAPR_PARAVIRT (which is normally the case), there is exactly
the same stub at @epapr_hypercall_start already.

Rather than maintaining this fairly useless stub implementation, it
makes more sense not to create the property in the device tree in the
first place if the host kernel does not implement it.

This changes kvmppc_get_hypercall() to return 1 if the host kernel
does not implement KVM_CAP_PPC_GET_PVINFO. The caller can use it to decide
on whether to create the property or not.

This changes the pseries machine to not create the property if KVM does
not implement KVM_PPC_GET_PVINFO. In practice this means that from now
on the property will not be created if either HV KVM or TCG is used.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[reworded commit message for clarity --dwg]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-03-24 11:17:33 +11:00
Laurent Vivier
8b9f2118ca ppc64: set MSR_SF bit
When a qemu-system-ppc64 is started, the 64-bit mode bit
is not set in MSR.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-03-24 11:17:33 +11:00
Cornelia Huck
ce11b06222 s390-ccw.img: rebuild image
Contains the following changes:

pc-bios/s390-ccw: add more disk layout checks
pc-bios/s390-ccw: virtio_panic -> panic
pc-bios/s390-ccw: add utility functions and "export" some others
pc-bios/s390-ccw: qemuize types
pc-bios/s390-ccw: update virtio implementation to allow up to 3 vrings
pc-bios/s390-ccw: add vdev object to store all device details
pc-bios/s390-ccw: make provisions for different backends
pc-bios/s390-ccw: add simplified virtio call
pc-bios/s390-ccw: add scsi definitions
pc-bios/s390-ccw: add virtio-scsi implementation
pc-bios/s390-ccw: enable virtio-scsi
pc-bios/s390-ccw: enhance bootmap detection
pc-bios/s390-ccw: disambiguation of "No zIPL magic" message

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-03-23 16:13:38 +01:00
Eugene (jno) Dvurechenski
688e697fa4 pc-bios/s390-ccw: disambiguation of "No zIPL magic" message
Don't indicate the same error message for different conditions.

Signed-off-by: Eugene (jno) Dvurechenski <jno@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-03-23 16:13:38 +01:00
Eugene (jno) Dvurechenski
f038682044 pc-bios/s390-ccw: enhance bootmap detection
Improve the algorithm that tries to guess the disk layout:
1. Use CD-ROMs to read ISO only
2. Make explicit paths for -scsi and -blk virtio

Acked-by: Maxim Samoylov <max7255@linux.vnet.ibm.com>
Signed-off-by: Eugene (jno) Dvurechenski <jno@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-03-23 16:13:38 +01:00
Eugene (jno) Dvurechenski
80ba3e249b pc-bios/s390-ccw: enable virtio-scsi
Make the code added before to work.

Signed-off-by: Eugene (jno) Dvurechenski <jno@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-03-23 16:13:38 +01:00
Eugene (jno) Dvurechenski
86aec22d48 pc-bios/s390-ccw: add virtio-scsi implementation
Add virtio-scsi.[ch] with primary implementation of virtio-scsi.

Signed-off-by: Eugene (jno) Dvurechenski <jno@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-03-23 16:13:38 +01:00
Eugene (jno) Dvurechenski
f791561476 pc-bios/s390-ccw: add scsi definitions
Add scsi.h to provide basic definitions for SCSI.

Signed-off-by: Eugene (jno) Dvurechenski <jno@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-03-23 16:13:38 +01:00
Eugene (jno) Dvurechenski
8944edc3dd pc-bios/s390-ccw: add simplified virtio call
Add virtio_run(VirtioCmd) call to use simple declarative approach.

Signed-off-by: Eugene (jno) Dvurechenski <jno@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-03-23 16:13:38 +01:00
Eugene (jno) Dvurechenski
a1102cebbf pc-bios/s390-ccw: make provisions for different backends
Add dispatching code to make room for non virtio-blk boot devices.

Signed-off-by: Eugene (jno) Dvurechenski <jno@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-03-23 16:13:38 +01:00
Eugene (jno) Dvurechenski
69429682c6 pc-bios/s390-ccw: add vdev object to store all device details
Add VDev "object" as a container for all device-related items.
The default object is static.

Leverage dependency on many different device-related globals.
Make them syntactically visible.

Signed-off-by: Eugene (jno) Dvurechenski <jno@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-03-23 16:13:38 +01:00
Eugene (jno) Dvurechenski
8512989143 pc-bios/s390-ccw: update virtio implementation to allow up to 3 vrings
Add ability to work with up to 3 vrings, which is required for
virtio-scsi implementation.
Implement the optional cookie to speed up processing of virtio
notifications.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Eugene (jno) Dvurechenski <jno@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-03-23 16:13:38 +01:00
Eugene (jno) Dvurechenski
b88d7fa590 pc-bios/s390-ccw: qemuize types
Turn [the most of] existing declarations from
    struct type_name { ... };
into
    struct TypeName { ... };
    typedef struct TypeName TypeName;
and make use of them.

Also switch u{8,16,32,64} to uint{8,16,32,64}_t.

Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Eugene (jno) Dvurechenski <jno@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-03-23 16:13:38 +01:00
Eugene (jno) Dvurechenski
dc25e843f6 pc-bios/s390-ccw: add utility functions and "export" some others
Add several utility functions, make IPL_check and IPL_assert generally
available, etc.

Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Eugene (jno) Dvurechenski <jno@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-03-23 16:13:38 +01:00
Eugene (jno) Dvurechenski
c9262e8a84 pc-bios/s390-ccw: virtio_panic -> panic
This function has nothing to do with virtio.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Eugene (jno) Dvurechenski <jno@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-03-23 16:13:38 +01:00
Eugene (jno) Dvurechenski
b1be0972f9 pc-bios/s390-ccw: add more disk layout checks
Experiments showed possibility of few more "misconfigurations" in disk
layout. They are reported now.

Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Eugene (jno) Dvurechenski <jno@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-03-23 16:13:38 +01:00
John Arbuckle
aaac714f31 ui/cocoa.m: switch to QKeyCode
This patch removes the pc/xt keycode map and replaces it with the QKeyCode
keymap.

Signed-off-by: John Arbuckle <programmingkidx@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-23 14:29:30 +00:00
John Arbuckle
a35412782d qapi-schema.json: Add power and keypad equal keys
Add the power and keypad equal keys. These keys are found on a real Macintosh
keyboard.

Signed-off-by: John Arbuckle <programmingkidx@gmail.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-23 14:29:29 +00:00
John Arbuckle
f474790061 ui/cocoa.m: fix help menus
Make the help menus actually work. The code will search thru three different
locations for the help file. If it can't be found a dialog will tell the user
the file can't be found.

Signed-off-by: John Arbuckle <programmingkidx@gmail.com>
Message-id: F6B689F9-4DBD-4C50-BC38-35E5DD03D396@gmail.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-23 14:26:17 +00:00
Leon Alrae
b7c4ab809a default-configs: add mips-softmmu-common.mak
Add mips-softmmu-common.mak and include it in existing mips*-softmmu.mak
files to avoid having to repeat CONFIG defines four times.

Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
2016-03-23 13:36:56 +00:00
Leon Alrae
ba5c79f262 target-mips: indicate presence of IEEE 754-2008 FPU in R6/R5+MSA CPUs
MIPS Release 6 and MIPS SIMD Architecture make it mandatory to have IEEE
754-2008 FPU which is indicated by CP1 FIR.HAS2008, FCSR.ABS2008 and
FCSR.NAN2008 bits set to 1.

In QEMU we still keep these bits cleared as there is no 2008-NaN support.
However, this now causes problems preventing from running R6 Linux with
the v4.5 kernel. Kernel refuses to execute 2008-NaN ELFs on a CPU
whose FPU does not support 2008-NaN encoding:

  (...)
  VFS: Mounted root (ext4 filesystem) readonly on device 8:0.
  devtmpfs: mounted
  Freeing unused kernel memory: 256K (ffffffff806f0000 - ffffffff80730000)
  request_module: runaway loop modprobe binfmt-464c
  Starting init: /sbin/init exists but couldn't execute it (error -8)
  request_module: runaway loop modprobe binfmt-464c
  Starting init: /bin/sh exists but couldn't execute it (error -8)
  Kernel panic - not syncing: No working init found.  Try passing init= option to kernel. See Linux Documentation/init.txt for guidance.

Therefore always indicate presence of 2008-NaN support in R6 as well as in
R5+MSA CPUs, even though this feature is not yet supported by MIPS in QEMU.

Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>
2016-03-23 13:36:55 +00:00
Peter Maydell
2538039f2c Merge remote-tracking branch 'remotes/armbru/tags/pull-ivshmem-2016-03-18' into staging
ivshmem: Fixes, cleanups, device model split

# gpg: Signature made Mon 21 Mar 2016 20:33:54 GMT using RSA key ID EB918653
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>"
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>"

* remotes/armbru/tags/pull-ivshmem-2016-03-18: (40 commits)
  contrib/ivshmem-server: Print "not for production" warning
  ivshmem: Require master to have ID zero
  ivshmem: Drop ivshmem property x-memdev
  ivshmem: Clean up after the previous commit
  ivshmem: Split ivshmem-plain, ivshmem-doorbell off ivshmem
  ivshmem: Replace int role_val by OnOffAuto master
  qdev: New DEFINE_PROP_ON_OFF_AUTO
  ivshmem: Inline check_shm_size() into its only caller
  ivshmem: Simplify memory regions for BAR 2 (shared memory)
  ivshmem: Implement shm=... with a memory backend
  ivshmem: Tighten check of property "size"
  ivshmem: Simplify how we cope with short reads from server
  ivshmem: Drop the hackish test for UNIX domain chardev
  ivshmem: Rely on server sending the ID right after the version
  ivshmem: Propagate errors through ivshmem_recv_setup()
  ivshmem: Receive shared memory synchronously in realize()
  ivshmem: Plug leaks on unplug, fix peer disconnect
  ivshmem: Disentangle ivshmem_read()
  ivshmem: Simplify rejection of invalid peer ID from server
  ivshmem: Assert interrupts are set up once
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-23 12:57:44 +00:00
Bastian Koppelmann
0d4c3b8010 target-tricore: Add ftoi and itof instructions
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Message-Id: <1457708597-3025-8-git-send-email-kbastian@mail.uni-paderborn.de>
2016-03-23 09:22:48 +01:00
Bastian Koppelmann
743cd09dd7 target-tricore: Add cmp.f instruction
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Message-Id: <1457708597-3025-7-git-send-email-kbastian@mail.uni-paderborn.de>
2016-03-23 09:22:48 +01:00
Bastian Koppelmann
446ee5b2a8 target-tricore: Add div.f instruction
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Message-Id: <1457708597-3025-6-git-send-email-kbastian@mail.uni-paderborn.de>
2016-03-23 09:22:48 +01:00
Bastian Koppelmann
daab3f7fa8 target-tricore: Add mul.f instruction
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Message-Id: <1457708597-3025-5-git-send-email-kbastian@mail.uni-paderborn.de>
2016-03-23 09:22:48 +01:00
Bastian Koppelmann
baf410dcca target-tricore: add add.f/sub.f instructions
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Message-Id: <1457708597-3025-4-git-send-email-kbastian@mail.uni-paderborn.de>
2016-03-23 09:22:48 +01:00
Bastian Koppelmann
c433a17141 target-tricore: Move general CHECK_REG_PAIR of decode_rrr_divide
The add.f and sub.f to be implemented don't use 64 bit registers
and a general usage of CHECK_REG_PAIR would always generate an
exception for them.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Message-Id: <1457708597-3025-3-git-send-email-kbastian@mail.uni-paderborn.de>
2016-03-23 09:22:48 +01:00
Bastian Koppelmann
996a729f9b target-tricore: Add FPU infrastructure
This patch adds a file for all the FPU related helpers with all the includes,
useful defines, and a function to update the status bits. Additionally it adds
a mask for the rounding mode bits of PSW as well as all the opcodes for the
FPU instructions.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Message-Id: <1457708597-3025-2-git-send-email-kbastian@mail.uni-paderborn.de>
2016-03-23 09:22:48 +01:00
Bastian Koppelmann
1bd3e2fc3d target-tricore: Fix psw_read() clearing too many bits
psw_read() ought to sync the PSW value with the
cached status bits (C,V,SV,AV,SAV). For this the bits
are cleared in the PSW before they are written from the
cached bits. The clear mask is too big and clears two
additional bits.

Signed-off-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Message-Id: <1458547383-23102-4-git-send-email-kbastian@mail.uni-paderborn.de>
2016-03-23 09:22:48 +01:00
Bastian Koppelmann
9029710b9e target-tricore: Fix helper_msub64_q_ssov not reseting OVF bit
When this instruction does not produce an overflow the corresponding
bit has to be reset.

Signed-off-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Message-Id: <1458547383-23102-3-git-send-email-kbastian@mail.uni-paderborn.de>
2016-03-23 09:22:48 +01:00
Bastian Koppelmann
1f75cba8f8 target-tricore: add missing break in insn decode switch stmt
After decoding/translating a RRR_DIVIDE/RRRR_EXTRACT_INSERT type instruction
we would simply fall through and would decode/translate another unintended
RRR2_MADD/RRRW_EXTRACT_INSERT instruction.

Signed-off-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Message-Id: <1458547383-23102-2-git-send-email-kbastian@mail.uni-paderborn.de>
2016-03-23 09:22:48 +01:00
Samuel Thibault
67e3eee454 Avoid embedding struct mbuf in other structures
struct mbuf uses a C99 open char array to allow inlining data. Inlining
this in another structure is however a GNU extension. The inlines used
so far in struct Slirp were actually only needed as head of struct
mbuf lists. This replaces these inline with mere struct quehead,
and use casts as appropriate.

Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-23 00:57:01 +01:00
Samuel Thibault
c17c07231e slirp: send icmp6 errors when UDP send failed
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
2016-03-22 22:52:09 +01:00
Samuel Thibault
99787f69cd slirp: Fix memory leak on small incoming ipv4 packet
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
2016-03-22 22:51:56 +01:00
Marc-André Lureau
b6572b4f97 char: translate from QIOChannel error to errno
Caller of CharDriverState.chr* callback assume errno error conventions.
Translate QIOChannel error to errno (this fixes potential EAGAIN
regression, for ex if a vhost-user backend block, qemu_chr_fe_read_all()
could get error -2 and not wait)

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1457718924-19338-1-git-send-email-marcandre.lureau@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:18 +01:00
Paolo Bonzini
5c3ece79cd exec: fix error handling in file_ram_alloc
One instance of double closing, and invalid close(-1) in some cases
of "goto error".

Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:18 +01:00
Alex Bennée
8526e1f4e4 cputlb: modernise the debug support
To avoid cluttering the code with #ifdef legs we wrap up the print
statements into a tlb_debug() macro. As access to the virtual TLB can
get quite heavy defining DEBUG_TLB_LOG will ensure all the logs go to
the qemu_log target of CPU_LOG_MMU instead of stderr. This remains
compile time optional as these debug statements haven't been considered
for usefulness for user visible logging.

I've also removed DEBUG_TLB_CHECK which wasn't used.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Message-Id: <1458052224-9316-11-git-send-email-alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:18 +01:00
Alex Bennée
f6880b7f48 qemu-log: support simple pid substitution for logs
When debugging stuff that occurs over several forks it would be useful
not to keep overwriting the one logfile you've set-up. This allows a
simple %d to be included once in the logfile parameter which is
substituted with getpid().

As the test cases involve checking user output they need
g_test_trap_subprocess() support. As a result they are currently skipped
on Travis builds due to the older glib involved.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Leandro Dorileo <l@dorileo.org>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Richard Henderson  <rth@twiddle.net>
Message-Id: <1458052224-9316-10-git-send-email-alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:18 +01:00
Alex Bennée
064860778b target-arm: dfilter support for in_asm
Each individual architecture needs to use the qemu_log_in_addr_range()
feature for enabling in_asm output as it is part of the frontend.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Richard Henderson  <rth@twiddle.net>
Message-Id: <1458052224-9316-9-git-send-email-alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:18 +01:00
Alex Bennée
d977e1c2db qemu-log: dfilter-ise exec, out_asm, op and opt_op
This ensures the code generation debug code will honour -dfilter if set.
For the "exec" tracing I've added a new inline macro for efficiency's
sake.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Aurelien Jarno <aurelien@aureL32.net>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Message-Id: <1458052224-9316-8-git-send-email-alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:18 +01:00
Alex Bennée
3514552e04 qemu-log: new option -dfilter to limit output
When debugging big programs or system emulation sometimes you want both
the verbosity of cpu,exec et all but don't want to generate lots of logs
for unneeded stuff. This patch adds a new option -dfilter which allows
you to specify interesting address ranges in the form:

  -dfilter 0x8000..0x8fff,0xffffffc000080000+0x200,...

Then logging code can use the new qemu_log_in_addr_range() function to
decide if it will output logging information for the given range.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <1458052224-9316-7-git-send-email-alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:18 +01:00
Peter Maydell
1a83063522 qemu-log: Improve the "exec" TB execution logging
Improve the TB execution logging so that it is easier to identify
what is happening from trace logs:
 * move the "Trace" logging of executed TBs into cpu_tb_exec()
   so that it is emitted if and only if we actually execute a TB,
   and for consistency for the CPU state logging
 * log when we link two TBs together via tb_add_jump()
 * log when cpu_tb_exec() returns early from a chain of TBs

The new style logging looks like this:

Trace 0x7fb7cc822ca0 [ffffffc0000dce00]
Linking TBs 0x7fb7cc822ca0 [ffffffc0000dce00] index 0 -> 0x7fb7cc823110 [ffffffc0000dce10]
Trace 0x7fb7cc823110 [ffffffc0000dce10]
Trace 0x7fb7cc823420 [ffffffc000302688]
Trace 0x7fb7cc8234a0 [ffffffc000302698]
Trace 0x7fb7cc823520 [ffffffc0003026a4]
Trace 0x7fb7cc823560 [ffffffc0000dce44]
Linking TBs 0x7fb7cc823560 [ffffffc0000dce44] index 1 -> 0x7fb7cc8235d0 [ffffffc0000dce70]
Trace 0x7fb7cc8235d0 [ffffffc0000dce70]
Stopped execution of TB chain before 0x7fb7cc8235d0 [ffffffc0000dce70]
Trace 0x7fb7cc8235d0 [ffffffc0000dce70]
Trace 0x7fb7cc822fd0 [ffffffc0000dd52c]

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
[AJB: reword patch title, Abandoned->Stopped]
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Message-Id: <1458052224-9316-6-git-send-email-alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:18 +01:00
Peter Maydell
7ee606230e qemu-log: Avoid function call for disabled qemu_log_mask logging
Make qemu_log_mask() a macro which only calls the function to
do the actual work if the logging is enabled. This avoids making
a function call in possible fast paths where logging is disabled.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:18 +01:00
Alex Bennée
541957361e qemu-log: correct help text for -d cpu
This doesn't just dump CPU state on translation but on every block
entrance.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Message-Id: <1458052224-9316-4-git-send-email-alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:17 +01:00
Alex Bennée
5bd2ec3d7b tcg: pass down TranslationBlock to tcg_code_gen
My later debugging patches need access to the origin PC which is held in
the TranslationBlock structure. Pass down the whole structure as it also
holds the information about the code start point.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson  <rth@twiddle.net>
Message-Id: <1458052224-9316-3-git-send-email-alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:17 +01:00
Veronia Bahaa
f348b6d1a5 util: move declarations out of qemu-common.h
Move declarations out of qemu-common.h for functions declared in
utils/ files: e.g. include/qemu/path.h for utils/path.c.
Move inline functions out of qemu-common.h and into new files (e.g.
include/qemu/bcd.h)

Signed-off-by: Veronia Bahaa <veroniabahaa@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:17 +01:00
Rutuja Shah
73bcb24d93 Replaced get_tick_per_sec() by NANOSECONDS_PER_SECOND
This patch replaces get_ticks_per_sec() calls with the macro
NANOSECONDS_PER_SECOND. Also, as there are no callers, get_ticks_per_sec()
is then removed.  This replacement improves the readability and
understandability of code.

For example,

    timer_mod(fdctrl->result_timer,
	      qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + (get_ticks_per_sec() / 50));

NANOSECONDS_PER_SECOND makes it obvious that qemu_clock_get_ns
matches the unit of the expression on the right side of the plus.

Signed-off-by: Rutuja Shah <rutu.shah.26@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:17 +01:00
Paolo Bonzini
4771d756f4 hw: explicitly include qemu-common.h and cpu.h
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:17 +01:00
Markus Armbruster
7136fc1da2 include/crypto: Include qapi-types.h or qemu/bswap.h instead of qemu-common.h
qemu-common.h should only be included by .c files.  Its file comment
explains why: "No header file should depend on qemu-common.h, as this
would easily lead to circular header dependencies."

Several include/crypto/ headers include qemu-common.h, but either need
just qapi-types.h from it, or qemu/bswap.h, or nothing at all.  Replace or
drop the include accordingly.  tests/test-crypto-secret.c now misses
qemu/module.h, so include it there.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:16 +01:00
Markus Armbruster
bd36a618cc isa: Move DMA_transfer_handler from qemu-common.h to hw/isa/isa.h
DMA_transfer_handler is actually an ISA thing, and as such has no
business in qemu-common.h.  Move it to hw/isa/isa.h, and rename it to
IsaDmaTransferHandler.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:16 +01:00
Markus Armbruster
8a98ecada3 Move ParallelIOArg from qemu-common.h to sysemu/char.h
ParallelIOArg is shared between just qemu-char.c and
hw/char/parallel.c, and as such has no business in qemu-common.h.
Move it to sysemu/char.h.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:16 +01:00
Markus Armbruster
e07e540aaa Move QEMU_ALIGN_*() from qemu-common.h to qemu/osdep.h
qemu-common.h should only be included by .c files.  Its file comment
explains why: "No header file should depend on qemu-common.h, as this
would easily lead to circular header dependencies."

One of the reasons for headers to include it is QEMU_ALIGN_UP() and
QEMU_ALIGN_DOWN().  Move them next to ROUND_UP() in qemu/osdep.h, to
facilitate removing these ill-advised includes later on.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:16 +01:00
Markus Armbruster
a813963216 Move HOST_LONG_BITS from qemu-common.h to qemu/osdep.h
qemu-common.h should only be included by .c files.  Its file comment
explains why: "No header file should depend on qemu-common.h, as this
would easily lead to circular header dependencies."

One of the reasons for headers to include it is HOST_LONG_BITS.  Move
that to its more natural home qemu/osdep.h, to facilitate removing
these ill-advised includes later on.

This also lets us use HOST_LONG_BITS in bswap.h instead of duplicating
its definition there to avoid cyclic inclusion.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:16 +01:00
Markus Armbruster
a7c4d9c7ca hw/pci/pci.h: Don't include qemu-common.h
qemu-common.h should only be included by .c files.  Its file comment
explains why: "No header file should depend on qemu-common.h, as this
would easily lead to circular header dependencies."

hw/pci/pci.h includes qemu-common.h, but its users only need pcibus_t
and PCIHostDeviceAddress from it.  Move them to hw/pci/pci.h and drop
the ill-advised include.  Include hw/pci/pci.h where the moved stuff
is now missing.  Except we can't in target-i386/kvm_i386.h, because
that would break the i386-linux-user compile.  Add
PCIHostDeviceAddress to qemu/typedefs.h instead.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:16 +01:00
Markus Armbruster
0137fdc094 include/hw/hw.h: Don't include qemu-common.h
qemu-common.h should only be included by .c files.  Its file comment
explains why: "No header file should depend on qemu-common.h, as this
would easily lead to circular header dependencies."

hw/hw.h includes qemu-common.h, but its users generally need only
hw_error() and qemu/module.h from it.  Move the former to hw/hw.h,
include the latter there, and drop the ill-advised include.
hw/misc/cbus.c now misses hw_error(), so include hw/hw.h there.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:16 +01:00
Markus Armbruster
daf015ef5a include/qemu/iov.h: Don't include qemu-common.h
qemu-common.h should only be included by .c files.  Its file comment
explains why: "No header file should depend on qemu-common.h, as this
would easily lead to circular header dependencies."

qemu/iov.h includes qemu-common.h for QEMUIOVector stuff.  Move all
that to qemu/iov.h and drop the ill-advised include.  Include
qemu/iov.h where the QEMUIOVector stuff is now missing.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:16 +01:00
Markus Armbruster
6f061ea10f fw_cfg: Split fw_cfg_keys.h off fw_cfg.h
Much of fw_cfg.h's contents is #ifndef NO_QEMU_PROTOS.  This lets a
few places include it without satisfying the dependencies of the
suppressed code.  If you somehow include it with NO_QEMU_PROTOS, any
future includes are ignored.  Unnecessarily unclean.

Move the stuff not under NO_QEMU_PROTOS into its own header
fw_cfg_keys.h, and include it as appropriate.  Tidy up the moved code
to please checkpatch.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:16 +01:00
Markus Armbruster
c80f6e9caa Clean up includes some more
Manually drop redundant includes that scripts/clean-includes misses,
e.g. because they're hidden in generator programs, or they use the
wrong kind of delimiter.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:16 +01:00
Markus Armbruster
14b6d44d47 Use scripts/clean-includes to drop redundant qemu/typedefs.h
Re-run scripts/clean-includes to apply the previous commit's
corrections and updates.  Besides redundant qemu/typedefs.h, this only
finds a redundant config-host.h include in ui/egl-helpers.c.  No idea
how that escaped the previous runs.

Some manual whitespace trimming around dropped includes squashed in.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:16 +01:00
Markus Armbruster
da34e65cb4 include/qemu/osdep.h: Don't include qapi/error.h
Commit 57cb38b included qapi/error.h into qemu/osdep.h to get the
Error typedef.  Since then, we've moved to include qemu/osdep.h
everywhere.  Its file comment explains: "To avoid getting into
possible circular include dependencies, this file should not include
any other QEMU headers, with the exceptions of config-host.h,
compiler.h, os-posix.h and os-win32.h, all of which are doing a
similar job to this file and are under similar constraints."
qapi/error.h doesn't do a similar job, and it doesn't adhere to
similar constraints: it includes qapi-types.h.  That's in excess of
100KiB of crap most .c files don't actually need.

Add the typedef to qemu/typedefs.h, and include that instead of
qapi/error.h.  Include qapi/error.h in .c files that need it and don't
get it now.  Include qapi-types.h in qom/object.h for uint16List.

Update scripts/clean-includes accordingly.  Update it further to match
reality: replace config.h by config-target.h, add sysemu/os-posix.h,
sysemu/os-win32.h.  Update the list of includes in the qemu/osdep.h
comment quoted above similarly.

This reduces the number of objects depending on qapi/error.h from "all
of them" to less than a third.  Unfortunately, the number depending on
qapi-types.h shrinks only a little.  More work is needed for that one.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
[Fix compilation without the spice devel packages. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 22:20:15 +01:00
Peter Maydell
ffa6564c9b Merge remote-tracking branch 'remotes/weil/tags/pull-wxx-20160322' into staging
wxx patch queue

# gpg: Signature made Tue 22 Mar 2016 18:18:36 GMT using RSA key ID 677450AD
# gpg: Good signature from "Stefan Weil <sw@weilnetz.de>"
# gpg:                 aka "Stefan Weil <stefan.weil@weilnetz.de>"
# gpg:                 aka "Stefan Weil <stefan.weil@bib.uni-mannheim.de>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 4923 6FEA 75C9 5D69 8EC2  B78A E08C 21D5 6774 50AD

* remotes/weil/tags/pull-wxx-20160322:
  wxx: Add support for ncurses
  Remove unneeded include statements for setjmp.h
  Include setjmp.h in qemu/osdep.h (bug fix for w64)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-22 20:27:55 +00:00
Stefan Weil
ae6296342a wxx: Add support for ncurses
We used to support only pdcurses for Windows, but recently Cygwin added
mingw64-i686-ncurses and mingw64-x86_64-ncurses packages which are
supported now, too.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-03-22 19:17:38 +01:00
Stefan Weil
8ff98f1ed2 Remove unneeded include statements for setjmp.h
As soon as setjmp.h is included from qemu/osdep.h, those old include
statements are no longer needed.

Add also setjmp.h to the list in scripts/clean-includes.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-03-22 19:11:15 +01:00
Stefan Weil
e89fdafb58 Include setjmp.h in qemu/osdep.h (bug fix for w64)
setjmp must be declared before sysemu/os-win32.h
because it is redefined there for 64 bit Windows.

Reviewed-by: Richard Henderson  <rth@twiddle.net>
Tested-by: Andrew Baumann <Andrew.Baumann@microsoft.com>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2016-03-22 19:11:15 +01:00
Peter Maydell
459621ac1a Merge remote-tracking branch 'remotes/mdroth/tags/qga-pull-2016-03-21-tag' into staging
qemu-ga patch queue for 2.6

* remove unused variable

# gpg: Signature made Mon 21 Mar 2016 17:32:42 GMT using RSA key ID F108B584
# gpg: Good signature from "Michael Roth <flukshun@gmail.com>"
# gpg:                 aka "Michael Roth <mdroth@utexas.edu>"
# gpg:                 aka "Michael Roth <mdroth@linux.vnet.ibm.com>"

* remotes/mdroth/tags/qga-pull-2016-03-21-tag:
  qemu-ga: drop unused local err variable

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-22 17:39:48 +00:00
Peter Maydell
ac0d25e843 Merge remote-tracking branch 'remotes/kraxel/tags/pull-usb-20160321-1' into staging
usb: bugfix collection.

# gpg: Signature made Mon 21 Mar 2016 11:07:39 GMT using RSA key ID D3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"

* remotes/kraxel/tags/pull-usb-20160321-1:
  usb: ehci: add capability mmio write function
  hw/usb/dev-mtp: Guard inotify usage with CONFIG_INOTIFY1
  usb: fix unbound stack warning for inotify_watchfn
  usb: fix unbound stack usage for usb_mtp_add_str
  usb: fix unbounded stack warning for xhci_dma_write_u32s
  usb: Fix compilation for Windows

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-22 16:42:06 +00:00
Markus Armbruster
a335c6f204 contrib/ivshmem-server: Print "not for production" warning
The code is okay for illustrating how things work and for testing, but
its error handling make it unfit for production use.  Print a warning
to protect the innocent.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-41-git-send-email-armbru@redhat.com>
2016-03-21 21:29:03 +01:00
Markus Armbruster
62a830b688 ivshmem: Require master to have ID zero
Migration with ivshmem needs to be carefully orchestrated to work.
Exactly one peer (the "master") migrates to the destination, all other
peers need to unplug (and disconnect), migrate, plug back (and
reconnect).  This is sort of documented in qemu-doc.

If peers connect on the destination before migration completes, the
shared memory can get messed up.  This isn't documented anywhere.  Fix
that in qemu-doc.

To avoid messing up register IVPosition on migration, the server must
assign the same ID on source and destination.  ivshmem-spec.txt leaves
ID assignment unspecified, however.

Amend ivshmem-spec.txt to require the first client to receive ID zero.
The example ivshmem-server complies: it always assigns the first
unused ID.

For a bit of additional safety, enforce ID zero for the master.  This
does nothing when we're not using a server, because the ID is zero for
all peers then.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-40-git-send-email-armbru@redhat.com>
2016-03-21 21:29:03 +01:00
Markus Armbruster
13fd2cb689 ivshmem: Drop ivshmem property x-memdev
Use ivshmem-plain instead.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-39-git-send-email-armbru@redhat.com>
2016-03-21 21:29:03 +01:00
Markus Armbruster
ddc8528443 ivshmem: Clean up after the previous commit
Move code to more sensible places.  Use the opportunity to reorder and
document IVShmemState members.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-38-git-send-email-armbru@redhat.com>
2016-03-21 21:29:03 +01:00
Markus Armbruster
5400c02b90 ivshmem: Split ivshmem-plain, ivshmem-doorbell off ivshmem
ivshmem can be configured with and without interrupt capability
(a.k.a. "doorbell").  The two configurations have largely disjoint
options, which makes for a confusing (and badly checked) user
interface.  Moreover, the device can't tell the guest whether its
doorbell is enabled.

Create two new device models ivshmem-plain and ivshmem-doorbell, and
deprecate the old one.

Changes from ivshmem:

* PCI revision is 1 instead of 0.  The new revision is fully backwards
  compatible for guests.  Guests may elect to require at least
  revision 1 to make sure they're not exposed to the funny "no shared
  memory, yet" state.

* Property "role" replaced by "master".  role=master becomes
  master=on, role=peer becomes master=off.  Default is off instead of
  auto.

* Property "use64" is gone.  The new devices always have 64 bit BARs.

Changes from ivshmem to ivshmem-plain:

* The Interrupt Pin register in PCI config space is zero (does not use
  an interrupt pin) instead of one (uses INTA).

* Property "x-memdev" is renamed to "memdev".

* Properties "shm" and "size" are gone.  Use property "memdev"
  instead.

* Property "msi" is gone.  The new device can't have MSI-X capability.
  It can't interrupt anyway.

* Properties "ioeventfd" and "vectors" are gone.  They're meaningless
  without interrupts anyway.

Changes from ivshmem to ivshmem-doorbell:

* Property "msi" is gone.  The new device always has MSI-X capability.

* Property "ioeventfd" defaults to on instead of off.

* Property "size" is gone.  The new device can only map all the shared
  memory received from the server.

Guests can easily find out whether the device is configured for
interrupts by checking for MSI-X capability.

Note: some code added in sub-optimal places to make the diff easier to
review.  The next commit will move it to more sensible places.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-37-git-send-email-armbru@redhat.com>
2016-03-21 21:29:03 +01:00
Markus Armbruster
2a845da736 ivshmem: Replace int role_val by OnOffAuto master
In preparation of making it a qdev property.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-36-git-send-email-armbru@redhat.com>
2016-03-21 21:29:02 +01:00
Markus Armbruster
55e8a15435 qdev: New DEFINE_PROP_ON_OFF_AUTO
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-35-git-send-email-armbru@redhat.com>
2016-03-21 21:29:02 +01:00
Markus Armbruster
8baeb22bfc ivshmem: Inline check_shm_size() into its only caller
Improve the error messages while there.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1458066895-20632-34-git-send-email-armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2016-03-21 21:29:02 +01:00
Markus Armbruster
c2d8019cd7 ivshmem: Simplify memory regions for BAR 2 (shared memory)
ivshmem_realize() puts the shared memory region in a container region.
Used to be necessary to permit delayed mapping of the shared memory.
However, we recently moved to synchronous mapping, in "ivshmem:
Receive shared memory synchronously in realize()" and the commit
following it.  The container is redundant since then.  Drop it.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1458066895-20632-33-git-send-email-armbru@redhat.com>
2016-03-21 21:29:02 +01:00
Markus Armbruster
5503e28504 ivshmem: Implement shm=... with a memory backend
ivshmem has its very own code to create and map shared memory.
Replace that with an implicitly created memory backend.  Reduces the
number of ways we create BAR 2 from three to two.

The memory-backend-file is currently available only with CONFIG_LINUX,
so this adds a second Linuxism to ivshmem (the other one is eventfd).
Should we ever need to make it portable to systems where
memory-backend-file can't be made to serve, we could create a
memory-backend-shmem that allocates memory with shm_open().

Bonus fix: shared memory files are now created with permissions 0655
instead of 0777.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1458066895-20632-32-git-send-email-armbru@redhat.com>
2016-03-21 21:29:02 +01:00
Markus Armbruster
08183c20b8 ivshmem: Tighten check of property "size"
If size_t is narrower than 64 bits, passing uint64_t ivshmem_size to
mmap() truncates.  Reject such sizes.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-31-git-send-email-armbru@redhat.com>
2016-03-21 21:29:02 +01:00
Markus Armbruster
ee276391a3 ivshmem: Simplify how we cope with short reads from server
Short reads from a UNIX domain sockets are exceedingly unlikely when
the other side always sends eight bytes and we always read eight
bytes.  We cope with them anyway.  However, the code doing that is
rather convoluted.  Dumb it down radically.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-30-git-send-email-armbru@redhat.com>
2016-03-21 21:29:01 +01:00
Markus Armbruster
ba5970a178 ivshmem: Drop the hackish test for UNIX domain chardev
The chardev must be capable of transmitting SCM_RIGHTS ancillary
messages.  We check it by comparing CharDriverState member filename to
"unix:".  That's almost as brittle as it is disgusting.

When the actual transmission all happened asynchronously, this check
was all we could do in realize(), and thus better than nothing.  But
now we receive at least one SCM_RIGHTS synchronously in realize(),
it's not worth its keep anymore.  Drop it.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-29-git-send-email-armbru@redhat.com>
2016-03-21 21:29:01 +01:00
Markus Armbruster
a3feb08639 ivshmem: Rely on server sending the ID right after the version
The protocol specification (ivshmem-spec.txt, formerly
ivshmem_device_spec.txt) has always required the ID message to be sent
right at the beginning, and ivshmem-server has always complied.  The
device, however, accepts it out of order.  If an interrupt setup
arrived before it, though, it would be misinterpreted as connect
notification.  Fix the latent bug by relying on the spec and
ivshmem-server's actual behavior.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-28-git-send-email-armbru@redhat.com>
2016-03-21 21:29:01 +01:00
Markus Armbruster
1309cf448a ivshmem: Propagate errors through ivshmem_recv_setup()
This kills off the funny state described in the previous commit.

Simplify ivshmem_io_read() accordingly, and update documentation.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1458066895-20632-27-git-send-email-armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2016-03-21 21:29:01 +01:00
Markus Armbruster
3a55fc0f24 ivshmem: Receive shared memory synchronously in realize()
When configured for interrupts (property "chardev" given), we receive
the shared memory from an ivshmem server.  We do so asynchronously
after realize() completes, by setting up callbacks with
qemu_chr_add_handlers().

Keeping server I/O out of realize() that way avoids delays due to a
slow server.  This is probably relevant only for hot plug.

However, this funny "no shared memory, yet" state of the device also
causes a raft of issues that are hard or impossible to work around:

* The guest is exposed to this state: when we enter and leave it its
  shared memory contents is apruptly replaced, and device register
  IVPosition changes.

  This is a known issue.  We document that guests should not access
  the shared memory after device initialization until the IVPosition
  register becomes non-negative.

  For cold plug, the funny state is unlikely to be visible in
  practice, because we normally receive the shared memory long before
  the guest gets around to mess with the device.

  For hot plug, the timing is tighter, but the relative slowness of
  PCI device configuration has a good chance to hide the funny state.

  In either case, guests complying with the documented procedure are
  safe.

* Migration becomes racy.

  If migration completes before the shared memory setup completes on
  the source, shared memory contents is silently lost.  Fortunately,
  migration is rather unlikely to win this race.

  If the shared memory's ramblock arrives at the destination before
  shared memory setup completes, migration fails.

  There is no known way for a management application to wait for
  shared memory setup to complete.

  All you can do is retry failed migration.  You can improve your
  chances by leaving more time between running the destination QEMU
  and the migrate command.

  To mitigate silent memory loss, you need to ensure the server
  initializes shared memory exactly the same on source and
  destination.

  These issues are entirely undocumented so far.

I'd expect the server to be almost always fast enough to hide these
issues.  But then rare catastrophic races are in a way the worst kind.

This is way more trouble than I'm willing to take from any device.
Kill the funny state by receiving shared memory synchronously in
realize().  If your hot plug hangs, go kill your ivshmem server.

For easier review, this commit only makes the receive synchronous, it
doesn't add the necessary error propagation.  Without that, the funny
state persists.  The next commit will do that, and kill it off for
real.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-26-git-send-email-armbru@redhat.com>
2016-03-21 21:29:01 +01:00
Markus Armbruster
9db51b4d64 ivshmem: Plug leaks on unplug, fix peer disconnect
close_peer_eventfds() cleans up three things: ioeventfd triggers if
they exist, eventfds, and the array to store them.

Commit 98609cd (v1.2.0) fixed it not to clean up ioeventfd triggers
when they don't exist (property ioeventfd=off, which is the default).
Unfortunately, the fix also made it skip cleanup of the eventfds and
the array then.  This is a memory and file descriptor leak on unplug.

Additionally, the reset of nb_eventfds is skipped.  Doesn't matter on
unplug.  On peer disconnect, however, this permanently wedges the
interrupt vectors used for that peer's ID.  The eventfds stay behind,
but aren't connected to a peer anymore.  When the ID gets recycled for
a new peer, the new peer's eventfds get assigned to vectors after the
old ones.  Commonly, the device's number of vectors matches the
server's, so the new ones get dropped with a "Too many eventfd
received" message.  Interrupts either don't work (common case) or go
to the wrong vector.

Fix by narrowing the conditional to just the ioeventfd trigger
cleanup.

While there, move the "invalid" peer check to the only caller where it
can actually happen, and tighten it to reject own ID.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-25-git-send-email-armbru@redhat.com>
2016-03-21 21:29:01 +01:00
Markus Armbruster
ca0b7566cc ivshmem: Disentangle ivshmem_read()
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-24-git-send-email-armbru@redhat.com>
2016-03-21 21:29:01 +01:00
Markus Armbruster
cd9953f720 ivshmem: Simplify rejection of invalid peer ID from server
ivshmem_read() processes server messages.  These are 64 bit signed
integers.  -1 is shared memory setup, 16 bit unsigned is a peer ID,
anything else is invalid.

ivshmem_read() rejects invalid negative messages right away, silently.

Invalid positive messages get rejected only in resize_peers(), and
ivshmem_read() then prints the rather cryptic message "failed to
resize peers array".

Extend the first check to cover all invalid messages, make it report
"server sent invalid message", and drop the second check.

Now resize_peers() can't fail anymore; simplify.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-23-git-send-email-armbru@redhat.com>
2016-03-21 21:29:01 +01:00
Markus Armbruster
3c27969b3e ivshmem: Assert interrupts are set up once
An interrupt is set up when the interrupt's file descriptor is
received.  Each message applies to the next interrupt vector.
Therefore, each vector cannot be set up more than once.

ivshmem_add_kvm_msi_virq() half-heartedly tries not to rely on this by
doing nothing then, but that's not going to recover from this error
should it become possible in the future.  watch_vector_notifier()
doesn't even try.

Simply assert what is the case, so we get alerted if we ever screw it
up.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-22-git-send-email-armbru@redhat.com>
2016-03-21 21:29:01 +01:00
Markus Armbruster
2d1d422d11 ivshmem: Leave INTx alone when using MSI-X
The ivshmem device can either use MSI-X or legacy INTx for interrupts.

With MSI-X enabled, peer interrupt events trigger an MSI as they
should.  But software can still raise INTx via interrupt status and
mask register in BAR 0.  This is explicitly prohibited by PCI Local
Bus Specification Revision 3.0, section 6.8.3.3:

    While enabled for MSI or MSI-X operation, a function is prohibited
    from using its INTx# pin (if implemented) to request service (MSI,
    MSI-X, and INTx# are mutually exclusive).

Fix the device model to leave INTx alone when using MSI-X.

Document that we claim to use INTx in config space even when we don't.
Unlike other devices, ivshmem does *not* use INTx when configured for
MSI-X and MSI-X isn't enabled by software.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1458066895-20632-21-git-send-email-armbru@redhat.com>
2016-03-21 21:29:01 +01:00
Markus Armbruster
082751e82b ivshmem: Clean up MSI-X conditions
There are three predicates related to MSI-X:

* ivshmem_has_feature(s, IVSHMEM_MSI) is true unless the non-MSI-X
  variant of the device is selected with msi=off.

* msix_present() is true when the device has the PCI capability MSI-X.
  It's initially false, and becomes true during successful realize of
  the MSI-X variant of the device.  Thus, it's the same as
  ivshmem_has_feature(s, IVSHMEM_MSI) for realized devices.

* msix_enabled() is true when msix_present() is true and guest software
  has enabled MSI-X.

Code that differs between the non-MSI-X and the MSI-X variant of the
device needs to be guarded by ivshmem_has_feature(s, IVSHMEM_MSI) or
by msix_present(), except the latter works only for realized devices.

Code that depends on whether MSI-X is in use needs to be guarded with
msix_enabled().

Code review led me to two minor messes:

* ivshmem_vector_notify() calls msix_notify() even when
  !msix_enabled(), unlike most other MSI-X-capable devices.  As far as
  I can tell, msix_notify() does nothing when !msix_enabled().  Add
  the guard anyway.

* Most callers of ivshmem_use_msix() guard it with
  ivshmem_has_feature(s, IVSHMEM_MSI).  Not necessary, because
  ivshmem_use_msix() does nothing when !msix_present().  That's
  ivshmem's only use of msix_present(), though.  Guard it
  consistently, and drop the now redundant msix_present() check.
  While there, rename ivshmem_use_msix() to ivshmem_msix_vector_use().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1458066895-20632-20-git-send-email-armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2016-03-21 21:29:00 +01:00
Markus Armbruster
434ad76db5 ivshmem: Clean up register callbacks
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-19-git-send-email-armbru@redhat.com>
2016-03-21 21:29:00 +01:00
Markus Armbruster
d855e27565 ivshmem: Failed realize() can leave migration blocker behind
If pci_ivshmem_realize() fails after it created its migration blocker,
the blocker is left in place.  Fix that by creating it last.

Likewise, if it fails after it called fifo8_create(), it leaks fifo
memory.  Fix that the same way.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-18-git-send-email-armbru@redhat.com>
2016-03-21 21:29:00 +01:00
Markus Armbruster
9cf70c5225 ivshmem: Fix harmless misuse of Error
We reuse errp after passing it host_memory_backend_get_memory().  If
both host_memory_backend_get_memory() and the reuse set an error, the
reuse will fail the assertion in error_setv().  Fortunately,
host_memory_backend_get_memory() can't fail.

Pass it &error_abort to make our assumption explicit, and to get the
assertion failure in the right place should it become invalid.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-17-git-send-email-armbru@redhat.com>
2016-03-21 21:29:00 +01:00
Markus Armbruster
71c265816d ivshmem: Don't destroy the chardev on version mismatch
Yes, the chardev is commonly useless after we read a bad version from
it, but destroying it is inappropriate anyway: the user created it, so
the user should be able to hold on to it as long as he likes.  We
don't destroy it on other errors.  Screwed up in commit 5105b1d.

Stop reading instead.

Also note QEMU's behavior in ivshmem-spec.txt.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-16-git-send-email-armbru@redhat.com>
2016-03-21 21:29:00 +01:00
Markus Armbruster
c20fc0c3ee ivshmem: Drop ivshmem_event() stub
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-15-git-send-email-armbru@redhat.com>
2016-03-21 21:29:00 +01:00
Markus Armbruster
e64befe929 ivshmem: Clean up after commit 9940c32
IVShmemState member eventfd_chr is useless since commit 9940c32.  Drop
it.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-14-git-send-email-armbru@redhat.com>
2016-03-21 21:29:00 +01:00
Markus Armbruster
a4fa93bf20 ivshmem: Compile debug prints unconditionally to prevent bit-rot
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-13-git-send-email-armbru@redhat.com>
2016-03-21 21:29:00 +01:00
Markus Armbruster
97553976dd ivshmem: Add missing newlines to debug printfs
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-12-git-send-email-armbru@redhat.com>
2016-03-21 21:29:00 +01:00
Markus Armbruster
fdee2025dd ivshmem: Rewrite specification document
This started as an attempt to update ivshmem_device_spec.txt for
clarity, accuracy and completeness while working on its code, and
quickly became a full rewrite.  Since the diff would be useless
anyway, I'm using the opportunity to rename the file to
ivshmem-spec.txt.

I tried hard to ensure the new text contradicts neither the old text
nor the code.  If the new text contradicts the old text but not the
code, it's probably a bug in the old text.  If the new text
contradicts both, its probably a bug in the new text.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-11-git-send-email-armbru@redhat.com>
2016-03-21 21:28:59 +01:00
Markus Armbruster
41b65e5eda ivshmem-test: Improve test cases /ivshmem/server-*
Document missing test: behavior with MSI-X present but not enabled.

For MSI-X, we test and clear the interrupt pending bit before testing
the interrupt.  For INTx, we only clear.  Change to test and clear for
consistency.

Test MSI-X vector 1 in addition to vector 0.

Improve comments.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-10-git-send-email-armbru@redhat.com>
2016-03-21 21:28:59 +01:00
Markus Armbruster
14c5d49ab3 ivshmem-test: Clean up wait for devices to become operational
test_ivshmem_server() waits until the first byte in BAR 2 contains the
0x42 we put into shared memory.  Works because the byte reads zero
until the device maps the shared memory gotten from the server.

Check the IVPosition register instead: it's initially -1, and becomes
non-negative right when the device maps the share memory, so no
change, just cleaner, because it's what guest software is supposed to
do.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-9-git-send-email-armbru@redhat.com>
2016-03-21 21:28:59 +01:00
Markus Armbruster
4958fe5d3c ivshmem-test: Improve test case /ivshmem/single
Test state of registers after reset.

Test reading Interrupt Status clears it.

Test (invalid) read of Doorbell.

Add more comments.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-8-git-send-email-armbru@redhat.com>
2016-03-21 21:28:59 +01:00
Markus Armbruster
998261726a tests/libqos/pci-pc: Fix qpci_pc_iomap() to map BARs aligned
qpci_pc_iomap() maps BARs one after the other, without padding.  This
is wrong.  PCI Local Bus Specification Revision 3.0, 6.2.5.1. Address
Maps: "all address spaces used are a power of two in size and are
naturally aligned".  That's because the size of a BAR is given by the
number of address bits the device decodes, and the BAR needs to be
mapped at a multiple of that size to ensure the address decoding
works.

Fix qpci_pc_iomap() accordingly.  This takes care of a FIXME in
ivshmem-test.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-7-git-send-email-armbru@redhat.com>
2016-03-21 21:28:59 +01:00
Markus Armbruster
330b58368c event_notifier: Make event_notifier_init_fd() #ifdef CONFIG_EVENTFD
Event notifiers are designed for eventfd(2).  They can fall back to
pipes, but according to Paolo, event_notifier_init_fd() really
requires the real thing, and should therefore be under #ifdef
CONFIG_EVENTFD.  Do that.

Its only user is ivshmem, which is currently CONFIG_POSIX.  Narrow it
to CONFIG_EVENTFD.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1458066895-20632-6-git-send-email-armbru@redhat.com>
2016-03-21 21:28:59 +01:00
Peter Maydell
9fa570d57e Merge remote-tracking branch 'remotes/berrange/tags/pull-crypto-2016-03-21-1' into staging
Merge crypto 2016/03/21 v1

# gpg: Signature made Mon 21 Mar 2016 10:05:51 GMT using RSA key ID 15104FDF
# gpg: Good signature from "Daniel P. Berrange <dan@berrange.com>"
# gpg:                 aka "Daniel P. Berrange <berrange@redhat.com>"

* remotes/berrange/tags/pull-crypto-2016-03-21-1:
  crypto: fix cipher function signature mismatch with nettle & xts
  crypto: add compat cast5_set_key with  nettle < 3.0.0

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-21 10:19:12 +00:00
Daniel P. Berrange
f7ac78cfe1 crypto: fix cipher function signature mismatch with nettle & xts
For versions of nettle < 3.0.0, the cipher functions took a
'void *ctx' and 'unsigned len' instad of 'const void *ctx'
and 'size_t len'. The xts functions though are builtin to
QEMU and always expect the latter signatures. Define a
second set of wrappers to use with the correct signatures
needed by XTS mode.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-03-21 10:03:45 +00:00
Daniel P. Berrange
621e6ae657 crypto: add compat cast5_set_key with nettle < 3.0.0
Prior to the nettle 3.0.0 release, the cast5_set_key function
was actually named cast128_set_key, so we must add a compatibility
definition.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-03-21 10:02:22 +00:00
Stefan Hajnoczi
a284974dee qemu-ga: drop unused local err variable
Commit 125b310e1d ("qemu-ga: move
channel/transport functionality into wrapper class") stopped using the
local err variable in channel_event_cb().

This patch deletes the unused variable.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2016-03-20 19:51:18 -05:00
Peter Maydell
4829e0378d Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2016-03-18' into staging
QAPI patches for 2016-03-18

# gpg: Signature made Fri 18 Mar 2016 09:54:57 GMT using RSA key ID EB918653
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>"
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>"

* remotes/armbru/tags/pull-qapi-2016-03-18:
  qapi: Use anonymous bases in QMP flat unions
  qapi: Allow anonymous base for flat union
  qapi: Make BlockdevOptions doc example closer to reality
  qapi: Don't special-case simple union wrappers
  qapi: Drop unused c_null()
  qapi: Inline gen_visit_members() into lone caller
  qapi-commands: Inline single-use helpers of gen_marshal()
  qapi-commands: Utilize implicit struct visits
  qapi-event: Utilize implicit struct visits
  qapi-event: Drop qmp_output_get_qobject() null check
  qapi: Emit implicit structs in generated C
  qapi: Adjust names of implicit types
  qapi: Make c_type() more OO-like
  qapi: Fix command with named empty argument type
  qapi: Assert in places where variants are not handled

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-18 17:18:41 +00:00
Markus Armbruster
ad4929384b qemu-doc: Fix ivshmem huge page example
Option parameter "share" is missing.  Without it, you get a *private*
mmap(), which defeats ivshmem's purpose pretty thoroughly ;)

While there, switch to the conventional mountpoint of hugetlbfs
/dev/hugepages.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1458066895-20632-5-git-send-email-armbru@redhat.com>
2016-03-18 17:34:55 +01:00
Markus Armbruster
3625c739ea ivshmem-server: Don't overload POSIX shmem and file name
Option -m NAME is interpreted as directory name if we can statfs() it
and its on hugetlbfs.  Else it's interpreted as POSIX shared memory
object name.  This is nuts.

Always interpret -m as directory.  Create new -M for POSIX shared
memory.  Last of -m or -M wins.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1458066895-20632-4-git-send-email-armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2016-03-18 17:34:40 +01:00
Markus Armbruster
e3ad72965a ivshmem-server: Fix and clean up command line help
Burying error messages in ~20 lines of usage help is bad form.  Print
a single line pointing to -h instead.

Print -h help to stdout rather than stderr.  Fix default of -p.  Clean
up the help text a bit.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <1458066895-20632-3-git-send-email-armbru@redhat.com>
2016-03-18 17:34:40 +01:00
Markus Armbruster
3be5cc2324 target-ppc: Document TOCTTOU in hugepage support
The code to find the minimum page size is is vulnerable to TOCTTOU.
Added in commit 2d103aa "target-ppc: fix hugepage support when using
memory-backend-file" (v2.4.0).  Since I can't fix it myself right now,
add a FIXME comment.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1458066895-20632-2-git-send-email-armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2016-03-18 17:34:21 +01:00
Prasad J Pandit
dff0367cf6 usb: ehci: add capability mmio write function
USB Ehci emulation supports host controller capability registers.
But its mmio '.write' function was missing, which lead to a null
pointer dereference issue. Add a do nothing 'ehci_caps_write'
definition to avoid it; Do nothing because capability registers
are Read Only(RO).

Reported-by: Zuozhi Fzz <zuozhi.fzz@alibaba-inc.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-id: 1454072434-16045-1-git-send-email-ppandit@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-03-18 14:20:39 +01:00
Matthew Fortune
983bff3530 hw/usb/dev-mtp: Guard inotify usage with CONFIG_INOTIFY1
inotify_init1 usage was guarded by a check for linux but does not
exist on older distributions like CentOS 5 resulting in build
failures.

Signed-off-by: Matthew Fortune <matthew.fortune@imgtec.com>
Message-id: 6D39441BF12EF246A7ABCE6654B023536BB85D4A@hhmail02.hh.imgtec.org
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-03-18 13:58:15 +01:00
Peter Xu
f34d57d359 usb: fix unbound stack warning for inotify_watchfn
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1457503640-31473-1-git-send-email-peterx@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-03-18 13:56:24 +01:00
Peter Xu
e3d60bc7c6 usb: fix unbound stack usage for usb_mtp_add_str
Use heap instead of stack.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-03-18 13:55:16 +01:00
Peter Xu
182b391e79 usb: fix unbounded stack warning for xhci_dma_write_u32s
All the callers for xhci_dma_write_u32s() are using mostly 5 * uint32_t
in len. To avoid unbound stack warning for the function, make it
statically allocated, and assert when it's not big enough in the
future.

Signed-off-by: Peter Xu <peterx@redhat.com>
Message-id: 1457661106-9569-1-git-send-email-peterx@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-03-18 13:42:14 +01:00
Stefan Weil
0ab6d12ffd usb: Fix compilation for Windows
Mingw-w64 does not provide sys/ioctl.h and Linux builds don't need it,
so remove that include statement.

ERROR is defined by wingdi.h (included via windows.h). Undefine it before
it is redefined to avoid a compiler warning / error.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Message-id: 1458159439-32322-1-git-send-email-sw@weilnetz.de
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-03-18 13:13:30 +01:00
Eric Blake
3666a97f78 qapi: Use anonymous bases in QMP flat unions
Now that the generator supports it, we might as well use an
anonymous base rather than breaking out a single-use Base
structure, for all three of our current QMP flat unions.

Oddly enough, this change does not affect the resulting
introspection output (because we already inline the members of
a base type into an object, and had no independent use of the
base type reachable from a command).

The case_whitelist now has to list the name of an implicit
type; which is not too bad (consider it a feature if it makes
it harder for developers to make the whitelist grow :)

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1458254921-17042-16-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-03-18 10:29:26 +01:00
Eric Blake
ac4338f8eb qapi: Allow anonymous base for flat union
Rather than requiring all flat unions to explicitly create
a separate base struct, we can allow the qapi schema to specify
the common members via an inline dictionary. This is similar to
how commands can specify an inline anonymous type for its 'data'.
We already have several struct types that only exist to serve as
a single flat union's base; the next commit will clean them up.
In particular, this patch's change to the BlockdevOptions example
in qapi-code-gen.txt will actually be done in the real QAPI schema.

Now that anonymous bases are legal, we need to rework the
flat-union-bad-base negative test (as previously written, it
forms what is now valid QAPI; tweak it to now provide coverage
of a new error message path), and add a positive test in
qapi-schema-test to use an anonymous base (making the integer
argument optional, for even more coverage).

Note that this patch only allows anonymous bases for flat unions;
simple unions are already enough syntactic sugar that we do not
want to burden them further.  Meanwhile, while it would be easy
to also allow an anonymous base for structs, that would be quite
redundant, as the members can be put right into the struct
instead.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1458254921-17042-15-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-03-18 10:29:26 +01:00
Eric Blake
bd59adce69 qapi: Make BlockdevOptions doc example closer to reality
Although we don't want to repeat the entire BlockdevOptions
QMP command in the example, it helps if we aren't needlessly
diverging (the initial example was written before we had
committed the actual QMP interface).  Use names that match what
is found in qapi/block-core.json, such as '*read-only' rather
than 'readonly', or 'BlockdevRef' rather than 'BlockRef'.

For the simple union example, invent BlockdevOptionsSimple so
that later text is unambiguous which of the two union forms is
meant (telling the user to refer back to two 'BlockdevOptions'
wasn't nice, and QMP has only the flat union form).

Also, mention that the discriminator of a flat union is
non-optional.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1458254921-17042-14-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-03-18 10:29:26 +01:00
Eric Blake
32bafa8fdd qapi: Don't special-case simple union wrappers
Simple unions were carrying a special case that hid their 'data'
QMP member from the resulting C struct, via the hack method
QAPISchemaObjectTypeVariant.simple_union_type().  But by using
the work we started by unboxing flat union and alternate
branches, coupled with the ability to visit the members of an
implicit type, we can now expose the simple union's implicit
type in qapi-types.h:

| struct q_obj_ImageInfoSpecificQCow2_wrapper {
|     ImageInfoSpecificQCow2 *data;
| };
|
| struct q_obj_ImageInfoSpecificVmdk_wrapper {
|     ImageInfoSpecificVmdk *data;
| };
...
| struct ImageInfoSpecific {
|     ImageInfoSpecificKind type;
|     union { /* union tag is @type */
|         void *data;
|-        ImageInfoSpecificQCow2 *qcow2;
|-        ImageInfoSpecificVmdk *vmdk;
|+        q_obj_ImageInfoSpecificQCow2_wrapper qcow2;
|+        q_obj_ImageInfoSpecificVmdk_wrapper vmdk;
|     } u;
| };

Doing this removes asymmetry between QAPI's QMP side and its
C side (both sides now expose 'data'), and means that the
treatment of a simple union as sugar for a flat union is now
equivalent in both languages (previously the two approaches used
a different layer of dereferencing, where the simple union could
be converted to a flat union with equivalent C layout but
different {} on the wire, or to an equivalent QMP wire form
but with different C representation).  Using the implicit type
also lets us get rid of the simple_union_type() hack.

Of course, now all clients of simple unions have to adjust from
using su->u.member to using su->u.member.data; while this touches
a number of files in the tree, some earlier cleanup patches
helped minimize the change to the initialization of a temporary
variable rather than every single member access.  The generated
qapi-visit.c code is also affected by the layout change:

|@@ -7393,10 +7393,10 @@ void visit_type_ImageInfoSpecific_member
|     }
|     switch (obj->type) {
|     case IMAGE_INFO_SPECIFIC_KIND_QCOW2:
|-        visit_type_ImageInfoSpecificQCow2(v, "data", &obj->u.qcow2, &err);
|+        visit_type_q_obj_ImageInfoSpecificQCow2_wrapper_members(v, &obj->u.qcow2, &err);
|         break;
|     case IMAGE_INFO_SPECIFIC_KIND_VMDK:
|-        visit_type_ImageInfoSpecificVmdk(v, "data", &obj->u.vmdk, &err);
|+        visit_type_q_obj_ImageInfoSpecificVmdk_wrapper_members(v, &obj->u.vmdk, &err);
|         break;
|     default:
|         abort();

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1458254921-17042-13-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-03-18 10:29:26 +01:00
Eric Blake
861877a0dd qapi: Drop unused c_null()
Now that we are always bulk-initializing a QAPI C struct to 0
(whether by g_malloc0() or by 'Type arg = {0};'), we no longer
have any clients of c_null() in the generator for per-element
initialization.  This patch is easy enough to revert if we find
a use in the future, but in the present, get rid of the dead code.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1458254921-17042-12-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-03-18 10:29:26 +01:00
Eric Blake
12f254fd5f qapi: Inline gen_visit_members() into lone caller
Commit 82ca8e46 noticed that we had multiple implementations of
visiting every member of a struct, and consolidated it into
gen_visit_fields() (now gen_visit_members()) with enough
parameters to cater to slight differences between the clients.
But recent exposure of implicit types has meant that we are now
down to a single use of that method, so we can clean up the
unused conditionals and just inline it into the remaining
caller: gen_visit_object_members().

Likewise, gen_err_check() no longer needs optional parameters,
as the lone use of non-defaults was via gen_visit_members().

No change to generated code.

Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1458254921-17042-11-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-03-18 10:29:26 +01:00
Eric Blake
c1ff0e6c85 qapi-commands: Inline single-use helpers of gen_marshal()
Originally, gen_marshal_input_visit() (or gen_visitor_input_block()
before commit f1538019) was factored out to make it easy to do two
passes of a visit to each member of a (possibly-implicit) object,
without duplicating lots of code.  But after recent changes, those
visits now occupy a single line of emitted code, and the helper
method has become a series of conditionals both before and after
the one important line, making it rather awkward to see at a glance
what gets emitted on the first (parsing) or second (deallocation)
pass.  It's a lot easier to read the generator code if we just
inline both uses directly into gen_marshal(), without all the
conditionals.

Once we've done that, it's easy to notice that gen_marshal_vars()
is used only once, and inlining it too lets us consolidate some
mcgen() calls that used to be split across helpers.

gen_call() remains a single-use helper function, but it has
enough indentation and complexity that inlining it would hamper
legibility.

No change to generated output.  The fact that the diffstat shows
a net reduction in lines is an argument in favor of this cleanup.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1458254921-17042-10-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-03-18 10:29:26 +01:00
Eric Blake
386230a249 qapi-commands: Utilize implicit struct visits
Rather than generate inline per-member visits, take advantage
of the 'visit_type_FOO_members()' function for command
marshalling.  This is possible now that implicit structs can be
visited like any other.  Generate call arguments from a stack-
allocated struct, rather than a list of local variables:

|@@ -57,26 +57,15 @@ void qmp_marshal_add_fd(QDict *args, QOb
|     QmpInputVisitor *qiv = qmp_input_visitor_new_strict(QOBJECT(args));
|     QapiDeallocVisitor *qdv;
|     Visitor *v;
|-    bool has_fdset_id = false;
|-    int64_t fdset_id = 0;
|-    bool has_opaque = false;
|-    char *opaque = NULL;
|+    q_obj_add_fd_arg arg = {0};
|
|     v = qmp_input_get_visitor(qiv);
|-    if (visit_optional(v, "fdset-id", &has_fdset_id)) {
|-        visit_type_int(v, "fdset-id", &fdset_id, &err);
|-        if (err) {
|-            goto out;
|-        }
|-    }
|-    if (visit_optional(v, "opaque", &has_opaque)) {
|-        visit_type_str(v, "opaque", &opaque, &err);
|-        if (err) {
|-            goto out;
|-        }
|+    visit_type_q_obj_add_fd_arg_members(v, &arg, &err);
|+    if (err) {
|+        goto out;
|     }
|
|-    retval = qmp_add_fd(has_fdset_id, fdset_id, has_opaque, opaque, &err);
|+    retval = qmp_add_fd(arg.has_fdset_id, arg.fdset_id, arg.has_opaque, arg.opaque, &err);
|     if (err) {
|         goto out;
|     }
|@@ -88,12 +77,7 @@ out:
|     qmp_input_visitor_cleanup(qiv);
|     qdv = qapi_dealloc_visitor_new();
|     v = qapi_dealloc_get_visitor(qdv);
|-    if (visit_optional(v, "fdset-id", &has_fdset_id)) {
|-        visit_type_int(v, "fdset-id", &fdset_id, NULL);
|-    }
|-    if (visit_optional(v, "opaque", &has_opaque)) {
|-        visit_type_str(v, "opaque", &opaque, NULL);
|-    }
|+    visit_type_q_obj_add_fd_arg_members(v, &arg, NULL);
|     qapi_dealloc_visitor_cleanup(qdv);
| }

This also has the nice side effect of eliminating a chance of
collision between argument QMP names and local variables.

This patch also paves the way for some followup simplifications
in the generator, in subsequent patches.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1458254921-17042-9-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-03-18 10:29:25 +01:00
Eric Blake
0949e95b48 qapi-event: Utilize implicit struct visits
Rather than generate inline per-member visits, take advantage
of the 'visit_type_FOO_members()' function for emitting events.
This is possible now that implicit structs can be visited like
any other.  Generated code shrinks accordingly; by initializing
a struct based on parameters, through a new gen_param_var()
helper, like:

|@@ -338,6 +250,9 @@ void qapi_event_send_block_job_error(con
|     QMPEventFuncEmit emit = qmp_event_get_func_emit();
|     QmpOutputVisitor *qov;
|     Visitor *v;
|+    q_obj_BLOCK_JOB_ERROR_arg param = {
|+        (char *)device, operation, action
|+    };
|
|     if (!emit) {
|         return;
@@ -351,19 +266,7 @@ void qapi_event_send_block_job_error(con
|     if (err) {
|         goto out;
|     }
|-    visit_type_str(v, "device", (char **)&device, &err);
|-    if (err) {
|-        goto out_obj;
|-    }
|-    visit_type_IoOperationType(v, "operation", &operation, &err);
|-    if (err) {
|-        goto out_obj;
|-    }
|-    visit_type_BlockErrorAction(v, "action", &action, &err);
|-    if (err) {
|-        goto out_obj;
|-    }
|-out_obj:
|+    visit_type_q_obj_BLOCK_JOB_ERROR_arg_members(v, &param, &err);
|     visit_end_struct(v, err ? NULL : &err);

Notice that the initialization of 'param' has to cast away const
(just as the old gen_visit_members() had to do): we can't change
the signature of the user function (which uses 'const char *'), but
have to assign it to a non-const QAPI object (which requires
'char *').

While touching this, document with a FIXME comment that there is
still a potential collision between QMP members and our choice of
local variable names within qapi_event_send_FOO().

This patch also paves the way for some followup simplifications
in the generator, in subsequent patches.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1458254921-17042-8-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-03-18 10:29:25 +01:00
Eric Blake
8df59565d2 qapi-event: Drop qmp_output_get_qobject() null check
qmp_output_get_qobject() was changed never to return null some time
ago (in commit 6c2f9a15), but the qapi_event_send_FOO() functions
still check.  Clean that up:

|@@ -28,7 +28,6 @@ void qapi_event_send_acpi_device_ost(ACP
|     QMPEventFuncEmit emit;
|     QmpOutputVisitor *qov;
|     Visitor *v;
|-    QObject *obj;
|
|     emit = qmp_event_get_func_emit();
|     if (!emit) {
|@@ -54,10 +53,7 @@ out_obj:
|         goto out;
|     }
|
|-    obj = qmp_output_get_qobject(qov);
|-    g_assert(obj);
|-
|-    qdict_put_obj(qmp, "data", obj);
|+    qdict_put_obj(qmp, "data", qmp_output_get_qobject(qov));
|     emit(QAPI_EVENT_ACPI_DEVICE_OST, qmp, &err);
|
| out:

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1458254921-17042-7-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-03-18 10:29:25 +01:00
Eric Blake
7ce106a96f qapi: Emit implicit structs in generated C
We already have several places that want to visit all the members
of an implicit object within a larger context (simple union variant,
event with anonymous data, command with anonymous arguments struct);
and will be adding another one soon (the ability to declare an
anonymous base for a flat union).  Having a C struct declared for
these implicit types, along with a visit_type_FOO_members() helper
function, will make for fewer special cases in our generator.

We do not, however, need qapi_free_FOO() or visit_type_FOO()
functions for implicit types, because they should not be used
directly outside of the generated code.  This is done by adding a
conditional in visit_object_type() for both qapi-types.py and
qapi-visit.py based on the object name.  The comparison of
"name.startswith('q_')" is a bit hacky (it's basically duplicating
what .is_implicit() already uses), but beats changing the signature
of the visit_object_type() callback to pass a new 'implicit' flag.
The hack should be temporary: we are considering adding a future
patch that consolidates the narrow visit_object_type(..., base,
local_members, variants) and visit_object_type_flat(...,
all_members, variants) [where different sets of information are
already broken out, and the QAPISchemaObjectType is no longer
available] into a broader visit_object_type(obj_type) [where the
visitor can query the needed fields from obj_type directly].

Also, now that we WANT to output C code for implicits, we no longer
need the visit_needed() filter, leaving 'q_empty' as the only object
still needing a special case.  Remember, 'q_empty' is the only
built-in generated object, which means that without a special case
it would be emitted in multiple files (the main qapi-types.h and in
qga-qapi-types.h) causing compilation failure due to redefinition.
But since it has no members, it's easier to just avoid an attempt to
visit that particular type; since gen_object() is called recursively,
we also prime the objects_seen set to cover any recursion into the
empty type.

The patch relies on the changed naming of implicit types in the
previous patch.  It is a bit unfortunate that the generated struct
names and visit_type_FOO_members() don't match normal naming
conventions, but it's not too bad, since they will only be used in
generated code.

The generated code grows substantially in size: the implicit
'-wrapper' types must be emitted in qapi-types.h before any union
can include an unboxed member of that type.  Arguably, the '-args'
types could be emitted in a private header for just qapi-visit.c
and qmp-marshal.c, rather than polluting qapi-types.h; but adding
complexity to the generator to split the output location according
to role doesn't seem worth the maintenance costs.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1458254921-17042-6-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-03-18 10:29:25 +01:00
Eric Blake
7599697c66 qapi: Adjust names of implicit types
The original choice of ':obj-' as the prefix for implicit types
made it obvious that we weren't going to clash with any user-defined
names, which cannot contain ':'.  But now we want to create structs
for implicit types, to get rid of special cases in the generators,
and our use of ':' in implicit names needs a tweak to produce valid
C code.

We could transliterate ':' to '_', except that C99 mandates that
"identifiers that begin with an underscore are always reserved for
use as identifiers with file scope in both the ordinary and tag name
spaces".  So it's time to change our naming convention: we can
instead use the 'q_' prefix that we reserved for ourselves back in
commit 9fb081e0.  Technically, since we aren't planning on exposing
the empty type in generated code, we could keep the name ':empty',
but renaming it to 'q_empty' makes the check for startswith('q_')
cover all implicit types, whether or not code is generated for them.

As long as we don't declare 'empty' or 'obj' ticklish, it shouldn't
clash with c_name() prepending 'q_' to the user's ticklish names.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1458254921-17042-5-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-03-18 10:29:25 +01:00
Eric Blake
4040d995e4 qapi: Make c_type() more OO-like
QAPISchemaType.c_type() is a bit awkward: it takes two optional
boolean flags is_param and is_unboxed, and they should never both
be True.

Add a new method for each of the flags, and drop the flags from
c_type().

Most callers pass no flags; they remain unchanged.

One caller passes is_param=True; call the new .c_param_type()
instead.

One caller passes is_unboxed=True, except for simple union types.
This is actually an ugly special case that will go away soon, so
until then, we now have to call either .c_type() or the new
.c_unboxed_type().  Tolerable in the interim.

It requires slightly more Python, but is arguably easier to read.

Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1458254921-17042-4-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-03-18 10:29:25 +01:00
Eric Blake
972a110162 qapi: Fix command with named empty argument type
The generator special-cased

 { 'command':'foo', 'data': {} }

to avoid emitting a visitor variable, but failed to see that

 { 'struct':'NamedEmptyType, 'data': {} }
 { 'command':'foo', 'data':'NamedEmptyType' }

needs the same treatment.  There, the generator happily generates a
visitor to get no arguments, and a visitor to destroy no arguments;
and the compiler isn't happy with that, as demonstrated by the updated
qapi-schema-test.json:

  tests/test-qmp-marshal.c: In function ‘qmp_marshal_user_def_cmd0’:
  tests/test-qmp-marshal.c:264:14: error: variable ‘v’ set but not used [-Werror=unused-but-set-variable]
       Visitor *v;
                ^

No change to generated code except for the testsuite addition.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1458254921-17042-3-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-03-18 10:29:25 +01:00
Eric Blake
29f6bd15eb qapi: Assert in places where variants are not handled
We are getting closer to the point where we could use one union
as the base or variant type within another union type (as long
as there are no collisions between any possible combination of
member names allowed across all discriminator choices).  But
until we get to that point, it is worth asserting that variants
are not present in places where we are not prepared to handle
them: when exploding a type into a parameter list, we do not
expect variants.  The qapi.py code is already checking this,
via the older check_type() method; but someday we hope to get
rid of that and move checking into QAPISchema*.check().  The
two asserts added here make sure any refactoring still catches
problems, and makes it locally obvious why we can iterate over
only type.members without worrying about type.variants.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1458254921-17042-2-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-03-18 10:29:25 +01:00
Peter Maydell
879c26fb9f Merge remote-tracking branch 'remotes/berrange/tags/pull-qcrypto-2016-03-17-3' into staging
Merge QCrypto 2016/03/17 v3

# gpg: Signature made Thu 17 Mar 2016 16:51:32 GMT using RSA key ID 15104FDF
# gpg: Good signature from "Daniel P. Berrange <dan@berrange.com>"
# gpg:                 aka "Daniel P. Berrange <berrange@redhat.com>"

* remotes/berrange/tags/pull-qcrypto-2016-03-17-3:
  crypto: implement the LUKS block encryption format
  crypto: add block encryption framework
  crypto: wire up XTS mode for cipher APIs
  crypto: refactor code for dealing with AES cipher
  crypto: import an implementation of the XTS cipher mode
  crypto: add support for the twofish cipher algorithm
  crypto: add support for the serpent cipher algorithm
  crypto: add support for the cast5-128 cipher algorithm
  crypto: skip testing of unsupported cipher algorithms
  crypto: add support for anti-forensic split algorithm
  crypto: add support for generating initialization vectors
  crypto: add support for PBKDF2 algorithm
  crypto: add cryptographic random byte source

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-17 16:57:50 +00:00
Daniel P. Berrange
3e308f20ed crypto: implement the LUKS block encryption format
Provide a block encryption implementation that follows the
LUKS/dm-crypt specification.

This supports all combinations of hash, cipher algorithm,
cipher mode and iv generator that are implemented by the
current crypto layer.

There is support for opening existing volumes formatted
by dm-crypt, and for formatting new volumes. In the latter
case it will only use key slot 0.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-03-17 16:50:40 +00:00
Peter Maydell
6741d38ad0 Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches

# gpg: Signature made Thu 17 Mar 2016 15:49:29 GMT using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"

* remotes/kevin/tags/for-upstream: (29 commits)
  iotests: Test QUORUM_REPORT_BAD in fifo mode
  quorum: Emit QUORUM_REPORT_BAD for reads in fifo mode
  block: Use blk_co_pwritev() in blk_co_write_zeroes()
  block: Use blk_aio_prwv() for aio_read/write/write_zeroes
  block: Use blk_prw() in blk_pread()/blk_pwrite()
  block: Use blk_co_pwritev() in blk_write_zeroes()
  block: Pull up blk_read_unthrottled() implementation
  block: Use blk_co_pwritev() for blk_write()
  block: Use blk_co_preadv() for blk_read()
  block: Use BdrvChild in BlockBackend
  block: Remove bdrv_states list
  block: Use bdrv_next() instead of bdrv_states
  block: Rewrite bdrv_next()
  block: Add blk_next_root_bs()
  block: Add bdrv_next_monitor_owned()
  block: Move some bdrv_*_all() functions to BB
  blockdev: Remove blk_hide_on_behalf_of_hmp_drive_del()
  blockdev: Split monitor reference from BB creation
  blockdev: Separate BB name management
  blockdev: Add list of all BlockBackends
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-17 15:59:42 +00:00
Kevin Wolf
361dca7a5a Merge remote-tracking branch 'mreitz/tags/pull-block-for-kevin-2016-03-17-v2' into queue-block
Two quorum patches for the block queue, v2.

# gpg: Signature made Thu Mar 17 16:44:11 2016 CET using RSA key ID E838ACAD
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>"

* mreitz/tags/pull-block-for-kevin-2016-03-17-v2:
  iotests: Test QUORUM_REPORT_BAD in fifo mode
  quorum: Emit QUORUM_REPORT_BAD for reads in fifo mode

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 16:48:49 +01:00
Alberto Garcia
509565f36f iotests: Test QUORUM_REPORT_BAD in fifo mode
Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: c0a8dbfdbe939520cda5f661af6f1cd7b6b4df9d.1458034554.git.berto@igalia.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-03-17 16:43:30 +01:00
Alberto Garcia
6049490df4 quorum: Emit QUORUM_REPORT_BAD for reads in fifo mode
If there's an I/O error in one of Quorum children then QEMU
should emit QUORUM_REPORT_BAD. However this is not working with
read-pattern=fifo. This patch fixes this problem.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: d57e39e8d3e8564003a1e2aadbd29c97286eb2d2.1458034554.git.berto@igalia.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-03-17 16:43:30 +01:00
Kevin Wolf
8896e08814 block: Use blk_co_pwritev() in blk_co_write_zeroes()
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 16:30:00 +01:00
Kevin Wolf
57d6a42883 block: Use blk_aio_prwv() for aio_read/write/write_zeroes
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 16:30:00 +01:00
Kevin Wolf
a55d3fba99 block: Use blk_prw() in blk_pread()/blk_pwrite()
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:57 +01:00
Kevin Wolf
fc1453cdfc block: Use blk_co_pwritev() in blk_write_zeroes()
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:57 +01:00
Kevin Wolf
5bd5119667 block: Pull up blk_read_unthrottled() implementation
Use blk_read(), so that it goes through blk_co_preadv() like all read
requests from the BB to the BDS.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:57 +01:00
Kevin Wolf
a8823a3bfd block: Use blk_co_pwritev() for blk_write()
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:57 +01:00
Kevin Wolf
1bf1cbc91f block: Use blk_co_preadv() for blk_read()
This patch introduces blk_co_preadv() as a central function on the
BlockBackend level that is supposed to handle all read requests from the
BB to its root BDS eventually.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:57 +01:00
Kevin Wolf
f21d96d04b block: Use BdrvChild in BlockBackend
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:57 +01:00
Max Reitz
9aaf28c61d block: Remove bdrv_states list
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:57 +01:00
Max Reitz
79720af640 block: Use bdrv_next() instead of bdrv_states
There is no point in manually iterating through the bdrv_states list
when there is bdrv_next().

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:57 +01:00
Max Reitz
2626058034 block: Rewrite bdrv_next()
Instead of using the bdrv_states list, iterate over all the
BlockDriverStates attached to BlockBackends, and over all the
monitor-owned BDSs afterwards (except for those attached to a BB).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Max Reitz
981f4f578e block: Add blk_next_root_bs()
This function iterates over all BDSs attached to a BB. We are going to
need it when rewriting bdrv_next() so it no longer uses bdrv_states.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Max Reitz
262b4e8f74 block: Add bdrv_next_monitor_owned()
Add a function for iterating over all monitor-owned BlockDriverStates so
the generic block layer can do so.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Max Reitz
fe1a9cbc33 block: Move some bdrv_*_all() functions to BB
Move bdrv_commit_all() and bdrv_flush_all() to the BlockBackend level.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Max Reitz
7c735873d9 blockdev: Remove blk_hide_on_behalf_of_hmp_drive_del()
We can basically inline it in hmp_drive_del(); monitor_remove_blk() is
called already, so we just need to call bdrv_make_anon(), too.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Max Reitz
efaa7c4eeb blockdev: Split monitor reference from BB creation
Before this patch, blk_new() automatically assigned a name to the new
BlockBackend and considered it referenced by the monitor. This patch
removes the implicit monitor_add_blk() call from blk_new() (and
consequently the monitor_remove_blk() call from blk_delete(), too) and
thus blk_new() (and related functions) no longer take a BB name
argument.

In fact, there is only a single point where blk_new()/blk_new_open() is
called and the new BB is monitor-owned, and that is in blockdev_init().
Besides thus relieving us from having to invent names for all of the BBs
we use in qemu-img, this fixes a bug where qemu cannot create a new
image if there already is a monitor-owned BB named "image".

If a BB and its BDS tree are created in a single operation, as of this
patch the BDS tree will be created before the BB is given a name
(whereas it was the other way around before). This results in minor
change to the output of iotest 087, whose reference output is amended
accordingly.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Max Reitz
e5e785500b blockdev: Separate BB name management
Introduce separate functions (monitor_add_blk() and
monitor_remove_blk()) which set or unset a BB name. Since the name is
equivalent to the monitor's reference to a BB, adding a name the same as
declaring the BB to be monitor-owned and removing it revokes this
status, hence the function names.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Max Reitz
2cf22d6a1a blockdev: Add list of all BlockBackends
While monitor_block_backends contains nearly all BBs, we sometimes
really need all BBs. To this end, this patch adds the block_backend
list.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Max Reitz
9492b0b928 blockdev: Rename blk_backends
The blk_backends list does not contain all BlockBackends but only the
ones which are referenced by the monitor, and that is not necessarily
true for every BlockBackend. Rename the list to monitor_block_backends
to make that fact clear.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Max Reitz
d0e46a5577 block: Drop BB name from bad option error
The information which BB is concerned does not seem useful enough to
justify its existence in most other place (which may be related to qemu
printing the -drive parameter in question anyway, and for blockdev-add
the attribution is naturally unambiguous). Furthermore, as of a future
patch, bdrv_get_device_name(bs) will always return the empty string
before bdrv_open_inherit() returns.

Therefore, just dropping that information seems to be the best course of
action.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Max Reitz
a55448b368 qapi: Drop QERR_UNKNOWN_BLOCK_FORMAT_FEATURE
Just specifying a custom string is simpler in basically all places that
used it, and in addition, specifying the BB or node name is something we
generally do not do in other error messages when opening a BDS, so we
should not do it here.

This changes the output for iotest 036 (to the better, in my opinion),
so the reference output needs to be changed accordingly.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Max Reitz
da31d594cf block: Use blk_{commit,flush}_all() consistently
Replace bdrv_commmit_all() and bdrv_flush_all() by their BlockBackend
equivalents.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Max Reitz
1393f21270 block: Add blk_commit_all()
Later, we will remove bdrv_commit_all() and move its contents here, and
in order to replace bdrv_commit_all() calls by calls to blk_commit_all()
before doing so, we need to add it as an alias now.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Max Reitz
74d1b8fc27 block: Use blk_next() in block-backend.c
Instead of iterating directly through blk_backends, we can use
blk_next() instead. This gives us some abstraction from the list itself
which we can use to rename it, for example.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Max Reitz
da27a00e27 monitor: Use BB list for BB name completion
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-17 15:47:56 +01:00
Kevin Wolf
f8746fb804 block: Fix memory leak in hmp_drive_add_node()
hmp_drive_add_node() leaked qdict in the error path when no node-name is
specified.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
2016-03-17 15:47:56 +01:00
Kevin Wolf
23f7fcb295 block: Fix qemu_root_bds_opts.head initialisation
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-17 15:47:56 +01:00
Daniel P. Berrange
7d9690148a crypto: add block encryption framework
Add a generic framework for supporting different block encryption
formats. Upon instantiating a QCryptoBlock object, it will read
the encryption header and extract the encryption keys. It is
then possible to call methods to encrypt/decrypt data buffers.

There is also a mode whereby it will create/initialize a new
encryption header on a previously unformatted volume.

The initial framework comes with support for the legacy QCow
AES based encryption. This enables code in the QCow driver to
be consolidated later.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-03-17 14:41:15 +00:00
Daniel P. Berrange
eaec903c5b crypto: wire up XTS mode for cipher APIs
Introduce 'XTS' as a permitted mode for the cipher APIs.
With XTS the key provided must be twice the size of the
key normally required for any given algorithm. This is
because the key will be split into two pieces for use
in XTS mode.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-03-17 14:41:15 +00:00
Daniel P. Berrange
e3ba0b6701 crypto: refactor code for dealing with AES cipher
The built-in and nettle cipher backends for AES maintain
two separate AES contexts, one for encryption and one for
decryption. This is going to be inconvenient for the future
code dealing with XTS, so wrap them up in a single struct
so there is just one pointer to pass around for both
encryption and decryption.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-03-17 14:41:15 +00:00
Daniel P. Berrange
84f7f180b0 crypto: import an implementation of the XTS cipher mode
The XTS (XEX with tweaked-codebook and ciphertext stealing)
cipher mode is commonly used in full disk encryption. There
is unfortunately no implementation of it in either libgcrypt
or nettle, so we need to provide our own.

The libtomcrypt project provides a repository of crypto
algorithms under a choice of either "public domain" or
the "what the fuck public license".

So this impl is taken from the libtomcrypt GIT repo and
adapted to be compatible with the way we need to call
ciphers provided by nettle/gcrypt.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-03-17 14:41:15 +00:00
Daniel P. Berrange
50f6753e27 crypto: add support for the twofish cipher algorithm
New cipher algorithms 'twofish-128', 'twofish-192' and
'twofish-256' are defined for the Twofish algorithm.
The gcrypt backend does not support 'twofish-192'.

The nettle and gcrypt cipher backends are updated to
support the new cipher and a test vector added to the
cipher test suite. The new algorithm is enabled in the
LUKS block encryption driver.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-03-17 14:41:15 +00:00
Daniel P. Berrange
94318522ed crypto: add support for the serpent cipher algorithm
New cipher algorithms 'serpent-128', 'serpent-192' and
'serpent-256' are defined for the Serpent algorithm.

The nettle and gcrypt cipher backends are updated to
support the new cipher and a test vector added to the
cipher test suite. The new algorithm is enabled in the
LUKS block encryption driver.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-03-17 14:41:15 +00:00
Daniel P. Berrange
084a85eedd crypto: add support for the cast5-128 cipher algorithm
A new cipher algorithm 'cast-5-128' is defined for the
Cast-5 algorithm with 128 bit key size. Smaller key sizes
are supported by Cast-5, but nothing in QEMU should use
them, so only 128 bit keys are permitted.

The nettle and gcrypt cipher backends are updated to
support the new cipher and a test vector added to the
cipher test suite. The new algorithm is enabled in the
LUKS block encryption driver.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-03-17 14:41:15 +00:00
Daniel P. Berrange
aa41363598 crypto: skip testing of unsupported cipher algorithms
We don't guarantee that all crypto backends will support
all cipher algorithms, so we should skip tests unless
the crypto backend indicates support.

Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-03-17 14:41:14 +00:00
Daniel P. Berrange
5a95e0fccd crypto: add support for anti-forensic split algorithm
The LUKS format specifies an anti-forensic split algorithm which
is used to artificially expand the size of the key material on
disk. This is an implementation of that algorithm.

Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-03-17 14:41:14 +00:00
Daniel P. Berrange
cb730894ae crypto: add support for generating initialization vectors
There are a number of different algorithms that can be used
to generate initialization vectors for disk encryption. This
introduces a simple internal QCryptoBlockIV object to provide
a consistent internal API to the different algorithms. The
initially implemented algorithms are 'plain', 'plain64' and
'essiv', each matching the same named algorithm provided
by the Linux kernel dm-crypt driver.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-03-17 14:41:14 +00:00
Daniel P. Berrange
37788f253a crypto: add support for PBKDF2 algorithm
The LUKS data format includes use of PBKDF2 (Password-Based
Key Derivation Function). The Nettle library can provide
an implementation of this, but we don't want code directly
depending on a specific crypto library backend. Introduce
a new include/crypto/pbkdf.h header which defines a QEMU
API for invoking PBKDK2. The initial implementations are
backed by nettle & gcrypt, which are commonly available
with distros shipping GNUTLS.

The test suite data is taken from the cryptsetup codebase
under the LGPLv2.1+ license. This merely aims to verify
that whatever backend we provide for this function in QEMU
will comply with the spec.

Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-03-17 14:41:07 +00:00
Peter Maydell
331ac65963 Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging
# gpg: Signature made Thu 17 Mar 2016 11:08:28 GMT using RSA key ID 81AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"

* remotes/stefanha/tags/block-pull-request:
  Revert "qed: Implement .bdrv_drain"
  aio-posix: Change CONFIG_EPOLL to CONFIG_EPOLL_CREATE1

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-17 11:27:54 +00:00
Stefan Hajnoczi
1f3ddfcb25 Revert "qed: Implement .bdrv_drain"
This reverts commit df9a681dc9.

Note that commit df9a681dc9 included some
unrelated hunks, possibly due to a merge failure or an overlooked
squash.  This only reverts the qed .bdrv_drain() implementation.

The qed .bdrv_drain() implementation is unsafe and can lead to a double
request completion.

Paolo Bonzini reports:
"The problem is that bdrv_qed_drain calls qed_plug_allocating_write_reqs
unconditionally, but this is not correct if an allocating write is
queued.  In this case, qed_unplug_allocating_write_reqs will restart the
allocating write and possibly cause it to complete.  The aiocb however
is still in use for the L2/L1 table writes, and will then be completed
again as soon as the table writes are stable."

For QEMU 2.6 we can simply revert this commit.  A full solution for the
qed need check timer may be added if the bdrv_drain() implementation is
extended.

Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1457431876-8475-1-git-send-email-stefanha@redhat.com
2016-03-17 09:50:14 +00:00
Matthew Fortune
147dfab747 aio-posix: Change CONFIG_EPOLL to CONFIG_EPOLL_CREATE1
CONFIG_EPOLL was being used to guard epoll_create1 which results
in build failures on CentOS 5.

Signed-off-by: Matthew Fortune <matthew.fortune@imgtec.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 6D39441BF12EF246A7ABCE6654B023536BB85D08@hhmail02.hh.imgtec.org
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-03-17 09:50:14 +00:00
Daniel P. Berrange
b917da4cbd crypto: add cryptographic random byte source
There are three backend impls provided. The preferred
is gnutls, which is backed by nettle in modern distros.
The gcrypt impl is provided for cases where QEMU build
against gnutls is disabled, but crypto is still desired.
No nettle impl is provided, since it is non-trivial to
use the nettle APIs for random numbers. Users of nettle
should ensure gnutls is enabled for QEMU.

Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-03-17 09:49:01 +00:00
Peter Maydell
8c45754724 Merge remote-tracking branch 'remotes/ehabkost/tags/machine-pull-request' into staging
Machine Core queue, 2016-03-16

# gpg: Signature made Wed 16 Mar 2016 18:57:34 GMT using RSA key ID 984DC5A6
# gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>"

* remotes/ehabkost/tags/machine-pull-request:
  module: Rename machine_init() to opts_init()
  machine: Use type_init() to register machine classes

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-17 08:52:58 +00:00
Eduardo Habkost
34294e2f54 module: Rename machine_init() to opts_init()
The only remaining users of machine_init() only call
qemu_add_opts(). Rename machine_init() to opts_init() and move it
closer to the qemu_add_opts() calls on vl.c.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-03-16 15:54:23 -03:00
Eduardo Habkost
0e6aac87fd machine: Use type_init() to register machine classes
Change all machine_init() users that simply call type_register*()
to use type_init().

Cc: Evgeny Voevodin <e.voevodin@samsung.com>
Cc: Maksim Kozlov <m.kozlov@samsung.com>
Cc: Igor Mitsyanko <i.mitsyanko@gmail.com>
Cc: Dmitry Solodkiy <d.solodkiy@samsung.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Andrzej Zaborowski <balrogg@gmail.com>
Cc: Michael Walle <michael@walle.cc>
Cc: "Hervé Poussineau" <hpoussin@reactos.org>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Cc: Leon Alrae <leon.alrae@imgtec.com>
Cc: Alexander Graf <agraf@suse.de>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Blue Swirl <blauwirbel@gmail.com>
Cc: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Acked-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-03-16 15:34:05 -03:00
Peter Maydell
33616ace9f Merge remote-tracking branch 'remotes/cody/tags/block-pull-request' into staging
# gpg: Signature made Wed 16 Mar 2016 17:33:44 GMT using RSA key ID C0DE3057
# gpg: Good signature from "Jeffrey Cody <jcody@redhat.com>"
# gpg:                 aka "Jeffrey Cody <jeff@codyprime.org>"
# gpg:                 aka "Jeffrey Cody <codyprime@gmail.com>"

* remotes/cody/tags/block-pull-request:
  MAINTAINERS: Fix typo, block/stream.h -> block/stream.c
  block/sheepdog: fix argument passed to qemu_strtoul()

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-16 18:20:10 +00:00
Peter Maydell
d1f8764099 Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20160316-1' into staging
target-arm queue:
 * loader: Fix incorrect parameter name in load_image_mr()
 * Implement MRS (banked) and MSR (banked) instructions
 * virt: Implement versioning for machine model
 * i.MX: some initial patches preparing for i.MX6 support
 * new ASPEED AST2400 SoC and palmetto-bmc machine
 * bcm2835: add some more raspi2 devices
 * sd: fix segfault running "info qtree"

# gpg: Signature made Wed 16 Mar 2016 17:42:43 GMT using RSA key ID 14360CDE
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>"
# gpg:                 aka "Peter Maydell <pmaydell@gmail.com>"
# gpg:                 aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>"

* remotes/pmaydell/tags/pull-target-arm-20160316-1: (21 commits)
  sd: Fix "info qtree" on boards with SD cards
  bcm2835_dma: add emulation of Raspberry Pi DMA controller
  bcm2835_property: implement framebuffer control/configuration properties
  bcm2835_fb: add framebuffer device for Raspberry Pi
  bcm2835_aux: add emulation of BCM2835 AUX (aka UART1) block
  bcm2835_peripherals: enable sdhci pending-insert quirk for raspberry pi
  hw/arm: Add palmetto-bmc machine
  hw/arm: Add ASPEED AST2400 SoC model
  hw/intc: Add (new) ASPEED VIC device model
  hw/timer: Add ASPEED timer device model
  i.MX: Add missing descriptions in devices.
  i.MX: Add i.MX6 CCM and ANALOG device.
  i.MX: Add the CLK_IPG_HIGH clock
  i.MX: Remove CCM useless clock computation handling.
  i.MX: Rename CCM NOCLK to CLK_NONE for naming consistency.
  i.MX: Allow GPT timer to rollover.
  arm: virt: Move machine class init code to the abstract machine type
  arm: virt: Add an abstract ARM virt machine type
  target-arm: Fix translation level on early translation faults
  target-arm: Implement MRS (banked) and MSR (banked) instructions
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-16 17:43:37 +00:00
Peter Maydell
fec44a8c70 sd: Fix "info qtree" on boards with SD cards
The SD card object is not a SysBusDevice, so don't create it with
qdev_create() if we're not assigning it to a specific bus; use
object_new() instead.

This was causing 'info qtree' to segfault on boards with SD cards,
because qdev_create(NULL, TYPE_FOO) puts the created object on the
system bus, and then we may try to run functions like sysbus_dev_print()
on it, which fail when casting the object to SysBusDevice.

(This is the same mistake that we made with the NAND device
and fixed in commit 6749695eaaf346c1.)

Reported-by: xiaoqiang.zhao <zxq_yx_007@163.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: xiaoqiang.zhao <zxq_yx_007@163.com>
Message-id: 1458061009-7733-1-git-send-email-peter.maydell@linaro.org
2016-03-16 17:42:19 +00:00
Grégory ESTRADE
6717f587a4 bcm2835_dma: add emulation of Raspberry Pi DMA controller
At present, all DMA transfers complete inline (so a looping descriptor
queue will lock up the device). We also do not model pause/abort,
arbitrarion/priority, or debug features.

Signed-off-by: Grégory ESTRADE <gregory.estrade@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Andrew Baumann <Andrew.Baumann@microsoft.com>
Message-id: 1457467526-8840-6-git-send-email-Andrew.Baumann@microsoft.com
[AB: implement 2D mode, cleanup/refactoring for upstream submission]
Signed-off-by: Andrew Baumann <Andrew.Baumann@microsoft.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-16 17:42:18 +00:00
Grégory ESTRADE
355a8ccc5c bcm2835_property: implement framebuffer control/configuration properties
The property channel driver now interfaces with the framebuffer device
to query and set framebuffer parameters. As a result of this, the "get
ARM RAM size" query now correctly returns the video RAM base address
(not total RAM size), and the ram-size property is no longer relevant
here.

Signed-off-by: Grégory ESTRADE <gregory.estrade@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Andrew Baumann <Andrew.Baumann@microsoft.com>
Message-id: 1457467526-8840-5-git-send-email-Andrew.Baumann@microsoft.com
[AB: cleanup/refactoring for upstream submission]
Signed-off-by: Andrew Baumann <Andrew.Baumann@microsoft.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-16 17:42:18 +00:00
Grégory ESTRADE
5e9c2a8dac bcm2835_fb: add framebuffer device for Raspberry Pi
The framebuffer occupies the upper portion of memory (64MiB by
default), but it can only be controlled/configured via a system
mailbox or property channel (to be added by a subsequent patch).

Signed-off-by: Grégory ESTRADE <gregory.estrade@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Andrew Baumann <Andrew.Baumann@microsoft.com>
Message-id: 1457467526-8840-4-git-send-email-Andrew.Baumann@microsoft.com
[AB: added Windows (BGR) support and cleanup/refactoring for upstream submission]
Signed-off-by: Andrew Baumann <Andrew.Baumann@microsoft.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-16 17:42:18 +00:00
Andrew Baumann
97398d900c bcm2835_aux: add emulation of BCM2835 AUX (aka UART1) block
At present only the core UART functions (data path for tx/rx) are
implemented, which is enough for UEFI to boot. The following
features/registers are unimplemented:
  * Line/modem control
  * Scratch register
  * Extra control
  * Baudrate
  * SPI interfaces

Signed-off-by: Andrew Baumann <Andrew.Baumann@microsoft.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1457467526-8840-3-git-send-email-Andrew.Baumann@microsoft.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-16 17:42:18 +00:00
Andrew Baumann
a2a8dfa8d8 bcm2835_peripherals: enable sdhci pending-insert quirk for raspberry pi
Signed-off-by: Andrew Baumann <Andrew.Baumann@microsoft.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1457467526-8840-2-git-send-email-Andrew.Baumann@microsoft.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-16 17:42:18 +00:00
Andrew Jeffery
327d8e4ed2 hw/arm: Add palmetto-bmc machine
The new machine is a thin layer over the AST2400 ARM926-based SoC[1].
Between the minimal machine and the current SoC implementation there is
enough functionality to boot an aspeed_defconfig Linux kernel to
userspace. Nothing yet is specific to the Palmetto's BMC (other than
using an AST2400 SoC), but creating specific machine types is preferable
to a generic machine that doesn't match any particular hardware.

[1] http://www.aspeedtech.com/products.php?fPath=20&rId=376

Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Message-id: 1458096317-25223-5-git-send-email-andrew@aj.id.au
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-16 17:42:18 +00:00
Andrew Jeffery
43e3346e43 hw/arm: Add ASPEED AST2400 SoC model
While the ASPEED AST2400 SoC[1] has a broad range of capabilities this
implementation is minimal, comprising an ARM926 processor, ASPEED VIC
and timer devices, and a 8250 UART.

[1] http://www.aspeedtech.com/products.php?fPath=20&rId=376

Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Message-id: 1458096317-25223-4-git-send-email-andrew@aj.id.au
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-16 17:42:18 +00:00
Andrew Jeffery
0c69996e22 hw/intc: Add (new) ASPEED VIC device model
Implement a basic ASPEED VIC device model for the AST2400 SoC[1], with
enough functionality to boot an aspeed_defconfig Linux kernel. The model
implements the 'new' (revised) register set: While the hardware exposes
both the new and legacy register sets, accesses to the model's legacy
register set will not be serviced (however the access will be logged).

[1] http://www.aspeedtech.com/products.php?fPath=20&rId=376

Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Message-id: 1458096317-25223-3-git-send-email-andrew@aj.id.au
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-16 17:42:18 +00:00
Andrew Jeffery
c04bd47db6 hw/timer: Add ASPEED timer device model
Implement basic ASPEED timer functionality for the AST2400 SoC[1]: Up to
8 timers can independently be configured, enabled, reset and disabled.
Some hardware features are not implemented, namely clock value matching
and pulse generation, but the implementation is enough to boot the Linux
kernel configured with aspeed_defconfig.

[1] http://www.aspeedtech.com/products.php?fPath=20&rId=376

Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Message-id: 1458096317-25223-2-git-send-email-andrew@aj.id.au
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-16 17:42:18 +00:00
Jean-Christophe Dubois
eccfa35e9f i.MX: Add missing descriptions in devices.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Message-id: f1f565eb9dffdeb582feb1b15ba9e8b0afcf5468.1456868959.git.jcd@tribudubois.net
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-16 17:42:18 +00:00
Jean-Christophe Dubois
a66d815cd5 i.MX: Add i.MX6 CCM and ANALOG device.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Message-id: 9fa80b4d8c5d0f50c94e77d74f952a7a665e168f.1456868959.git.jcd@tribudubois.net
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-16 17:42:18 +00:00
Jean-Christophe Dubois
d552f675fb i.MX: Add the CLK_IPG_HIGH clock
EPIT, GPT and other i.MX timers are using "abstract" clocks among which
a CLK_IPG_HIGH clock.

On i.MX25 and i.MX31 CLK_IPG and CLK_IPG_HIGH are mapped to the same clock
but on other SOC like i.MX6 they are mapped to distinct clocks.

This patch add the CLK_IPG_HIGH to prepare for SOC where these 2 clocks are
different.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Message-id: 224bf650194760284cb40630e985867e1373276a.1456868959.git.jcd@tribudubois.net
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-16 17:42:18 +00:00
Jean-Christophe Dubois
f4b2add6cc i.MX: Remove CCM useless clock computation handling.
Most clocks supported by the CCM are useless to the qemu framework.

Only clocks related to timers (EPIT, GPT, PWM, WATCHDOG, ...) are usefull
to QEMU code.

Therefore this patch removes clock computation handling for all clocks but:
* CLK_NONE,
* CLK_IPG,
* CLK_32k

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Message-id: 9e7222efb349801032e60c0f6b0fbad0e5dcf648.1456868959.git.jcd@tribudubois.net
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-16 17:42:18 +00:00
Jean-Christophe Dubois
c91a5883c3 i.MX: Rename CCM NOCLK to CLK_NONE for naming consistency.
This way all CCM clock defines/enums are named CLK_XXX

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Message-id: 8537df765c1713625c7a8b9aca4c7ca60b42e0c0.1456868959.git.jcd@tribudubois.net
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-16 17:42:18 +00:00
Jean-Christophe Dubois
4833e15f74 i.MX: Allow GPT timer to rollover.
GPT timer need to rollover when it reaches 0xffffffff.

It also need to reset to 0 when in "restart mode" and crossing the
compare 1 register.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Message-id: 6e2b36117a249a78bf822dd59a390368f407136e.1456868959.git.jcd@tribudubois.net
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-16 17:42:18 +00:00
Wei Huang
9c94d8e6c9 arm: virt: Move machine class init code to the abstract machine type
This patch moves the common class initialization code from
"virt-2.6" to the new abstract class. An empty property is added to
"virt-2.6" machine. In the meanwhile, related funtions are renamed
to "virt_2_6_*" for consistency.

Signed-off-by: Wei Huang <wei@redhat.com>
Message-id: 1457717778-17727-3-git-send-email-wei@redhat.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-16 17:42:18 +00:00
Wei Huang
ed796373b4 arm: virt: Add an abstract ARM virt machine type
In preparation for future ARM virt machine types, this patch creates
an abstract type for all ARM machines. The current machine type in
QEMU (i.e. "virt") is renamed to "virt-2.6", whose naming scheme is
similar to other architectures. For the purpose of backward compatibility,
"virt" is converted to an alias, pointing to "virt-2.6". With this patch,
"qemu -M ?" lists the following virtual machine types along with others:

virt                 QEMU 2.6 ARM Virtual Machine (alias of virt-2.6)
virt-2.6             QEMU 2.6 ARM Virtual Machine

Signed-off-by: Wei Huang <wei@redhat.com>
Message-id: 1457717778-17727-2-git-send-email-wei@redhat.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-16 17:42:18 +00:00
Sergey Sorokin
1b4093ea66 target-arm: Fix translation level on early translation faults
Qemu reports translation fault on 1st level instead of 0th level in case of
AArch64 address translation if the translation table walk is disabled or
the address is in the gap between the two regions.

Signed-off-by: Sergey Sorokin <afarallax@yandex.ru>
Message-id: 1457527503-25958-1-git-send-email-afarallax@yandex.ru
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-16 17:42:18 +00:00
Jeff Cody
773460256b MAINTAINERS: Fix typo, block/stream.h -> block/stream.c
There is no block/stream.h, the intended filename is block/stream.c
instead.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: b9feeac95301c1b0b1c28a485da5e3781370c31a.1457578261.git.jcody@redhat.com
2016-03-16 13:25:29 -04:00
Jeff Cody
03c698f0a2 block/sheepdog: fix argument passed to qemu_strtoul()
The function qemu_strtoul() reads 'unsigned long' sized data,
which is larger than uint32_t on 64-bit machines.

Even though the snap_id field in the header is 32-bits, we must
accommodate the full size in qemu_strtoul().

This patch also adds more meaningful error handling to the
qemu_strtoul() call, and subsequent results.

Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
Message-id: e56fc50abedd9a112e0683342c8eafda063cd2f9.1456935548.git.jcody@redhat.com
2016-03-16 13:25:29 -04:00
Peter Maydell
8bfd0550be target-arm: Implement MRS (banked) and MSR (banked) instructions
Starting with the ARMv7 Virtualization Extensions, the A32 and T32
instruction sets provide instructions "MSR (banked)" and "MRS
(banked)" which can be used to access registers for a mode other
than the current one:
 * R<m>_<mode>
 * ELR_hyp
 * SPSR_<mode>

Implement the missing instructions.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Message-id: 1456762734-23939-1-git-send-email-peter.maydell@linaro.org
2016-03-16 17:05:58 +00:00
Jens Wiklander
f09f9bd9fa loader: Fix incorrect parameter name in load_image_mr() macro
Fix a typo in the load_image_mr() macro: 'mr' was written when
the parameter name is '_mr'. (This had no visible effects since
the single use of the macro used 'mr' as the argument.)

Fixes 76151cacfe "loader: Add
load_image_mr() to load ROM image to a MemoryRegion"

Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-16 17:05:58 +00:00
Peter Maydell
0ebc03bc06 util/base64.c: Clean includes
Remove unnecessary include of config-host.h.
(This was missed by the clean-includes script because of the
incorrect use of <> for a QEMU header.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-id: 1456237112-32662-5-git-send-email-peter.maydell@linaro.org
2016-03-16 12:48:11 +00:00
Peter Maydell
8bc92a762a update-linux-headers.sh: Fake types.h doesn't need to include anything
We have a fake linux/types.h which we create in update-linux-headers.h.
Now that every QEMU source file includes osdep.h, this fake header
doesn't need to include anything at all.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-id: 1456237112-32662-4-git-send-email-peter.maydell@linaro.org
2016-03-16 12:48:11 +00:00
Peter Maydell
8816c600d3 include/config.h: Remove
include/config.h just includes config-target.h (and used to also
include config-host.h).
It is now obsolete and unused, because osdep.h does this job, so
remove it.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-id: 1456237112-32662-3-git-send-email-peter.maydell@linaro.org
2016-03-16 12:48:11 +00:00
Peter Maydell
4674da1c49 slirp/slirp.h: Remove now-empty #ifdefs
After automatic cleanup to remove unnecessary #includes of headers that
osdep.h provides, slirp.h has a few now unnecessary #ifdef/#endif pairs;
remove them.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-id: 1456237112-32662-2-git-send-email-peter.maydell@linaro.org
2016-03-16 12:48:11 +00:00
Peter Maydell
6aeda86890 Merge remote-tracking branch 'remotes/armbru/tags/pull-error-2016-03-16' into staging
Error reporting patches for 2016-03-16

# gpg: Signature made Wed 16 Mar 2016 09:57:00 GMT using RSA key ID EB918653
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>"
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>"

* remotes/armbru/tags/pull-error-2016-03-16:
  error: ensure errno detail is printed with error_abort

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-16 11:09:36 +00:00
Peter Maydell
cad0b273e5 Merge remote-tracking branch 'remotes/armbru/tags/pull-monitor-2016-03-16' into staging
Monitor patches for 2016-03-16

# gpg: Signature made Wed 16 Mar 2016 09:47:23 GMT using RSA key ID EB918653
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>"
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>"

* remotes/armbru/tags/pull-monitor-2016-03-16:
  qdev-monitor: add missing aliases for virtio device classes
  qdev-monitor: sort alias table by typename
  qdev-monitor: improve error message when alias device is unavailable

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-16 10:38:15 +00:00
Peter Maydell
f235538e38 Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.6-20160316' into staging
ppc patch queue for 2016-03-16

Accumulated patches for target-ppc, pseries machine type and related
devices.  As we are now in soft freeze, these are mostly fixes.
   * Fix KVM migration for several SPRs that qemu didn't handle
   * Clean up handling of SDR1, which allows a fix to the gdbstub
   * Fix a race in spapr_rng
   * Fix a bug with multifunction hotplug

The exception is the 7 patches to allow EEH on spapr-pci-host-bridge
devices (rather than the special and poorly designed
spapr-vfio-pci-host-bridge device).  I believe these are low risk of
breaking non-EEH cases, and EEH cases were little used in practice
previously (since libvirt did not support the special device amongst
other things).  It did have a draft posted before the soft freeze,
removes a very ugly VFIO interface, and removes device we'd like to
deprecate sooner rather than later.  So, I'm hoping we can squeeze
these in during the soft freeze.

This includes two patches to the VFIO code, which Alex Williamson has
indicated he's ok with coming through my tree.

# gpg: Signature made Wed 16 Mar 2016 05:04:52 GMT using RSA key ID 20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-2.6-20160316:
  vfio: Eliminate vfio_container_ioctl()
  spapr_pci: Remove finish_realize hook
  spapr_pci: (Mostly) remove spapr-pci-vfio-host-bridge
  spapr_pci: Allow EEH on spapr-pci-host-bridge
  spapr_pci: Eliminate class callbacks
  spapr_pci: Switch to vfio_eeh_as_op() interface
  vfio: Start improving VFIO/EEH interface
  spapr_rng: fix race with main loop
  target-ppc: Eliminate kvmppc_kern_htab global
  target-ppc: Add helpers for updating a CPU's SDR1 and external HPT
  target-ppc: Split out SREGS get/put functions
  spapr_pci: fix multifunction hotplug
  target-ppc: Add PVR for POWER8NVL processor
  ppc: Add a few more P8 PMU SPRs
  ppc: Fix migration of the TAR SPR
  ppc: Define the PSPB register on POWER8

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-16 10:09:26 +00:00
Daniel P. Berrange
20e2dec149 error: ensure errno detail is printed with error_abort
When &error_abort is passed in, the error reporting code
will print the current error message and then abort() the
process. Unfortunately at the time it aborts, we've not
yet appended the errno detail. This makes debugging certain
problems significantly harder as the log is incomplete.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1457544504-8548-22-git-send-email-berrange@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-03-16 10:55:51 +01:00
Peter Maydell
af1d3ebbef Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
acpi: minor fix

Since previous pull acpi test triggers warnings,
fix it up.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# gpg: Signature made Tue 15 Mar 2016 21:26:38 GMT using RSA key ID D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>"

* remotes/mst/tags/for_upstream:
  acpi-test: update UID for GSI links

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-16 09:27:58 +00:00
Sascha Silbe
588c36cac7 qdev-monitor: add missing aliases for virtio device classes
virtio-{blk,balloon,net,serial} are aliases for their actual,
architecture-dependent implementations (*-ccw on s390x, *-pci on other
architectures supporting virtio). This makes it a lot easier to craft
qemu invocations that work on all supported architectures. Complete
the set to cover all existing non-abstract virtio device classes.

For virtio-balloon, only the CCW implementation was missing.

Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Message-Id: <1455831854-49013-4-git-send-email-silbe@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-03-16 10:13:10 +01:00
Sascha Silbe
36e9916811 qdev-monitor: sort alias table by typename
Sort the alias table by typename so it's easier to see which aliases
exist.

Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Message-Id: <1455831854-49013-3-git-send-email-silbe@linux.vnet.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-03-16 10:13:10 +01:00
Sascha Silbe
f6b5319d41 qdev-monitor: improve error message when alias device is unavailable
When trying to instantiate an alias that points to a device class that
doesn't exist, the error message looks like qemu misunderstood the
request:

$ s390x-softmmu/qemu-system-s390x -device virtio-gpu
qemu-system-s390x: -device virtio-gpu: 'virtio-gpu-ccw' is not a valid
device model name

Special-case the error message to make it explicit that alias
expansion is going on:

$ s390x-softmmu/qemu-system-s390x -device virtio-gpu
qemu-system-s390x: -device virtio-gpu: 'virtio-gpu' (alias
'virtio-gpu-ccw') is not a valid device model name

Suggested-By: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Message-Id: <1455831854-49013-2-git-send-email-silbe@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-03-16 10:13:10 +01:00
David Gibson
3356128cd1 vfio: Eliminate vfio_container_ioctl()
vfio_container_ioctl() was a bad interface that bypassed abstraction
boundaries, had semantics that sat uneasily with its name, and was unsafe
in many realistic circumstances.  Now that spapr-pci-vfio-host-bridge has
been folded into spapr-pci-host-bridge, there are no more users, so remove
it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
2016-03-16 09:55:11 +11:00
David Gibson
a36304fdca spapr_pci: Remove finish_realize hook
Now that spapr-pci-vfio-host-bridge is reduced to just a stub, there is
only one implementation of the finish_realize hook in sPAPRPHBClass.  So,
we can fold that implementation into its (single) caller, and remove the
hook.  That's the last thing left in sPAPRPHBClass, so that can go away as
well.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2016-03-16 09:55:11 +11:00
David Gibson
72700d7e73 spapr_pci: (Mostly) remove spapr-pci-vfio-host-bridge
Now that the regular spapr-pci-host-bridge can handle EEH, there are only
two things that spapr-pci-vfio-host-bridge does differently:
    1. automatically sizes its DMA window to match the host IOMMU
    2. checks if the attached VFIO container is backed by the
       VFIO_SPAPR_TCE_IOMMU type on the host

(1) is not particularly useful, since the default window used by the
regular host bridge will work with the host IOMMU configuration on all
current systems anyway.

Plus, automatically changing guest visible configuration (such as the DMA
window) based on host settings is generally a bad idea.  It's not
definitively broken, since spapr-pci-vfio-host-bridge is only supposed to
support VFIO devices which can't be migrated anyway, but still.

(2) is not really useful, because if a guest tries to configure EEH on a
different host IOMMU, the first call will fail and that will be that.

It's possible there are scripts or tools out there which expect
spapr-pci-vfio-host-bridge, so we don't remove it entirely.  This patch
reduces it to just a stub for backwards compatibility.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2016-03-16 09:55:11 +11:00
David Gibson
c1fa017c7e spapr_pci: Allow EEH on spapr-pci-host-bridge
Now that the EEH code is independent of the special
spapr-vfio-pci-host-bridge device, we can allow it on all spapr PCI
host bridges instead.  We do this by changing spapr_phb_eeh_available()
to be based on the vfio_eeh_as_ok() call instead of the host bridge class.

Because the value of vfio_eeh_as_ok() can change with devices being
hotplugged or unplugged, this can potentially lead to some strange edge
cases where the guest starts using EEH, then it starts failing because
of a change in status.

However, it's not really any worse than the current situation.  Cases that
would have worked previously will still work (i.e. VFIO devices from at
most one VFIO IOMMU group per vPHB), it's just that it's no longer
necessary to use spapr-vfio-pci-host-bridge with the groupid pre-specified.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2016-03-16 09:55:11 +11:00
David Gibson
fbb4e98341 spapr_pci: Eliminate class callbacks
The EEH operations in the spapr-vfio-pci-host-bridge no longer rely on the
special groupid field in sPAPRPHBVFIOState.  So we can simplify, removing
the class specific callbacks with direct calls based on a simple
spapr_phb_eeh_enabled() helper.  For now we implement that in terms of
a boolean in the class, but we'll continue to clean that up later.

On its own this is a rather strange way of doing things, but it's a useful
intermediate step to further cleanups.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2016-03-16 09:55:10 +11:00
David Gibson
76a9e9f680 spapr_pci: Switch to vfio_eeh_as_op() interface
This switches all EEH on VFIO operations in spapr_pci_vfio.c from the
broken vfio_container_ioctl() interface to the new vfio_as_eeh_op()
interface.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2016-03-16 09:55:10 +11:00
David Gibson
3153119e9b vfio: Start improving VFIO/EEH interface
At present the code handling IBM's Enhanced Error Handling (EEH) interface
on VFIO devices operates by bypassing the usual VFIO logic with
vfio_container_ioctl().  That's a poorly designed interface with unclear
semantics about exactly what can be operated on.

In particular it operates on a single vfio container internally (hence the
name), but takes an address space and group id, from which it deduces the
container in a rather roundabout way.  groupids are something that code
outside vfio shouldn't even be aware of.

This patch creates new interfaces for EEH operations.  Internally we
have vfio_eeh_container_op() which takes a VFIOContainer object
directly.  For external use we have vfio_eeh_as_ok() which determines
if an AddressSpace is usable for EEH (at present this means it has a
single container with exactly one group attached), and vfio_eeh_as_op()
which will perform an operation on an AddressSpace in the unambiguous case,
and otherwise returns an error.

This interface still isn't great, but it's enough of an improvement to
allow a number of cleanups in other places.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
2016-03-16 09:55:10 +11:00
Greg Kurz
f1a6cf3ef7 spapr_rng: fix race with main loop
Since commit "60253ed1e6ec rng: add request queue support to rng-random",
the use of a spapr_rng device may hang vCPU threads.

The following path is taken without holding the lock to the main loop mutex:

h_random()
  rng_backend_request_entropy()
    rng_random_request_entropy()
      qemu_set_fd_handler()

The consequence is that entropy_available() may be called before the vCPU
thread could even queue the request: depending on the scheduling, it may
happen that entropy_available() does not call random_recv()->qemu_sem_post().
The vCPU thread will then sleep forever in h_random()->qemu_sem_wait().

This could not happen before 60253ed1e6 because entropy_available() used
to call random_recv() unconditionally.

This patch ensures the lock is held to avoid the race.

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Reviewed-by: Cédric Le Goater <clg@fr.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-03-16 09:55:06 +11:00
David Gibson
c18ad9a54b target-ppc: Eliminate kvmppc_kern_htab global
fa48b43 "target-ppc: Remove hack for ppc_hash64_load_hpte*() with HV KVM"
purports to remove a hack in the handling of hash page tables (HPTs)
managed by KVM instead of qemu.  However, it actually went in the wrong
direction.

That patch requires anything looking for an external HPT (that is one not
managed by the guest itself) to check both env->external_htab (for a qemu
managed HPT) and kvmppc_kern_htab (for a KVM managed HPT).  That's a
problem because kvmppc_kern_htab is local to mmu-hash64.c, but some places
which need to check for an external HPT are outside that, such as
kvm_arch_get_registers().  The latter was subtly broken by the earlier
patch such that gdbstub can no longer access memory.

Basically a KVM managed HPT is much more like a qemu managed HPT than it is
like a guest managed HPT, so the original "hack" was actually on the right
track.

This partially reverts fa48b43, so we again mark a KVM managed external HPT
by putting a special but non-NULL value in env->external_htab.  It then
goes further, using that marker to eliminate the kvmppc_kern_htab global
entirely.  The ppc_hash64_set_external_hpt() helper function is extended
to set that marker if passed a NULL value (if you're setting an external
HPT, but don't have an actual HPT to set, the assumption is that it must
be a KVM managed HPT).

This also has some flow-on changes to the HPT access helpers, required by
the above changes.

Reported-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Tested-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
2016-03-16 09:55:06 +11:00
David Gibson
e5c0d3ce40 target-ppc: Add helpers for updating a CPU's SDR1 and external HPT
When a Power cpu with 64-bit hash MMU has it's hash page table (HPT)
pointer updated by a write to the SDR1 register we need to update some
derived variables.  Likewise, when the cpu is configured for an external
HPT (one not in the guest memory space) some derived variables need to be
updated.

Currently the logic for this is (partially) duplicated in ppc_store_sdr1()
and in spapr_cpu_reset().  In future we're going to need it in some other
places, so make some common helpers for this update.

In addition the new ppc_hash64_set_external_hpt() helper also updates
SDR1 in KVM - it's not updated by the normal runtime KVM <-> qemu CPU
synchronization.  In a sense this belongs logically in the
ppc_hash64_set_sdr1() helper, but that is called from
kvm_arch_get_registers() so can't itself call cpu_synchronize_state()
without infinite recursion.  In practice this doesn't matter because
the only other caller is TCG specific.

Currently there aren't situations where updating SDR1 at runtime in KVM
matters, but there are going to be in future.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
2016-03-16 09:55:06 +11:00
David Gibson
a7a00a729a target-ppc: Split out SREGS get/put functions
Currently the getting and setting of Power MMU registers (sregs) take up
large inline chunks of the kvm_arch_get_registers() and
kvm_arch_put_registers() functions.  Especially since there are two
variants (for Book-E and Book-S CPUs), only one of which will be used in
practice, this is pretty hard to read.

This patch splits these out into helper functions for clarity.  No
functional change is expected.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
2016-03-16 09:55:05 +11:00
Michael Roth
788d2599de spapr_pci: fix multifunction hotplug
Since 3f1e147, QEMU has adopted a convention of supporting function
hotplug by deferring hotplug events until func 0 is hotplugged.
This is likely how management tools like libvirt would expose
such support going forward.

Since sPAPR guests rely on per-func events rather than
slot-based, our protocol has been to hotplug func 0 *first* to
avoid cases where devices appear within guests without func 0
present to avoid undefined behavior.

To remain compatible with new convention, defer hotplug in a
similar manner, but then generate events in 0-first order as we
did in the past. Once func 0 present, fail any attempts to plug
additional functions (as we do with PCIe).

For unplug, defer unplug operations in a similar manner, but
generate unplug events such that function 0 is removed last in guest.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-03-16 09:55:05 +11:00
Alexey Kardashevskiy
a88dced8eb target-ppc: Add PVR for POWER8NVL processor
This adds a new POWER8+NVLink CPU PVR which core is identical to POWER8
but has a different PVR. The only available machine now has PVR
pvr 004c 0100 so this defines "POWER8NVL" alias as v1.0.

The corresponding kernel commit is
https://github.com/torvalds/linux/commit/ddee09c099c3
"powerpc: Add PVR for POWER8NVL processor"

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-03-16 09:55:05 +11:00
Benjamin Herrenschmidt
14646457ae ppc: Add a few more P8 PMU SPRs
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-03-16 09:55:05 +11:00
Thomas Huth
1e440cbc99 ppc: Fix migration of the TAR SPR
The TAR special purpose register currently does not get migrated
under KVM because it does not get synchronized with the kernel.
Use spr_register_kvm() instead of spr_register() to fix this issue.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-03-16 09:55:05 +11:00
Thomas Huth
d6f1445faf ppc: Define the PSPB register on POWER8
POWER8 / PowerISA 2.07 has a new special purpose register called PSPB
("Problem State Priority Boost Register"). The contents of this register
are currently lost during migration. To be able to migrate this register,
too, we've got to define this SPR along with the other SPRs of POWER8.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-03-16 09:55:05 +11:00
Michael S. Tsirkin
3ba6a710e6 acpi-test: update UID for GSI links
Update acpi test data to match
commit 6a991e07bb
("hw/acpi: fix GSI links UID").

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-15 23:25:52 +02:00
Peter Maydell
4caecccbc1 Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
* Miscellaneous exec.c fixes (Markus, myself)
* Q35 support for -machine kernel_irqchip=split (Rita)
* Chardev replay support (Pavel)
* icount "warping" cleanups (Pavel)

# gpg: Signature made Tue 15 Mar 2016 17:24:08 GMT using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"

* remotes/bonzini/tags/for-upstream:
  icount: decouple warp calls
  icount: remove obsolete warp call
  replay: character devices
  exec: fix early return from ram_block_add
  exec: Fix memory allocation when memory path isn't on hugetlbfs
  exec: Fix memory allocation when memory path names new file
  update-linux-headers: Add userfaultfd.h
  kvm: x86: q35: Add support for -machine kernel_irqchip=split for q35

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-15 17:56:14 +00:00
Pavel Dovgalyuk
e76d1798fa icount: decouple warp calls
qemu_clock_warp function is called to update virtual clock when CPU
is sleeping. This function includes replay checkpoint to make execution
deterministic in icount mode.
Record/replay module flushes async event queue at checkpoints.
Some of the events (e.g., block devices operations) include interaction
with hardware. E.g., APIC polled by block devices sets one of IRQ flags.
Flag to be set depends on currently executed thread (CPU or iothread).
Therefore in replay mode we have to process the checkpoints in the same thread
as they were recorded.
qemu_clock_warp function (and its checkpoint) may be called from different
thread. This patch decouples two different execution cases of this function:
call when CPU is sleeping from iothread and call from cpu thread to update
virtual clock.
First task is performed by qemu_start_warp_timer function. It sets warp
timer event to the moment of nearest pending virtual timer.
Second function (qemu_account_warp_timer) is called from cpu thread
before execution of the code. It advances virtual clock by adding the length
of period while CPU was sleeping.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Message-Id: <20160310115609.4812.44986.stgit@PASHA-ISP>
[Update docs. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-15 18:23:45 +01:00
Pavel Dovgalyuk
281b2201e4 icount: remove obsolete warp call
qemu_clock_warp call in qemu_tcg_wait_io_event function is not needed
anymore, because it is called in every iteration of main_loop_wait.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Message-Id: <20160310115603.4812.67559.stgit@PASHA-ISP>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-15 18:23:42 +01:00
Pavel Dovgalyuk
33577b47c6 replay: character devices
This patch implements record and replay of character devices.
It records chardevs communication in replay mode. Recorded information
include data read from backend and counter of bytes written
from frontend to backend to preserve frontend internal state.
If character device was configured through the command line in record mode,
then in replay mode it should be also added to command line. Backend of
the character device could be changed in replay mode.
Replaying of devices that perform ioctl and get_msgfd operations is not
supported.
gdbstub which also acts as a backend is not recorded to allow controlling
the replaying through gdb. Monitor backends are also not recorded.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Message-Id: <20160314074436.4980.83856.stgit@PASHA-ISP>
[Add stubs. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-15 18:23:40 +01:00
Paolo Bonzini
39c350ee12 exec: fix early return from ram_block_add
After reporting an error, ram_block_add was going on with the registration
of the RAMBlock.  The visible effect is that it unlocked the ramlist
mutex twice.

Fixes: 528f46af6e
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-15 18:23:33 +01:00
Markus Armbruster
e1fb647199 exec: Fix memory allocation when memory path isn't on hugetlbfs
gethugepagesize() works reliably only when its argument is on
hugetlbfs.  When it's not, it returns the filesystem's "optimal
transfer block size", which may or may not be the actual page size
you'll get when you mmap().

If the value is too small or not a power of two, we fail
qemu_ram_mmap()'s assertions.  These were added in commit 794e8f3
(v2.5.0).  The bug's impact before that is currently unknown.  Seems
fairly unlikely at least when the normal page size is 4KiB.

Else, if the value is too large, we align more strictly than
necessary.

gethugepagesize() goes back to commit c902760 (v0.13).  That commit
clearly intended gethugepagesize() to be used on hugetlbfs only.  Not
only was it named accordingly, it also printed a warning when used on
anything else.  However, the commit neglected to spell out the
restriction in user documentation of -mem-path.

Commit bfc2a1a (v2.5.0) dropped the warning as bogus "because QEMU
functions perfectly well with the path on a regular tmpfs filesystem".
It sure does when you're sufficiently lucky.  In my testing, I was
lucky, too.

Fix by switching to qemu_fd_getpagesize().  Rename the variable
holding its result from hpagesize to page_size.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1457378754-21649-3-git-send-email-armbru@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-15 18:23:33 +01:00
Markus Armbruster
fd97fd4408 exec: Fix memory allocation when memory path names new file
Commit 8d31d6b extended file_ram_alloc() to accept file names in
addition to directory names.  Even though it passes O_CREAT to open(),
it actually works only for existing files.  Reproducer adapted from
the commit's qemu-doc.texi update:

    $ qemu-system-x86_64 -object memory-backend-file,size=2M,mem-path=/dev/hugepages/my-shmem-file,id=mb1
    qemu-system-x86_64: -object memory-backend-file,size=2M,mem-path=/dev/hugepages/my-shmem-file,id=mb1: failed to get page size of file /dev/hugepages/my-shmem-file: No such file or directory

This is because we first get the page size for @path, then open the
actual file.  Unwise even before the flawed commit, because the
directory could change in between, invalidating the page size.
Unlikely to bite in practice.

Rearrange the code to create the file (if necessary) before getting
its page size.  Carefully avoid TOCTTOU conditions with a method
suggested by Paolo Bonzini.

While there, replace "hugepages" by "guest RAM" in error messages,
because host memory backends can be used for purposes other than huge
pages, e.g. /dev/shm/ shared memory.  Help text of -mem-path agrees.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1457378754-21649-2-git-send-email-armbru@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-15 18:23:33 +01:00
Alexey Kardashevskiy
2ae823d4f7 update-linux-headers: Add userfaultfd.h
userfailtfd.h is used by post-copy migration so include it to
the update-linux-headers.sh as we want it updated altogether with
other kernel headers.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Message-Id: <1455512381-15271-1-git-send-email-aik@ozlabs.ru>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-15 18:23:33 +01:00
Rita Sinha
b094f2e015 kvm: x86: q35: Add support for -machine kernel_irqchip=split for q35
The split IRQ chip mode via KVM_CAP_SPLIT_IRQCHIP was introduced with commit
15eafc2e60 but was broken for q35. This patch makes kernel_irqchip=split
functional for q35.

Signed-off-by: Rita Sinha <rita.sinha89@gmail.com>
Message-Id: <1457378525-16455-1-git-send-email-rita.sinha89@gmail.com>
Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-15 18:23:33 +01:00
Peter Maydell
a6cdb77f81 Merge remote-tracking branch 'remotes/thibault/tags/samuel-thibault' into staging
slirp: Adding IPv6 support to Qemu -net user mode

# gpg: Signature made Tue 15 Mar 2016 16:06:03 GMT using RSA key ID FB6B2F1D
# gpg: Good signature from "Samuel Thibault <samuel.thibault@gnu.org>"
# gpg:                 aka "Samuel Thibault <sthibault@debian.org>"
# gpg:                 aka "Samuel Thibault <samuel.thibault@inria.fr>"
# gpg:                 aka "Samuel Thibault <samuel.thibault@labri.fr>"
# gpg:                 aka "Samuel Thibault <samuel.thibault@ens-lyon.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 900C B024 B679 31D4 0F82  304B D017 8C76 7D06 9EE6
#      Subkey fingerprint: F632 74CD C630 0873 CB3D  29D9 E3E5 1CE8 FB6B 2F1D

* remotes/thibault/tags/samuel-thibault:
  slirp: Add IPv6 support to the TFTP code
  qapi-schema, qemu-options & slirp: Adding Qemu options for IPv6 addresses
  slirp: Adding IPv6 address for DNS relay
  slirp: Handle IPv6 in TCP functions
  slirp: Reindent after refactoring
  slirp: Generalizing and neutralizing various TCP functions before adding IPv6 stuff
  slirp: Factorizing tcpiphdr structure with an union
  slirp: Adding IPv6 UDP support
  slirp: Adding ICMPv6 error sending
  slirp: Fix ICMP error sending
  slirp: Adding IPv6, ICMPv6 Echo and NDP autoconfiguration

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-15 17:09:52 +00:00
Peter Maydell
a58a4cb187 Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
vhost, virtio, pci, pc, acpi

nvdimm work
sparse cpu id rework
ipmi enhancements
fixes all over the place
pxb option to tweak chassis number

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

# gpg: Signature made Tue 15 Mar 2016 14:33:10 GMT using RSA key ID D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>"

* remotes/mst/tags/for_upstream: (51 commits)
  hw/acpi: fix GSI links UID
  ipmi: add some local variables in ipmi_sdr_init
  ipmi: remove the need of an ending record in the SDR table
  ipmi: use a function to initialize the SDR table
  ipmi: add a realize function to the device class
  ipmi: add rsp_buffer_set_error() helper
  ipmi: remove IPMI_CHECK_RESERVATION() macro
  ipmi: replace IPMI_ADD_RSP_DATA() macro with inline helpers
  ipmi: remove IPMI_CHECK_CMD_LEN() macro
  MAINTAINERS: machine core
  MAINTAINERS: Add an entry for virtio header files
  pc: acpi: clarify why possible LAPIC entries must be present in MADT
  pc: acpi: drop cpu->found_cpus bitmap
  pc: acpi: create Processor and Notify objects only for valid lapics
  pc: acpi: create MADT.lapic entries only for valid lapics
  pc: acpi: SRAT: create only valid processor lapic entries
  pc: acpi: cleanup qdev_get_machine() calls
  machine: introduce MachineClass.possible_cpu_arch_ids() hook
  pc: init pcms->apic_id_limit once and use it throughout pc.c
  pc: acpi: remove NOP assignment
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-15 16:43:48 +00:00
Thomas Huth
fad7fb9ccd slirp: Add IPv6 support to the TFTP code
Add the handler code for incoming TFTP packets to udp6_input(),
and make sure that the TFTP code can send packets with both,
udp_output() and udp6_output() by introducing a wrapper function
called tftp_udp_output().

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
2016-03-15 17:05:34 +01:00
Peter Maydell
f84d587111 Merge remote-tracking branch 'remotes/berrange/tags/pull-io-next-2016-03-15-1' into staging
Merge I/O fixes

# gpg: Signature made Tue 15 Mar 2016 14:42:43 GMT using RSA key ID 15104FDF
# gpg: Good signature from "Daniel P. Berrange <dan@berrange.com>"
# gpg:                 aka "Daniel P. Berrange <berrange@redhat.com>"

* remotes/berrange/tags/pull-io-next-2016-03-15-1:
  io: stronger check for support for IPv4/6

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-15 15:51:06 +00:00
Marcel Apfelbaum
6a991e07bb hw/acpi: fix GSI links UID
According to the ACPI spec, each UID must be unique.
Use the irq number as UID for GSI links.

Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-15 16:16:57 +02:00
Daniel P. Berrange
cfd47a71df io: stronger check for support for IPv4/6
Instead of just checking for bind(), also check whether
getaddrinfo can resolve IPv6 addresses. This catches
failure when travis runs QEMU builds inside minimal
docker containers

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-03-15 13:55:52 +00:00
Peter Maydell
d41e0bed7b Merge remote-tracking branch 'remotes/ehabkost/tags/x86-pull-request' into staging
X86 fixes

# gpg: Signature made Mon 14 Mar 2016 20:26:25 GMT using RSA key ID 984DC5A6
# gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>"

* remotes/ehabkost/tags/x86-pull-request:
  kvm: Remove x2apic feature from CPU model when kernel_irqchip is off
  hyperv: cpu hotplug fix with HyperV enabled

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-15 11:05:37 +00:00
Peter Maydell
9828f9b6c8 Merge remote-tracking branch 'remotes/rth/tags/pull-i386-20160314' into staging
target-i386 fixes

# gpg: Signature made Mon 14 Mar 2016 17:54:06 GMT using RSA key ID 4DD0279B
# gpg: Good signature from "Richard Henderson <rth7680@gmail.com>"
# gpg:                 aka "Richard Henderson <rth@redhat.com>"
# gpg:                 aka "Richard Henderson <rth@twiddle.net>"

* remotes/rth/tags/pull-i386-20160314:
  target-i386: Dump unknown opcodes with -d unimp
  target-i386: Fix inhibit irq mask handling
  target-i386: Use gen_nop_modrm for prefetch instructions
  target-i386: Fix addr16 prefix
  target-i386: Fix SMSW for 64-bit mode
  target-i386: Fix SMSW and LMSW from/to register
  target-i386: Avoid repeated calls to the bnd_jmp helper

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-15 10:08:12 +00:00
Yann Bordenave
7aac531ef2 qapi-schema, qemu-options & slirp: Adding Qemu options for IPv6 addresses
This patch adds parameters to manage some new options in the qemu -net
command.
Slirp IPv6 address, network prefix, and DNS IPv6 address can be given in
argument to the qemu command.
Defaults parameters are respectively fec0::2, fec0::, /64 and fec0::3.

Signed-off-by: Yann Bordenave <meow@meowstars.org>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
2016-03-15 10:35:25 +01:00
Guillaume Subiron
05061d8548 slirp: Adding IPv6 address for DNS relay
This patch adds an IPv6 address to the DNS relay. in6_equal_dns() is
developed using this Slirp attribute.
sotranslate_in/out/accept() are also updated to manage the IPv6 case so the
guest can be able to join the host using one of the Slirp addresses.

For now this only points to localhost. Further development will be needed to
automatically fetch the IPv6 address from resolv.conf, and announce this via
RDNSS.

Signed-off-by: Guillaume Subiron <maethor@subiron.org>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
2016-03-15 10:35:22 +01:00
Guillaume Subiron
3feea4447f slirp: Handle IPv6 in TCP functions
This patch adds IPv6 case in TCP functions refactored by the last
patches.
This also adds IPv6 pseudo-header in tcpiphdr structure.
Finally, tcp_input() is called by ip6_input().

Signed-off-by: Guillaume Subiron <maethor@subiron.org>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
2016-03-15 10:35:19 +01:00
Guillaume Subiron
1252cf40a8 slirp: Reindent after refactoring
No code change.

Signed-off-by: Guillaume Subiron <maethor@subiron.org>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
2016-03-15 10:35:17 +01:00
Guillaume Subiron
9dfbf250d2 slirp: Generalizing and neutralizing various TCP functions before adding IPv6 stuff
Basically, this patch adds some switch in various TCP functions to
prepare them for the IPv6 case.

To have something to "switch" in tcp_input() and tcp_respond(), a new
argument is used to give them the sa_family of the addresses they are
working on.

This patch does not include the entailed reindentation, to make proofread
easier. Reindentation is adressed in the following no-op patch.

Signed-off-by: Guillaume Subiron <maethor@subiron.org>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
2016-03-15 10:35:14 +01:00
Guillaume Subiron
98c63057d2 slirp: Factorizing tcpiphdr structure with an union
This patch factorizes the tcpiphdr structure to put the IPv4 fields in
an union, for addition of version 6 in further patch.
Using some macros, retrocompatibility of the existing code is assured.

This patch also fixes the SLIRP_MSIZE and margin computation in various
functions, and makes them compatible with the new tcpiphdr structure,
whose size will be bigger than sizeof(struct tcphdr) + sizeof(struct ip)

Signed-off-by: Guillaume Subiron <maethor@subiron.org>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
2016-03-15 10:35:11 +01:00
Guillaume Subiron
15d62af4b6 slirp: Adding IPv6 UDP support
This adds the sin6 case in the fhost and lhost unions and related macros.
It adds udp6_input() and udp6_output().
It adds the IPv6 case in sorecvfrom().
Finally, udp_input() is called by ip6_input().

Signed-off-by: Guillaume Subiron <maethor@subiron.org>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
2016-03-15 10:35:08 +01:00
Yann Bordenave
fc6c9257c6 slirp: Adding ICMPv6 error sending
Adding icmp6_send_error to send ICMPv6 Error messages. This function is
simpler than the v4 version.
Adding some calls in various functions to send ICMP errors, when a
received packet is too big, or when its hop limit is 0.

Signed-off-by: Yann Bordenave <meow@meowstars.org>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
2016-03-15 10:35:04 +01:00
Yann Bordenave
de40abfecf slirp: Fix ICMP error sending
Disambiguation : icmp_error is renamed into icmp_send_error, since it
doesn't manage errors, but only sends ICMP Error messages.

Signed-off-by: Yann Bordenave <meow@meowstars.org>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
2016-03-15 10:35:02 +01:00
Guillaume Subiron
0d6ff71ae3 slirp: Adding IPv6, ICMPv6 Echo and NDP autoconfiguration
This patch adds the functions needed to handle IPv6 packets. ICMPv6 and
NDP headers are implemented.

Slirp is now able to send NDP Router or Neighbor Advertisement when it
receives Router or Neighbor Solicitation. Using a 64bit-sized IPv6
prefix, the guest is now able to perform stateless autoconfiguration
(SLAAC) and to compute its IPv6 address.

This patch adds an ndp_table, mainly inspired by arp_table, to keep an
NDP cache and manage network address resolution.
Slirp regularly sends NDP Neighbor Advertisement, as recommended by the
RFC, to make the guest refresh its route.

This also adds ip6_cksum() to compute ICMPv6 checksums using IPv6
pseudo-header.

Some #define ETH_* are moved upper in slirp.h to make them accessible to
other slirp/*.h

Signed-off-by: Guillaume Subiron <maethor@subiron.org>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
2016-03-15 10:35:00 +01:00
Peter Maydell
1a8b408168 Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches

# gpg: Signature made Mon 14 Mar 2016 16:36:52 GMT using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"

* remotes/kevin/tags/for-upstream: (40 commits)
  iotests: Add test for QMP event rates
  monitor: Use QEMU_CLOCK_VIRTUAL for the event queue in qtest mode
  monitor: Separate QUORUM_REPORT_BAD events according to the node name
  quorum: Fix crash in quorum_aio_cb()
  iotests: Correct 081's reference output
  block: Remove unused typedef of BlockDriverDirtyHandler
  block: Move block dirty bitmap code to separate files
  typedefs: Add BdrvDirtyBitmap
  block: Include hbitmap.h in block.h
  backup: Use Bitmap to replace "s->bitmap"
  vpc: Use BB functions in .bdrv_create()
  vmdk: Use BB functions in .bdrv_create()
  vhdx: Use BB functions in .bdrv_create()
  vdi: Use BB functions in .bdrv_create()
  sheepdog: Use BB functions in .bdrv_create()
  qed: Use BB functions in .bdrv_create()
  qcow2: Use BB functions in .bdrv_create()
  qcow: Use BB functions in .bdrv_create()
  parallels: Use BB functions in .bdrv_create()
  block: Introduce blk_set_allow_write_beyond_eof()
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-15 09:13:06 +00:00
Lan Tianyu
492a4c94be kvm: Remove x2apic feature from CPU model when kernel_irqchip is off
x2apic feature is in the kvm_default_props and automatically added to all
CPU models when KVM is enabled. But userspace devices don't support x2apic
which can't be enabled without the in-kernel irqchip. It will trigger
warning of "host doesn't support requested feature: CPUID.01H:ECX.x2apic
[bit 21]" when kernel_irqchip is off. This patch is to fix it via removing
x2apic feature when kernel_irqchip is off.

Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-03-14 17:26:06 -03:00
Denis V. Lunev
4467c6c118 hyperv: cpu hotplug fix with HyperV enabled
With Hyper-V enabled CPU hotplug stops working. The CPU appears
in device manager on Windows but does not appear in peformance
monitor and control panel.

The root of the problem is the following. Windows checks
HV_X64_CPU_DYNAMIC_PARTITIONING_AVAILABLE bit in CPUID. The
presence of this bit is enough to cure the situation.

The bit should be set when CPU hotplug is allowed for HyperV VM.
The check that hot_add_cpu callback is defined is enough from the
protocol point of view. Though this callback is defined almost
always thus there is no need to export that knowledge in the
other way.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Richard Henderson <rth@twiddle.net>
CC: Eduardo Habkost <ehabkost@redhat.com>
CC: "Andreas Färber" <afaerber@suse.de>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-03-14 17:26:06 -03:00
Richard Henderson
b9f9c5b41a target-i386: Dump unknown opcodes with -d unimp
We discriminate here between opcodes that are illegal in the current
cpu mode or with illegal arguments (such as modrm.mod == 3) and
encodings that are unknown (such as an unimplemented isa extension).

Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-03-14 10:53:07 -07:00
Richard Henderson
f083d92c03 target-i386: Fix inhibit irq mask handling
The patch in 7f0b714 was too simplistic, in that we wound up setting
the flag and then resetting it immediately in gen_eob.

Fixes the reported boot problem with Windows XP.

Reported-by: Hervé Poussineau <hpoussin@reactos.org>
Tested-by: Hervé Poussineau <hpoussin@reactos.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-03-14 10:53:02 -07:00
Richard Henderson
26317698ef target-i386: Use gen_nop_modrm for prefetch instructions
Tested-by: Hervé Poussineau <hpoussin@reactos.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-03-14 10:52:56 -07:00
Paolo Bonzini
e2e02a8207 target-i386: Fix addr16 prefix
While ADDSEG will only be false in 16-bit mode for LEA, it can be
false even in other cases when 16-bit addresses are obtained via
the 67h prefix in 32-bit mode.  In this case, gen_lea_v_seg forgets
to add a nonzero FS or GS base if CS/DS/ES/SS are all zero.  This
case is pretty rare but happens when booting Windows 95/98, and
this patch fixes it.

The bug is visible since commit d6a291498, but it was introduced
together with gen_lea_v_seg and it probably could be reproduced
with a "addr16 gs movsb" instruction as early as in commit
ca2f29f555.

Reported-by: Hervé Poussineau <hpoussin@reactos.org>
Tested-by: Hervé Poussineau <hpoussin@reactos.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1456931078-21635-1-git-send-email-pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-03-14 10:52:48 -07:00
Richard Henderson
a657f79e32 target-i386: Fix SMSW for 64-bit mode
In non-64-bit modes, the instruction always stores 16 bits.
But in 64-bit mode, when the destination is a register, the
instruction can write 32 or 64 bits.

Tested-by: Hervé Poussineau <hpoussin@reactos.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-03-14 10:52:42 -07:00
Paolo Bonzini
880f848650 target-i386: Fix SMSW and LMSW from/to register
SMSW and LMSW accept register operands, but commit 1906b2a ("target-i386:
Rearrange processing of 0F 01", 2016-02-13) did not account for that.

Fixes: 1906b2af7c
Reported-by: Hervé Poussineau <hpoussin@reactos.org>
Tested-by: Hervé Poussineau <hpoussin@reactos.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1456845134-18812-1-git-send-email-pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-03-14 10:52:29 -07:00
Paolo Bonzini
8b33e82b86 target-i386: Avoid repeated calls to the bnd_jmp helper
Two flags were tested the wrong way.

Tested-by: Hervé Poussineau <hpoussin@reactos.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1456845145-18891-1-git-send-email-pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
[rth: Fixed enable test as well.]
2016-03-14 10:45:41 -07:00
Kevin Wolf
0d611402a1 Merge remote-tracking branch 'mreitz/tags/pull-block-for-kevin-2016-03-14-v2' into queue-block
Block patches for pi day, v2.

# gpg: Signature made Mon Mar 14 17:35:29 2016 CET using RSA key ID E838ACAD
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>"

* mreitz/tags/pull-block-for-kevin-2016-03-14-v2:
  iotests: Add test for QMP event rates
  monitor: Use QEMU_CLOCK_VIRTUAL for the event queue in qtest mode
  monitor: Separate QUORUM_REPORT_BAD events according to the node name
  quorum: Fix crash in quorum_aio_cb()
  iotests: Correct 081's reference output
  block: Remove unused typedef of BlockDriverDirtyHandler
  block: Move block dirty bitmap code to separate files
  typedefs: Add BdrvDirtyBitmap
  block: Include hbitmap.h in block.h
  backup: Use Bitmap to replace "s->bitmap"

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 17:36:31 +01:00
Alberto Garcia
7223c48cff iotests: Add test for QMP event rates
This test verifies that the rate-limited QMP events are emitted at a
maximum rate of 1 per second as defined in monitor_qapi_event_conf in
monitor.c

It also checks that QUORUM_REPORT_BAD events generated from different
nodes are kept in separate queues so they don't mask each other.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 0dbd3ee88a59a6363042ad81cfb345037bfbf612.1457610443.git.berto@igalia.com
[mreitz@redhat.com: Renamed test from 146 to 148]
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-03-14 17:35:06 +01:00
Alberto Garcia
dc59997871 monitor: Use QEMU_CLOCK_VIRTUAL for the event queue in qtest mode
This allows us to perform tests on the monitor queues to verify that
the rate limits are enforced.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: dde511809e954a5c32d5b648bb184c03c89ed5d5.1457610443.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-03-14 17:35:06 +01:00
Alberto Garcia
6d425eb94d monitor: Separate QUORUM_REPORT_BAD events according to the node name
The QUORUM_REPORT_BAD event is emitted whenever there's an I/O error
in a child of a Quorum device. This event is emitted at a maximum rate
of 1 per second. This means that an error in one of the children will
mask errors in the other children if they happen within the same 1
second interval.

This patch modifies qapi_event_throttle_equal() so QUORUM_REPORT_BAD
events are kept separately if they come from different children.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: b989c0cb3755bc4b6696e796fa8ed2ef6c56606a.1457610443.git.berto@igalia.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-03-14 17:35:06 +01:00
Alberto Garcia
b9c600d207 quorum: Fix crash in quorum_aio_cb()
quorum_aio_cb() emits the QUORUM_REPORT_BAD event if there's
an I/O error in a Quorum child. However sacb->aiocb must be
correctly initialized for this to happen. read_quorum_children() and
read_fifo_child() are not doing this, which results in a QEMU crash.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 8138570d071ba7e25db3736979234a1fd71dbd05.1457610443.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-03-14 17:35:06 +01:00
Max Reitz
e3f66e0368 iotests: Correct 081's reference output
The newly added type parameter for the QUORUM_REPORT_BAD event changed
the output of iotest 081, so the reference should be amended
accordingly.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 1457705687-27122-1-git-send-email-mreitz@redhat.com
Reviewed-by: Alberto Garcia <berto@igalia.com>
2016-03-14 17:35:06 +01:00
Fam Zheng
fcce736719 block: Remove unused typedef of BlockDriverDirtyHandler
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 1457412306-18940-6-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-03-14 17:35:05 +01:00
Fam Zheng
ebab225910 block: Move block dirty bitmap code to separate files
The only code change is making bdrv_dirty_bitmap_truncate public. It is
used in block.c.

Also two long lines (bdrv_get_dirty) are wrapped.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 1457412306-18940-5-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-03-14 17:35:05 +01:00
Fam Zheng
9a3f5cf1bf typedefs: Add BdrvDirtyBitmap
Following patches to refactor and move block dirty bitmap code could use
this.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 1457412306-18940-4-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-03-14 17:35:05 +01:00
Fam Zheng
78f9dc859d block: Include hbitmap.h in block.h
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 1457412306-18940-3-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-03-14 17:35:05 +01:00
Fam Zheng
b2f56462d5 backup: Use Bitmap to replace "s->bitmap"
"s->bitmap" tracks done sectors, we only check bit states without using any
iterator which HBitmap is good for. Switch to "Bitmap" which is simpler and
more memory efficient.

Meanwhile, rename it to done_bitmap, to reflect the intention.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 1457412306-18940-2-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-03-14 17:35:05 +01:00
Peter Maydell
618a5a8bc5 Merge remote-tracking branch 'remotes/stefanha/tags/tracing-pull-request' into staging
# gpg: Signature made Mon 14 Mar 2016 11:27:01 GMT using RSA key ID 81AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"

* remotes/stefanha/tags/tracing-pull-request:
  trace: separate MMIO tracepoints from TB-access tracepoints
  trace: include CPU index in trace_memory_region_*()

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-14 16:22:17 +00:00
Kevin Wolf
b8f45cdf78 vpc: Use BB functions in .bdrv_create()
All users of the block layers are supposed to go through a BlockBackend.
The .bdrv_create() implementation is one such user, so this patch
converts it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:44 +01:00
Kevin Wolf
c4bea1690e vmdk: Use BB functions in .bdrv_create()
All users of the block layers are supposed to go through a BlockBackend.
The .bdrv_create() implementation is one such user, so this patch
converts it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Kevin Wolf
10bf03af12 vhdx: Use BB functions in .bdrv_create()
All users of the block layers are supposed to go through a BlockBackend.
The .bdrv_create() implementation is one such user, so this patch
converts it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Kevin Wolf
a08f0c3b5f vdi: Use BB functions in .bdrv_create()
All users of the block layers are supposed to go through a BlockBackend.
The .bdrv_create() implementation is one such user, so this patch
converts it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Kevin Wolf
fba98d455a sheepdog: Use BB functions in .bdrv_create()
All users of the block layers are supposed to go through a BlockBackend.
The .bdrv_create() implementation is one such user, so this patch
converts it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Kevin Wolf
8a56fdadaf qed: Use BB functions in .bdrv_create()
All users of the block layers are supposed to go through a BlockBackend.
The .bdrv_create() implementation is one such user, so this patch
converts it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Kevin Wolf
23588797b6 qcow2: Use BB functions in .bdrv_create()
All users of the block layers are supposed to go through a BlockBackend.
The .bdrv_create() implementation is one such user, so this patch
converts it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Kevin Wolf
6af4016020 qcow: Use BB functions in .bdrv_create()
All users of the block layers are supposed to go through a BlockBackend.
The .bdrv_create() implementation is one such user, so this patch
converts it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Kevin Wolf
8942764f54 parallels: Use BB functions in .bdrv_create()
All users of the block layers are supposed to go through a BlockBackend.
The .bdrv_create() implementation is one such user, so this patch
converts it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Kevin Wolf
c10c9d9615 block: Introduce blk_set_allow_write_beyond_eof()
We check that the guest can't write beyond the end of its disk, but for
other internal users it can make sense to allow growing a file.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Kevin Wolf
6340472c54 block: Use writeback in .bdrv_create() implementations
There's no reason to use a writethrough cache mode while creating an
image.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Kevin Wolf
2073d410ce hmp: Extend drive_del to delete nodes without BB
Now that we can use drive_add to create new nodes without a BB, we also
want to be able to delete such nodes again.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-14 16:46:43 +01:00
Kevin Wolf
abb21ac3e6 hmp: 'drive_add -n' for creating a node without BB
This patch adds an option to the drive_add HMP command to create only a
BlockDriverState without a BlockBackend on top.

The motivation for this is that libvirt needs to specify options to a
migration target (specifically, detect-zeroes). drive-mirror doesn't
allow specifying options, and the proper way to do this is to create the
target BDS separately with blockdev-add (where you can specify options)
and then use blockdev-mirror to that BDS.

However, libvirt can't use blockdev-add as long as it is still
experimental, and we're expecting that it will still take some time, so
we need to resort to drive_add.

The problem with drive_add is that so far it always created a BB, and
BDSes with a BB can't be used as a mirroring target as long as we don't
support multiple BBs per BDS - and while we're working towards that
goal, it's another thing that will still take some time.

So to achieve the goal, the simplest solution to provide the
functionality now without adding one-off options to the mirror QMP
commands is to extend drive_add to create nodes without BBs.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Fam Zheng
71968dbfd8 vmdk: Switch to heap arrays for vmdk_parent_open
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Fam Zheng
5997c210b9 vmdk: Switch to heap arrays for vmdk_read_cid
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Fam Zheng
965415eb20 vmdk: Switch to heap arrays for vmdk_write_cid
It is only called once for each opened image, so we can do it the easy
way.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Kevin Wolf
a81d616437 block: Fix cache mode defaults in bds_tree_init()
Without setting explicit defaults in the options, blockdev-add without
an ID ended up defaulting to writethrough. It should be writeback as
documented.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2016-03-14 16:46:43 +01:00
Kevin Wolf
73176bee99 block: Fix snapshot=on cache modes
Since commit 91a097e, we end up with a somewhat weird cache mode
configuration with snapshot=on: The commit broke the cache mode
inheritance for the snapshot overlay so that it is opened as
writethrough instead of unsafe now. The following bdrv_append() call to
put it on top of the tree swaps the WCE flag with the snapshot's backing
file (i.e. the originally given file), so what we eventually get is
cache=writeback on the temporary overlay and
cache=writethrough,cache.no-flush=on on the real image file.

This patch changes things so that the temporary overlay gets
cache=unsafe again like it used to, and the real images get whatever the
user specified. This means that cache.direct is now respected even with
snapshot=on, and in the case of committing changes, the final flush is
no longer ignored except explicitly requested by the user.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-14 16:46:43 +01:00
Kevin Wolf
f86b8b584b blockdev: Snapshotting must not open second instance of old top
Calling bdrv_img_create() with a size of -1 means that it determines the
size automatically by opening the backing file. However, in the case of
live snapshots, the backing file is already opened and we must avoid
opening the same image twice at the same time. Apart from that, just
getting the size from the already existing BDS is a lot less overhead
than opening a new instance.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
2016-03-14 16:46:43 +01:00
Changlong Xie
924e8a2bbc quorum: modify vote rules for flush operation
Keep flush interface the same logic as quorum read/write, Otherwise in
following scenario, we'll encounter unexpected errors.

Quorum has two children(A, B). A do flush sucessfully, but B flush failed.
This cause the filesystem of guest become read-only with following errors:

end_request: I/O error, dev vda, sector 11159960
Aborting journal on device vda3-8
EXT4-fs error (device vda3): ext4_journal_start_sb:327: Detected abort journal
EXT4-fs (vda3): Remounting filesystem read-only

Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Changlong Xie
0ae053b7e1 qmp event: Refactor QUORUM_REPORT_BAD
Introduce QuorumOpType, and make QUORUM_REPORT_BAD compatible
with it.

Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Changlong Xie
58346b82ed docs: fix invalid node name in qmp event
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:43 +01:00
Jeff Cody
1001dd9f84 block/vpc: add tests for image creation force_size parameter
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:42 +01:00
Jeff Cody
fb9245c261 block/vpc: give option to force the current_size field in .bdrv_create
When QEMU creates a VHD image, it goes by the original spec,
calculating the current_size based on the nearest CHS geometry (with an
exception for disks > 127GB).

Apparently, Azure will only allow images that are sized to the nearest
MB, and the current_size as calculated from CHS cannot guarantee that.

Allow QEMU to create images similar to how Hyper-V creates images, by
setting current_size to the specified virtual disk size.  This
introduces an option, force_size, to be passed to the vpc format during
image creation, e.g.:

    qemu-img convert -f raw -o force_size -O vpc test.img test.vhd

When using the "force_size" option, the creator app field used by
QEMU will be "qem2" instead of "qemu", to indicate the difference.
In light of this, we also add parsing of the "qem2" field during
vpc_open.

Bug reference: https://bugs.launchpad.net/qemu/+bug/1490611

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:42 +01:00
Jeff Cody
798609bbe2 block/vpc: tests for auto-detecting VPC and Hyper-V VHD images
This tests auto-detection, and overrides, of VHD image sizes created
by Virtual PC, Hyper-V, and Disk2vhd.

This adds three sample images:

hyperv2012r2-dynamic.vhd.bz2 - dynamic VHD image created with Hyper-V
virtualpc-dynamic.vhd.bz2    - dynamic VHD image created with Virtual PC
d2v-zerofilled.vhd.bz2       - dynamic VHD image created with Disk2vhd

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:42 +01:00
Jeff Cody
c540d53ac8 block/vpc: choose size calculation method based on creator_app field
The VHD file format is used by both Virtual PC, and Hyper-V.  However,
how the virtual disk size is calculated varies between the two.

Virtual PC uses the CHS drive parameters to determine the drive size.
Hyper-V, on the other hand, uses the current_size field in the footer
when determining image size.

This is problematic for a few reasons:

* VHD images from Hyper-V, using CHS calculations, will likely be
  trunctated.

* If we just rely always on current_size, then QEMU may have data
  compatibility issues with Virtual PC (we may write too much data
  into a VHD file to be used by Virtual PC, for instance).

* Existing VHD images created by QEMU have used the CHS calculations,
  except for images exceeding the 127GB limit.  We want to remain
  compatible with our own generated images.

Luckily, the VHD specification defines a 'Creator App' field, that is
used to indicate what software created the VHD file.

This patch does two things:

    1. Uses the 'Creator App' field to help determine how to calculate
       size, and

    2. Adds a VPC format option 'force_size_calc', so that the user can
       override the 'Creator App' auto-detection, in case there exist
       VHD images with unknown or contradictory 'Creator App' entries.

N.B.: We currently use the maximum CHS value as an indication to use the
current_size field.  This patch does not change that, even with the
'force_size_calc' option.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:42 +01:00
Kevin Wolf
c21cc6ca98 block/qapi: Include empty drives in query-blockstats
Since commit 5ec18f8c, query-blockstats didn't return the statistics of
drives without media any more because such drives have only a BB now,
but not a BDS any more.

This patch fixes the regression so that query-blockstats iterates over
BBs by default and empty drives are displayed again.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-14 16:46:42 +01:00
Kevin Wolf
b07363a1a3 block/qapi: Factor out bdrv_query_bds_stats()
The new functions handles the data that is taken from the
BlockDriverState.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-14 16:46:42 +01:00
Kevin Wolf
2b77e60ab8 block/qapi: Factor out bdrv_query_blk_stats()
The new functions handles the data that is taken from the BlockBackend.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-03-14 16:46:42 +01:00
Paolo Bonzini
396374caea qemu-img: eliminate memory leak
Not particularly important since qemu-img exits immediately after
calling img_rebase, but easily fixed.  Coverity says thanks.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-03-14 16:46:42 +01:00
Peter Maydell
6dcea61425 Merge remote-tracking branch 'remotes/awilliam/tags/vfio-update-20160311.0' into staging
VFIO updates 2016-03-11

 - Allow devices to be specified via sysfs path (Alex Williamson)
 - vfio region helpers and generalization for future device specific regions
   (Alex Williamson)
 - Automatic ROM device ID and checksum fixup (Alex Williamson)
 - Split VGA setup to allow enabling VGA from quirks (Alex Williamson)
 - Remove fixed string limit for ROM MemoryRegion name (Neo Jia)
 - MAINTAINERS update (Thomas Huth)

# gpg: Signature made Fri 11 Mar 2016 15:55:31 GMT using RSA key ID 3BB08B22
# gpg: Good signature from "Alex Williamson <alex.williamson@redhat.com>"
# gpg:                 aka "Alex Williamson <alex@shazbot.org>"
# gpg:                 aka "Alex Williamson <alwillia@redhat.com>"
# gpg:                 aka "Alex Williamson <alex.l.williamson@gmail.com>"

* remotes/awilliam/tags/vfio-update-20160311.0:
  MAINTAINERS: Add entry for the include/hw/vfio/ folder
  vfio/pci: replace fixed string limit by g_strdup_printf
  vfio/pci: Split out VGA setup
  vfio/pci: Fixup PCI option ROMs
  vfio/pci: Convert all MemoryRegion to dynamic alloc and consistent functions
  vfio: Generalize region support
  vfio: Wrap VFIO_DEVICE_GET_REGION_INFO
  vfio: Add sysfsdev property for pci & platform

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-14 15:11:39 +00:00
Peter Maydell
0dcee62261 Merge remote-tracking branch 'remotes/amit-migration/tags/migration-for-2.6-7' into staging
migration:
 - postcopy is no longer experimental
 - fix a use-after-free in postcopy
 - fix a compile warning

# gpg: Signature made Fri 11 Mar 2016 12:29:33 GMT using RSA key ID 854083B6
# gpg: Good signature from "Amit Shah <amit@amitshah.net>"
# gpg:                 aka "Amit Shah <amit@kernel.org>"
# gpg:                 aka "Amit Shah <amitshah@gmx.net>"

* remotes/amit-migration/tags/migration-for-2.6-7:
  postcopy: Remove the x-
  postcopy: listen thread is never joined
  migration: fix use-after-free in loadvm_postcopy_handle_run_bh
  migration: fix warning for source_return_path_thread

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-14 13:51:21 +00:00
Peter Maydell
8326ec2c83 Merge remote-tracking branch 'remotes/berrange/tags/pull-io-win32-2016-03-11-1' into staging
Merge I/O fixes for win32

# gpg: Signature made Fri 11 Mar 2016 10:03:20 GMT using RSA key ID 15104FDF
# gpg: Good signature from "Daniel P. Berrange <dan@berrange.com>"
# gpg:                 aka "Daniel P. Berrange <berrange@redhat.com>"

* remotes/berrange/tags/pull-io-win32-2016-03-11-1:
  osdep: remove use of socket_error() from all code
  osdep: add wrappers for socket functions
  char: remove qemu_chr_open_socket_fd method
  char: remove socket_try_connect method
  char: remove qemu_chr_finish_socket_connection method
  io: implement socket watch for win32 using WSAEventSelect+select
  io: remove checking of EWOULDBLOCK
  io: use qemu_accept to ensure SOCK_CLOEXEC is set
  io: introduce qio_channel_create_socket_watch
  io: pass HANDLE to g_source_add_poll on Win32
  io: fix copy+paste mistake in socket error message
  io: assert errors before asserting content in I/O test
  io: set correct error object in background reader test thread
  io: wait for incoming client in socket test
  io: bind to socket before creating QIOChannelSocket
  io: initialize sockets in test program
  io: use bind() to check for IPv4/6 availability
  osdep: fix socket_error() to work with Mingw64

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-14 11:49:33 +00:00
Peter Maydell
d1ab9681ac Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20160311' into staging
CPU hotplug via cpu-add for s390x, cleanup of the s390x machine
compat code and a bugfix in the s390-ccw bios.

# gpg: Signature made Fri 11 Mar 2016 09:48:02 GMT using RSA key ID C6F02FAF
# gpg: Good signature from "Cornelia Huck <huckc@linux.vnet.ibm.com>"
# gpg:                 aka "Cornelia Huck <cornelia.huck@de.ibm.com>"

* remotes/cohuck/tags/s390x-20160311:
  s390x/cpu: use g_new0
  s390x: Introduce S390MachineClass
  s390x: Introduce machine definition macros
  pc-bios/s390-ccw: fix old bug in ptr increment
  s390x/cpu: Allow hotplug of CPUs
  s390x/cpu: Add error handling to cpu creation
  s390x/cpu: Add CPU property links
  s390x/cpu: Tolerate max_cpus
  s390x/cpu: Get rid of side effects when creating a vcpu
  s390x/cpu: Set initial CPU state in common routine
  s390x/cpu: Cleanup init in preparation for hotplug

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-14 11:13:11 +00:00
Hollis Blanchard
f2d089425d trace: separate MMIO tracepoints from TB-access tracepoints
Memory accesses to code which has previously been translated into a TB show up
in the MMIO path, so that they may invalidate the TB. It's extremely confusing
to mix those in with device MMIOs, so split them into their own tracepoint.

Signed-off-by: Hollis Blanchard <hollis_blanchard@mentor.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1456949575-1633-2-git-send-email-hollis_blanchard@mentor.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-03-14 09:34:30 +00:00
Hollis Blanchard
5a68be94ac trace: include CPU index in trace_memory_region_*()
Knowing which CPU performed an action is essential for understanding SMP guest
behavior.

However, cpu_physical_memory_rw() may be executed by a machine init function,
before any VCPUs are running, when there is no CPU running ('current_cpu' is
NULL). In this case, store -1 in the trace record as the CPU index. Trace
analysis tools may need to be aware of this special case.

Signed-off-by: Hollis Blanchard <hollis_blanchard@mentor.com>
Message-id: 1456949575-1633-1-git-send-email-hollis_blanchard@mentor.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-03-14 09:34:30 +00:00
Cédric Le Goater
5167560b03 ipmi: add some local variables in ipmi_sdr_init
This patch adds a couple of variables to manipulate the raw sdr
entries. The const attribute is also removed on init_sdrs. This will
ease the introduction of a sdr loader using a file.

Signed-off-by: Cédric Le Goater <clg@fr.ibm.com>
Acked-by: Corey Minyard <cminyard@mvista.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-11 16:59:13 +02:00
Cédric Le Goater
52fc01d973 ipmi: remove the need of an ending record in the SDR table
Currently, the code initializing the sdr table relies on an ending
record with a recid of 0xffff. This patch changes the loop to use the
sdr size as a breaking condition.

Signed-off-by: Cédric Le Goater <clg@fr.ibm.com>
Acked-by: Corey Minyard <cminyard@mvista.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-11 16:59:13 +02:00
Cédric Le Goater
4fa9f08e96 ipmi: use a function to initialize the SDR table
This patch moves the code section initializing the sdrs in its own
routine to prepare ground for changes in the subsequent patches.

Signed-off-by: Cédric Le Goater <clg@fr.ibm.com>
Acked-by: Corey Minyard <cminyard@mvista.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-11 16:59:13 +02:00
Cédric Le Goater
0bc6001f0d ipmi: add a realize function to the device class
This will be useful to define and use properties when the object is
instantiated.

Signed-off-by: Cédric Le Goater <clg@fr.ibm.com>
Acked-by: Corey Minyard <cminyard@mvista.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-11 16:59:13 +02:00
Cédric Le Goater
6acb971a94 ipmi: add rsp_buffer_set_error() helper
The third byte in the response buffer of an IPMI command holds the
error code. In many IPMI command handlers, this byte is updated
directly. This patch adds a helper routine to clarify why this byte is
being used.

Signed-off-by: Cédric Le Goater <clg@fr.ibm.com>
Acked-by: Corey Minyard <cminyard@mvista.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-11 16:59:13 +02:00
Cédric Le Goater
7f996411ad ipmi: remove IPMI_CHECK_RESERVATION() macro
Some IPMI command handlers in the BMC simulator use a macro
IPMI_CHECK_RESERVATION() to check a SDR reservation but the macro
implicitly uses local variables. This patch simply removes it.

Signed-off-by: Cédric Le Goater <clg@fr.ibm.com>
Acked-by: Corey Minyard <cminyard@mvista.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-11 16:59:13 +02:00
Cédric Le Goater
a580d82085 ipmi: replace IPMI_ADD_RSP_DATA() macro with inline helpers
The IPMI command handlers in the BMC simulator use a macro
IPMI_ADD_RSP_DATA() to push bytes in a response buffer. The macro
hides the fact that it implicitly uses variables local to the handler,
which is misleading.

This patch introduces a simple 'struct RspBuffer' and inlined helper
routines to store byte(s) in a response buffer. rsp_buffer_push()
replaces the macro IPMI_ADD_RSP_DATA() and rsp_buffer_pushmore() is
new helper to push multiple bytes. The latest is used in the command
handlers get_msg() and get_sdr() which are manipulating the buffer
directly.

Signed-off-by: Cédric Le Goater <clg@fr.ibm.com>
Acked-by: Corey Minyard <cminyard@mvista.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-11 16:59:13 +02:00
Cédric Le Goater
4f298a4b29 ipmi: remove IPMI_CHECK_CMD_LEN() macro
Most IPMI command handlers in the BMC simulator start with a call to
the macro IPMI_CHECK_CMD_LEN() which verifies that a minimal number of
arguments expected by the command are indeed available. To achieve
this task, the macro implicitly uses local variables which is
misleading in the code.

This patch adds a 'cmd_len_min' attribute to the struct IPMICmdHandler
defining the minimal number of arguments expected by the command and
moves this check in the global command handler ipmi_sim_handle_command().

To clarify the checks being done on the received command, the patch
introduces a helper ipmi_get_handler().

Signed-off-by: Cédric Le Goater <clg@fr.ibm.com>
Acked-by: Corey Minyard <cminyard@mvista.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-11 16:59:13 +02:00
Michael S. Tsirkin
5da4fb0018 MAINTAINERS: machine core
Marcel and Eduardo agreed to co-maintain these.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-11 16:59:13 +02:00
Thomas Huth
494f7b572e MAINTAINERS: Add an entry for virtio header files
Files in the include/hw/virtio/ folder should be included in the
"virtio" sections of the MAINTAINERS file.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-11 16:59:13 +02:00
Igor Mammedov
ed2ef10c0c pc: acpi: clarify why possible LAPIC entries must be present in MADT
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-11 16:59:12 +02:00
Igor Mammedov
adcb89d55d pc: acpi: drop cpu->found_cpus bitmap
cpu->found_cpus bitmap is used for setting present
flag in CPON AML package. But it takes a bunch of code
to fill bitmap and could be simplified by getting
presense info from possible CPUs list directly.

So drop cpu->found_cpus bitmap and unroll possible
CPUs list into APIC index array at the place where
CPUON AML package is created.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
2016-03-11 16:59:12 +02:00
Igor Mammedov
2adba0a18a pc: acpi: create Processor and Notify objects only for valid lapics
do not assume that all lapics in range 0..apic_id_limit
are valid and do not create Processor and Notify objects
for not possible lapics.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-11 16:59:12 +02:00
Igor Mammedov
907e7c94d1 pc: acpi: create MADT.lapic entries only for valid lapics
do not assume that all lapics in range 0..apic_id_limit
are valid and do not create lapic entries for not
possible lapics in MADT.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
2016-03-11 16:59:12 +02:00
Igor Mammedov
5803fce389 pc: acpi: SRAT: create only valid processor lapic entries
When APIC IDs are sparse*, in addition to valid LAPIC
entries the SRAT is also filled invalid ones for non
possible APIC IDs.
Fix it by asking machine for all possible APIC IDs
instead of wrongly assuming that all APIC IDs in
range 0..apic_id_limit are possible.

* sparse lapic topology CLI:
     -smp x,sockets=2,cores=3,maxcpus=6
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-11 16:59:12 +02:00
Igor Mammedov
3d3ebcad6a pc: acpi: cleanup qdev_get_machine() calls
cache qdev_get_machine() result in acpi_setup/acpi_build_update
time and pass it as an argument to child functions that need it.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
2016-03-11 16:59:12 +02:00
Igor Mammedov
3811ef14f5 machine: introduce MachineClass.possible_cpu_arch_ids() hook
on x86 currently range 0..max_cpus is used to generate
architecture-dependent CPU ID (APIC Id) for each present
and possible CPUs. However architecture-dependent CPU IDs
list could be sparse and code that needs to enumerate
all IDs (ACPI) ended up doing guess work enumerating all
possible and impossible IDs up to
  apic_id_limit = x86_cpu_apic_id_from_index(max_cpus).

That leads to creation of MADT entries and Processor
objects in ACPI tables for not possible CPUs.
Fix it by allowing board specify a concrete list of
CPU IDs accourding its own rules (which for x86 depends
on topology). So that code that needs this list could
request it from board instead of trying to guess
what IDs are correct on its own.

This interface will also allow to help making AML
part of CPU hotplug target independent so it could
be reused for ARM target.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
2016-03-11 16:59:12 +02:00
Igor Mammedov
ebde2465a9 pc: init pcms->apic_id_limit once and use it throughout pc.c
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
2016-03-11 16:59:12 +02:00
Igor Mammedov
ae29883508 pc: acpi: remove NOP assignment
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-11 16:59:12 +02:00
Cao jin
f9735fd53f pxb: cleanup
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
2016-03-11 16:59:12 +02:00
Marc-André Lureau
342f7a9d05 qemu-char: make tcp_chr_disconnect() reentrant-safe
During CHR_EVENT_CLOSED, the function could be reentered, make this
case safe.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-11 16:59:12 +02:00
Marc-André Lureau
6167ebbd91 qemu-char: remove all msgfds on disconnect
Disconnect should reset context.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-11 16:59:12 +02:00
Marc-André Lureau
869a58af86 qemu-char: avoid potential double-free
If tcp_set_msgfds() is called several time with NULL fds, this
could lead to double-free.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-11 16:59:12 +02:00
Marc-André Lureau
b7fcb3603c vhost-user: remove useless is_server field
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-11 16:59:12 +02:00
Marc-André Lureau
c1bf3531ae vhost-user: fix use after free
"name" is freed after visiting options, instead use the first NetClientState
name. Adds a few assert() for clarifying and checking some impossible states.

READ of size 1 at 0x602000000990 thread T0
    #0 0x7f6b251c570c  (/lib64/libasan.so.2+0x4770c)
    #1 0x5566dc380600 in qemu_find_net_clients_except net/net.c:824
    #2 0x5566dc39bac7 in net_vhost_user_event net/vhost-user.c:193
    #3 0x5566dbee862a in qemu_chr_be_event /home/elmarco/src/qemu/qemu-char.c:201
    #4 0x5566dbef2890 in tcp_chr_disconnect /home/elmarco/src/qemu/qemu-char.c:2790
    #5 0x5566dbef2d0b in tcp_chr_sync_read /home/elmarco/src/qemu/qemu-char.c:2835
    #6 0x5566dbee8a99 in qemu_chr_fe_read_all /home/elmarco/src/qemu/qemu-char.c:295
    #7 0x5566dc39b964 in net_vhost_user_watch net/vhost-user.c:180
    #8 0x5566dc5a06c7 in qio_channel_fd_source_dispatch io/channel-watch.c:70
    #9 0x7f6b1aa2ab87 in g_main_dispatch /home/elmarco/src/gnome/glib/glib/gmain.c:3154
    #10 0x7f6b1aa2b9cb in g_main_context_dispatch /home/elmarco/src/gnome/glib/glib/gmain.c:3769
    #11 0x5566dc475ed4 in glib_pollfds_poll /home/elmarco/src/qemu/main-loop.c:212
    #12 0x5566dc476029 in os_host_main_loop_wait /home/elmarco/src/qemu/main-loop.c:257
    #13 0x5566dc476165 in main_loop_wait /home/elmarco/src/qemu/main-loop.c:505
    #14 0x5566dbf08d31 in main_loop /home/elmarco/src/qemu/vl.c:1932
    #15 0x5566dbf16783 in main /home/elmarco/src/qemu/vl.c:4646
    #16 0x7f6b180bb57f in __libc_start_main (/lib64/libc.so.6+0x2057f)
    #17 0x5566dbbf5348 in _start (/home/elmarco/src/qemu/x86_64-softmmu/qemu-system-x86_64+0x3f9348)

0x602000000990 is located 0 bytes inside of 5-byte region [0x602000000990,0x602000000995)
freed by thread T0 here:
    #0 0x7f6b2521666a in __interceptor_free (/lib64/libasan.so.2+0x9866a)
    #1 0x7f6b1aa332a4 in g_free /home/elmarco/src/gnome/glib/glib/gmem.c:189
    #2 0x5566dc5f416f in qapi_dealloc_type_str qapi/qapi-dealloc-visitor.c:134
    #3 0x5566dc5f3268 in visit_type_str qapi/qapi-visit-core.c:196
    #4 0x5566dc5ced58 in visit_type_Netdev_fields /home/elmarco/src/qemu/qapi-visit.c:5936
    #5 0x5566dc5cef71 in visit_type_Netdev /home/elmarco/src/qemu/qapi-visit.c:5960
    #6 0x5566dc381a8d in net_visit net/net.c:1049
    #7 0x5566dc381c37 in net_client_init net/net.c:1076
    #8 0x5566dc3839e2 in net_init_netdev net/net.c:1473
    #9 0x5566dc63cc0a in qemu_opts_foreach util/qemu-option.c:1112
    #10 0x5566dc383b36 in net_init_clients net/net.c:1499
    #11 0x5566dbf15d86 in main /home/elmarco/src/qemu/vl.c:4397
    #12 0x7f6b180bb57f in __libc_start_main (/lib64/libc.so.6+0x2057f)

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-11 16:59:12 +02:00
Xiao Guangrong
f7df22de56 nvdimm acpi: emulate dsm method
Emulate dsm method after IO VM-exit

Currently, we only introduce the framework and no function is actually
supported

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-11 16:59:11 +02:00
Xiao Guangrong
18c440e1e1 nvdimm acpi: let qemu handle _DSM method
If dsm memory is successfully patched, we let qemu fully emulate
the dsm method

This patch saves _DSM input parameters into dsm memory, tell dsm
memory address to QEMU, then fetch the result from the dsm memory

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-11 16:59:11 +02:00
Xiao Guangrong
b99514135b nvdimm acpi: introduce patched dsm memory
The dsm memory is used to save the input parameters and store
the dsm result which is filled by QEMU.

The address of dsm memory is decided by bios and patched into
int32 object named "MEMA"

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-11 16:59:11 +02:00
Xiao Guangrong
5fe79386ba nvdimm acpi: initialize the resource used by NVDIMM ACPI
32 bits IO port starting from 0x0a18 in guest is reserved for NVDIMM
ACPI emulation. The table, NVDIMM_DSM_MEM_FILE, will be patched into
NVDIMM ACPI binary code

OSPM uses this port to tell QEMU the final address of the DSM memory
and notify QEMU to emulate the DSM method

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-11 16:59:11 +02:00
Gerd Hoffmann
b63283d7c3 pci-ids: add virtio 1.0 ids to spec
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-11 16:59:11 +02:00
Michael S. Tsirkin
2c02a48e6d acpi-test-data: add _DIS methods
commit c82f503dd5
("hw/acpi: fix Q35 support for legacy Windows OS")
added _DIS for all link devices.

Update expected test files accordingly.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-11 16:59:11 +02:00
Marcel Apfelbaum
c82f503dd5 hw/acpi: fix Q35 support for legacy Windows OS
Legacy Windows operating systems like Windows XP and Windows 2003
require _DIS method to be present for all interrupt links.

PC machines already have a no-op implemented for GSI links, add
it also in Q35.

Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
2016-03-11 16:45:21 +02:00
Cao jin
7335a95abd ich9lpc: fix typo
change some "rbca" to "rcrb"(root complex register block) while
the other to "rcba"(root complex base address).
Bonus: add more comments and fix some indentation.

Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-11 16:45:21 +02:00
Michael S. Tsirkin
226419d615 msi_supported -> msi_nonbroken
Rename controller flag to make it clearer what it means.
Add some documentation as well.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-11 16:45:21 +02:00
Gerd Hoffmann
75fd6f13af virtio-pci: call pci reset variant when guest requests reset.
Actually fixes linux not finding virtio 1.0 device virtqueues after
reboot.  Which is new I think, any chance linux kernel virtio code
became more strict in 4.3?

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Tested-by: Fam Zheng <famz@redhat.com>
2016-03-11 16:45:21 +02:00
Michael S. Tsirkin
79248c22ad i386: update expected DSDT
DSDT was changed by:
commit 27b9fc54d2 ("i386: populate floppy
drive information in DSDT").

Update expected files accordingly.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-11 16:44:58 +02:00
Roman Kagan
27b9fc54d2 i386: populate floppy drive information in DSDT
On x86-based systems Linux determines the presence and the type of
floppy drives via a query of a CMOS field.  So does SeaBIOS when
populating the return data for int 0x13 function 0x08.

However Windows doesn't do it. Instead, it requests this information
from BIOS via int 0x13/0x08 or through ACPI objects _FDE (Floppy Drive
Enumerate) and _FDI (Floppy Drive Information) of the floppy controller
object.  On UEFI systems only ACPI-based detection is supported.

QEMU doesn't provide those objects in its ACPI tables and as a result
floppy drives are invisible to Windows on UEFI/OVMF.

This patch adds those objects to the floppy controller in DSDT,
populating them with the information from respective QEMU objects.

Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: John Snow <jsnow@redhat.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Kevin O'Connor <kevin@koconnor.net>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-11 14:55:15 +02:00
Roman Kagan
e08fde0c5e fdc: add function to determine drive chs limits
When populating ACPI objects for floppy drives one needs to provide the
maximum values for cylinder, sector, and head number the drive supports.

This patch adds a function that iterates through the array of predefined
floppy drive formats and returns the maximum values of c, h, s, out of
those matching the given floppy drive type.

Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: John Snow <jsnow@redhat.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Kevin O'Connor <kevin@koconnor.net>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
2016-03-11 14:55:15 +02:00
Roman Kagan
bda055096b i386: expose floppy drive CMOS type
Make it possible to query the CMOS type of a floppy drive outside of the
source file where it's defined.

It will allow to properly populate the corresponding ACPI objects and
thus enable Windows on BIOS-less systems to access the floppy drives.

Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: John Snow <jsnow@redhat.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Kevin O'Connor <kevin@koconnor.net>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-11 14:55:15 +02:00
Roman Kagan
9b613f4e40 i386/acpi: make floppy controller object dynamic
Instead of statically declaring the floppy controller in DSDT, with its
_STA method depending on some obscure bit in the parent ISA bridge, add
the object dynamically to DSDT via AML API only when the controller is
present.

The _STA method is no longer necessary and is therefore dropped.  So are
the declarations of the fields indicating whether the contoller is
enabled.

Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: John Snow <jsnow@redhat.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Kevin O'Connor <kevin@koconnor.net>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-11 14:55:15 +02:00
Igor Mammedov
c9f4b77ad5 pc-dimm: fix error handling in pc_dimm_check_memdev_is_busy()
If host_memory_backend_get_memory() were to return error and
NULL MemoryRegion, pc_dimm_check_memdev_is_busy() would crash
dereferencing NULL pointer in memory_region_is_mapped().
But if error is set and non NULL MemoryRegion is returned
then error_setg() will fail with "error already set" assertion
in error_setv()

To avoid above issues use typical error handling pattern
for property setters:

Error *local_error = NULL;
...
error_propagate(errp, local_err);

Reported-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-11 14:55:15 +02:00
Ilya Maximets
fff4e48ed5 vhost-user: verify that number of queues is less than MAX_QUEUE_NUM
Fix QEMU crash when -netdev vhost-user,queues=n is passed with number
of queues greater than MAX_QUEUE_NUM.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2016-03-11 14:55:15 +02:00
Denis V. Lunev
a0d06486b4 virtio-balloon: add 'available' counter
The patch for the kernel part is in linux-next already:
commit ac88e7c908b920866e529862f2b2f0129b254ab2
    Author: Igor Redko <redkoi@virtuozzo.com>
    Date:   Thu Feb 18 09:23:01 2016 +1100

    virtio_balloon: export 'available' memory to balloon statistics

    Add a new field, VIRTIO_BALLOON_S_AVAIL, to virtio_balloon memory
    statistics protocol, corresponding to 'Available' in /proc/meminfo.

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Igor Redko <redkoi@virtuozzo.com>
CC: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-11 14:55:15 +02:00
Marcel Apfelbaum
fc1769b758 hw/virtio: group virtio flags into an enum
Minimizes the possibility to assign
the same bit to different features.

Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2016-03-11 14:54:28 +02:00
Marcel Apfelbaum
631a438755 hw/virtio: fix double use of a virtio flag
Commits 1811e64c and a6df8adf use the same virtio feature bit 4
for different features.

Fix it by using different bits.

Reported-by: Laurent Vivier <lvivier@redhat.com>
Tested-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2016-03-11 14:54:28 +02:00
Ladi Prosek
4eae2a657d balloon: fix segfault and harden the stats queue
The segfault here is triggered by the driver notifying the stats queue
twice after adding a buffer to it. This effectively resets stats_vq_elem
back to NULL and QEMU crashes on the next stats timer tick in
balloon_stats_poll_cb.

This is a regression introduced in 51b19ebe43, although admittedly
the device assumed too much about the stats queue protocol even before
that commit. This commit adds a few more checks and ensures that the one
stats buffer gets deallocated on device reset.

Cc: qemu-stable@nongnu.org
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-11 14:54:28 +02:00
Michael S. Tsirkin
f203549108 acpi: add build_append_named_dword, returning an offset in buffer
This is a very limited form of support for runtime patching -
similar in functionality to what we can do with ACPI_EXTRACT
macros in python, but implemented in C.

This is to allow ACPI code direct access to data tables -
which is exactly what DataTableRegion is there for, except
no known windows release so far implements DataTableRegion.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-11 14:54:28 +02:00
Xiao Guangrong
3f3009c098 acpi: allow using object as offset for OperationRegion
Extend aml_operation_region() to use object as offset

Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-11 14:54:28 +02:00
Xiao Guangrong
9815cba502 acpi: add aml_concatenate()
It will be used by nvdimm acpi

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-11 14:54:28 +02:00
Xiao Guangrong
39b6dbd8d7 acpi: add aml_create_field()
It will be used by nvdimm acpi

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-11 14:54:27 +02:00
Dr. David Alan Gilbert
32c3db5b26 postcopy: Remove the x-
Postcopy seems to have survived a cycle with only a few fixes,
and Jiri has the current libvirt wired up and working
( https://www.redhat.com/archives/libvir-list/2016-March/msg00080.html )
so remove the experimental tag.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1457690016-9070-3-git-send-email-dgilbert@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-03-11 17:53:59 +05:30
Dr. David Alan Gilbert
a587a3fe6c postcopy: listen thread is never joined
We don't join the listen thread, it does its own cleanup.
Mark as detached not joinable.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1457690016-9070-2-git-send-email-dgilbert@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-03-11 17:53:59 +05:30
Denis V. Lunev
8646992279 migration: fix use-after-free in loadvm_postcopy_handle_run_bh
MigrationState is destroyed before we can come into bottom half.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
CC: Amit Shah <amit.shah@redhat.com>
CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <1457537708-8622-1-git-send-email-den@openvz.org>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-03-11 12:58:45 +05:30
Peter Xu
568b01caf3 migration: fix warning for source_return_path_thread
max_len is not necessary, while it brings a warning during compilation
when specify "-Wstack-usage=1000000". Replacing using sizeof().

Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1457503932-31763-1-git-send-email-peterx@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-03-11 12:58:37 +05:30
Thomas Huth
99b88c6d1f MAINTAINERS: Add entry for the include/hw/vfio/ folder
The headers in include/hw/vfio/ should be listed in the VFIO
section of the MAINTAINERS file.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-03-10 20:50:44 -07:00
Neo Jia
062ed5d8d6 vfio/pci: replace fixed string limit by g_strdup_printf
A trivial change to remove string limit by using g_strdup_printf

Tested-by: Neo Jia <cjia@nvidia.com>
Signed-off-by: Neo Jia <cjia@nvidia.com>
Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-03-10 20:50:43 -07:00
Alex Williamson
e593c0211b vfio/pci: Split out VGA setup
This could be setup later by device specific code, such as IGD
initialization.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-03-10 20:50:41 -07:00
Alex Williamson
e2e5ee9c56 vfio/pci: Fixup PCI option ROMs
Devices like Intel graphics are known to not only have bad checksums,
but also the wrong device ID.  This is not so surprising given that
the video BIOS is typically part of the system firmware image rather
that embedded into the device and needs to support any IGD device
installed into the system.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-03-10 20:50:39 -07:00
Alex Williamson
2d82f8a3cd vfio/pci: Convert all MemoryRegion to dynamic alloc and consistent functions
Match common vfio code with setup, exit, and finalize functions for
BAR, quirk, and VGA management.  VGA is also changed to dynamic
allocation to match the other MemoryRegions.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-03-10 20:50:38 -07:00
Alex Williamson
db0da029a1 vfio: Generalize region support
Both platform and PCI vfio drivers create a "slow", I/O memory region
with one or more mmap memory regions overlayed when supported by the
device. Generalize this to a set of common helpers in the core that
pulls the region info from vfio, fills the region data, configures
slow mapping, and adds helpers for comleting the mmap, enable/disable,
and teardown.  This can be immediately used by the PCI MSI-X code,
which needs to mmap around the MSI-X vector table.

This also changes VFIORegion.mem to be dynamically allocated because
otherwise we don't know how the caller has allocated VFIORegion and
therefore don't know whether to unreference it to destroy the
MemoryRegion or not.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-03-10 20:03:16 -07:00
Daniel P. Berrange
b16a44e13e osdep: remove use of socket_error() from all code
Now that QEMU wraps the Win32 sockets methods to automatically
set errno upon failure, there is no reason for callers to use
the socket_error() method. They can rely on accessing errno
even on Win32. Remove all use of socket_error() from general
code, leaving it as a static method in oslib-win32.c only.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-03-10 17:19:34 +00:00
Daniel P. Berrange
a2d96af4bb osdep: add wrappers for socket functions
The windows socket functions look identical to the normal POSIX
sockets functions, but instead of setting errno, the caller needs
to call WSAGetLastError(). QEMU has tried to deal with this
incompatibility by defining a socket_error() method that callers
must use that abstracts the difference between WSAGetLastError()
and errno.

This approach is somewhat error prone though - many callers of
the sockets functions are just using errno directly because it
is easy to forget the need use a QEMU specific wrapper. It is
not always immediately obvious that a particular function will
in fact call into Windows sockets functions, so the dev may not
even realize they need to use socket_error().

This introduces an alternative approach to portability inspired
by the way GNULIB fixes portability problems. We use a macro to
redefine the original socket function names to refer to a QEMU
wrapper function. The wrapper function calls the original Win32
sockets method and then sets errno from the WSAGetLastError()
value.

Thus all code can simply call the normal POSIX sockets APIs are
have standard errno reporting on error, even on Windows. This
makes the socket_error() method obsolete.

We also bring closesocket & ioctlsocket into this approach. Even
though they are non-standard Win32 names, we can't wrap the normal
close/ioctl methods since there's no reliable way to distinguish
between a file descriptor and HANDLE in Win32.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-03-10 17:19:07 +00:00
Daniel P. Berrange
08b758b482 char: remove qemu_chr_open_socket_fd method
The qemu_chr_open_socket_fd method takes care of either doing a
synchronous socket connect, or creating a listener socket. Part
of the work when creating the listener socket is to register a
watch for incoming clients. The caller of qemu_chr_open_socket_fd
may not want this watch created, as it might be doing a synchronous
wait for the first client. Rather than passing yet more parameters
into qemu_chr_open_socket_fd to let it handle this, just remove
the qemu_chr_open_socket_fd method an inline its functionality
into the caller. This allows for a clearer control flow and shorter
code.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-03-10 17:19:07 +00:00
Daniel P. Berrange
317856cac8 char: remove socket_try_connect method
The qemu_chr_open_socket_fd() method multiplexes three different
actions into one method. The socket_try_connect() method is one
of its callers, but it only ever want one specific action
performed. By inlining that action into socket_try_connect()
we see that there is not in fact any failure scenario, so there
is not even any reason for socket_try_connect to exist. Just
inline the asynchronous connection attempts directly at the
places that need them. This shortens & clarifies the code.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-03-10 17:19:07 +00:00
Daniel P. Berrange
f50dfe457f char: remove qemu_chr_finish_socket_connection method
The qemu_chr_finish_socket_connection method is multiplexing two
different actions into one method. Each caller of it though, only
wants one specific action. The code is shorter & clearer if we
thus remove the method and just inline the specific actions
where needed.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-03-10 17:19:07 +00:00
Paolo Bonzini
a589720567 io: implement socket watch for win32 using WSAEventSelect+select
On Win32 we cannot directly poll on socket handles. Instead we
create a Win32 event object and associate the socket handle with
the event. When the event signals readyness we then have to
use select to determine which events are ready. Creating Win32
events is moderately heavyweight, so we don't want todo it
every time we create a GSource, so this associates a single
event with a QIOChannel.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-03-10 17:19:07 +00:00
Daniel P. Berrange
30fd3e2790 io: remove checking of EWOULDBLOCK
Since we now canonicalize WSAEWOULDBLOCK into EAGAIN there is
no longer any need to explicitly check EWOULDBLOCK for Win32.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-03-10 17:19:05 +00:00
Daniel P. Berrange
de7971ffb9 io: use qemu_accept to ensure SOCK_CLOEXEC is set
The QIOChannelSocket code mistakenly uses the bare accept()
function which does not set SOCK_CLOEXEC.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-03-10 17:11:40 +00:00
Paolo Bonzini
b83b68a013 io: introduce qio_channel_create_socket_watch
Sockets are not in the same namespace as file descriptors on Windows.
As an initial step, introduce separate APIs for file descriptor and
socket watches.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-10 17:10:19 +00:00
Paolo Bonzini
e560d141ab io: pass HANDLE to g_source_add_poll on Win32
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-10 17:10:19 +00:00
Daniel P. Berrange
5151d23e65 io: fix copy+paste mistake in socket error message
s/write/read/ in the error message reported after
readmsg() fails

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-03-10 17:10:18 +00:00
Daniel P. Berrange
294bbbb425 io: assert errors before asserting content in I/O test
When checking the results of an I/O operation test, assert that
the error objects are NULL before asserting on the content. This
is found to give more useful indication of the problem when
diagnosing test failures.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-03-10 17:10:18 +00:00
Daniel P. Berrange
256920eb94 io: set correct error object in background reader test thread
The reader thread was accidentally setting the error pointer
intended for the writer thread. If both threads set errors
this would result in QEMU abort'ing due to the error already
being set.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-03-10 17:10:18 +00:00
Daniel P. Berrange
a9d5aed12d io: wait for incoming client in socket test
Exercise the GSource code for server sockets by calling
qio_channel_wait() prior to accepting the incoming client.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-03-10 17:10:18 +00:00
Daniel P. Berrange
abc981bf29 io: bind to socket before creating QIOChannelSocket
In the QIOChannelSocket test we create a socket file
descriptor and then try to create a QIOChannelSocket.
This works on Linux, but fails on Win32 because it is
not valid to call getsockname() on an unbound socket.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-03-10 17:10:18 +00:00
Daniel P. Berrange
5838d66e73 io: initialize sockets in test program
The win32 sockets layer requires that socket_init() is called
otherwise nothing will work.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-03-10 17:10:18 +00:00
Daniel P. Berrange
0a27af918b io: use bind() to check for IPv4/6 availability
Currently the test-io-channel-socket.c test uses getifaddrs
to see if an IPv4/6 address is present on any host NIC, as
a way to determine if IPv4/6 sockets can be used. This is
problematic because getifaddrs is not available on Win32.

Rather than testing indirectly via getifaddrs, just create
a socket and try to bind() to the loopback address instead.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-03-10 17:10:18 +00:00
Daniel P. Berrange
c619644067 osdep: fix socket_error() to work with Mingw64
Historically QEMU has had a socket_error() macro that was
defined to map to WSASocketError(). The os-win32.h header
file would define errno constants that mapped to the
WSA error constants. This worked fine with Mingw32 since
its header files never defined any errno values, nor did
it even provide an errno.h.  So callers of socket_error()
could match on traditional Exxxx constants and it would
all "just work".

With Mingw64 though, things work rather differently. First
there is an errno.h file which defines all the traditional
errno constants you'd expect from a UNIX platform. There
is then a winerror.h which defined the WSA error constants.
Crucially the WSAExxxx errno values in winerror.h do not
match the Exxxx errno values in error.h.

If QEMU had only imported winerror.h it would still work,
but the qemu/osdep.h file unconditionally imports errno.h.
So callers of socket_error() will get now WSAExxxx values
back and compare them to the Exxx constants. This will
always fail silently at runtime.

To solve this QEMU needs to stop assuming the WSAExxxx
constant values match the Exxx constant values. Thus the
socket_error() macro is turned into a small function that
re-maps WSAExxxx values into Exxx.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2016-03-10 17:10:17 +00:00
Alex Williamson
469002263a vfio: Wrap VFIO_DEVICE_GET_REGION_INFO
In preparation for supporting capability chains on regions, wrap
ioctl(VFIO_DEVICE_GET_REGION_INFO) so we don't duplicate the code for
each caller.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-03-10 09:39:07 -07:00
Alex Williamson
7df9381b7a vfio: Add sysfsdev property for pci & platform
vfio-pci currently requires a host= parameter, which comes in the
form of a PCI address in [domain:]<bus:slot.function> notation.  We
expect to find a matching entry in sysfs for that under
/sys/bus/pci/devices/.  vfio-platform takes a similar approach, but
defines the host= parameter to be a string, which can be matched
directly under /sys/bus/platform/devices/.  On the PCI side, we have
some interest in using vfio to expose vGPU devices.  These are not
actual discrete PCI devices, so they don't have a compatible host PCI
bus address or a device link where QEMU wants to look for it.  There's
also really no requirement that vfio can only be used to expose
physical devices, a new vfio bus and iommu driver could expose a
completely emulated device.  To fit within the vfio framework, it
would need a kernel struct device and associated IOMMU group, but
those are easy constraints to manage.

To support such devices, which would include vGPUs, that honor the
VFIO PCI programming API, but are not necessarily backed by a unique
PCI address, add support for specifying any device in sysfs.  The
vfio API already has support for probing the device type to ensure
compatibility with either vfio-pci or vfio-platform.

With this, a vfio-pci device could either be specified as:

-device vfio-pci,host=02:00.0

or

-device vfio-pci,sysfsdev=/sys/devices/pci0000:00/0000:00:1c.0/0000:02:00.0

or even

-device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:02:00.0

When vGPU support comes along, this might look something more like:

-device vfio-pci,sysfsdev=/sys/devices/virtual/intel-vgpu/vgpu0@0000:00:02.0

NB - This is only a made up example path

The same change is made for vfio-platform, specifying sysfsdev has
precedence over the old host option.

Tested-by: Eric Auger <eric.auger@linaro.org>
Reviewed-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-03-10 09:39:07 -07:00
Cornelia Huck
75cfb3bb41 s390x/cpu: use g_new0
Let's use g_new0 to allocate cpu_states.

Suggested-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-03-10 12:02:02 +01:00
Janosch Frank
8b8a61ad8c s390x: Introduce S390MachineClass
As we now have the new machine definitions, that let us disable/enable
machine options more easily, we need a way to save them and make them
publicly available.

The new s390-virtio-ccw.h header exports the s390 ccw machine state
and class, so they can be easily used in other C files.

Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-03-10 10:37:16 +01:00
Janosch Frank
4fca654872 s390x: Introduce machine definition macros
Most of the machine definition code looks the same between different
machine versions. The new DEFINE_CCW_MACHINE macro makes defining a
new machine easier by inserting standard machine version
definitions. This also makes it possible to propagate values between
machine versions.

The patch is inspired by code from hw/ppc/spapr.c

Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-03-10 10:37:16 +01:00
Eugene (jno) Dvurechenski
3a3c752f0b pc-bios/s390-ccw: fix old bug in ptr increment
We need to increment by the size of the structure, whereas 'ns' is 'uint8_t *'.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Eugene (jno) Dvurechenski <jno@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-03-10 10:37:16 +01:00
Matthew Rosato
a006b67fe4 s390x/cpu: Allow hotplug of CPUs
Implement cpu hotplug routine and add the machine hook.

Signed-off-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Message-Id: <1457112875-5209-8-git-send-email-mjrosato@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-03-10 10:37:15 +01:00
Matthew Rosato
96b1a8bb55 s390x/cpu: Add error handling to cpu creation
Check for and propogate errors during s390 cpu creation.

Signed-off-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
Message-Id: <1457112875-5209-7-git-send-email-mjrosato@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-03-10 10:37:15 +01:00
Matthew Rosato
502edbf834 s390x/cpu: Add CPU property links
Link each CPUState as property machine/cpu[n] during initialization.
Add a hotplug handler to s390-virtio-ccw machine and set the
state during plug.

Signed-off-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Message-Id: <1457112875-5209-6-git-send-email-mjrosato@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-03-10 10:37:15 +01:00
Matthew Rosato
25637d31f2 s390x/cpu: Tolerate max_cpus
Once hotplug is enabled, interrupts may come in for CPUs
with an address > smp_cpus.  Allocate for this and allow
search routines to look beyond smp_cpus.

Signed-off-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
Message-Id: <1457112875-5209-5-git-send-email-mjrosato@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-03-10 10:37:15 +01:00
Matthew Rosato
c6644fc88b s390x/cpu: Get rid of side effects when creating a vcpu
In preparation for hotplug, defer some CPU initialization
until the device is actually being realized, including
cpu_exec_init.

Signed-off-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Message-Id: <1457112875-5209-4-git-send-email-mjrosato@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-03-10 10:37:15 +01:00
Matthew Rosato
ef3027affc s390x/cpu: Set initial CPU state in common routine
Both initial and hotplugged CPUs need to set the same initial
state.

Signed-off-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Message-Id: <1457112875-5209-3-git-send-email-mjrosato@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-03-10 10:37:15 +01:00
Matthew Rosato
d2eae20790 s390x/cpu: Cleanup init in preparation for hotplug
Ensure a valid cpu_model is set upfront by setting the
default value directly into the MachineState when none is
specified.  This is needed to ensure hotplugged CPUs share
the same cpu_model.

Signed-off-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Message-Id: <1457112875-5209-2-git-send-email-mjrosato@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-03-10 10:37:15 +01:00
Peter Maydell
a648c13738 Merge remote-tracking branch 'remotes/kraxel/tags/pull-ui-20160309-1' into staging
add linux evdev support, vnc and console fixes.

# gpg: Signature made Wed 09 Mar 2016 09:02:47 GMT using RSA key ID D3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"

* remotes/kraxel/tags/pull-ui-20160309-1:
  ui/console: add escape sequence \e[5, 6n
  input-linux: add switch to enable auto-repeat events
  input-linux: add option to toggle grab on all devices
  input: linux evdev support
  vnc: send cursor when a new client is connecting

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-10 02:51:14 +00:00
Ren Kimura
58aa7d8e44 ui/console: add escape sequence \e[5, 6n
Add support of escape sequence "\e[5n" and "\e[6n" to console.
"\e[5n" reports status of console and it always succeed
in virtual console.
"\e[6n" reports now cursor position in console.

Signed-off-by: Ren Kimura <rkx1209dev@gmail.com>
Message-id: 1457466681-7714-2-git-send-email-rkx1209dev@gmail.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-03-09 09:35:56 +01:00
Peter Maydell
4ba364b472 Merge remote-tracking branch 'remotes/thibault/tags/samuel-thibault' into staging
Add Samuel Thibault as slirp maintainer

# gpg: Signature made Tue 08 Mar 2016 20:43:01 GMT using RSA key ID FB6B2F1D
# gpg: Good signature from "Samuel Thibault <samuel.thibault@gnu.org>"
# gpg:                 aka "Samuel Thibault <sthibault@debian.org>"
# gpg:                 aka "Samuel Thibault <samuel.thibault@inria.fr>"
# gpg:                 aka "Samuel Thibault <samuel.thibault@labri.fr>"
# gpg:                 aka "Samuel Thibault <samuel.thibault@ens-lyon.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 900C B024 B679 31D4 0F82  304B D017 8C76 7D06 9EE6
#      Subkey fingerprint: F632 74CD C630 0873 CB3D  29D9 E3E5 1CE8 FB6B 2F1D

* remotes/thibault/tags/samuel-thibault:
  MAINTAINERS: Add Samuel Thibault as slirp maintainer

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-09 05:14:55 +00:00
Peter Maydell
8519c8e073 Merge remote-tracking branch 'remotes/amit-migration/tags/migration-for-2.6-6' into staging
migration:
* add avx2 instruction optimization, speeds up zero-page checking on
  compatible architectures and compilers (gcc 4.9+)
* add additional postcopy stats to 'info migrate' output

# gpg: Signature made Tue 08 Mar 2016 11:29:48 GMT using RSA key ID 854083B6
# gpg: Good signature from "Amit Shah <amit@amitshah.net>"
# gpg:                 aka "Amit Shah <amit@kernel.org>"
# gpg:                 aka "Amit Shah <amitshah@gmx.net>"

* remotes/amit-migration/tags/migration-for-2.6-6:
  cutils: add avx2 instruction optimization
  configure: detect ifunc and avx2 attribute
  Postcopy: Fix sync count in info migrate

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-09 01:07:16 +00:00
Peter Maydell
3293680dc7 Merge remote-tracking branch 'remotes/kraxel/tags/pull-fw-cfg-20160308-1' into staging
acpi: add fw_cfg device node to dsdt

# gpg: Signature made Tue 08 Mar 2016 11:15:42 GMT using RSA key ID D3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"

* remotes/kraxel/tags/pull-fw-cfg-20160308-1:
  tests: update acpi test data
  fw_cfg: document ACPI device node information
  acpi: arm: add fw_cfg device node to dsdt
  acpi: pc: add fw_cfg device node to dsdt
  pc: fw_cfg: move ioport base constant to pc.h
  fw_cfg: expose control register size in fw_cfg.h

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-09 00:44:43 +00:00
Peter Maydell
5763795f93 Merge remote-tracking branch 'remotes/amit-virtio-rng/tags/rng-for-2.6-2' into staging
rng: use simpleq instead of gslist

# gpg: Signature made Tue 08 Mar 2016 10:51:23 GMT using RSA key ID 854083B6
# gpg: Good signature from "Amit Shah <amit@amitshah.net>"
# gpg:                 aka "Amit Shah <amit@kernel.org>"
# gpg:                 aka "Amit Shah <amitshah@gmx.net>"

* remotes/amit-virtio-rng/tags/rng-for-2.6-2:
  rng: switch request queue to QSIMPLEQ

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-09 00:21:17 +00:00
Samuel Thibault
eda509fa0a MAINTAINERS: Add Samuel Thibault as slirp maintainer
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Acked-by: Jan Kiszka <jan.kiszka@siemens.com>
2016-03-08 21:39:04 +01:00
Liang Li
28b90d9c19 cutils: add avx2 instruction optimization
buffer_find_nonzero_offset() is a hot function during live migration.
Now it use SSE2 instructions for optimization. For platform supports
AVX2 instructions, use AVX2 instructions for optimization can help
to improve the performance of buffer_find_nonzero_offset() about 30%
comparing to SSE2.

Live migration can be faster with this optimization, the test result
shows that for an 8GiB RAM idle guest just boots, this patch can help
to shorten the total live migration time about 6%.

This patch use the ifunc mechanism to select the proper function when
running, for platform supports AVX2, execute the AVX2 instructions,
else, execute the original instructions.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Suggested-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1457416397-26671-3-git-send-email-liang.z.li@intel.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-03-08 16:53:26 +05:30
Liang Li
99f2dbd343 configure: detect ifunc and avx2 attribute
Detect if the compiler can support the ifun and avx2, if so, set
CONFIG_AVX2_OPT which will be used to turn on the avx2 instruction
optimization.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Liang Li <liang.z.li@intel.com>
Message-Id: <1457416397-26671-2-git-send-email-liang.z.li@intel.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-03-08 16:53:26 +05:30
Dr. David Alan Gilbert
614e8018ed Postcopy: Fix sync count in info migrate
I'd missed the sync count off in the postcopy case.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Message-id: 1456394631-18010-1-git-send-email-dgilbert@redhat.com
Message-Id: <1456394631-18010-1-git-send-email-dgilbert@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-03-08 16:52:27 +05:30
Gerd Hoffmann
a6ccabd676 input-linux: add switch to enable auto-repeat events
Enable with "-input-linux /dev/input/${device},repeat=on".

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 1457087116-4379-4-git-send-email-kraxel@redhat.com
2016-03-08 12:20:11 +01:00
Gerd Hoffmann
46d921bebe input-linux: add option to toggle grab on all devices
Maintain a list of all input devices.  Add an option to make grab
work across all devices (so toggling grab on the keybard can switch
over the mouse too).

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 1457087116-4379-3-git-send-email-kraxel@redhat.com
2016-03-08 12:20:11 +01:00
Gerd Hoffmann
e0d2bd5195 input: linux evdev support
This patch adds support for reading input events directly from linux
evdev devices and forward them to the guest.  Unlike virtio-input-host
which simply passes on all events to the guest without looking at them
this will interpret the events and feed them into the qemu input
subsystem.

Therefore this is limited to what the qemu input subsystem and the
emulated input devices are able to handle.  Also there is no support for
absolute coordinates (tablet/touchscreen).  So we are talking here about
basic mouse and keyboard support.

The advantage is that it'll work without virtio-input drivers in the
guest, the events are delivered to the usual ps/2 or usb input devices
(depending on what the machine happens to have).  And for keyboards
qemu is able to switch the keyboard between guest and host on hotkey.
The hotkey is hard-coded for now (both control keys), initialy the
guest owns the keyboard.

Probably most useful when assigning vga devices with vfio and using a
physical monitor instead of vnc/spice/gtk as guest display.

Usage:  Add '-input-linux /dev/input/event<nr>' to the qemu command
line.  Note that udev has rules which populate /dev/input/by-{id,path}
with static names, which might be more convinient to use.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 1457087116-4379-2-git-send-email-kraxel@redhat.com
2016-03-08 12:20:11 +01:00
Gerd Hoffmann
a60c785608 tests: update acpi test data
using tests/acpi-test-data/rebuild-expected-aml.sh

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-03-08 12:15:27 +01:00
Gabriel L. Somlo
36a43ea83b fw_cfg: document ACPI device node information
Signed-off-by: Gabriel Somlo <somlo@cmu.edu>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Marc Marí <markmb@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-id: 1455906029-25565-6-git-send-email-somlo@cmu.edu
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-03-08 12:15:22 +01:00
Gabriel L. Somlo
70bee80d6b acpi: arm: add fw_cfg device node to dsdt
Add a fw_cfg device node to the ACPI DSDT. This is mostly
informational, as the authoritative fw_cfg MMIO region(s)
are listed in the Device Tree. However, since we are building
ACPI tables, we might as well be thorough while at it...

Signed-off-by: Gabriel Somlo <somlo@cmu.edu>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Tested-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Marc Marí <markmb@redhat.com>
Reviewed-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-id: 1455906029-25565-5-git-send-email-somlo@cmu.edu
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-03-08 12:15:15 +01:00
Gabriel L. Somlo
e2ec75685c acpi: pc: add fw_cfg device node to dsdt
Add a fw_cfg device node to the ACPI DSDT. While the guest-side
firmware can't utilize this information (since it has to access
the hard-coded fw_cfg device to extract ACPI tables to begin with),
having fw_cfg listed in ACPI will help the guest kernel keep a more
accurate inventory of in-use IO port regions.

Signed-off-by: Gabriel Somlo <somlo@cmu.edu>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Marc Marí <markmb@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-id: 1455906029-25565-4-git-send-email-somlo@cmu.edu
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-03-08 12:15:09 +01:00
Gabriel L. Somlo
305ae88895 pc: fw_cfg: move ioport base constant to pc.h
Move BIOS_CFG_IOPORT define from pc.c to pc.h, and rename
it to FW_CFG_IO_BASE.

Cc: Marc Marí <markmb@redhat.com>
Signed-off-by: Gabriel Somlo <somlo@cmu.edu>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Marc Marí <markmb@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-id: 1455906029-25565-3-git-send-email-somlo@cmu.edu
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-03-08 12:14:49 +01:00
Peter Maydell
d1cc881d54 Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging
# gpg: Signature made Tue 08 Mar 2016 07:46:08 GMT using RSA key ID 398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F  3562 EF04 965B 398D 6211

* remotes/jasowang/tags/net-pull-request:
  net: check packet payload length
  filter-buffer: Add status_changed callback processing
  filter: Add 'status' property for filter object
  rocker: allow user to specify rocker world by property
  rocker: add name field into WorldOps ale let world specify its name
  rocker: return -ENOMEM in case of some world alloc fails
  rocker: forbid to change world type
  net: netmap: probe netmap interface for virtio-net header
  net: simplify net_init_tap_one logic
  MAINTAINERS: Add entries for include/net/ files
  net: filter: correctly remove filter from the list during finalization
  net: ne2000: check ring buffer control registers

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-08 10:25:50 +00:00
Gabriel L. Somlo
ce9a2aa372 fw_cfg: expose control register size in fw_cfg.h
Expose the size of the control register (FW_CFG_CTL_SIZE) in fw_cfg.h.
Add comment to fw_cfg_io_realize() pointing out that since the
8-bit data register is always subsumed by the 16-bit control
register in the port I/O case, we use the control register width
as the *total* width of the (classic, non-DMA) port I/O region reserved
for the device.

Cc: Marc Marí <markmb@redhat.com>
Signed-off-by: Gabriel Somlo <somlo@cmu.edu>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Marc Marí <markmb@redhat.com>
Message-id: 1455906029-25565-2-git-send-email-somlo@cmu.edu
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-03-08 10:46:30 +01:00
Frediano Ziglio
91ec41dc3f vnc: send cursor when a new client is connecting
If you have hardware cursor and you are reconnecting the VNC client
you need to send the cursor. Failing to do so make the cursor invisible
till is changed.

Signed-off-by: Frediano Ziglio <fziglio@redhat.com>
Message-id: 1456929142-14033-1-git-send-email-fziglio@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-03-08 10:45:01 +01:00
Prasad J Pandit
362786f14a net: check packet payload length
While computing IP checksum, 'net_checksum_calculate' reads
payload length from the packet. It could exceed the given 'data'
buffer size. Add a check to avoid it.

Reported-by: Liu Ling <liuling-it@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-03-08 15:34:18 +08:00
zhanghailiang
f1b2bc601a filter-buffer: Add status_changed callback processing
While the status of filter-buffer changing from 'on' to 'off',
it need to release all the buffered packets, and delete the related
timer, while switch from 'off' to 'on', it need to resume the release
packets timer.

Here, we extract the process of setup timer into a new helper,
which will be used in the new status_changed callback.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-03-08 15:34:18 +08:00
zhanghailiang
338d3f415e filter: Add 'status' property for filter object
With this property, users can control if this filter is 'on'
or 'off'. The default behavior for filter is 'on'.

For some types of filters, they may need to react to status changing,
So here, we introduced status changing callback/notifier for filter class.

We will skip the disabled ('off') filter when delivering packets in net layer.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-03-08 15:34:18 +08:00
Jiri Pirko
9fe7101f1d rocker: allow user to specify rocker world by property
Add property to specify rocker world. All ports will be assigned to this
world.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-03-08 15:34:18 +08:00
Jiri Pirko
031143c8d5 rocker: add name field into WorldOps ale let world specify its name
Also use this in world_name getter function.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-03-08 15:34:18 +08:00
Jiri Pirko
39e0c4f47d rocker: return -ENOMEM in case of some world alloc fails
Until now, 0 is returned in this error case. Fix it ro return -ENOMEM.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-03-08 15:34:18 +08:00
Jiri Pirko
0ab9cd9a4b rocker: forbid to change world type
Port to world assignment should be permitted only by qemu user. Driver
should not be able to do it, so forbid that possibility.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-03-08 15:34:18 +08:00
Vincenzo Maffione
9fbad2ca36 net: netmap: probe netmap interface for virtio-net header
Previous implementation of has_ufo, has_vnet_hdr, has_vnet_hdr_len, etc.
did not really probe for virtio-net header support for the netmap
interface attached to the backend. These callbacks were correct for
VALE ports, but incorrect for hardware NICs, pipes, monitors, etc.

This patch fixes the implementation to work properly with all kinds
of netmap ports.

Signed-off-by: Vincenzo Maffione <v.maffione@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-03-08 15:34:18 +08:00
Paolo Bonzini
3a2d44f6dd net: simplify net_init_tap_one logic
net_init_tap_one receives in vhostfdname a fd name from vhostfd= or
vhostfds=, or NULL if there is no vhostfd=/vhostfds=.  It is simpler
to just check vhostfdname, than it is to check for vhostfd= or
vhostfds=.  This also calms down Coverity, which otherwise thinks
that monitor_fd_param could dereference a NULL vhostfdname.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-03-08 15:34:09 +08:00
Thomas Huth
d24b2b1ccc MAINTAINERS: Add entries for include/net/ files
The include/net/ files correspond to the files in the net/ directory,
thus there should be corresponding entries in the MAINTAINERS file.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-03-08 15:34:09 +08:00
Jason Wang
5dd2d45e34 net: filter: correctly remove filter from the list during finalization
Qemu may crash when we want to add two filters on the same netdev but
the initialization of second fails (e.g missing parameters):

./qemu-system-x86_64 -netdev user,id=un0 \
 -object filter-buffer,id=f0,netdev=un0,interval=10 \
 -object filter-buffer,id=f1,netdev=un0
Segmentation fault (core dumped)

This is because we don't check whether or not the filter was in the
list of netdev. This patch fixes this.

Cc: Yang Hongyang <hongyang.yang@easystack.cn>
Reviewed-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-03-08 15:34:09 +08:00
Prasad J Pandit
415ab35a44 net: ne2000: check ring buffer control registers
Ne2000 NIC uses ring buffer of NE2000_MEM_SIZE(49152)
bytes to process network packets. Registers PSTART & PSTOP
define ring buffer size & location. Setting these registers
to invalid values could lead to infinite loop or OOB r/w
access issues. Add check to avoid it.

Reported-by: Yang Hongke <yanghongke@huawei.com>
Tested-by: Yang Hongke <yanghongke@huawei.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-03-08 15:34:09 +08:00
Ladi Prosek
443590c204 rng: switch request queue to QSIMPLEQ
QSIMPLEQ supports appending to tail in O(1) and is intrusive so
it doesn't require extra memory allocations for the bookkeeping
data.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Message-Id: <1457010971-24771-1-git-send-email-lprosek@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-03-08 12:54:14 +05:30
Peter Maydell
97556fe80e Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
* RAMBlock vs. MemoryRegion cleanups from Fam
* mru_section optimization from Fam
* memory.txt improvements from Peter and Xiaoqiang
* i8257 fix from Hervé
* -daemonize fix
* Cleanups and small fixes from Alex, Praneith, Wei

# gpg: Signature made Mon 07 Mar 2016 17:08:59 GMT using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"

* remotes/bonzini/tags/for-upstream:
  scsi-bus: Remove tape command from scsi_req_xfer
  kvm/irqchip: use bitmap utility for gsi tracking
  MAINTAINERS: Add entry for include/sysemu/kvm*.h
  doc/memory.txt: correct description of MemoryRegionOps fields
  doc/memory.txt: correct a logic error
  icount: possible options for sleep are on or off
  exec: Introduce AddressSpaceDispatch.mru_section
  exec: Factor out section_covers_addr
  exec: Pass RAMBlock pointer to qemu_ram_free
  memory: Drop MemoryRegion.ram_addr
  memory: Implement memory_region_get_ram_addr with mr->ram_block
  memory: Move assignment to ram_block to memory_region_init_*
  exec: Return RAMBlock pointer from allocating functions
  i8257: fix Terminal Count status
  log: do not log if QEMU is daemonized but without -D

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-08 04:53:37 +00:00
Alex Pyrgiotis
4792b7e9d5 scsi-bus: Remove tape command from scsi_req_xfer
Remove the RECOVER_BUFFERED_DATA command from the list of commands that
are handled by scsi_req_xfer(). Given that this command is
tape-specific, it should be handled only by scsi_stream_req_xfer().

Signed-off-by: Alex Pyrgiotis <apyrgio@arrikto.com>

Message-Id: <1457365822-22435-1-git-send-email-apyrgio@arrikto.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-07 17:56:23 +01:00
Wei Yang
8269fb7082 kvm/irqchip: use bitmap utility for gsi tracking
By using utilities in bitops and bitmap, this patch tries to make it more
friendly to audience. No functional change.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Message-Id: <1457229445-25954-1-git-send-email-richard.weiyang@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-07 15:18:22 +01:00
Thomas Huth
a95e9a485b MAINTAINERS: Add entry for include/sysemu/kvm*.h
The include/sysemu/kvm*.h header files should be part of
the overall KVM section.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1456403605-26587-1-git-send-email-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-07 13:26:38 +01:00
Peter Maydell
ef00bdaf8c doc/memory.txt: correct description of MemoryRegionOps fields
Probably what happened was that when the API was being designed it
started off with an 'aligned' field, and then later the field name
and semantics were changed but the docs weren't updated to match.

Similarly, cpu_register_io_memory() does not exist anymore, so
clarify the documentation for .old_mmio.

Reported-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-07 13:26:38 +01:00
xiaoqiang zhao
8210f5f6f5 doc/memory.txt: correct a logic error
In the regions overlap example, region B has a higher priority thus
should has a larger priority number than C.

Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Message-Id: <1456476051-15121-1-git-send-email-zxq_yx_007@163.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-07 13:26:38 +01:00
Pranith Kumar
778d9f9b25 icount: possible options for sleep are on or off
icount sleep takes on or off as options. A few places mention sleep=no
which is not accepted. This patch corrects them.

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <1456499811-16819-1-git-send-email-bobby.prani@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-07 13:26:38 +01:00
Fam Zheng
729633c2bc exec: Introduce AddressSpaceDispatch.mru_section
Under heavy workloads the lookup will likely end up with the same
MemoryRegionSection from last time. Using a pointer to cache the result,
like ram_list.mru_block, significantly reduces cost of
address_space_translate.

During address space topology update, as->dispatch will be reallocated
so the pointer is invalidated automatically.

Perf reports a visible drop on the cpu usage, because phys_page_find is
not called.  Before:

   2.35%  qemu-system-x86_64       [.] phys_page_find
   0.97%  qemu-system-x86_64       [.] address_space_translate_internal
   0.95%  qemu-system-x86_64       [.] address_space_translate
   0.55%  qemu-system-x86_64       [.] address_space_lookup_region

After:

   0.97%  qemu-system-x86_64       [.] address_space_translate_internal
   0.97%  qemu-system-x86_64       [.] address_space_lookup_region
   0.84%  qemu-system-x86_64       [.] address_space_translate

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <1456813104-25902-8-git-send-email-famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-07 13:26:37 +01:00
Fam Zheng
29cb533d8c exec: Factor out section_covers_addr
This will be shared by the next patch.

Also add a comment explaining the unobvious condition on "size.hi".

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <1456813104-25902-7-git-send-email-famz@redhat.com>
[Small change to the comment. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-07 13:26:37 +01:00
Fam Zheng
f1060c55bf exec: Pass RAMBlock pointer to qemu_ram_free
The only caller now knows exactly which RAMBlock to free, so it's not
necessary to do the lookup.

Reviewed-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <1456813104-25902-6-git-send-email-famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-07 13:26:37 +01:00
Fam Zheng
8e41fb63c5 memory: Drop MemoryRegion.ram_addr
All references to mr->ram_addr are replaced by
memory_region_get_ram_addr(mr) (except for a few assertions that are
replaced with mr->ram_block).

Reviewed-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <1456813104-25902-5-git-send-email-famz@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-07 13:26:29 +01:00
Fam Zheng
7ebb2745ac memory: Implement memory_region_get_ram_addr with mr->ram_block
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <1456813104-25902-4-git-send-email-famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-07 13:18:28 +01:00
Fam Zheng
0a75601853 memory: Move assignment to ram_block to memory_region_init_*
We don't force "const" qualifiers with pointers in QEMU, but it's still
good to keep a clean function interface. Assigning to mr->ram_block is
in this sense ugly - one initializer mutating its owning object's state.

Move it to memory_region_init_*, where mr->ram_addr is assigned.

Reviewed-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <1456813104-25902-3-git-send-email-famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-07 13:18:28 +01:00
Fam Zheng
528f46af6e exec: Return RAMBlock pointer from allocating functions
Previously we return RAMBlock.offset; now return the pointer to the
whole structure.

ram_block_add returns void now, error is completely passed with errp.

Reviewed-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <1456813104-25902-2-git-send-email-famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-07 13:18:28 +01:00
Hervé Poussineau
bb8f32c031 i8257: fix Terminal Count status
When a DMA transfer is done (ie all bytes have been transfered), the corresponding
Terminal Count bit must be set in the status register.
This bit is already cleared in i8257_read_cont and i8257_write_cont when required.

This fixes (at least) floppy transfer in IBM 40p firmware, which checks in DMA
controller if everything went fine.

Signed-off-by: Hervé Poussineau <hpoussin@reactos.org>
Message-Id: <1456404332-31556-1-git-send-email-hpoussin@reactos.org>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-07 13:18:28 +01:00
Paolo Bonzini
c586eac336 log: do not log if QEMU is daemonized but without -D
Commit 96c33a4 ("log: Redirect stderr to logfile if deamonized",
2016-02-22) wanted to move stderr of a daemonized QEMU to the file
specified with -D.

However, if -D was not passed, the patch had the side effect of not
redirecting stderr to /dev/null.  This happened because qemu_logfile
was set to stderr rather than the expected value of NULL.  The fix
is simply in the "if" condition of do_qemu_set_log; the "if" for
closing the file is also changed to match.

Reported-by: Jan Tomko <jtomko@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-07 13:18:28 +01:00
Peter Maydell
1464ad45cd Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2016-03-04' into staging
QAPI patches for 2016-03-04

# gpg: Signature made Sat 05 Mar 2016 09:47:19 GMT using RSA key ID EB918653
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>"
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>"

* remotes/armbru/tags/pull-qapi-2016-03-04:
  qapi: Drop useless 'data' member of unions
  chardev: Drop useless ChardevDummy type
  qapi: Avoid use of 'data' member of QAPI unions
  ui: Shorten references into InputEvent
  util: Shorten references into SocketAddress
  chardev: Shorten references into ChardevBackend
  qapi: Update docs to match recent generator changes
  qapi-visit: Expose visit_type_FOO_members()
  qapi: Rename 'fields' to 'members' in generated C code
  qapi: Rename 'fields' to 'members' in generator
  qapi-dealloc: Reduce use outside of generated code
  qmp-shell: fix pretty printing of JSON responses

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-06 11:53:27 +00:00
Eric Blake
48eb62a74f qapi: Drop useless 'data' member of unions
We started moving away from the use of the 'void *data' member
in the C union corresponding to a QAPI union back in commit
544a373; recent commits have gotten rid of other uses.  Now
that it is completely unused, we can remove the member itself
as well as the FIXME comment.  Update the testsuite to drop the
negative test union-clash-data.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1457021813-10704-11-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-03-05 10:42:06 +01:00
Eric Blake
b1918fbb1c chardev: Drop useless ChardevDummy type
Commit d0d7708b made ChardevDummy be an empty wrapper type around
ChardevCommon.  But there is no technical reason for this indirection,
so simplify the code by directly using the base type.

Also change the fallback assignment to assign u.null rather than
u.data, since a future patch will remove the data member of the C
struct generated for QAPI unions.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1457106160-23614-1-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-03-05 10:42:03 +01:00
Eric Blake
10f759079e qapi: Avoid use of 'data' member of QAPI unions
QAPI code generators currently create a 'void *data' member as
part of the anonymous union embedded in the C struct corresponding
to a QAPI union.  However, directly assigning to this member of
the union feels a bit fishy, when we can assign to another member
of the struct instead.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1457021813-10704-9-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-03-05 10:41:58 +01:00
Eric Blake
b5a1b44318 ui: Shorten references into InputEvent
An upcoming patch will alter how simple unions, like InputEvent, are
laid out, which will impact all lines of the form 'evt->u.XXX'
(expanding it to the longer 'evt->u.XXX.data').  For better
legibility in that patch, and less need for line wrapping, it's better
to use a temporary variable to reduce the effect of a layout change to
just the variable initializations, rather than every reference within
an InputEvent.

There was one instance in hid.c:hid_pointer_event() where the code
was referring to evt->u.rel inside the case label where evt->u.abs
is the correct name; thankfully, both members of the union have the
same type, so it happened to work, but it is now cleaner.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1457021813-10704-8-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-03-05 10:41:55 +01:00
Eric Blake
0399293e5b util: Shorten references into SocketAddress
An upcoming patch will alter how simple unions, like SocketAddress,
are laid out, which will impact all lines of the form 'addr->u.XXX'
(expanding it to the longer 'addr->u.XXX.data').  For better
legibility in that patch, and less need for line wrapping, it's better
to use a temporary variable to reduce the effect of a layout change to
just the variable initializations, rather than every reference within
a SocketAddress.  Also, take advantage of some C99 initialization where
it makes sense (simplifying g_new0() to g_new()).

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1457021813-10704-7-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-03-05 10:41:52 +01:00
Eric Blake
f194a1ae53 chardev: Shorten references into ChardevBackend
An upcoming patch will alter how simple unions, like ChardevBackend,
are laid out, which will impact all lines of the form 'backend->u.XXX'
(expanding it to the longer 'backend->u.XXX.data').  For better
legibility in that patch, and less need for line wrapping, it's better
to use a temporary variable to reduce the effect of a layout change to
just the variable initializations, rather than every reference within
a ChardevBackend.  It doesn't hurt that this also makes the code more
consistent: some clients touched here already had a temporary variable
but weren't using it.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-By: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1457021813-10704-6-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-03-05 10:41:47 +01:00
Eric Blake
9ee86b8526 qapi: Update docs to match recent generator changes
Several commits have been changing the generator, but not updating
the docs to match:
- The implicit tag member is named "type", not "kind".  Screwed up in
commit 39a1815.
- Commit 9f08c8ec made list types lazy, and thereby dropped
UserDefOneList if nothing explicitly uses the list type.
- Commit 51e72bc1 switched the parameter order with 'name' occurring
earlier.
- Commit e65d89bf changed the layout of UserDefOneList.
- Prefer the term 'member' over 'field'.
- We now expose visit_type_FOO_members() for objects.
- etc.

Rework the examples to show slightly more output (we don't want to
show too much; that's what the testsuite is for), and regenerate the
output to match all recent changes.  Also, rearrange output to show
.h files before .c (understanding the interface first often makes
the implementation easier to follow).

Reported-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1457021813-10704-5-git-send-email-eblake@redhat.com>
2016-03-05 10:41:16 +01:00
Eric Blake
4d91e9115c qapi-visit: Expose visit_type_FOO_members()
Dan Berrange reported a case where he needs to work with a
QCryptoBlockOptions union type using the OptsVisitor, but only
visit one of the branches of that type (the discriminator is not
visited directly, but learned externally).  When things were
boxed, it was easy: just visit the variant directly, which took
care of both allocating the variant and visiting its members, then
store that pointer in the union type.  But now that things are
unboxed, we need a way to visit the members without allocation,
done by exposing visit_type_FOO_members() to the user.

Before the patch, we had quite a bit of code associated with
object_members_seen to make sure that a declaration of the helper
was in scope before any use of the function.  But now that the
helper is public and declared in the header, the .c file no
longer needs to worry about topological sorting (the helper is
always in scope), which leads to some nice cleanups.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1457021813-10704-4-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-03-05 10:41:13 +01:00
Eric Blake
c81200b014 qapi: Rename 'fields' to 'members' in generated C code
C types and JSON objects don't have fields, but members.  We
shouldn't gratuitously invent terminology.  This patch is a
strict renaming of static genarated functions, plus the naming
of the dummy filler member for empty structs, before the next
patch exposes some of that naming to the rest of the code base.

Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1457021813-10704-3-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-03-05 10:41:09 +01:00
Eric Blake
14f00c6c49 qapi: Rename 'fields' to 'members' in generator
C types and JSON objects don't have fields, but members.  We
shouldn't gratuitously invent terminology.  This patch is a
strict renaming of generator code internals (including testsuite
comments), before later patches rename C interfaces.

No change to generated code with this patch.

Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1457021813-10704-2-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-03-05 10:40:52 +01:00
Eric Blake
96a1616c85 qapi-dealloc: Reduce use outside of generated code
No need to roll our own use of the dealloc visitors when we can
just directly use the qapi_free_FOO() functions that do what we
want in one line.

In net.c, inline net_visit() into its remaining lone caller.

After this patch, test-visitor-serialization.c is the only
non-generated file that needs to use a dealloc visitor, because
it is testing low level aspects of the visitor interface.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1456262075-3311-2-git-send-email-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-03-04 17:16:32 +01:00
Daniel P. Berrange
e55250c6cb qmp-shell: fix pretty printing of JSON responses
Pretty printing of JSON responses is important to be able to understand
large responses from query commands in particular. Unfortunately this
was broken during the addition of the verbose flag in

  commit 1ceca07e48
  Author: John Snow <jsnow@redhat.com>
  Date:   Wed Apr 29 15:14:04 2015 -0400

    scripts: qmp-shell: Add verbose flag

This is because that change turned the python data structure into a
formatted JSON string before the pretty print was given it. So we're
just pretty printing a string, which is a no-op.

The original pretty printer would output python objects.

(QEMU) query-chardev
{   u'return': [   {   u'filename': u'vc',
                       u'frontend-open': False,
                       u'label': u'parallel0'},
                   {   u'filename': u'vc',
                       u'frontend-open': True,
                       u'label': u'serial0'},
                   {   u'filename': u'unix:/tmp/qemp,server',
                       u'frontend-open': True,
                       u'label': u'compat_monitor0'}]}

This fixes the problem by switching to outputting pretty formatted JSON
text instead. This has the added benefit that the pretty printed output
is now valid JSON text. Due to the way the verbose flag was handled, the
pretty printing now applies to the command sent, as well as its response:

(QEMU) query-chardev
{
    "execute": "query-chardev",
    "arguments": {}
}
{
    "return": [
        {
            "frontend-open": false,
            "label": "parallel0",
            "filename": "vc"
        },
        {
            "frontend-open": true,
            "label": "serial0",
            "filename": "vc"
        },
        {
            "frontend-open": true,
            "label": "compat_monitor0",
            "filename": "unix:/tmp/qmp,server"
        }
    ]
}

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <1456224706-1591-1-git-send-email-berrange@redhat.com>
Tested-by: Kashyap Chamarthy <kchamart@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
[Bonus fix: multiple -p now work]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-03-04 17:16:32 +01:00
Peter Maydell
3c0f12df65 Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20160304' into staging
target-arm queue:
 * Correct handling of writes to CPSR from gdbstub in user mode
 * virt: lift maximum RAM limit to 255GB
 * sdhci: implement reset
 * virt: if booting in Secure mode, provide secure-only RAM, make first
   flash device secure-only, and assume the EL3 boot rom will handle PSCI
 * bcm2835: use explicit endianness accessors rather than ldl/stl_phys
 * support big-endian in system mode for ARM
 * implement SETEND instruction
 * arm_gic: implement the GICv2 GICC_DIR register
 * fix SRS bug: only trap from S-EL1 to EL3 if specified mode is Mon

# gpg: Signature made Fri 04 Mar 2016 11:38:53 GMT using RSA key ID 14360CDE
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>"
# gpg:                 aka "Peter Maydell <pmaydell@gmail.com>"
# gpg:                 aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>"

* remotes/pmaydell/tags/pull-target-arm-20160304: (30 commits)
  target-arm: Only trap SRS from S-EL1 if specified mode is MON
  hw/intc/arm_gic.c: Implement GICv2 GICC_DIR
  arm: boot: Support big-endian elfs
  loader: Add data swap option to load-elf
  loader: load_elf(): Add doc comment
  loader: add API to load elf header
  target-arm: implement BE32 mode in system emulation
  target-arm: implement setend
  target-arm: introduce tbflag for endianness
  target-arm: a64: Add endianness support
  target-arm: introduce disas flag for endianness
  target-arm: pass DisasContext to gen_aa32_ld*/st*
  target-arm: implement SCTLR.EE
  linux-user: arm: handle CPSR.E correctly in strex emulation
  linux-user: arm: set CPSR.E/SCTLR.E0E correctly for BE mode
  arm: cpu: handle BE32 user-mode as BE
  target-arm: cpu: Move cpu_is_big_endian to header
  target-arm: implement SCTLR.B, drop bswap_code
  linux-user: arm: pass env to get_user_code_*
  linux-user: arm: fix coding style for some linux-user signal functions
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-04 11:46:32 +00:00
Ralf-Philipp Weinmann
ba63cf47a9 target-arm: Only trap SRS from S-EL1 if specified mode is MON
Commit cbc0326b6f caused SRS instructions executed from Secure
EL1 to trap to EL3 even if the specified mode was not monitor mode.

According to the ARMv8 Architecture reference manual [F6.1.203], ALL
of the following conditions need to be met for SRS to trap to EL3:
* It is executed at Secure PL1.
* The specified mode is monitor mode.
* EL3 is using AArch64.

Correct the condition governing the trap to EL3 to check the
specified mode.

Signed-off-by: Ralf-Philipp Weinmann <ralf+devel@comsecuris.com>
Message-id: 20160222224251.GA11654@beta.comsecuris.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: tweaked comment text to read 'specified mode'; edited
 commit message]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-04 11:30:22 +00:00
Peter Maydell
a55c910e0b hw/intc/arm_gic.c: Implement GICv2 GICC_DIR
The GICv2 introduces a new CPU interface register GICC_DIR, which
allows an OS to split the "priority drop" and "deactivate interrupt"
parts of interrupt completion. Implement this register.
(Note that the register is at offset 0x1000 in the CPU interface,
which means it is on a different 4K page from all the other registers.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Sergey Fedorov <serge.fdrv@gmail.com>
Message-id: 1456854176-7813-1-git-send-email-peter.maydell@linaro.org
2016-03-04 11:30:22 +00:00
Peter Crosthwaite
9776f63645 arm: boot: Support big-endian elfs
Support ARM big-endian ELF files in system-mode emulation. When loading
an elf, determine the endianness mode expected by the elf, and set the
relevant CPU state accordingly.

With this, big-endian modes are now fully supported via system-mode LE,
so there is no need to restrict the elf loading to the TARGET
endianness so the ifdeffery on TARGET_WORDS_BIGENDIAN goes away.

Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: fix typo in comments]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-04 11:30:21 +00:00
Peter Crosthwaite
7ef295ea5b loader: Add data swap option to load-elf
Some CPUs are of an opposite data-endianness to other components in the
system. Sometimes elfs have the data sections layed out with this CPU
data-endianness accounting for when loaded via the CPU, so byte swaps
(relative to other system components) will occur.

The leading example, is ARM's BE32 mode, which is is basically LE with
address manipulation on half-word and byte accesses to access the
hw/byte reversed address. This means that word data is invariant
across LE and BE32. This also means that instructions are still LE.
The expectation is that the elf will be loaded via the CPU in this
endianness scheme, which means the data in the elf is reversed at
compile time.

As QEMU loads via the system memory directly, rather than the CPU, we
need a mechanism to reverse elf data endianness to implement this
possibility.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-04 11:30:21 +00:00
Peter Crosthwaite
140b7ce5ff loader: load_elf(): Add doc comment
Document the usage of load_elf() for clarity on current features.

Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-04 11:30:21 +00:00
Peter Crosthwaite
04ae712a9f loader: add API to load elf header
Add an API to load an elf header header from a file. Populates a
buffer with the header contents, as well as a boolean for whether the
elf is 64b or not. Both arguments are optional.

Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: Fix typo in comment]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-04 11:30:21 +00:00
Paolo Bonzini
e334bd3190 target-arm: implement BE32 mode in system emulation
System emulation only has a little-endian target; BE32 mode
is implemented by adjusting the low bits of the address
for every byte and halfword load and store.  64-bit accesses
flip the low and high words.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[PC changes:
  * rebased against master (Jan 2016)
]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-04 11:30:21 +00:00
Paolo Bonzini
9886ecdf31 target-arm: implement setend
Since this is not a high-performance path, just use a helper to
flip the E bit and force a lookup in the hash table since the
flags have changed.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-04 11:30:21 +00:00
Peter Crosthwaite
91cca2cda9 target-arm: introduce tbflag for endianness
Introduce a tbflags for endianness, set based upon the CPUs current
endianness. This in turn propagates through to the disas endianness
flag.

Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-04 11:30:20 +00:00
Peter Crosthwaite
aa6489da4e target-arm: a64: Add endianness support
Set the dc->mo_endianness flag for AA64 and use it in all ldst ops.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-04 11:30:20 +00:00
Paolo Bonzini
dacf0a2ff7 target-arm: introduce disas flag for endianness
Introduce a disas flag for setting the CPU data endianness. This allows
control of the endianness from the CPU state rather than hard-coding it
to TARGET_WORDS_BIGENDIAN.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[ PC changes:
  * Split off as new patch from original:
        "target-arm: introduce tbflag for CPSR.E"
  * Wrote commit message from scratch
]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-04 11:30:20 +00:00
Paolo Bonzini
12dcc3217d target-arm: pass DisasContext to gen_aa32_ld*/st*
We'll need the DisasContext in the next patch to retrieve the
desired endianness, so pass it as a whole to gen_aa32_ld*/st*.

Unfortunately we cannot let those functions call get_mem_index,
because of user-mode load/store instructions.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[ PC changes:
 * Fix long lines
]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-04 11:30:20 +00:00
Peter Crosthwaite
73462dddf6 target-arm: implement SCTLR.EE
Implement SCTLR.EE bit which controls data endianess for exceptions
and page table translations. SCTLR.EE is mirrored to the CPSR.E bit
on exception entry.

Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-04 11:30:20 +00:00
Paolo Bonzini
c3ae85fc8f linux-user: arm: handle CPSR.E correctly in strex emulation
Now that CPSR.E is set correctly, prepare for when setend will be able
to change it; bswap data in and out of strex manually by comparing
SCTLR.B, CPSR.E and TARGET_WORDS_BIGENDIAN (we do not have the luxury
of using TCGMemOps).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[ PC changes:
  * Moved SCTLR/CPSR logic to arm_cpu_data_is_big_endian
]
Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-04 11:30:19 +00:00
Peter Crosthwaite
9c5a746038 linux-user: arm: set CPSR.E/SCTLR.E0E correctly for BE mode
If doing big-endian linux-user mode, set both the CPSR.E and SCTLR.E0E
bits. This sets big-endian mode for data accesses.

Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-04 11:30:19 +00:00
Peter Crosthwaite
b2e62d9a7b arm: cpu: handle BE32 user-mode as BE
endian with address manipulations on subword accesses (to give the
illusion of BE). But user-mode cannot tell the difference and is
already implemented as straight BE. So handle the difference in the
endianess query, where USER mode is BE and system is not.

Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-04 11:30:19 +00:00
Peter Crosthwaite
ed50ff7875 target-arm: cpu: Move cpu_is_big_endian to header
There is a CPU data endianness test that is used to drive the
virtio_big_endian test.

Move this up to the header so it can be more generally used for endian
tests. The KVM specific cpu_syncronize_state call is left behind in the
virtio specific function.

Rename it arm_cpu-data_is_big_endian() to more accurately capture that
this is for data accesses only.

Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-04 11:30:19 +00:00
Paolo Bonzini
f9fd40ebe4 target-arm: implement SCTLR.B, drop bswap_code
bswap_code is a CPU property of sorts ("is the iside endianness the
opposite way round to TARGET_WORDS_BIGENDIAN?") but it is not the
actual CPU state involved here which is SCTLR.B (set for BE32
binaries, clear for BE8).

Replace bswap_code with SCTLR.B, and pass that to arm_ld*_code.
The next patches will make data fetches honor both SCTLR.B and
CPSR.E appropriately.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[PC changes:
 * rebased on master (Jan 2016)
 * s/TARGET_USER_ONLY/CONFIG_USER_ONLY
 * Use bswap_code() for disas_set_info() instead of raw sctlr_b
]
Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-04 11:30:19 +00:00
Paolo Bonzini
49017bd8b4 linux-user: arm: pass env to get_user_code_*
This matches the idiom used by get_user_data_* later in the series,
and will help when bswap_code will be replaced by SCTLR.B.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-04 11:30:18 +00:00
Paolo Bonzini
a0e1e6d705 linux-user: arm: fix coding style for some linux-user signal functions
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-04 11:30:18 +00:00
Andrew Baumann
eab713941a bcm2835_mbox/property: replace ldl_phys/stl_phys with endian-specific accesses
PMM pointed out that ldl_phys and stl_phys are dependent on the CPU's
endianness, whereas device model code should be independent of
it. This changes the relevant Raspberry Pi devices to explicitly call
the little-endian variants.

Signed-off-by: Andrew Baumann <Andrew.Baumann@microsoft.com>
Message-id: 1456880233-22568-1-git-send-email-Andrew.Baumann@microsoft.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-04 11:30:18 +00:00
Peter Maydell
4824a61a6d hw/arm/virt: Assume EL3 boot rom will handle PSCI if one is provided
If the user passes us an EL3 boot rom, then it is going to want to
implement the PSCI interface itself. In this case, disable QEMU's
internal PSCI implementation so it does not get in the way, and
instead start all CPUs in an SMP configuration at once (the boot
rom will catch them all and pen up the secondaries until needed).
The boot rom code is also responsible for editing the device tree
to include any necessary information about its own PSCI implementation
before eventually passing it to a NonSecure guest.

(This "start all CPUs at once" approach is what both ARM Trusted
Firmware and UEFI expect, since it is what the ARM Foundation Model
does; the other approach would be to provide some emulated hardware
for "start the secondaries" but this is simplest.)

This is a compatibility break, but I don't believe that anybody
was using a secure boot ROM with an SMP configuration. Such a setup
would be somewhat broken since there was nothing preventing nonsecure
guest code from calling the QEMU PSCI function to start up a secondary
core in a way that completely bypassed the secure world.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Message-id: 1456853976-7592-1-git-send-email-peter.maydell@linaro.org
2016-03-04 11:30:18 +00:00
Peter Maydell
738a5d9fbb hw/arm/virt: Make first flash device Secure-only if booting secure
If the virt board is started with the 'secure' property set to
request a Secure setup, then make the first flash device be
visible only to the Secure world.

This is a breaking change, but I don't expect it to be noticed
by anybody, because running TZ-aware guests isn't common and
those guests are generally going to be booting from the flash
and implicitly expecting their Non-secure guests to not touch it.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1455288361-30117-5-git-send-email-peter.maydell@linaro.org
2016-03-04 11:30:18 +00:00
Peter Maydell
16f4a8dc5c hw/arm/virt: Load bios image to MemoryRegion, not physaddr
If we're loading a BIOS image into the first flash device,
load it into the flash's memory region specifically, not
into the physical address where the flash resides. This will
make a difference when the flash might be in the Secure
address space rather than the Nonsecure one.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1455288361-30117-4-git-send-email-peter.maydell@linaro.org
2016-03-04 11:30:17 +00:00
Peter Maydell
76151cacfe loader: Add load_image_mr() to load ROM image to a MemoryRegion
Add a new function load_image_mr(), which behaves like
load_image_targphys() except that it loads the ROM image to
a specified MemoryRegion rather than to a specified physical
address. This is useful when a ROM blob needs to be loaded
to a particular flash or ROM device but the address of that
device in the machine's address space is not known. (For
instance, ROMs in devices, or ROMs which might exist in
a different address space to the system address space.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1455288361-30117-3-git-send-email-peter.maydell@linaro.org
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-04 11:30:17 +00:00
Peter Maydell
83ec1923cd hw/arm/virt: Provide a secure-only RAM if booting in Secure mode
If we're booting in Secure mode, provide a secure-only RAM
(just 16MB) so that secure firmware has somewhere to run
from that won't be accessible to the Non-secure guest.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1455288361-30117-2-git-send-email-peter.maydell@linaro.org
2016-03-04 11:30:17 +00:00
Peter Maydell
8b41c30525 sdhci: Implement DeviceClass reset
The sdhci device was missing a DeviceClass reset method;
implement it. Poweron reset looks the same as reset commanded
by the guest via the device registers, apart from modelling of
the rpi 'pending insert interrupt on powerup' quirk.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Andrew Baumann <Andrew.Baumann@microsoft.com>
Message-id: 1456493044-10025-3-git-send-email-peter.maydell@linaro.org
2016-03-04 11:30:17 +00:00
Peter Maydell
0719e71e52 sd.c: Handle NULL block backend in sd_get_inserted()
The sd.c SD card emulation code can be in a state where the
SDState BlockBackend pointer is NULL; this is treated as
"card not present". Add a missing check to sd_get_inserted()
so that we don't segfault in this situation.

(This could be provoked by the guest writing to the SDHCI
register to do a reset on a xilinx-zynq-a9 board; it will
also happen at startup when sdhci implements its DeviceClass
reset method.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Message-id: 1456493044-10025-2-git-send-email-peter.maydell@linaro.org
2016-03-04 11:30:17 +00:00
Peter Maydell
71c2768433 virt: Lift the maximum RAM limit from 30GB to 255GB
The virt board restricts guests to only 30GB of RAM. This is a
hangover from the vexpress-a15 board, and there's no inherent reason
for it. 30GB is smaller than you might reasonably want to provision
a VM for on a beefy server machine. Raise the limit to 255GB.

We choose 255GB because the available space we currently have
below the 1TB boundary is up to the 512GB mark, but we don't
want to paint ourselves into a corner by assigning it all to
RAM. So we make half of it available for RAM, with the 256GB..512GB
range available for future non-RAM expansion purposes.

If we need to provide more RAM to VMs in the future then we need to:
 * allocate a second bank of RAM starting at 2TB and working up
 * fix the DT and ACPI table generation code in QEMU to correctly
   report two split lumps of RAM to the guest
 * fix KVM in the host kernel to allow guests with >40 bit address spaces

The last of these is obviously the trickiest, but it seems
reasonable to assume that anybody configuring a VM with a quarter
of a terabyte of RAM will be doing it on a host with more than a
terabyte of physical address space.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Tested-by: Wei Huang <wei@redhat.com>
Message-id: 1456402182-11651-1-git-send-email-peter.maydell@linaro.org
2016-03-04 11:30:16 +00:00
Peter Maydell
8c4f0eb94c target-arm: Correct handling of writes to CPSR mode bits from gdb in usermode
In helper.c the expression
  (env->uncached_cpsr & CPSR_M) != CPSR_USER
is always true; the right hand side was supposed to be ARM_CPU_MODE_USR
(an error in commit cb01d391).

Since the incorrect expression was always true, this just meant that
commit cb01d391 had no effect.

However simply changing the RHS here would reveal a logic error: if
the mode is USR we wish to completely ignore the attempt to set the
mode bits, which means that we must clear the CPSR_M bits from mask
to avoid the uncached_cpsr bits being updated at the end of the
function.

Move the condition into the correct place in the code, fix its RHS
constant, and add a comment about the fact that we must be doing a
gdbstub write if we're in user mode.

Fixes: https://bugs.launchpad.net/qemu/+bug/1550503
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Sergey Fedorov <serge.fdrv@gmail.com>
Message-id: 1456764438-30015-1-git-send-email-peter.maydell@linaro.org
2016-03-04 11:30:16 +00:00
Peter Maydell
2d3b7c0164 Merge remote-tracking branch 'remotes/amit-virtio-rng/tags/rng-for-2.6-1' into staging
rng:
- implement a request queue for rng-random so multiple guest requests
  don't result in vq buffers getting forgotten
- remove unused request cancellation code
- a VM with multiple vq buffers, when migrated, could get in a situation
  where not all buffers are handed back to the guest.  This is now
  fixed.

# gpg: Signature made Thu 03 Mar 2016 12:18:54 GMT using RSA key ID 854083B6
# gpg: Good signature from "Amit Shah <amit@amitshah.net>"
# gpg:                 aka "Amit Shah <amit@kernel.org>"
# gpg:                 aka "Amit Shah <amitshah@gmx.net>"

* remotes/amit-virtio-rng/tags/rng-for-2.6-1:
  virtio-rng: ask for more data if queue is not fully drained
  rng: add request queue support to rng-random
  rng: move request queue cleanup from RngEgd to RngBackend
  rng: move request queue from RngEgd to RngBackend
  rng: remove the unused request cancellation code
  MAINTAINERS: Add an entry for the include/sysemu/rng*.h files

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-03 13:13:36 +00:00
Ladi Prosek
f8693c2cd0 virtio-rng: ask for more data if queue is not fully drained
This commit effectively reverts:

  commit 4621c1768e
  Author: Amit Shah <amit.shah@redhat.com>
  Date:   Wed Nov 21 11:21:19 2012 +0530

  virtio-rng: remove extra request for entropy

but instead of calling virtio_rng_process unconditionally, it
first checks to see if the queue is empty as a little bit of
optimization.

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Message-Id: <1456998514-19271-1-git-send-email-lprosek@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-03-03 17:42:26 +05:30
Ladi Prosek
60253ed1e6 rng: add request queue support to rng-random
Requests are now created in the RngBackend parent class and the
code path is shared by both rng-egd and rng-random.

This commit fixes the rng-random implementation which processed
only one request at a time and simply discarded all but the most
recent one. In the guest this manifested as delayed completion
of reads from virtio-rng, i.e. a read was completed only after
another read was issued.

By switching rng-random to use the same request queue as rng-egd,
the unsafe stack-based allocation of the entropy buffer is
eliminated and replaced with g_malloc.

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Message-Id: <1456994238-9585-5-git-send-email-lprosek@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-03-03 17:42:26 +05:30
Ladi Prosek
9f14b0add1 rng: move request queue cleanup from RngEgd to RngBackend
RngBackend is now in charge of cleaning up the linked list on
instance finalization. It also exposes a function to finalize
individual RngRequest instances, called by its child classes.

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Message-Id: <1456994238-9585-4-git-send-email-lprosek@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-03-03 17:42:26 +05:30
Ladi Prosek
74074e8a7c rng: move request queue from RngEgd to RngBackend
The 'requests' field now lives in the RngBackend parent class.
There are no functional changes in this commit.

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Message-Id: <1456994238-9585-3-git-send-email-lprosek@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-03-03 17:42:26 +05:30
Ladi Prosek
3c52ddcdc5 rng: remove the unused request cancellation code
rng_backend_cancel_requests had no callers and none of the code
deleted in this commit ever ran.

Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Message-Id: <1456994238-9585-2-git-send-email-lprosek@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-03-03 17:42:26 +05:30
Thomas Huth
750cf86932 MAINTAINERS: Add an entry for the include/sysemu/rng*.h files
These headers are used by the virtio-rng and rng backends code,
so they should be listed in the same section in MAINTAINERS, too.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Message-Id: <1456404260-26928-1-git-send-email-thuth@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2016-03-03 17:42:23 +05:30
Peter Maydell
ed6128ebbd Merge remote-tracking branch 'remotes/stefanha/tags/tracing-pull-request' into staging
# gpg: Signature made Tue 01 Mar 2016 15:48:04 GMT using RSA key ID 81AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"

* remotes/stefanha/tags/tracing-pull-request:
  trace: Add a proper API to manage auto-generated events from the 'tcg' property
  trace: Add 'vcpu' event property to trace guest vCPU
  typedefs: Add CPUState
  trace: Add helper function to cast event arguments
  tcg: Move definition of type TCGv
  tcg: Add type for vCPU pointers
  trace: Remove unnecessary intermediate event copies
  trace: Extend API to manage event arguments
  vl: fix tracing initialization
  trace: use addresses instead of offsets in memory tracepoints
  trace: split subpage MMIOs into their own trace events.
  trace: docs: "simple" backend does support strings
  trace: drop trailing empty strings

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-01 15:54:03 +00:00
Lluís Vilanova
4ade0541de trace: Add a proper API to manage auto-generated events from the 'tcg' property
Formalizes the existence of the 'event_trans' and 'event_exec' event
attributes, which until now were monkey-patched only when necessary.

Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Message-id: 145640558759.20978.6374959404425591089.stgit@localhost
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-03-01 13:34:38 +00:00
Lluís Vilanova
3d211d9f4d trace: Add 'vcpu' event property to trace guest vCPU
This property identifies events that trace vCPU-specific information.

It adds a "CPUState*" argument to events with the property, identifying
the vCPU raising the event. TCG translation events also have a
"TCGv_env" implicit argument that is later used as the "CPUState*"
argument at execution time.

Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Message-id: 145641861797.30295.6991314023181842105.stgit@localhost
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-03-01 13:27:10 +00:00
Lluís Vilanova
b23197f9cf typedefs: Add CPUState
Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Message-id: 145641861239.30295.8564457138934628740.stgit@localhost
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-03-01 13:27:09 +00:00
Lluís Vilanova
bc9beb47c7 trace: Add helper function to cast event arguments
Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Message-id: 145641860680.30295.1873612736245870753.stgit@localhost
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-03-01 13:27:09 +00:00
Lluís Vilanova
5d4e1a1081 tcg: Move definition of type TCGv
The target-dependant type TCGv must be defined in "tcg/tcg.h" before
including the tracing helper wrappers in "tcg/tcg-op.h".

It also makes more sense to define it here, where other TCG types are
defined too.

Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Message-id: 145641860129.30295.17554707227384022653.stgit@localhost
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-03-01 13:27:09 +00:00
Lluís Vilanova
1bcea73e13 tcg: Add type for vCPU pointers
Adds the 'TCGv_env' type for pointers to 'CPUArchState' objects. The
tracing infrastructure later needs to differentiate between regular
pointers and pointers to vCPUs.

Also changes all targets to use the new 'TCGv_env' type instead of the
generic 'TCGv_ptr'. As of now, the change is merely cosmetic ('TCGv_env'
translates into 'TCGv_ptr'), but that could change in the future to
enforce the difference.

Note that a 'TCGv_env' type (for 'CPUState') is not added, since all
helpers currently receive the architecture-specific
pointer ('CPUArchState').

Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Acked-by: Richard Henderson <rth@twiddle.net>
Message-id: 145641859552.30295.7821536833590725201.stgit@localhost
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-03-01 13:27:09 +00:00
Lluís Vilanova
56797b1fbc trace: Remove unnecessary intermediate event copies
The current code forces the use of a chain of ".original" dereferences,
which looks odd.

Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Message-id: 145641858988.30295.7223459456488075843.stgit@localhost
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-03-01 13:27:09 +00:00
Lluís Vilanova
3596f524d4 trace: Extend API to manage event arguments
Lets the user manage event arguments as a list, and simplifies argument
concatenation.

Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 145641858432.30295.3069911069472672646.stgit@localhost
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-03-01 13:27:09 +00:00
Denis V. Lunev
62cb4145bb vl: fix tracing initialization
we should call trace_init_backends() before trace_init_file() for
CONFIG_TRACE_SIMPLE There is no difference for other cases.

This problem was introduced by the commit
    commit 41fc57e44e
    Author: Paolo Bonzini <pbonzini@redhat.com>
    Date:   Thu Jan 7 16:55:24 2016 +0300

    trace: split trace_init_file out of trace_init_backends

'make check' was failed as a result if configured with
  --enable-trace-backends=simple

Spotted by Alex Bennée.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Message-id: 1455036545-14870-1-git-send-email-den@openvz.org
CC: Alex Bennée <alex.bennee@linaro.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-03-01 13:20:15 +00:00
Hollis Blanchard
4779dc1d19 trace: use addresses instead of offsets in memory tracepoints
When memory_region_ops tracepoints are enabled, calculate and record the
absolute address being accessed. Otherwise, we only get offsets into the
memory region instead of addresses.

[Fixed "offset" -> "addr" in trace event format strings.
--Stefan]

Signed-off-by: Hollis Blanchard <hollis_blanchard@mentor.com>
Message-id: 1454976185-30095-3-git-send-email-hollis_blanchard@mentor.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-03-01 13:20:15 +00:00
Hollis Blanchard
23d92d68e7 trace: split subpage MMIOs into their own trace events.
Previously, a single MMIO could trigger the memory_region_ops tracepoint twice:
once on its way into subpage ops, then later on its way into the model's ops.

Also, the fields previously called "addr" are actually offsets into the memory
region. Rename them to "offset" while we're editing the tracepoint definitions.

Signed-off-by: Hollis Blanchard <hollis_blanchard@mentor.com>
Message-id: 1454976185-30095-2-git-send-email-hollis_blanchard@mentor.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-03-01 13:20:15 +00:00
Hollis Blanchard
2c140f5f2c trace: docs: "simple" backend does support strings
The simple tracing backend has supported strings for more than three years
(62bab73213).

Signed-off-by: Hollis Blanchard <hollis_blanchard@mentor.com>
Message-id: 1454976185-30095-1-git-send-email-hollis_blanchard@mentor.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-03-01 13:20:15 +00:00
Greg Kurz
6411dd1334 trace: drop trailing empty strings
Also fix a typo in the virtio_balloon_handle_output() trace while here.

[The double-quoting was a limitation of the old tracetool.sh script.
The modern tracetool.py script does not require double-quotes at the end
of the line.  See commit cf85cf8e97
("trace: Format strings must begin/end with double quotes").
--Stefan]

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-id: 20160111173036.24764.59878.stgit@bahia.huguette.org
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-03-01 13:20:15 +00:00
Peter Maydell
9c279bec75 Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20160301' into staging
Assorted fixes, cleanups and enhancements.

# gpg: Signature made Tue 01 Mar 2016 11:45:12 GMT using RSA key ID C6F02FAF
# gpg: Good signature from "Cornelia Huck <huckc@linux.vnet.ibm.com>"
# gpg:                 aka "Cornelia Huck <cornelia.huck@de.ibm.com>"

* remotes/cohuck/tags/s390x-20160301:
  s390x/css: only suspend when enabled by orb
  MAINTAINERS: Remove entry for hw/s390x/s390-virtio-bus.[ch]
  MAINTAINERS: Remove the old s390-virtio machine
  s390x/pci: use PCI_MSIX_FLAGS on retrieving the MSIX entries
  s390x/css: Use static initialization for channel_subsys fields
  s390x/css: Allocate channel_subsys statically
  s390x/pci: fix reg/dereg irq functions
  s390x/css: introduce indicator refcounting interfaces
  s390x/virtio: old machine leftovers
  watchdog/diag288: avoid race condition on expired watchdog
  s390x: remove {kvm_}s390_virtio_irq()
  s390x: fix debug statement in trigger_page_fault()
  s390x/kvm: sync fprs via kvm_run
  linux-headers: update against kvm/next

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-01 13:09:55 +00:00
Peter Maydell
646fd16865 Merge remote-tracking branch 'remotes/kraxel/tags/pull-seabios-20160301-1' into staging
seabios: update to 1.9.1 stable release

# gpg: Signature made Tue 01 Mar 2016 08:39:53 GMT using RSA key ID D3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"

* remotes/kraxel/tags/pull-seabios-20160301-1:
  seabios: update to 1.9.1 stable release

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-01 12:18:23 +00:00
Cornelia Huck
ce350f32e4 s390x/css: only suspend when enabled by orb
We must not allow a channel program to suspend if the suspend
control bit in the orb had not been specified.

Reviewed-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-03-01 12:15:29 +01:00
Thomas Huth
d90527178c MAINTAINERS: Remove entry for hw/s390x/s390-virtio-bus.[ch]
The files have been deleted recently, no need to keep these entries
anymore.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1456397100-22746-1-git-send-email-thuth@redhat.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-03-01 12:15:29 +01:00
Thomas Huth
6aaa681c9b MAINTAINERS: Remove the old s390-virtio machine
The old s390-virtio machine has been removed last year, so we don't
need the corresponding section in the MAINTAINERS file anymore.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1456394274-21082-1-git-send-email-thuth@redhat.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-03-01 12:15:29 +01:00
Wei Yang
ce1307e180 s390x/pci: use PCI_MSIX_FLAGS on retrieving the MSIX entries
Even PCI_CAP_FLAGS has the same value as PCI_MSIX_FLAGS, the later one is
the more proper on retrieving MSIX entries.

This patch uses PCI_MSIX_FLAGS to retrieve the MSIX entries.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
CC: Cornelia Huck <cornelia.huck@de.ibm.com>
CC: Christian Borntraeger <borntraeger@de.ibm.com>
Message-Id: <1455895091-7589-3-git-send-email-richard.weiyang@gmail.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-03-01 12:15:29 +01:00
Eduardo Habkost
bc994b74ea s390x/css: Use static initialization for channel_subsys fields
machine_init() will be gone, but we don't need it if we just
initialize the channel_subsys fields statically.

Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Alexander Graf <agraf@suse.de>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <1455656347-29033-4-git-send-email-ehabkost@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
[adapted on top of indicator changes]
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-03-01 12:15:29 +01:00
Eduardo Habkost
562f5e0b97 s390x/css: Allocate channel_subsys statically
There's no need to use g_malloc0() to allocate the channel_subsys
struct, just use a static variable.

Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Alexander Graf <agraf@suse.de>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <1455656347-29033-3-git-send-email-ehabkost@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
[adapted on top of indicator changes]
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-03-01 12:15:29 +01:00
Yi Min Zhao
8581c115d2 s390x/pci: fix reg/dereg irq functions
Indicator refcounting interfaces are introduced. This patch fixes
introducing unneeded indicator mappings and failure to release
AISB mappings on deregistration.

Signed-off-by: Yi Min Zhao <zyimin@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-03-01 12:15:29 +01:00
Yi Min Zhao
a28d8391e3 s390x/css: introduce indicator refcounting interfaces
Currently, virtio-ccw uses its own interfaces to keep indicators mapped
just once even if the same address has been registered multiple times.
These interfaces fit the PCI use case as well. Therefore, move them to
css and make them generic interfaces.

Signed-off-by: Yi Min Zhao <zyimin@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-03-01 12:15:28 +01:00
Cornelia Huck
99abd0d6f7 s390x/virtio: old machine leftovers
Remove some now unused #defines.

Reviewed-By: Sascha Silbe <silbe@linux.vnet.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-03-01 12:15:28 +01:00
Sascha Silbe
fe345a3d5d watchdog/diag288: avoid race condition on expired watchdog
When configured to inject an NMI, watchdog_perform_action() may cause
the BQL to be temporarily relinquished (inject_nmi() → ... →
s390_nmi() → s390_cpu_restart() → run_on_cpu()). When the guest issues
diag 288 again in response to the NMI, the diag 288 operation will
race against wdt_diag288_reset(). Depending on scheduler behaviour,
wdt_diag288_reset() may be run after the guest issued a diag 288
Init. As a result, we will cancel the timer the guest just set up. The
effect observed by the guest is that a second expiry does not trigger
the watchdog action and diag 288 Change operations fail.

Fix this by resetting the timer _before_ invoking the action.

Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Acked-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-03-01 12:15:28 +01:00
Cornelia Huck
8777f6abdb s390x: remove {kvm_}s390_virtio_irq()
This interface was only used by the old virtio machine and therefore
is not needed anymore.

Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-03-01 12:15:28 +01:00
David Hildenbrand
c5b2ee4c7a s390x: fix debug statement in trigger_page_fault()
When mmu_translate debugging output is enabled, code won't compile.
Let's just use the same statement as in trigger_prot_fault().

Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-03-01 12:15:28 +01:00
David Hildenbrand
5ab0e547bf s390x/kvm: sync fprs via kvm_run
We can now also sync the fprs via kvm_run, avoiding one ioctl.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-03-01 12:15:28 +01:00
Cornelia Huck
66fb2d5467 linux-headers: update against kvm/next
Update against commit efef127c, but keep userfaultd.h.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-03-01 12:15:28 +01:00
Peter Maydell
0b85d73583 Merge remote-tracking branch 'remotes/kraxel/tags/pull-input-20160301-1' into staging
qapi: fix input-send-event and promote to stable

# gpg: Signature made Tue 01 Mar 2016 08:19:52 GMT using RSA key ID D3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"

* remotes/kraxel/tags/pull-input-20160301-1:
  qapi: promote input-send-event to stable
  qapi: rename InputAxis values.
  qapi: rename input buttons
  qapi: switch x-input-send-event from console to device+head
  console: add & use qemu_console_lookup_by_device_name

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-01 11:15:00 +00:00
Peter Maydell
d9c7737e57 Merge remote-tracking branch 'remotes/kraxel/tags/pull-vga-20160301-1' into staging
vga: minor cirrus/qxl bugfixes.

# gpg: Signature made Tue 01 Mar 2016 07:16:22 GMT using RSA key ID D3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"

* remotes/kraxel/tags/pull-vga-20160301-1:
  qxl: lock current_async update in qxl_soft_reset
  cirrus_vga: fix off-by-one in blit_region_is_unsafe

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-01 10:34:19 +00:00
Peter Maydell
9c74a85304 Merge remote-tracking branch 'remotes/cody/tags/block-pull-request' into staging
# gpg: Signature made Mon 29 Feb 2016 20:08:16 GMT using RSA key ID C0DE3057
# gpg: Good signature from "Jeffrey Cody <jcody@redhat.com>"
# gpg:                 aka "Jeffrey Cody <jeff@codyprime.org>"
# gpg:                 aka "Jeffrey Cody <codyprime@gmail.com>"

* remotes/cody/tags/block-pull-request:
  iotests/124: Add cluster_size mismatch test
  block/backup: avoid copying less than full target clusters
  block/backup: make backup cluster size configurable
  mirror: Add mirror_wait_for_io
  mirror: Rewrite mirror_iteration
  vhdx: Simplify vhdx_set_shift_bits()
  vhdx: DIV_ROUND_UP() in vhdx_calc_bat_entries()
  iscsi: add support for getting CHAP password via QCryptoSecret API
  curl: add support for HTTP authentication parameters
  rbd: add support for getting password from QCryptoSecret object
  sheepdog: allow to delete snapshot
  block/nfs: add support for setting debug level

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-01 09:54:53 +00:00
Gerd Hoffmann
fee5b753ff seabios: update to 1.9.1 stable release
git shortlog rel-1.9.0..rel-1.9.1
=================================

Cole Robinson (1):
      biostables: Support SMBIOS 2.6+ UUID format

Kevin O'Connor (7):
      xhci: Check for device disconnects during USB2 reset polling
      xhci: Wait for port enable even for USB3 devices
      sdcard: Only enable error_irq_enable for bits defined in SDHCI v1 spec
      sdcard: fix typo causing 32bit write to 16bit block_size field
      nmi: Don't try to switch onto extra stack in NMI handler
      scsi: Do not call printf() from scsi_is_ready()
      coreboot: Check for unaligned cbfs header

Marcel Apfelbaum (1):
      fw/pci: do not automatically allocate IO region for PCIe bridges

Roger Pau Monne (1):
      build: fix typo in buildversion.py

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-03-01 09:37:07 +01:00
Gerd Hoffmann
05fa1c742f qxl: lock current_async update in qxl_soft_reset
This should fix a defect report from Coverity.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-01 07:51:32 +01:00
Paolo Bonzini
d2ba7ecb34 cirrus_vga: fix off-by-one in blit_region_is_unsafe
The "max" value is being compared with >=, but addr + width points to
the first byte that will _not_ be copied.  Laszlo suggested using a
"greater than" comparison, instead of subtracting one like it is
already done above for the height, so that max remains always positive.

The mistake is "safe"---it will reject some blits, but will never cause
out-of-bounds writes.

Cc: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Message-id: 1455121059-18280-1-git-send-email-pbonzini@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2016-03-01 07:51:32 +01:00
John Snow
cc199b16cf iotests/124: Add cluster_size mismatch test
If a backing file isn't specified in the target image and the
cluster_size is larger than the bitmap granularity, we run the risk of
creating bitmaps with allocated clusters but empty/no data which will
prevent the proper reading of the backup in the future.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1456433911-24718-4-git-send-email-jsnow@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-02-29 14:55:14 -05:00
John Snow
4c9bca7e39 block/backup: avoid copying less than full target clusters
During incremental backups, if the target has a cluster size that is
larger than the backup cluster size and we are backing up to a target
that cannot (for whichever reason) pull clusters up from a backing image,
we may inadvertantly create unusable incremental backup images.

For example:

If the bitmap tracks changes at a 64KB granularity and we transmit 64KB
of data at a time but the target uses a 128KB cluster size, it is
possible that only half of a target cluster will be recognized as dirty
by the backup block job. When the cluster is allocated on the target
image but only half populated with data, we lose the ability to
distinguish between zero padding and uninitialized data.

This does not happen if the target image has a backing file that points
to the last known good backup.

Even if we have a backing file, though, it's likely going to be faster
to just buffer the redundant data ourselves from the live image than
fetching it from the backing file, so let's just always round up to the
target granularity.

The same logic applies to backup modes top, none, and full. Copying
fractional clusters without the guarantee of COW is dangerous, but even
if we can rely on COW, it's likely better to just re-copy the data.

Reported-by: Fam Zheng <famz@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1456433911-24718-3-git-send-email-jsnow@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-02-29 14:55:14 -05:00
John Snow
16096a4d47 block/backup: make backup cluster size configurable
64K might not always be appropriate, make this a runtime value.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1456433911-24718-2-git-send-email-jsnow@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-02-29 14:55:14 -05:00
Fam Zheng
21cd917ff5 mirror: Add mirror_wait_for_io
The three lines are duplicated a number of times now, refactor a
function.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1454637630-10585-3-git-send-email-famz@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-02-29 14:54:31 -05:00
Fam Zheng
e5b43573e2 mirror: Rewrite mirror_iteration
The "pnum < nb_sectors" condition in deciding whether to actually copy
data is unnecessarily strict, and the qiov initialization is
unnecessarily for bdrv_aio_write_zeroes and bdrv_aio_discard.

Rewrite mirror_iteration to fix both flaws.

The output of iotests 109 is updated because we now report the offset
and len slightly differently in mirroring progress.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1454637630-10585-2-git-send-email-famz@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-02-29 14:54:31 -05:00
Max Reitz
04a3615860 vhdx: Simplify vhdx_set_shift_bits()
For values which are powers of two (and we do assume all of these to
be), sizeof(x) * 8 - 1 - clz(x) == ctz(x). Therefore, use ctz().

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 1450451066-13335-3-git-send-email-mreitz@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-02-29 14:54:31 -05:00
Max Reitz
939901dcd2 vhdx: DIV_ROUND_UP() in vhdx_calc_bat_entries()
We have DIV_ROUND_UP(), so we can use it to produce more easily readable
code. It may be slower than the bit shifting currently performed
(because it actually performs a division), but since
vhdx_calc_bat_entries() is never used in a hot path, this is completely
fine.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 1450451066-13335-2-git-send-email-mreitz@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-02-29 14:54:31 -05:00
Daniel P. Berrange
b189346eb1 iscsi: add support for getting CHAP password via QCryptoSecret API
The iSCSI driver currently accepts the CHAP password in plain text
as a block driver property. This change adds a new "password-secret"
property that accepts the ID of a QCryptoSecret instance.

  $QEMU \
     -object secret,id=sec0,filename=/home/berrange/example.pw \
     -drive driver=iscsi,url=iscsi://example.com/target-foo/lun1,\
            user=dan,password-secret=sec0

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1453385961-10718-4-git-send-email-berrange@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-02-29 14:54:31 -05:00
Daniel P. Berrange
1bff960642 curl: add support for HTTP authentication parameters
If connecting to a web server which has authentication
turned on, QEMU gets a 401 as curl has not been configured
with any authentication credentials.

This adds 4 new parameters to the curl block driver
options 'username', 'password-secret', 'proxy-username'
and 'proxy-password-secret'. Passwords are provided using
the recently added 'secret' object type

 $QEMU \
     -object secret,id=sec0,filename=/home/berrange/example.pw \
     -object secret,id=sec1,filename=/home/berrange/proxy.pw \
     -drive driver=http,url=http://example.com/some.img,\
            username=dan,password-secret=sec0,\
            proxy-username=dan,proxy-password-secret=sec1

Of course it is possible to use the same secret for both the
proxy & server passwords if desired, or omit the proxy auth
details, or the server auth details as required.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1453385961-10718-3-git-send-email-berrange@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-02-29 14:54:31 -05:00
Daniel P. Berrange
60390a2192 rbd: add support for getting password from QCryptoSecret object
Currently RBD passwords must be provided on the command line
via

  $QEMU -drive file=rbd:pool/image:id=myname:\
               key=QVFDVm41aE82SHpGQWhBQXEwTkN2OGp0SmNJY0UrSE9CbE1RMUE=:\
               auth_supported=cephx

This is insecure because the key is visible in the OS process
listing.

This adds support for an 'password-secret' parameter in the RBD
parameters that can be used with the QCryptoSecret object to
provide the password via a file:

  echo "QVFDVm41aE82SHpGQWhBQXEwTkN2OGp0SmNJY0UrSE9CbE1RMUE=" > poolkey.b64
  $QEMU -object secret,id=secret0,file=poolkey.b64,format=base64 \
        -drive driver=rbd,filename=rbd:pool/image:id=myname:\
               auth_supported=cephx,password-secret=secret0

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1453385961-10718-2-git-send-email-berrange@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-02-29 14:54:30 -05:00
Vasiliy Tolstov
eab8eb8db3 sheepdog: allow to delete snapshot
This patch implements a blockdriver function bdrv_snapshot_delete() in
the sheepdog driver. With the new function, snapshots of sheepdog can
be deleted from libvirt.

Cc: Jeff Cody <jcody@redhat.com>
Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
Signed-off-by: Vasiliy Tolstov <v.tolstov@selfip.ru>
Message-id: 1450873346-22334-1-git-send-email-mitake.hitoshi@lab.ntt.co.jp
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-02-29 14:54:30 -05:00
Peter Lieven
7725b8bf12 block/nfs: add support for setting debug level
recent libnfs versions support logging debug messages. Add
support for it in qemu through an URL parameter.

Example:
 qemu -cdrom nfs://127.0.0.1/iso/my.iso?debug=2

Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1447052973-14513-1-git-send-email-pl@kamp.de
Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-02-29 14:54:30 -05:00
1645 changed files with 68674 additions and 25025 deletions

1
.gitignore vendored
View File

@@ -108,4 +108,5 @@
cscope.*
tags
TAGS
docker-src.*
*~

View File

@@ -42,83 +42,49 @@ notifications:
env:
global:
- TEST_CMD="make check"
- EXTRA_CONFIG=""
matrix:
# Group major targets together with their linux-user counterparts
- TARGETS=alpha-softmmu,alpha-linux-user,cris-softmmu,cris-linux-user,m68k-softmmu,m68k-linux-user,microblaze-softmmu,microblazeel-softmmu,microblaze-linux-user,microblazeel-linux-user
- TARGETS=arm-softmmu,arm-linux-user,armeb-linux-user,aarch64-softmmu,aarch64-linux-user
- TARGETS=i386-softmmu,i386-linux-user,x86_64-softmmu,x86_64-linux-user
- TARGETS=mips-softmmu,mips64-softmmu,mips64el-softmmu,mipsel-softmmu,mips-linux-user,mips64-linux-user,mips64el-linux-user,mipsel-linux-user,mipsn32-linux-user,mipsn32el-linux-user
- TARGETS=or32-softmmu,or32-linux-user,ppc-softmmu,ppc64-softmmu,ppcemb-softmmu,ppc-linux-user,ppc64-linux-user,ppc64abi32-linux-user,ppc64le-linux-user
- TARGETS=s390x-softmmu,s390x-linux-user,sh4-softmmu,sh4eb-softmmu,sh4-linux-user,sh4eb-linux-user,sparc-softmmu,sparc64-softmmu,sparc-linux-user,sparc32plus-linux-user,sparc64-linux-user,unicore32-softmmu,unicore32-linux-user
# Group remaining softmmu only targets into one build
- TARGETS=lm32-softmmu,moxie-softmmu,tricore-softmmu,xtensa-softmmu,xtensaeb-softmmu
- CONFIG=""
- CONFIG="--enable-debug --enable-debug-tcg --enable-trace-backends=log"
- CONFIG="--disable-linux-aio --disable-cap-ng --disable-attr --disable-brlapi --disable-uuid --disable-libusb"
- CONFIG="--enable-modules"
- CONFIG="--with-coroutine=ucontext"
- CONFIG="--with-coroutine=sigaltstack"
git:
# we want to do this ourselves
submodules: false
before_install:
- if [ "$TRAVIS_OS_NAME" == "osx" ]; then brew update ; fi
- if [ "$TRAVIS_OS_NAME" == "osx" ]; then brew install libffi gettext glib pixman ; fi
- wget -O - http://people.linaro.org/~alex.bennee/qemu-submodule-git-seed.tar.xz | tar -xvJ
- git submodule update --init --recursive
before_script:
- ./configure --target-list=${TARGETS} --enable-debug-tcg ${EXTRA_CONFIG}
- ./configure ${CONFIG}
script:
- make -j2 && ${TEST_CMD}
- make -j3 && ${TEST_CMD}
matrix:
# We manually include a number of additional build for non-standard bits
include:
# Debug related options
- env: TARGETS=x86_64-softmmu
EXTRA_CONFIG="--enable-debug"
# Sparse is GCC only
- env: CONFIG="--enable-sparse"
compiler: gcc
# We currently disable "make check"
- env: TARGETS=alpha-softmmu
EXTRA_CONFIG="--enable-debug --enable-tcg-interpreter"
# gprof/gcov are GCC features
- env: CONFIG="--enable-gprof --enable-gcov --disable-pie"
compiler: gcc
# We manually include builds which we disable "make check" for
- env: CONFIG="--enable-debug --enable-tcg-interpreter"
TEST_CMD=""
compiler: gcc
# Disable a few of the optional features
- env: TARGETS=x86_64-softmmu
EXTRA_CONFIG="--disable-linux-aio --disable-cap-ng --disable-attr --disable-brlapi --disable-uuid --disable-libusb"
compiler: gcc
# Currently configure doesn't force --disable-pie
- env: TARGETS=x86_64-softmmu
EXTRA_CONFIG="--enable-gprof --enable-gcov --disable-pie"
compiler: gcc
# Sparse
- env: TARGETS=x86_64-softmmu
EXTRA_CONFIG="--enable-sparse"
compiler: gcc
# Modules
- env: TARGETS=arm-softmmu,x86_64-softmmu
EXTRA_CONFIG="--enable-modules"
compiler: gcc
# All the trace backends (apart from dtrace)
- env: TARGETS=i386-softmmu
EXTRA_CONFIG="--enable-trace-backends=log"
compiler: gcc
# We currently disable "make check" (until 41fc57e44ed regression fixed)
- env: TARGETS=x86_64-softmmu
EXTRA_CONFIG="--enable-trace-backends=simple"
- env: CONFIG="--enable-trace-backends=simple"
TEST_CMD=""
compiler: gcc
# We currently disable "make check"
- env: TARGETS=x86_64-softmmu
EXTRA_CONFIG="--enable-trace-backends=ftrace"
- env: CONFIG="--enable-trace-backends=ftrace"
TEST_CMD=""
compiler: gcc
# We currently disable "make check"
- env: TARGETS=x86_64-softmmu
EXTRA_CONFIG="--enable-trace-backends=ust"
- env: CONFIG="--enable-trace-backends=ust"
TEST_CMD=""
compiler: gcc
# All the co-routine backends (apart from windows)
# We currently disable "make check"
- env: TARGETS=x86_64-softmmu
EXTRA_CONFIG="--with-coroutine=gthread"
- env: CONFIG="--with-coroutine=gthread"
TEST_CMD=""
compiler: gcc
- env: TARGETS=x86_64-softmmu
EXTRA_CONFIG="--with-coroutine=ucontext"
compiler: gcc
- env: TARGETS=x86_64-softmmu
EXTRA_CONFIG="--with-coroutine=sigaltstack"
compiler: gcc
- env: CONFIG=""
os: osx
compiler: clang

View File

@@ -165,6 +165,7 @@ F: hw/openrisc/
F: tests/tcg/openrisc/
PowerPC
M: David Gibson <david@gibson.dropbear.id.au>
M: Alexander Graf <agraf@suse.de>
L: qemu-ppc@nongnu.org
S: Maintained
@@ -234,6 +235,7 @@ L: kvm@vger.kernel.org
S: Supported
F: kvm-*
F: */kvm.*
F: include/sysemu/kvm*.h
ARM
M: Peter Maydell <peter.maydell@linaro.org>
@@ -277,7 +279,8 @@ Guest CPU Cores (Xen):
----------------------
X86
M: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
M: Stefano Stabellini <sstabellini@kernel.org>
M: Anthony Perard <anthony.perard@citrix.com>
L: xen-devel@lists.xensource.com
S: Supported
F: xen-*
@@ -356,10 +359,7 @@ F: include/hw/timer/a9gtimer.h
F: include/hw/timer/arm_mptimer.h
Exynos
M: Evgeny Voevodin <e.voevodin@samsung.com>
M: Maksim Kozlov <m.kozlov@samsung.com>
M: Igor Mitsyanko <i.mitsyanko@gmail.com>
M: Dmitry Solodkiy <d.solodkiy@samsung.com>
L: qemu-arm@nongnu.org
S: Maintained
F: hw/*/exynos*
@@ -598,7 +598,7 @@ F: hw/pci-host/grackle.c
F: hw/misc/macio/
PReP
M: Andreas Färber <andreas.faerber@web.de>
L: qemu-devel@nongnu.org
L: qemu-ppc@nongnu.org
S: Odd Fixes
F: hw/ppc/prep.c
@@ -656,12 +656,6 @@ F: hw/*/grlib*
S390 Machines
-------------
S390 Virtio
M: Alexander Graf <agraf@suse.de>
S: Maintained
F: hw/s390x/s390-*.c
X: hw/s390x/*pci*.[hc]
S390 Virtio-ccw
M: Cornelia Huck <cornelia.huck@de.ibm.com>
M: Christian Borntraeger <borntraeger@de.ibm.com>
@@ -669,7 +663,6 @@ M: Alexander Graf <agraf@suse.de>
S: Supported
F: hw/char/sclp*.[hc]
F: hw/s390x/
X: hw/s390x/s390-virtio-bus.[ch]
F: include/hw/s390x/
F: pc-bios/s390-ccw/
F: hw/watchdog/wdt_diag288.c
@@ -723,6 +716,12 @@ F: hw/timer/hpet*
F: hw/timer/i8254*
F: hw/timer/mc146818rtc*
Machine core
M: Eduardo Habkost <ehabkost@redhat.com>
M: Marcel Apfelbaum <marcel@redhat.com>
S: Supported
F: hw/core/machine.c
F: include/hw/boards.h
Xtensa Machines
---------------
@@ -872,6 +871,7 @@ VFIO
M: Alex Williamson <alex.williamson@redhat.com>
S: Supported
F: hw/vfio/*
F: include/hw/vfio/
vhost
M: Michael S. Tsirkin <mst@redhat.com>
@@ -883,6 +883,7 @@ M: Michael S. Tsirkin <mst@redhat.com>
S: Supported
F: hw/*/virtio*
F: net/vhost-user.c
F: include/hw/virtio/
virtio-9p
M: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
@@ -928,6 +929,7 @@ M: Amit Shah <amit.shah@redhat.com>
S: Supported
F: hw/virtio/virtio-rng.c
F: include/hw/virtio/virtio-rng.h
F: include/sysemu/rng*.h
F: backends/rng*.c
nvme
@@ -952,6 +954,14 @@ S: Maintained
F: hw/*/xilinx_*
F: include/hw/xilinx.h
Network packet abstractions
M: Dmitry Fleytman <dmitry@daynix.com>
S: Maintained
F: include/net/eth.h
F: net/eth.c
F: hw/net/net_rx_pkt*
F: hw/net/net_tx_pkt*
Vmware
M: Dmitry Fleytman <dmitry@daynix.com>
S: Maintained
@@ -971,6 +981,16 @@ F: hw/acpi/nvdimm.c
F: hw/mem/nvdimm.c
F: include/hw/mem/nvdimm.h
e1000x
M: Dmitry Fleytman <dmitry@daynix.com>
S: Maintained
F: hw/net/e1000x*
e1000e
M: Dmitry Fleytman <dmitry@daynix.com>
S: Maintained
F: hw/net/e1000e*
Subsystems
----------
Audio
@@ -984,6 +1004,7 @@ F: tests/intel-hda-test.c
Block layer core
M: Kevin Wolf <kwolf@redhat.com>
M: Max Reitz <mreitz@redhat.com>
L: qemu-block@nongnu.org
S: Supported
F: block*
@@ -997,6 +1018,7 @@ T: git git://repo.or.cz/qemu/kevin.git block
Block I/O path
M: Stefan Hajnoczi <stefanha@redhat.com>
M: Fam Zheng <famz@redhat.com>
L: qemu-block@nongnu.org
S: Supported
F: async.c
@@ -1013,7 +1035,7 @@ F: blockjob.c
F: include/block/blockjob.h
F: block/backup.c
F: block/commit.c
F: block/stream.h
F: block/stream.c
F: block/mirror.c
T: git git://github.com/codyprime/qemu-kvm-jtc.git block
@@ -1043,11 +1065,10 @@ S: Supported
F: scripts/coverity-model.c
CPU
M: Andreas Färber <afaerber@suse.de>
L: qemu-devel@nongnu.org
S: Supported
F: qom/cpu.c
F: include/qom/cpu.h
F: target-i386/cpu.c
ICC Bus
M: Igor Mammedov <imammedo@redhat.com>
@@ -1103,7 +1124,6 @@ F: ui/
F: include/ui/
Cocoa graphics
M: Andreas Färber <andreas.faerber@web.de>
M: Peter Maydell <peter.maydell@linaro.org>
S: Odd Fixes
F: ui/cocoa.m
@@ -1128,6 +1148,7 @@ Network device backends
M: Jason Wang <jasowang@redhat.com>
S: Maintained
F: net/
F: include/net/
T: git git://github.com/jasowang/qemu.git net
Netmap network backend
@@ -1152,8 +1173,6 @@ M: Eduardo Habkost <ehabkost@redhat.com>
S: Maintained
F: numa.c
F: include/sysemu/numa.h
K: numa|NUMA
K: srat|SRAT
T: git git://github.com/ehabkost/qemu.git numa
QAPI
@@ -1223,10 +1242,12 @@ F: scripts/qmp/
T: git git://repo.or.cz/qemu/armbru.git qapi-next
SLIRP
M: Samuel Thibault <samuel.thibault@ens-lyon.org>
M: Jan Kiszka <jan.kiszka@siemens.com>
S: Maintained
F: slirp/
F: net/slirp.c
F: include/net/slirp.h
T: git git://git.kiszka.org/qemu.git queues/slirp
Tracing
@@ -1397,9 +1418,8 @@ S: Orphan
Stable 0.15
L: qemu-stable@nongnu.org
M: Andreas Färber <afaerber@suse.de>
T: git git://git.qemu-project.org/qemu-stable-0.15.git
S: Supported
S: Orphan
Stable 0.14
L: qemu-stable@nongnu.org
@@ -1564,6 +1584,7 @@ F: block/win32-aio.c
qcow2
M: Kevin Wolf <kwolf@redhat.com>
M: Max Reitz <mreitz@redhat.com>
L: qemu-block@nongnu.org
S: Supported
F: block/qcow2*
@@ -1576,6 +1597,7 @@ F: block/qcow.c
blkdebug
M: Kevin Wolf <kwolf@redhat.com>
M: Max Reitz <mreitz@redhat.com>
L: qemu-block@nongnu.org
S: Supported
F: block/blkdebug.c
@@ -1611,3 +1633,10 @@ Build system architecture
M: Daniel P. Berrange <berrange@redhat.com>
S: Odd Fixes
F: docs/build-system.txt
Docker testing
--------------
Docker based testing framework and cases
M: Fam Zheng <famz@redhat.com>
S: Maintained
F: tests/docker/

View File

@@ -6,7 +6,7 @@ BUILD_DIR=$(CURDIR)
# Before including a proper config-host.mak, assume we are in the source tree
SRC_PATH=.
UNCHECKED_GOALS := %clean TAGS cscope ctags
UNCHECKED_GOALS := %clean TAGS cscope ctags docker docker-%
# All following code might depend on configuration variables
ifneq ($(wildcard config-host.mak),)
@@ -30,7 +30,6 @@ CONFIG_ALL=y
-include config-all-devices.mak
-include config-all-disas.mak
include $(SRC_PATH)/rules.mak
config-host.mak: $(SRC_PATH)/configure
@echo $@ is out-of-date, running configure
@# TODO: The next lines include code which supports a smooth
@@ -49,6 +48,8 @@ ifneq ($(filter-out $(UNCHECKED_GOALS),$(MAKECMDGOALS)),$(if $(MAKECMDGOALS),,fa
endif
endif
include $(SRC_PATH)/rules.mak
GENERATED_HEADERS = config-host.h qemu-options.def
GENERATED_HEADERS += qmp-commands.h qapi-types.h qapi-visit.h qapi-event.h
GENERATED_SOURCES += qmp-marshal.c qapi-types.c qapi-visit.c qapi-event.c
@@ -92,9 +93,6 @@ HELPERS-$(CONFIG_LINUX) = qemu-bridge-helper$(EXESUF)
ifdef BUILD_DOCS
DOCS=qemu-doc.html qemu-tech.html qemu.1 qemu-img.1 qemu-nbd.8 qemu-ga.8
DOCS+=qmp-commands.txt
ifdef CONFIG_LINUX
DOCS+=kvm_stat.1
endif
ifdef CONFIG_VIRTFS
DOCS+=fsdev/virtfs-proxy-helper.1
endif
@@ -238,7 +236,7 @@ qemu-img$(EXESUF): qemu-img.o $(block-obj-y) $(crypto-obj-y) $(io-obj-y) $(qom-o
qemu-nbd$(EXESUF): qemu-nbd.o $(block-obj-y) $(crypto-obj-y) $(io-obj-y) $(qom-obj-y) libqemuutil.a libqemustub.a
qemu-io$(EXESUF): qemu-io.o $(block-obj-y) $(crypto-obj-y) $(io-obj-y) $(qom-obj-y) libqemuutil.a libqemustub.a
qemu-bridge-helper$(EXESUF): qemu-bridge-helper.o
qemu-bridge-helper$(EXESUF): qemu-bridge-helper.o libqemuutil.a libqemustub.a
fsdev/virtfs-proxy-helper$(EXESUF): fsdev/virtfs-proxy-helper.o fsdev/9p-marshal.o fsdev/9p-iov-marshal.o libqemuutil.a libqemustub.a
fsdev/virtfs-proxy-helper$(EXESUF): LIBS += -lcap
@@ -329,7 +327,7 @@ ifneq ($(EXESUF),)
qemu-ga: qemu-ga$(EXESUF) $(QGA_VSS_PROVIDER) $(QEMU_GA_MSI)
endif
ivshmem-client$(EXESUF): $(ivshmem-client-obj-y)
ivshmem-client$(EXESUF): $(ivshmem-client-obj-y) libqemuutil.a libqemustub.a
$(call LINK, $^)
ivshmem-server$(EXESUF): $(ivshmem-server-obj-y) libqemuutil.a libqemustub.a
$(call LINK, $^)
@@ -356,6 +354,7 @@ clean:
if test -d $$d; then $(MAKE) -C $$d $@ || exit 1; fi; \
rm -f $$d/qemu-options.def; \
done
rm -f $(SUBDIR_DEVICES_MAK) config-all-devices.mak
VERSION ?= $(shell cat VERSION)
@@ -570,12 +569,6 @@ qemu-ga.8: qemu-ga.texi
$(POD2MAN) --section=8 --center=" " --release=" " qemu-ga.pod > $@, \
" GEN $@")
kvm_stat.1: scripts/kvm/kvm_stat.texi
$(call quiet-command, \
perl -Ww -- $(SRC_PATH)/scripts/texi2pod.pl $< kvm_stat.pod && \
$(POD2MAN) --section=1 --center=" " --release=" " kvm_stat.pod > $@, \
" GEN $@")
dvi: qemu-doc.dvi qemu-tech.dvi
html: qemu-doc.html qemu-tech.html
info: qemu-doc.info qemu-tech.info
@@ -651,3 +644,5 @@ endif
# Include automatically generated dependency files
# Dependencies in Makefile.objs files come from our recursive subdir rules
-include $(wildcard *.d tests/*.d)
include $(SRC_PATH)/tests/docker/Makefile.include

View File

@@ -1,6 +1,6 @@
#######################################################################
# Common libraries for tools and emulators
stub-obj-y = stubs/
stub-obj-y = stubs/ crypto/
util-obj-y = util/ qobject/ qapi/
util-obj-y += qmp-introspect.o qapi-types.o qapi-visit.o qapi-event.o
@@ -52,7 +52,6 @@ common-obj-$(CONFIG_LINUX) += fsdev/
common-obj-y += migration/
common-obj-y += qemu-char.o #aio.o
common-obj-y += page_cache.o
common-obj-y += qjson.o
common-obj-$(CONFIG_SPICE) += spice-qemu-char.o

View File

@@ -108,7 +108,12 @@ obj-$(CONFIG_LIBDECNUMBER) += libdecnumber/dpd/decimal128.o
ifdef CONFIG_LINUX_USER
QEMU_CFLAGS+=-I$(SRC_PATH)/linux-user/$(TARGET_ABI_DIR) -I$(SRC_PATH)/linux-user
# Note that we only add linux-user/host/$ARCH if it exists, and
# that it must come before linux-user/host/generic in the search path.
QEMU_CFLAGS+=-I$(SRC_PATH)/linux-user/$(TARGET_ABI_DIR) \
$(patsubst %,-I%,$(wildcard $(SRC_PATH)/linux-user/host/$(ARCH))) \
-I$(SRC_PATH)/linux-user/host/generic \
-I$(SRC_PATH)/linux-user
obj-y += linux-user/
obj-y += gdbstub.o thunk.o user-exec.o

View File

@@ -1 +1 @@
2.5.50
2.6.50

View File

@@ -77,7 +77,7 @@ static int accel_init_machine(AccelClass *acc, MachineState *ms)
return ret;
}
int configure_accelerator(MachineState *ms)
void configure_accelerator(MachineState *ms)
{
const char *p;
char buf[10];
@@ -128,8 +128,6 @@ int configure_accelerator(MachineState *ms)
if (init_failed) {
fprintf(stderr, "Back to %s accelerator.\n", acc->name);
}
return !accel_initialised;
}

View File

@@ -18,7 +18,7 @@
#include "block/block.h"
#include "qemu/queue.h"
#include "qemu/sockets.h"
#ifdef CONFIG_EPOLL
#ifdef CONFIG_EPOLL_CREATE1
#include <sys/epoll.h>
#endif
@@ -33,7 +33,7 @@ struct AioHandler
QLIST_ENTRY(AioHandler) node;
};
#ifdef CONFIG_EPOLL
#ifdef CONFIG_EPOLL_CREATE1
/* The fd number threashold to switch to epoll */
#define EPOLL_ENABLE_THRESHOLD 64
@@ -282,10 +282,12 @@ bool aio_pending(AioContext *ctx)
int revents;
revents = node->pfd.revents & node->pfd.events;
if (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR) && node->io_read) {
if (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR) && node->io_read &&
aio_node_check(ctx, node->is_external)) {
return true;
}
if (revents & (G_IO_OUT | G_IO_ERR) && node->io_write) {
if (revents & (G_IO_OUT | G_IO_ERR) && node->io_write &&
aio_node_check(ctx, node->is_external)) {
return true;
}
}
@@ -323,6 +325,7 @@ bool aio_dispatch(AioContext *ctx)
if (!node->deleted &&
(revents & (G_IO_IN | G_IO_HUP | G_IO_ERR)) &&
aio_node_check(ctx, node->is_external) &&
node->io_read) {
node->io_read(node->opaque);
@@ -333,6 +336,7 @@ bool aio_dispatch(AioContext *ctx)
}
if (!node->deleted &&
(revents & (G_IO_OUT | G_IO_ERR)) &&
aio_node_check(ctx, node->is_external) &&
node->io_write) {
node->io_write(node->opaque);
progress = true;
@@ -483,7 +487,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
void aio_context_setup(AioContext *ctx, Error **errp)
{
#ifdef CONFIG_EPOLL
#ifdef CONFIG_EPOLL_CREATE1
assert(!ctx->epollfd);
ctx->epollfd = epoll_create1(EPOLL_CLOEXEC);
if (ctx->epollfd == -1) {

View File

@@ -22,6 +22,8 @@
* THE SOFTWARE.
*/
#include "qemu/osdep.h"
#include "qemu-common.h"
#include "cpu.h"
#include "sysemu/sysemu.h"
#include "sysemu/arch_init.h"
#include "hw/pci/pci.h"
@@ -31,6 +33,7 @@
#include "qemu/error-report.h"
#include "qmp-commands.h"
#include "hw/acpi/acpi.h"
#include "qemu/help_option.h"
#ifdef TARGET_SPARC
int graphic_width = 1024;
@@ -271,13 +274,6 @@ void do_smbios_option(QemuOpts *opts)
#endif
}
void cpudef_init(void)
{
#if defined(cpudef_setup)
cpudef_setup(); /* parse cpu definitions in target config file */
#endif
}
int kvm_available(void)
{
#ifdef CONFIG_KVM

View File

@@ -23,6 +23,7 @@
*/
#include "qemu/osdep.h"
#include "qapi/error.h"
#include "qemu-common.h"
#include "block/aio.h"
#include "block/thread-pool.h"

View File

@@ -27,6 +27,7 @@
#include "monitor/monitor.h"
#include "qemu/timer.h"
#include "sysemu/sysemu.h"
#include "qemu/cutils.h"
#define AUDIO_CAP "audio"
#include "audio_int.h"
@@ -1869,8 +1870,7 @@ static void audio_init (void)
}
conf.period.ticks = 1;
} else {
conf.period.ticks =
muldiv64 (1, get_ticks_per_sec (), conf.period.hertz);
conf.period.ticks = NANOSECONDS_PER_SECOND / conf.period.hertz;
}
e = qemu_add_vm_change_state_handler (audio_vm_change_state_handler, s);

View File

@@ -24,6 +24,7 @@
*/
#include "qemu/osdep.h"
#include "qemu-common.h"
#include "qemu/bswap.h"
#include "audio.h"
#define AUDIO_CAP "mixeng"
@@ -270,7 +271,7 @@ f_sample *mixeng_clip[2][2][2][3] = {
* August 21, 1998
* Copyright 1998 Fabrice Bellard.
*
* [Rewrote completly the code of Lance Norskog And Sundry
* [Rewrote completely the code of Lance Norskog And Sundry
* Contributors with a more efficient algorithm.]
*
* This source code is freely redistributable and may be used for

View File

@@ -23,6 +23,7 @@
*/
#include "qemu/osdep.h"
#include "qemu-common.h"
#include "qemu/host-utils.h"
#include "audio.h"
#include "qemu/timer.h"
@@ -49,8 +50,8 @@ static int no_run_out (HWVoiceOut *hw, int live)
now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
ticks = now - no->old_ticks;
bytes = muldiv64 (ticks, hw->info.bytes_per_second, get_ticks_per_sec ());
bytes = audio_MIN (bytes, INT_MAX);
bytes = muldiv64(ticks, hw->info.bytes_per_second, NANOSECONDS_PER_SECOND);
bytes = audio_MIN(bytes, INT_MAX);
samples = bytes >> hw->info.shift;
no->old_ticks = now;
@@ -61,7 +62,7 @@ static int no_run_out (HWVoiceOut *hw, int live)
static int no_write (SWVoiceOut *sw, void *buf, int len)
{
return audio_pcm_sw_write (sw, buf, len);
return audio_pcm_sw_write(sw, buf, len);
}
static int no_init_out(HWVoiceOut *hw, struct audsettings *as, void *drv_opaque)
@@ -106,7 +107,7 @@ static int no_run_in (HWVoiceIn *hw)
int64_t now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
int64_t ticks = now - no->old_ticks;
int64_t bytes =
muldiv64 (ticks, hw->info.bytes_per_second, get_ticks_per_sec ());
muldiv64(ticks, hw->info.bytes_per_second, NANOSECONDS_PER_SECOND);
no->old_ticks = now;
bytes = audio_MIN (bytes, INT_MAX);

View File

@@ -898,7 +898,7 @@ static struct audio_option oss_options[] = {
.name = "EXCLUSIVE",
.tag = AUD_OPT_BOOL,
.valp = &glob_conf.exclusive,
.descr = "Open device in exclusive mode (vmix wont work)"
.descr = "Open device in exclusive mode (vmix won't work)"
},
#ifdef USE_DSP_POLICY
{

View File

@@ -781,23 +781,22 @@ static int qpa_ctl_in (HWVoiceIn *hw, int cmd, ...)
pa_threaded_mainloop_lock (g->mainloop);
/* FIXME: use the upcoming "set_source_output_{volume,mute}" */
op = pa_context_set_source_volume_by_index (g->context,
pa_stream_get_device_index (pa->stream),
op = pa_context_set_source_output_volume (g->context,
pa_stream_get_index (pa->stream),
&v, NULL, NULL);
if (!op) {
qpa_logerr (pa_context_errno (g->context),
"set_source_volume() failed\n");
"set_source_output_volume() failed\n");
} else {
pa_operation_unref(op);
}
op = pa_context_set_source_mute_by_index (g->context,
op = pa_context_set_source_output_mute (g->context,
pa_stream_get_index (pa->stream),
sw->vol.mute, NULL, NULL);
if (!op) {
qpa_logerr (pa_context_errno (g->context),
"set_source_mute() failed\n");
"set_source_output_mute() failed\n");
} else {
pa_operation_unref (op);
}

View File

@@ -19,6 +19,7 @@
#include "qemu/osdep.h"
#include "hw/hw.h"
#include "qemu/host-utils.h"
#include "qemu/error-report.h"
#include "qemu/timer.h"
#include "ui/qemu-spice.h"
@@ -104,11 +105,11 @@ static int rate_get_samples (struct audio_pcm_info *info, SpiceRateCtl *rate)
now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
ticks = now - rate->start_ticks;
bytes = muldiv64 (ticks, info->bytes_per_second, get_ticks_per_sec ());
bytes = muldiv64(ticks, info->bytes_per_second, NANOSECONDS_PER_SECOND);
samples = (bytes - rate->bytes_sent) >> info->shift;
if (samples < 0 || samples > 65536) {
error_report("Resetting rate control (%" PRId64 " samples)", samples);
rate_start (rate);
rate_start(rate);
samples = 0;
}
rate->bytes_sent += samples << info->shift;

View File

@@ -22,7 +22,7 @@
* THE SOFTWARE.
*/
#include "qemu/osdep.h"
#include "hw/hw.h"
#include "qemu/host-utils.h"
#include "qemu/timer.h"
#include "audio.h"
@@ -51,7 +51,7 @@ static int wav_run_out (HWVoiceOut *hw, int live)
int64_t now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
int64_t ticks = now - wav->old_ticks;
int64_t bytes =
muldiv64 (ticks, hw->info.bytes_per_second, get_ticks_per_sec ());
muldiv64(ticks, hw->info.bytes_per_second, NANOSECONDS_PER_SECOND);
if (bytes > INT_MAX) {
samples = INT_MAX >> hw->info.shift;

View File

@@ -22,6 +22,7 @@
* THE SOFTWARE.
*/
#include "qemu/osdep.h"
#include "qapi/error.h"
#include "qemu-common.h"
#include "sysemu/char.h"
#include "qemu/timer.h"
@@ -336,7 +337,7 @@ static int baum_eat_packet(BaumDriverState *baum, const uint8_t *buf, int len)
/* Allow 100ms to complete the DisplayData packet */
timer_mod(baum->cellCount_timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
get_ticks_per_sec() / 10);
NANOSECONDS_PER_SECOND / 10);
for (i = 0; i < baum->x * baum->y ; i++) {
EAT(c);
cells[i] = c;
@@ -567,7 +568,7 @@ static CharDriverState *chr_baum_init(const char *id,
ChardevReturn *ret,
Error **errp)
{
ChardevCommon *common = qapi_ChardevDummy_base(backend->u.braille);
ChardevCommon *common = backend->u.braille.data;
BaumDriverState *baum;
CharDriverState *chr;
brlapi_handle_t *handle;

View File

@@ -10,6 +10,7 @@
* See the COPYING file in the top-level directory.
*/
#include "qemu/osdep.h"
#include "qapi/error.h"
#include "qemu-common.h"
#include "sysemu/hostmem.h"
#include "sysemu/sysemu.h"
@@ -51,11 +52,14 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
error_setg(errp, "-mem-path not supported on this host");
#else
if (!memory_region_size(&backend->mr)) {
gchar *path;
backend->force_prealloc = mem_prealloc;
path = object_get_canonical_path(OBJECT(backend));
memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
object_get_canonical_path(OBJECT(backend)),
path,
backend->size, fb->share,
fb->mem_path, errp);
g_free(path);
}
#endif
}
@@ -117,11 +121,19 @@ file_backend_instance_init(Object *o)
set_mem_path, NULL);
}
static void file_backend_instance_finalize(Object *o)
{
HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
g_free(fb->mem_path);
}
static const TypeInfo file_backend_info = {
.name = TYPE_MEMORY_BACKEND_FILE,
.parent = TYPE_MEMORY_BACKEND,
.class_init = file_backend_class_init,
.instance_init = file_backend_instance_init,
.instance_finalize = file_backend_instance_finalize,
.instance_size = sizeof(HostMemoryBackendFile),
};

View File

@@ -11,6 +11,7 @@
*/
#include "qemu/osdep.h"
#include "sysemu/hostmem.h"
#include "qapi/error.h"
#include "qom/object_interfaces.h"
#define TYPE_MEMORY_BACKEND_RAM "memory-backend-ram"

View File

@@ -12,6 +12,7 @@
#include "qemu/osdep.h"
#include "sysemu/hostmem.h"
#include "hw/boards.h"
#include "qapi/error.h"
#include "qapi/visitor.h"
#include "qapi-types.h"
#include "qapi-visit.h"

View File

@@ -68,7 +68,7 @@ static CharDriverState *qemu_chr_open_msmouse(const char *id,
ChardevReturn *ret,
Error **errp)
{
ChardevCommon *common = qapi_ChardevDummy_base(backend->u.msmouse);
ChardevCommon *common = backend->u.msmouse.data;
CharDriverState *chr;
chr = qemu_chr_alloc(common, errp);

View File

@@ -13,6 +13,7 @@
#include "qemu/osdep.h"
#include "sysemu/rng.h"
#include "sysemu/char.h"
#include "qapi/error.h"
#include "qapi/qmp/qerror.h"
#include "hw/qdev.h" /* just for DEFINE_PROP_CHR */
@@ -25,33 +26,12 @@ typedef struct RngEgd
CharDriverState *chr;
char *chr_name;
GSList *requests;
} RngEgd;
typedef struct RngRequest
{
EntropyReceiveFunc *receive_entropy;
uint8_t *data;
void *opaque;
size_t offset;
size_t size;
} RngRequest;
static void rng_egd_request_entropy(RngBackend *b, size_t size,
EntropyReceiveFunc *receive_entropy,
void *opaque)
static void rng_egd_request_entropy(RngBackend *b, RngRequest *req)
{
RngEgd *s = RNG_EGD(b);
RngRequest *req;
req = g_malloc(sizeof(*req));
req->offset = 0;
req->size = size;
req->receive_entropy = receive_entropy;
req->opaque = opaque;
req->data = g_malloc(req->size);
size_t size = req->size;
while (size > 0) {
uint8_t header[2];
@@ -65,24 +45,15 @@ static void rng_egd_request_entropy(RngBackend *b, size_t size,
size -= len;
}
s->requests = g_slist_append(s->requests, req);
}
static void rng_egd_free_request(RngRequest *req)
{
g_free(req->data);
g_free(req);
}
static int rng_egd_chr_can_read(void *opaque)
{
RngEgd *s = RNG_EGD(opaque);
GSList *i;
RngRequest *req;
int size = 0;
for (i = s->requests; i; i = i->next) {
RngRequest *req = i->data;
QSIMPLEQ_FOREACH(req, &s->parent.requests, next) {
size += req->size - req->offset;
}
@@ -94,8 +65,8 @@ static void rng_egd_chr_read(void *opaque, const uint8_t *buf, int size)
RngEgd *s = RNG_EGD(opaque);
size_t buf_offset = 0;
while (size > 0 && s->requests) {
RngRequest *req = s->requests->data;
while (size > 0 && !QSIMPLEQ_EMPTY(&s->parent.requests)) {
RngRequest *req = QSIMPLEQ_FIRST(&s->parent.requests);
int len = MIN(size, req->size - req->offset);
memcpy(req->data + req->offset, buf + buf_offset, len);
@@ -104,38 +75,13 @@ static void rng_egd_chr_read(void *opaque, const uint8_t *buf, int size)
size -= len;
if (req->offset == req->size) {
s->requests = g_slist_remove_link(s->requests, s->requests);
req->receive_entropy(req->opaque, req->data, req->size);
rng_egd_free_request(req);
rng_backend_finalize_request(&s->parent, req);
}
}
}
static void rng_egd_free_requests(RngEgd *s)
{
GSList *i;
for (i = s->requests; i; i = i->next) {
rng_egd_free_request(i->data);
}
g_slist_free(s->requests);
s->requests = NULL;
}
static void rng_egd_cancel_requests(RngBackend *b)
{
RngEgd *s = RNG_EGD(b);
/* We simply delete the list of pending requests. If there is data in the
* queue waiting to be read, this is okay, because there will always be
* more data than we requested originally
*/
rng_egd_free_requests(s);
}
static void rng_egd_opened(RngBackend *b, Error **errp)
{
RngEgd *s = RNG_EGD(b);
@@ -204,8 +150,6 @@ static void rng_egd_finalize(Object *obj)
}
g_free(s->chr_name);
rng_egd_free_requests(s);
}
static void rng_egd_class_init(ObjectClass *klass, void *data)
@@ -213,7 +157,6 @@ static void rng_egd_class_init(ObjectClass *klass, void *data)
RngBackendClass *rbc = RNG_BACKEND_CLASS(klass);
rbc->request_entropy = rng_egd_request_entropy;
rbc->cancel_requests = rng_egd_cancel_requests;
rbc->opened = rng_egd_opened;
}

View File

@@ -13,19 +13,16 @@
#include "qemu/osdep.h"
#include "sysemu/rng-random.h"
#include "sysemu/rng.h"
#include "qapi/error.h"
#include "qapi/qmp/qerror.h"
#include "qemu/main-loop.h"
struct RndRandom
struct RngRandom
{
RngBackend parent;
int fd;
char *filename;
EntropyReceiveFunc *receive_func;
void *opaque;
size_t size;
};
/**
@@ -37,42 +34,41 @@ struct RndRandom
static void entropy_available(void *opaque)
{
RndRandom *s = RNG_RANDOM(opaque);
uint8_t buffer[s->size];
ssize_t len;
RngRandom *s = RNG_RANDOM(opaque);
len = read(s->fd, buffer, s->size);
if (len < 0 && errno == EAGAIN) {
return;
while (!QSIMPLEQ_EMPTY(&s->parent.requests)) {
RngRequest *req = QSIMPLEQ_FIRST(&s->parent.requests);
ssize_t len;
len = read(s->fd, req->data, req->size);
if (len < 0 && errno == EAGAIN) {
return;
}
g_assert(len != -1);
req->receive_entropy(req->opaque, req->data, len);
rng_backend_finalize_request(&s->parent, req);
}
g_assert(len != -1);
s->receive_func(s->opaque, buffer, len);
s->receive_func = NULL;
/* We've drained all requests, the fd handler can be reset. */
qemu_set_fd_handler(s->fd, NULL, NULL, NULL);
}
static void rng_random_request_entropy(RngBackend *b, size_t size,
EntropyReceiveFunc *receive_entropy,
void *opaque)
static void rng_random_request_entropy(RngBackend *b, RngRequest *req)
{
RndRandom *s = RNG_RANDOM(b);
RngRandom *s = RNG_RANDOM(b);
if (s->receive_func) {
s->receive_func(s->opaque, NULL, 0);
if (QSIMPLEQ_EMPTY(&s->parent.requests)) {
/* If there are no pending requests yet, we need to
* install our fd handler. */
qemu_set_fd_handler(s->fd, entropy_available, NULL, s);
}
s->receive_func = receive_entropy;
s->opaque = opaque;
s->size = size;
qemu_set_fd_handler(s->fd, entropy_available, NULL, s);
}
static void rng_random_opened(RngBackend *b, Error **errp)
{
RndRandom *s = RNG_RANDOM(b);
RngRandom *s = RNG_RANDOM(b);
if (s->filename == NULL) {
error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
@@ -87,7 +83,7 @@ static void rng_random_opened(RngBackend *b, Error **errp)
static char *rng_random_get_filename(Object *obj, Error **errp)
{
RndRandom *s = RNG_RANDOM(obj);
RngRandom *s = RNG_RANDOM(obj);
return g_strdup(s->filename);
}
@@ -96,7 +92,7 @@ static void rng_random_set_filename(Object *obj, const char *filename,
Error **errp)
{
RngBackend *b = RNG_BACKEND(obj);
RndRandom *s = RNG_RANDOM(obj);
RngRandom *s = RNG_RANDOM(obj);
if (b->opened) {
error_setg(errp, QERR_PERMISSION_DENIED);
@@ -109,7 +105,7 @@ static void rng_random_set_filename(Object *obj, const char *filename,
static void rng_random_init(Object *obj)
{
RndRandom *s = RNG_RANDOM(obj);
RngRandom *s = RNG_RANDOM(obj);
object_property_add_str(obj, "filename",
rng_random_get_filename,
@@ -122,7 +118,7 @@ static void rng_random_init(Object *obj)
static void rng_random_finalize(Object *obj)
{
RndRandom *s = RNG_RANDOM(obj);
RngRandom *s = RNG_RANDOM(obj);
if (s->fd != -1) {
qemu_set_fd_handler(s->fd, NULL, NULL, NULL);
@@ -143,7 +139,7 @@ static void rng_random_class_init(ObjectClass *klass, void *data)
static const TypeInfo rng_random_info = {
.name = TYPE_RNG_RANDOM,
.parent = TYPE_RNG_BACKEND,
.instance_size = sizeof(RndRandom),
.instance_size = sizeof(RngRandom),
.class_init = rng_random_class_init,
.instance_init = rng_random_init,
.instance_finalize = rng_random_finalize,

View File

@@ -12,6 +12,7 @@
#include "qemu/osdep.h"
#include "sysemu/rng.h"
#include "qapi/error.h"
#include "qapi/qmp/qerror.h"
#include "qom/object_interfaces.h"
@@ -20,18 +21,20 @@ void rng_backend_request_entropy(RngBackend *s, size_t size,
void *opaque)
{
RngBackendClass *k = RNG_BACKEND_GET_CLASS(s);
RngRequest *req;
if (k->request_entropy) {
k->request_entropy(s, size, receive_entropy, opaque);
}
}
req = g_malloc(sizeof(*req));
void rng_backend_cancel_requests(RngBackend *s)
{
RngBackendClass *k = RNG_BACKEND_GET_CLASS(s);
req->offset = 0;
req->size = size;
req->receive_entropy = receive_entropy;
req->opaque = opaque;
req->data = g_malloc(req->size);
if (k->cancel_requests) {
k->cancel_requests(s);
k->request_entropy(s, req);
QSIMPLEQ_INSERT_TAIL(&s->requests, req, next);
}
}
@@ -73,14 +76,48 @@ static void rng_backend_prop_set_opened(Object *obj, bool value, Error **errp)
s->opened = true;
}
static void rng_backend_free_request(RngRequest *req)
{
g_free(req->data);
g_free(req);
}
static void rng_backend_free_requests(RngBackend *s)
{
RngRequest *req, *next;
QSIMPLEQ_FOREACH_SAFE(req, &s->requests, next, next) {
rng_backend_free_request(req);
}
QSIMPLEQ_INIT(&s->requests);
}
void rng_backend_finalize_request(RngBackend *s, RngRequest *req)
{
QSIMPLEQ_REMOVE(&s->requests, req, RngRequest, next);
rng_backend_free_request(req);
}
static void rng_backend_init(Object *obj)
{
RngBackend *s = RNG_BACKEND(obj);
QSIMPLEQ_INIT(&s->requests);
object_property_add_bool(obj, "opened",
rng_backend_prop_get_opened,
rng_backend_prop_set_opened,
NULL);
}
static void rng_backend_finalize(Object *obj)
{
RngBackend *s = RNG_BACKEND(obj);
rng_backend_free_requests(s);
}
static void rng_backend_class_init(ObjectClass *oc, void *data)
{
UserCreatableClass *ucc = USER_CREATABLE_CLASS(oc);
@@ -93,6 +130,7 @@ static const TypeInfo rng_backend_info = {
.parent = TYPE_OBJECT,
.instance_size = sizeof(RngBackend),
.instance_init = rng_backend_init,
.instance_finalize = rng_backend_finalize,
.class_size = sizeof(RngBackendClass),
.class_init = rng_backend_class_init,
.abstract = true,

View File

@@ -14,6 +14,7 @@
#include "qemu/osdep.h"
#include "sysemu/tpm_backend.h"
#include "qapi/error.h"
#include "qapi/qmp/qerror.h"
#include "sysemu/tpm.h"
#include "qemu/thread.h"

1091
block.c

File diff suppressed because it is too large Load Diff

View File

@@ -4,7 +4,7 @@ block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
block-obj-y += qed-check.o
block-obj-$(CONFIG_VHDX) += vhdx.o vhdx-endian.o vhdx-log.o
block-obj-y += quorum.o
block-obj-y += parallels.o blkdebug.o blkverify.o
block-obj-y += parallels.o blkdebug.o blkverify.o blkreplay.o
block-obj-y += block-backend.o snapshot.o qapi.o
block-obj-$(CONFIG_WIN32) += raw-win32.o win32-aio.o
block-obj-$(CONFIG_POSIX) += raw-posix.o
@@ -20,9 +20,11 @@ block-obj-$(CONFIG_RBD) += rbd.o
block-obj-$(CONFIG_GLUSTERFS) += gluster.o
block-obj-$(CONFIG_ARCHIPELAGO) += archipelago.o
block-obj-$(CONFIG_LIBSSH2) += ssh.o
block-obj-y += accounting.o
block-obj-y += accounting.o dirty-bitmap.o
block-obj-y += write-threshold.o
block-obj-y += crypto.o
common-obj-y += stream.o
common-obj-y += commit.o
common-obj-y += backup.o

View File

@@ -51,7 +51,7 @@
*/
#include "qemu/osdep.h"
#include "qemu-common.h"
#include "qemu/cutils.h"
#include "block/block_int.h"
#include "qemu/error-report.h"
#include "qemu/thread.h"

View File

@@ -17,14 +17,14 @@
#include "block/block.h"
#include "block/block_int.h"
#include "block/blockjob.h"
#include "qapi/error.h"
#include "qapi/qmp/qerror.h"
#include "qemu/ratelimit.h"
#include "qemu/cutils.h"
#include "sysemu/block-backend.h"
#include "qemu/bitmap.h"
#define BACKUP_CLUSTER_BITS 16
#define BACKUP_CLUSTER_SIZE (1 << BACKUP_CLUSTER_BITS)
#define BACKUP_SECTORS_PER_CLUSTER (BACKUP_CLUSTER_SIZE / BDRV_SECTOR_SIZE)
#define BACKUP_CLUSTER_SIZE_DEFAULT (1 << 16)
#define SLICE_TIME 100000000ULL /* ns */
typedef struct CowRequest {
@@ -36,7 +36,7 @@ typedef struct CowRequest {
typedef struct BackupBlockJob {
BlockJob common;
BlockDriverState *target;
BlockBackend *target;
/* bitmap for sync=incremental */
BdrvDirtyBitmap *sync_bitmap;
MirrorSyncMode sync_mode;
@@ -45,10 +45,18 @@ typedef struct BackupBlockJob {
BlockdevOnError on_target_error;
CoRwlock flush_rwlock;
uint64_t sectors_read;
HBitmap *bitmap;
unsigned long *done_bitmap;
int64_t cluster_size;
NotifierWithReturn before_write;
QLIST_HEAD(, CowRequest) inflight_reqs;
} BackupBlockJob;
/* Size of a cluster in sectors, instead of bytes. */
static inline int64_t cluster_size_sectors(BackupBlockJob *job)
{
return job->cluster_size / BDRV_SECTOR_SIZE;
}
/* See if in-flight requests overlap and wait for them to complete */
static void coroutine_fn wait_for_overlapping_requests(BackupBlockJob *job,
int64_t start,
@@ -86,24 +94,25 @@ static void cow_request_end(CowRequest *req)
qemu_co_queue_restart_all(&req->wait_queue);
}
static int coroutine_fn backup_do_cow(BlockDriverState *bs,
static int coroutine_fn backup_do_cow(BackupBlockJob *job,
int64_t sector_num, int nb_sectors,
bool *error_is_read,
bool is_write_notifier)
{
BackupBlockJob *job = (BackupBlockJob *)bs->job;
BlockBackend *blk = job->common.blk;
CowRequest cow_request;
struct iovec iov;
QEMUIOVector bounce_qiov;
void *bounce_buffer = NULL;
int ret = 0;
int64_t sectors_per_cluster = cluster_size_sectors(job);
int64_t start, end;
int n;
qemu_co_rwlock_rdlock(&job->flush_rwlock);
start = sector_num / BACKUP_SECTORS_PER_CLUSTER;
end = DIV_ROUND_UP(sector_num + nb_sectors, BACKUP_SECTORS_PER_CLUSTER);
start = sector_num / sectors_per_cluster;
end = DIV_ROUND_UP(sector_num + nb_sectors, sectors_per_cluster);
trace_backup_do_cow_enter(job, start, sector_num, nb_sectors);
@@ -111,32 +120,27 @@ static int coroutine_fn backup_do_cow(BlockDriverState *bs,
cow_request_begin(&cow_request, job, start, end);
for (; start < end; start++) {
if (hbitmap_get(job->bitmap, start)) {
if (test_bit(start, job->done_bitmap)) {
trace_backup_do_cow_skip(job, start);
continue; /* already copied */
}
trace_backup_do_cow_process(job, start);
n = MIN(BACKUP_SECTORS_PER_CLUSTER,
n = MIN(sectors_per_cluster,
job->common.len / BDRV_SECTOR_SIZE -
start * BACKUP_SECTORS_PER_CLUSTER);
start * sectors_per_cluster);
if (!bounce_buffer) {
bounce_buffer = qemu_blockalign(bs, BACKUP_CLUSTER_SIZE);
bounce_buffer = blk_blockalign(blk, job->cluster_size);
}
iov.iov_base = bounce_buffer;
iov.iov_len = n * BDRV_SECTOR_SIZE;
qemu_iovec_init_external(&bounce_qiov, &iov, 1);
if (is_write_notifier) {
ret = bdrv_co_readv_no_serialising(bs,
start * BACKUP_SECTORS_PER_CLUSTER,
n, &bounce_qiov);
} else {
ret = bdrv_co_readv(bs, start * BACKUP_SECTORS_PER_CLUSTER, n,
&bounce_qiov);
}
ret = blk_co_preadv(blk, start * job->cluster_size,
bounce_qiov.size, &bounce_qiov,
is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0);
if (ret < 0) {
trace_backup_do_cow_read_fail(job, start, ret);
if (error_is_read) {
@@ -146,13 +150,11 @@ static int coroutine_fn backup_do_cow(BlockDriverState *bs,
}
if (buffer_is_zero(iov.iov_base, iov.iov_len)) {
ret = bdrv_co_write_zeroes(job->target,
start * BACKUP_SECTORS_PER_CLUSTER,
n, BDRV_REQ_MAY_UNMAP);
ret = blk_co_pwrite_zeroes(job->target, start * job->cluster_size,
bounce_qiov.size, BDRV_REQ_MAY_UNMAP);
} else {
ret = bdrv_co_writev(job->target,
start * BACKUP_SECTORS_PER_CLUSTER, n,
&bounce_qiov);
ret = blk_co_pwritev(job->target, start * job->cluster_size,
bounce_qiov.size, &bounce_qiov, 0);
}
if (ret < 0) {
trace_backup_do_cow_write_fail(job, start, ret);
@@ -162,7 +164,7 @@ static int coroutine_fn backup_do_cow(BlockDriverState *bs,
goto out;
}
hbitmap_set(job->bitmap, start, 1);
set_bit(start, job->done_bitmap);
/* Publish progress, guest I/O counts as progress too. Note that the
* offset field is an opaque progress value, it is not a disk offset.
@@ -189,14 +191,16 @@ static int coroutine_fn backup_before_write_notify(
NotifierWithReturn *notifier,
void *opaque)
{
BackupBlockJob *job = container_of(notifier, BackupBlockJob, before_write);
BdrvTrackedRequest *req = opaque;
int64_t sector_num = req->offset >> BDRV_SECTOR_BITS;
int nb_sectors = req->bytes >> BDRV_SECTOR_BITS;
assert(req->bs == blk_bs(job->common.blk));
assert((req->offset & (BDRV_SECTOR_SIZE - 1)) == 0);
assert((req->bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
return backup_do_cow(req->bs, sector_num, nb_sectors, NULL, true);
return backup_do_cow(job, sector_num, nb_sectors, NULL, true);
}
static void backup_set_speed(BlockJob *job, int64_t speed, Error **errp)
@@ -210,19 +214,10 @@ static void backup_set_speed(BlockJob *job, int64_t speed, Error **errp)
ratelimit_set_speed(&s->limit, speed / BDRV_SECTOR_SIZE, SLICE_TIME);
}
static void backup_iostatus_reset(BlockJob *job)
{
BackupBlockJob *s = container_of(job, BackupBlockJob, common);
if (s->target->blk) {
blk_iostatus_reset(s->target->blk);
}
}
static void backup_cleanup_sync_bitmap(BackupBlockJob *job, int ret)
{
BdrvDirtyBitmap *bm;
BlockDriverState *bs = job->common.bs;
BlockDriverState *bs = blk_bs(job->common.blk);
if (ret < 0 || block_job_is_cancelled(&job->common)) {
/* Merge the successor back into the parent, delete nothing. */
@@ -255,7 +250,6 @@ static const BlockJobDriver backup_job_driver = {
.instance_size = sizeof(BackupBlockJob),
.job_type = BLOCK_JOB_TYPE_BACKUP,
.set_speed = backup_set_speed,
.iostatus_reset = backup_iostatus_reset,
.commit = backup_commit,
.abort = backup_abort,
};
@@ -264,11 +258,11 @@ static BlockErrorAction backup_error_action(BackupBlockJob *job,
bool read, int error)
{
if (read) {
return block_job_error_action(&job->common, job->common.bs,
job->on_source_error, true, error);
return block_job_error_action(&job->common, job->on_source_error,
true, error);
} else {
return block_job_error_action(&job->common, job->target,
job->on_target_error, false, error);
return block_job_error_action(&job->common, job->on_target_error,
false, error);
}
}
@@ -281,7 +275,7 @@ static void backup_complete(BlockJob *job, void *opaque)
BackupBlockJob *s = container_of(job, BackupBlockJob, common);
BackupCompleteData *data = opaque;
bdrv_unref(s->target);
blk_unref(s->target);
block_job_completed(job, data->ret);
g_free(data);
@@ -322,21 +316,21 @@ static int coroutine_fn backup_run_incremental(BackupBlockJob *job)
int64_t cluster;
int64_t end;
int64_t last_cluster = -1;
BlockDriverState *bs = job->common.bs;
int64_t sectors_per_cluster = cluster_size_sectors(job);
HBitmapIter hbi;
granularity = bdrv_dirty_bitmap_granularity(job->sync_bitmap);
clusters_per_iter = MAX((granularity / BACKUP_CLUSTER_SIZE), 1);
clusters_per_iter = MAX((granularity / job->cluster_size), 1);
bdrv_dirty_iter_init(job->sync_bitmap, &hbi);
/* Find the next dirty sector(s) */
while ((sector = hbitmap_iter_next(&hbi)) != -1) {
cluster = sector / BACKUP_SECTORS_PER_CLUSTER;
cluster = sector / sectors_per_cluster;
/* Fake progress updates for any clusters we skipped */
if (cluster != last_cluster + 1) {
job->common.offset += ((cluster - last_cluster - 1) *
BACKUP_CLUSTER_SIZE);
job->cluster_size);
}
for (end = cluster + clusters_per_iter; cluster < end; cluster++) {
@@ -344,8 +338,8 @@ static int coroutine_fn backup_run_incremental(BackupBlockJob *job)
if (yield_and_check(job)) {
return ret;
}
ret = backup_do_cow(bs, cluster * BACKUP_SECTORS_PER_CLUSTER,
BACKUP_SECTORS_PER_CLUSTER, &error_is_read,
ret = backup_do_cow(job, cluster * sectors_per_cluster,
sectors_per_cluster, &error_is_read,
false);
if ((ret < 0) &&
backup_error_action(job, error_is_read, -ret) ==
@@ -357,17 +351,17 @@ static int coroutine_fn backup_run_incremental(BackupBlockJob *job)
/* If the bitmap granularity is smaller than the backup granularity,
* we need to advance the iterator pointer to the next cluster. */
if (granularity < BACKUP_CLUSTER_SIZE) {
bdrv_set_dirty_iter(&hbi, cluster * BACKUP_SECTORS_PER_CLUSTER);
if (granularity < job->cluster_size) {
bdrv_set_dirty_iter(&hbi, cluster * sectors_per_cluster);
}
last_cluster = cluster - 1;
}
/* Play some final catchup with the progress meter */
end = DIV_ROUND_UP(job->common.len, BACKUP_CLUSTER_SIZE);
end = DIV_ROUND_UP(job->common.len, job->cluster_size);
if (last_cluster + 1 < end) {
job->common.offset += ((end - last_cluster - 1) * BACKUP_CLUSTER_SIZE);
job->common.offset += ((end - last_cluster - 1) * job->cluster_size);
}
return ret;
@@ -377,30 +371,22 @@ static void coroutine_fn backup_run(void *opaque)
{
BackupBlockJob *job = opaque;
BackupCompleteData *data;
BlockDriverState *bs = job->common.bs;
BlockDriverState *target = job->target;
BlockdevOnError on_target_error = job->on_target_error;
NotifierWithReturn before_write = {
.notify = backup_before_write_notify,
};
BlockDriverState *bs = blk_bs(job->common.blk);
BlockBackend *target = job->target;
int64_t start, end;
int64_t sectors_per_cluster = cluster_size_sectors(job);
int ret = 0;
QLIST_INIT(&job->inflight_reqs);
qemu_co_rwlock_init(&job->flush_rwlock);
start = 0;
end = DIV_ROUND_UP(job->common.len, BACKUP_CLUSTER_SIZE);
end = DIV_ROUND_UP(job->common.len, job->cluster_size);
job->bitmap = hbitmap_alloc(end, 0);
job->done_bitmap = bitmap_new(end);
bdrv_set_enable_write_cache(target, true);
if (target->blk) {
blk_set_on_error(target->blk, on_target_error, on_target_error);
blk_iostatus_enable(target->blk);
}
bdrv_add_before_write_notifier(bs, &before_write);
job->before_write.notify = backup_before_write_notify;
bdrv_add_before_write_notifier(bs, &job->before_write);
if (job->sync_mode == MIRROR_SYNC_MODE_NONE) {
while (!block_job_is_cancelled(&job->common)) {
@@ -427,7 +413,7 @@ static void coroutine_fn backup_run(void *opaque)
/* Check to see if these blocks are already in the
* backing file. */
for (i = 0; i < BACKUP_SECTORS_PER_CLUSTER;) {
for (i = 0; i < sectors_per_cluster;) {
/* bdrv_is_allocated() only returns true/false based
* on the first set of sectors it comes across that
* are are all in the same state.
@@ -436,8 +422,8 @@ static void coroutine_fn backup_run(void *opaque)
* needed but at some point that is always the case. */
alloced =
bdrv_is_allocated(bs,
start * BACKUP_SECTORS_PER_CLUSTER + i,
BACKUP_SECTORS_PER_CLUSTER - i, &n);
start * sectors_per_cluster + i,
sectors_per_cluster - i, &n);
i += n;
if (alloced == 1 || n == 0) {
@@ -452,8 +438,8 @@ static void coroutine_fn backup_run(void *opaque)
}
}
/* FULL sync mode we copy the whole drive. */
ret = backup_do_cow(bs, start * BACKUP_SECTORS_PER_CLUSTER,
BACKUP_SECTORS_PER_CLUSTER, &error_is_read, false);
ret = backup_do_cow(job, start * sectors_per_cluster,
sectors_per_cluster, &error_is_read, false);
if (ret < 0) {
/* Depending on error action, fail now or retry cluster */
BlockErrorAction action =
@@ -468,17 +454,14 @@ static void coroutine_fn backup_run(void *opaque)
}
}
notifier_with_return_remove(&before_write);
notifier_with_return_remove(&job->before_write);
/* wait until pending backup_do_cow() calls have completed */
qemu_co_rwlock_wrlock(&job->flush_rwlock);
qemu_co_rwlock_unlock(&job->flush_rwlock);
hbitmap_free(job->bitmap);
g_free(job->done_bitmap);
if (target->blk) {
blk_iostatus_disable(target->blk);
}
bdrv_op_unblock_all(target, job->common.blocker);
bdrv_op_unblock_all(blk_bs(target), job->common.blocker);
data = g_malloc(sizeof(*data));
data->ret = ret;
@@ -494,6 +477,9 @@ void backup_start(BlockDriverState *bs, BlockDriverState *target,
BlockJobTxn *txn, Error **errp)
{
int64_t len;
BlockDriverInfo bdi;
BackupBlockJob *job = NULL;
int ret;
assert(bs);
assert(target);
@@ -504,13 +490,6 @@ void backup_start(BlockDriverState *bs, BlockDriverState *target,
return;
}
if ((on_source_error == BLOCKDEV_ON_ERROR_STOP ||
on_source_error == BLOCKDEV_ON_ERROR_ENOSPC) &&
(!bs->blk || !blk_iostatus_is_enabled(bs->blk))) {
error_setg(errp, QERR_INVALID_PARAMETER, "on-source-error");
return;
}
if (!bdrv_is_inserted(bs)) {
error_setg(errp, "Device is not inserted: %s",
bdrv_get_device_name(bs));
@@ -557,20 +536,39 @@ void backup_start(BlockDriverState *bs, BlockDriverState *target,
goto error;
}
BackupBlockJob *job = block_job_create(&backup_job_driver, bs, speed,
cb, opaque, errp);
job = block_job_create(&backup_job_driver, bs, speed, cb, opaque, errp);
if (!job) {
goto error;
}
bdrv_op_block_all(target, job->common.blocker);
job->target = blk_new();
blk_insert_bs(job->target, target);
job->on_source_error = on_source_error;
job->on_target_error = on_target_error;
job->target = target;
job->sync_mode = sync_mode;
job->sync_bitmap = sync_mode == MIRROR_SYNC_MODE_INCREMENTAL ?
sync_bitmap : NULL;
/* If there is no backing file on the target, we cannot rely on COW if our
* backup cluster size is smaller than the target cluster size. Even for
* targets with a backing file, try to avoid COW if possible. */
ret = bdrv_get_info(target, &bdi);
if (ret < 0 && !target->backing) {
error_setg_errno(errp, -ret,
"Couldn't determine the cluster size of the target image, "
"which has no backing file");
error_append_hint(errp,
"Aborting, since this may create an unusable destination image\n");
goto error;
} else if (ret < 0 && target->backing) {
/* Not fatal; just trudge on ahead. */
job->cluster_size = BACKUP_CLUSTER_SIZE_DEFAULT;
} else {
job->cluster_size = MAX(BACKUP_CLUSTER_SIZE_DEFAULT, bdi.cluster_size);
}
bdrv_op_block_all(target, job->common.blocker);
job->common.len = len;
job->common.co = qemu_coroutine_create(backup_run);
block_job_txn_add_job(txn, &job->common);
@@ -581,4 +579,8 @@ void backup_start(BlockDriverState *bs, BlockDriverState *target,
if (sync_bitmap) {
bdrv_reclaim_dirty_bitmap(bs, sync_bitmap, NULL);
}
if (job) {
blk_unref(job->target);
block_job_unref(&job->common);
}
}

View File

@@ -23,7 +23,8 @@
*/
#include "qemu/osdep.h"
#include "qemu-common.h"
#include "qapi/error.h"
#include "qemu/cutils.h"
#include "qemu/config-file.h"
#include "block/block_int.h"
#include "qemu/module.h"

160
block/blkreplay.c Executable file
View File

@@ -0,0 +1,160 @@
/*
* Block protocol for record/replay
*
* Copyright (c) 2010-2016 Institute for System Programming
* of the Russian Academy of Sciences.
*
* This work is licensed under the terms of the GNU GPL, version 2 or later.
* See the COPYING file in the top-level directory.
*
*/
#include "qemu/osdep.h"
#include "qemu-common.h"
#include "block/block_int.h"
#include "sysemu/replay.h"
#include "qapi/error.h"
typedef struct Request {
Coroutine *co;
QEMUBH *bh;
} Request;
/* Next request id.
This counter is global, because requests from different
block devices should not get overlapping ids. */
static uint64_t request_id;
static int blkreplay_open(BlockDriverState *bs, QDict *options, int flags,
Error **errp)
{
Error *local_err = NULL;
int ret;
/* Open the image file */
bs->file = bdrv_open_child(NULL, options, "image",
bs, &child_file, false, &local_err);
if (local_err) {
ret = -EINVAL;
error_propagate(errp, local_err);
goto fail;
}
ret = 0;
fail:
if (ret < 0) {
bdrv_unref_child(bs, bs->file);
}
return ret;
}
static void blkreplay_close(BlockDriverState *bs)
{
}
static int64_t blkreplay_getlength(BlockDriverState *bs)
{
return bdrv_getlength(bs->file->bs);
}
/* This bh is used for synchronization of return from coroutines.
It continues yielded coroutine which then finishes its execution.
BH is called adjusted to some replay checkpoint, therefore
record and replay will always finish coroutines deterministically.
*/
static void blkreplay_bh_cb(void *opaque)
{
Request *req = opaque;
qemu_coroutine_enter(req->co, NULL);
qemu_bh_delete(req->bh);
g_free(req);
}
static void block_request_create(uint64_t reqid, BlockDriverState *bs,
Coroutine *co)
{
Request *req = g_new(Request, 1);
*req = (Request) {
.co = co,
.bh = aio_bh_new(bdrv_get_aio_context(bs), blkreplay_bh_cb, req),
};
replay_block_event(req->bh, reqid);
}
static int coroutine_fn blkreplay_co_readv(BlockDriverState *bs,
int64_t sector_num, int nb_sectors, QEMUIOVector *qiov)
{
uint64_t reqid = request_id++;
int ret = bdrv_co_readv(bs->file->bs, sector_num, nb_sectors, qiov);
block_request_create(reqid, bs, qemu_coroutine_self());
qemu_coroutine_yield();
return ret;
}
static int coroutine_fn blkreplay_co_writev(BlockDriverState *bs,
int64_t sector_num, int nb_sectors, QEMUIOVector *qiov)
{
uint64_t reqid = request_id++;
int ret = bdrv_co_writev(bs->file->bs, sector_num, nb_sectors, qiov);
block_request_create(reqid, bs, qemu_coroutine_self());
qemu_coroutine_yield();
return ret;
}
static int coroutine_fn blkreplay_co_write_zeroes(BlockDriverState *bs,
int64_t sector_num, int nb_sectors, BdrvRequestFlags flags)
{
uint64_t reqid = request_id++;
int ret = bdrv_co_write_zeroes(bs->file->bs, sector_num, nb_sectors, flags);
block_request_create(reqid, bs, qemu_coroutine_self());
qemu_coroutine_yield();
return ret;
}
static int coroutine_fn blkreplay_co_discard(BlockDriverState *bs,
int64_t sector_num, int nb_sectors)
{
uint64_t reqid = request_id++;
int ret = bdrv_co_discard(bs->file->bs, sector_num, nb_sectors);
block_request_create(reqid, bs, qemu_coroutine_self());
qemu_coroutine_yield();
return ret;
}
static int coroutine_fn blkreplay_co_flush(BlockDriverState *bs)
{
uint64_t reqid = request_id++;
int ret = bdrv_co_flush(bs->file->bs);
block_request_create(reqid, bs, qemu_coroutine_self());
qemu_coroutine_yield();
return ret;
}
static BlockDriver bdrv_blkreplay = {
.format_name = "blkreplay",
.protocol_name = "blkreplay",
.instance_size = 0,
.bdrv_file_open = blkreplay_open,
.bdrv_close = blkreplay_close,
.bdrv_getlength = blkreplay_getlength,
.bdrv_co_readv = blkreplay_co_readv,
.bdrv_co_writev = blkreplay_co_writev,
.bdrv_co_write_zeroes = blkreplay_co_write_zeroes,
.bdrv_co_discard = blkreplay_co_discard,
.bdrv_co_flush = blkreplay_co_flush,
};
static void bdrv_blkreplay_init(void)
{
bdrv_register(&bdrv_blkreplay);
}
block_init(bdrv_blkreplay_init);

View File

@@ -8,10 +8,12 @@
*/
#include "qemu/osdep.h"
#include "qapi/error.h"
#include "qemu/sockets.h" /* for EINPROGRESS on Windows */
#include "block/block_int.h"
#include "qapi/qmp/qdict.h"
#include "qapi/qmp/qstring.h"
#include "qemu/cutils.h"
typedef struct {
BdrvChild *test_file;
@@ -291,22 +293,6 @@ static bool blkverify_recurse_is_first_non_filter(BlockDriverState *bs,
return bdrv_recurse_is_first_non_filter(s->test_file->bs, candidate);
}
/* Propagate AioContext changes to ->test_file */
static void blkverify_detach_aio_context(BlockDriverState *bs)
{
BDRVBlkverifyState *s = bs->opaque;
bdrv_detach_aio_context(s->test_file->bs);
}
static void blkverify_attach_aio_context(BlockDriverState *bs,
AioContext *new_context)
{
BDRVBlkverifyState *s = bs->opaque;
bdrv_attach_aio_context(s->test_file->bs, new_context);
}
static void blkverify_refresh_filename(BlockDriverState *bs, QDict *options)
{
BDRVBlkverifyState *s = bs->opaque;
@@ -354,9 +340,6 @@ static BlockDriver bdrv_blkverify = {
.bdrv_aio_writev = blkverify_aio_writev,
.bdrv_aio_flush = blkverify_aio_flush,
.bdrv_attach_aio_context = blkverify_attach_aio_context,
.bdrv_detach_aio_context = blkverify_detach_aio_context,
.is_filter = true,
.bdrv_recurse_is_first_non_filter = blkverify_recurse_is_first_non_filter,
};

File diff suppressed because it is too large Load Diff

View File

@@ -23,9 +23,11 @@
* THE SOFTWARE.
*/
#include "qemu/osdep.h"
#include "qapi/error.h"
#include "qemu-common.h"
#include "block/block_int.h"
#include "qemu/module.h"
#include "qemu/bswap.h"
/**************************************************************/
@@ -103,6 +105,7 @@ static int bochs_open(BlockDriverState *bs, QDict *options, int flags,
int ret;
bs->read_only = 1; // no write support yet
bs->request_alignment = BDRV_SECTOR_SIZE; /* No sub-sector I/O supported */
ret = bdrv_pread(bs->file->bs, 0, &bochs, sizeof(bochs));
if (ret < 0) {
@@ -220,38 +223,52 @@ static int64_t seek_to_sector(BlockDriverState *bs, int64_t sector_num)
return bitmap_offset + (512 * (s->bitmap_blocks + extent_offset));
}
static int bochs_read(BlockDriverState *bs, int64_t sector_num,
uint8_t *buf, int nb_sectors)
static int coroutine_fn
bochs_co_preadv(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
QEMUIOVector *qiov, int flags)
{
BDRVBochsState *s = bs->opaque;
uint64_t sector_num = offset >> BDRV_SECTOR_BITS;
int nb_sectors = bytes >> BDRV_SECTOR_BITS;
uint64_t bytes_done = 0;
QEMUIOVector local_qiov;
int ret;
assert((offset & (BDRV_SECTOR_SIZE - 1)) == 0);
assert((bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
qemu_iovec_init(&local_qiov, qiov->niov);
qemu_co_mutex_lock(&s->lock);
while (nb_sectors > 0) {
int64_t block_offset = seek_to_sector(bs, sector_num);
if (block_offset < 0) {
return block_offset;
} else if (block_offset > 0) {
ret = bdrv_pread(bs->file->bs, block_offset, buf, 512);
ret = block_offset;
goto fail;
}
qemu_iovec_reset(&local_qiov);
qemu_iovec_concat(&local_qiov, qiov, bytes_done, 512);
if (block_offset > 0) {
ret = bdrv_co_preadv(bs->file->bs, block_offset, 512,
&local_qiov, 0);
if (ret < 0) {
return ret;
goto fail;
}
} else {
memset(buf, 0, 512);
qemu_iovec_memset(&local_qiov, 0, 0, 512);
}
nb_sectors--;
sector_num++;
buf += 512;
bytes_done += 512;
}
return 0;
}
static coroutine_fn int bochs_co_read(BlockDriverState *bs, int64_t sector_num,
uint8_t *buf, int nb_sectors)
{
int ret;
BDRVBochsState *s = bs->opaque;
qemu_co_mutex_lock(&s->lock);
ret = bochs_read(bs, sector_num, buf, nb_sectors);
ret = 0;
fail:
qemu_co_mutex_unlock(&s->lock);
qemu_iovec_destroy(&local_qiov);
return ret;
}
@@ -266,7 +283,7 @@ static BlockDriver bdrv_bochs = {
.instance_size = sizeof(BDRVBochsState),
.bdrv_probe = bochs_probe,
.bdrv_open = bochs_open,
.bdrv_read = bochs_co_read,
.bdrv_co_preadv = bochs_co_preadv,
.bdrv_close = bochs_close,
};

View File

@@ -22,9 +22,11 @@
* THE SOFTWARE.
*/
#include "qemu/osdep.h"
#include "qapi/error.h"
#include "qemu-common.h"
#include "block/block_int.h"
#include "qemu/module.h"
#include "qemu/bswap.h"
#include <zlib.h>
/* Maximum compressed block size */
@@ -65,6 +67,7 @@ static int cloop_open(BlockDriverState *bs, QDict *options, int flags,
int ret;
bs->read_only = 1;
bs->request_alignment = BDRV_SECTOR_SIZE; /* No sub-sector I/O supported */
/* read header */
ret = bdrv_pread(bs->file->bs, 128, &s->block_size, 4);
@@ -228,33 +231,38 @@ static inline int cloop_read_block(BlockDriverState *bs, int block_num)
return 0;
}
static int cloop_read(BlockDriverState *bs, int64_t sector_num,
uint8_t *buf, int nb_sectors)
static int coroutine_fn
cloop_co_preadv(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
QEMUIOVector *qiov, int flags)
{
BDRVCloopState *s = bs->opaque;
int i;
uint64_t sector_num = offset >> BDRV_SECTOR_BITS;
int nb_sectors = bytes >> BDRV_SECTOR_BITS;
int ret, i;
assert((offset & (BDRV_SECTOR_SIZE - 1)) == 0);
assert((bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
qemu_co_mutex_lock(&s->lock);
for (i = 0; i < nb_sectors; i++) {
void *data;
uint32_t sector_offset_in_block =
((sector_num + i) % s->sectors_per_block),
block_num = (sector_num + i) / s->sectors_per_block;
if (cloop_read_block(bs, block_num) != 0) {
return -1;
ret = -EIO;
goto fail;
}
memcpy(buf + i * 512,
s->uncompressed_block + sector_offset_in_block * 512, 512);
}
return 0;
}
static coroutine_fn int cloop_co_read(BlockDriverState *bs, int64_t sector_num,
uint8_t *buf, int nb_sectors)
{
int ret;
BDRVCloopState *s = bs->opaque;
qemu_co_mutex_lock(&s->lock);
ret = cloop_read(bs, sector_num, buf, nb_sectors);
data = s->uncompressed_block + sector_offset_in_block * 512;
qemu_iovec_from_buf(qiov, i * 512, data, 512);
}
ret = 0;
fail:
qemu_co_mutex_unlock(&s->lock);
return ret;
}
@@ -272,7 +280,7 @@ static BlockDriver bdrv_cloop = {
.instance_size = sizeof(BDRVCloopState),
.bdrv_probe = cloop_probe,
.bdrv_open = cloop_open,
.bdrv_read = cloop_co_read,
.bdrv_co_preadv = cloop_co_preadv,
.bdrv_close = cloop_close,
};

View File

@@ -16,6 +16,7 @@
#include "trace.h"
#include "block/block_int.h"
#include "block/blockjob.h"
#include "qapi/error.h"
#include "qapi/qmp/qerror.h"
#include "qemu/ratelimit.h"
#include "sysemu/block-backend.h"
@@ -35,28 +36,36 @@ typedef struct CommitBlockJob {
BlockJob common;
RateLimit limit;
BlockDriverState *active;
BlockDriverState *top;
BlockDriverState *base;
BlockBackend *top;
BlockBackend *base;
BlockdevOnError on_error;
int base_flags;
int orig_overlay_flags;
char *backing_file_str;
} CommitBlockJob;
static int coroutine_fn commit_populate(BlockDriverState *bs,
BlockDriverState *base,
static int coroutine_fn commit_populate(BlockBackend *bs, BlockBackend *base,
int64_t sector_num, int nb_sectors,
void *buf)
{
int ret = 0;
QEMUIOVector qiov;
struct iovec iov = {
.iov_base = buf,
.iov_len = nb_sectors * BDRV_SECTOR_SIZE,
};
ret = bdrv_read(bs, sector_num, buf, nb_sectors);
if (ret) {
qemu_iovec_init_external(&qiov, &iov, 1);
ret = blk_co_preadv(bs, sector_num * BDRV_SECTOR_SIZE,
qiov.size, &qiov, 0);
if (ret < 0) {
return ret;
}
ret = bdrv_write(base, sector_num, buf, nb_sectors);
if (ret) {
ret = blk_co_pwritev(base, sector_num * BDRV_SECTOR_SIZE,
qiov.size, &qiov, 0);
if (ret < 0) {
return ret;
}
@@ -72,8 +81,8 @@ static void commit_complete(BlockJob *job, void *opaque)
CommitBlockJob *s = container_of(job, CommitBlockJob, common);
CommitCompleteData *data = opaque;
BlockDriverState *active = s->active;
BlockDriverState *top = s->top;
BlockDriverState *base = s->base;
BlockDriverState *top = blk_bs(s->top);
BlockDriverState *base = blk_bs(s->base);
BlockDriverState *overlay_bs;
int ret = data->ret;
@@ -93,6 +102,8 @@ static void commit_complete(BlockJob *job, void *opaque)
bdrv_reopen(overlay_bs, s->orig_overlay_flags, NULL);
}
g_free(s->backing_file_str);
blk_unref(s->top);
blk_unref(s->base);
block_job_completed(&s->common, ret);
g_free(data);
}
@@ -101,8 +112,6 @@ static void coroutine_fn commit_run(void *opaque)
{
CommitBlockJob *s = opaque;
CommitCompleteData *data;
BlockDriverState *top = s->top;
BlockDriverState *base = s->base;
int64_t sector_num, end;
int ret = 0;
int n = 0;
@@ -110,27 +119,27 @@ static void coroutine_fn commit_run(void *opaque)
int bytes_written = 0;
int64_t base_len;
ret = s->common.len = bdrv_getlength(top);
ret = s->common.len = blk_getlength(s->top);
if (s->common.len < 0) {
goto out;
}
ret = base_len = bdrv_getlength(base);
ret = base_len = blk_getlength(s->base);
if (base_len < 0) {
goto out;
}
if (base_len < s->common.len) {
ret = bdrv_truncate(base, s->common.len);
ret = blk_truncate(s->base, s->common.len);
if (ret) {
goto out;
}
}
end = s->common.len >> BDRV_SECTOR_BITS;
buf = qemu_blockalign(top, COMMIT_BUFFER_SIZE);
buf = blk_blockalign(s->top, COMMIT_BUFFER_SIZE);
for (sector_num = 0; sector_num < end; sector_num += n) {
uint64_t delay_ns = 0;
@@ -145,7 +154,8 @@ wait:
break;
}
/* Copy if allocated above the base */
ret = bdrv_is_allocated_above(top, base, sector_num,
ret = bdrv_is_allocated_above(blk_bs(s->top), blk_bs(s->base),
sector_num,
COMMIT_BUFFER_SIZE / BDRV_SECTOR_SIZE,
&n);
copy = (ret == 1);
@@ -157,7 +167,7 @@ wait:
goto wait;
}
}
ret = commit_populate(top, base, sector_num, n, buf);
ret = commit_populate(s->top, s->base, sector_num, n, buf);
bytes_written += n * BDRV_SECTOR_SIZE;
}
if (ret < 0) {
@@ -213,13 +223,6 @@ void commit_start(BlockDriverState *bs, BlockDriverState *base,
BlockDriverState *overlay_bs;
Error *local_err = NULL;
if ((on_error == BLOCKDEV_ON_ERROR_STOP ||
on_error == BLOCKDEV_ON_ERROR_ENOSPC) &&
(!bs->blk || !blk_iostatus_is_enabled(bs->blk))) {
error_setg(errp, "Invalid parameter combination");
return;
}
assert(top != bs);
if (top == base) {
error_setg(errp, "Invalid files for merge: top and base are the same");
@@ -259,8 +262,12 @@ void commit_start(BlockDriverState *bs, BlockDriverState *base,
return;
}
s->base = base;
s->top = top;
s->base = blk_new();
blk_insert_bs(s->base, base);
s->top = blk_new();
blk_insert_bs(s->top, top);
s->active = bs;
s->base_flags = orig_base_flags;

588
block/crypto.c Normal file
View File

@@ -0,0 +1,588 @@
/*
* QEMU block full disk encryption
*
* Copyright (c) 2015-2016 Red Hat, Inc.
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
* License as published by the Free Software Foundation; either
* version 2 of the License, or (at your option) any later version.
*
* This library is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* Lesser General Public License for more details.
*
* You should have received a copy of the GNU Lesser General Public
* License along with this library; if not, see <http://www.gnu.org/licenses/>.
*
*/
#include "qemu/osdep.h"
#include "block/block_int.h"
#include "sysemu/block-backend.h"
#include "crypto/block.h"
#include "qapi/opts-visitor.h"
#include "qapi-visit.h"
#include "qapi/error.h"
#define BLOCK_CRYPTO_OPT_LUKS_KEY_SECRET "key-secret"
#define BLOCK_CRYPTO_OPT_LUKS_CIPHER_ALG "cipher-alg"
#define BLOCK_CRYPTO_OPT_LUKS_CIPHER_MODE "cipher-mode"
#define BLOCK_CRYPTO_OPT_LUKS_IVGEN_ALG "ivgen-alg"
#define BLOCK_CRYPTO_OPT_LUKS_IVGEN_HASH_ALG "ivgen-hash-alg"
#define BLOCK_CRYPTO_OPT_LUKS_HASH_ALG "hash-alg"
typedef struct BlockCrypto BlockCrypto;
struct BlockCrypto {
QCryptoBlock *block;
};
static int block_crypto_probe_generic(QCryptoBlockFormat format,
const uint8_t *buf,
int buf_size,
const char *filename)
{
if (qcrypto_block_has_format(format, buf, buf_size)) {
return 100;
} else {
return 0;
}
}
static ssize_t block_crypto_read_func(QCryptoBlock *block,
size_t offset,
uint8_t *buf,
size_t buflen,
Error **errp,
void *opaque)
{
BlockDriverState *bs = opaque;
ssize_t ret;
ret = bdrv_pread(bs->file->bs, offset, buf, buflen);
if (ret < 0) {
error_setg_errno(errp, -ret, "Could not read encryption header");
return ret;
}
return ret;
}
struct BlockCryptoCreateData {
const char *filename;
QemuOpts *opts;
BlockBackend *blk;
uint64_t size;
};
static ssize_t block_crypto_write_func(QCryptoBlock *block,
size_t offset,
const uint8_t *buf,
size_t buflen,
Error **errp,
void *opaque)
{
struct BlockCryptoCreateData *data = opaque;
ssize_t ret;
ret = blk_pwrite(data->blk, offset, buf, buflen, 0);
if (ret < 0) {
error_setg_errno(errp, -ret, "Could not write encryption header");
return ret;
}
return ret;
}
static ssize_t block_crypto_init_func(QCryptoBlock *block,
size_t headerlen,
Error **errp,
void *opaque)
{
struct BlockCryptoCreateData *data = opaque;
int ret;
/* User provided size should reflect amount of space made
* available to the guest, so we must take account of that
* which will be used by the crypto header
*/
data->size += headerlen;
qemu_opt_set_number(data->opts, BLOCK_OPT_SIZE, data->size, &error_abort);
ret = bdrv_create_file(data->filename, data->opts, errp);
if (ret < 0) {
return -1;
}
data->blk = blk_new_open(data->filename, NULL, NULL,
BDRV_O_RDWR | BDRV_O_PROTOCOL, errp);
if (!data->blk) {
return -1;
}
return 0;
}
static QemuOptsList block_crypto_runtime_opts_luks = {
.name = "crypto",
.head = QTAILQ_HEAD_INITIALIZER(block_crypto_runtime_opts_luks.head),
.desc = {
{
.name = BLOCK_CRYPTO_OPT_LUKS_KEY_SECRET,
.type = QEMU_OPT_STRING,
.help = "ID of the secret that provides the encryption key",
},
{ /* end of list */ }
},
};
static QemuOptsList block_crypto_create_opts_luks = {
.name = "crypto",
.head = QTAILQ_HEAD_INITIALIZER(block_crypto_create_opts_luks.head),
.desc = {
{
.name = BLOCK_OPT_SIZE,
.type = QEMU_OPT_SIZE,
.help = "Virtual disk size"
},
{
.name = BLOCK_CRYPTO_OPT_LUKS_KEY_SECRET,
.type = QEMU_OPT_STRING,
.help = "ID of the secret that provides the encryption key",
},
{
.name = BLOCK_CRYPTO_OPT_LUKS_CIPHER_ALG,
.type = QEMU_OPT_STRING,
.help = "Name of encryption cipher algorithm",
},
{
.name = BLOCK_CRYPTO_OPT_LUKS_CIPHER_MODE,
.type = QEMU_OPT_STRING,
.help = "Name of encryption cipher mode",
},
{
.name = BLOCK_CRYPTO_OPT_LUKS_IVGEN_ALG,
.type = QEMU_OPT_STRING,
.help = "Name of IV generator algorithm",
},
{
.name = BLOCK_CRYPTO_OPT_LUKS_IVGEN_HASH_ALG,
.type = QEMU_OPT_STRING,
.help = "Name of IV generator hash algorithm",
},
{
.name = BLOCK_CRYPTO_OPT_LUKS_HASH_ALG,
.type = QEMU_OPT_STRING,
.help = "Name of encryption hash algorithm",
},
{ /* end of list */ }
},
};
static QCryptoBlockOpenOptions *
block_crypto_open_opts_init(QCryptoBlockFormat format,
QemuOpts *opts,
Error **errp)
{
OptsVisitor *ov;
QCryptoBlockOpenOptions *ret = NULL;
Error *local_err = NULL;
ret = g_new0(QCryptoBlockOpenOptions, 1);
ret->format = format;
ov = opts_visitor_new(opts);
visit_start_struct(opts_get_visitor(ov),
NULL, NULL, 0, &local_err);
if (local_err) {
goto out;
}
switch (format) {
case Q_CRYPTO_BLOCK_FORMAT_LUKS:
visit_type_QCryptoBlockOptionsLUKS_members(
opts_get_visitor(ov), &ret->u.luks, &local_err);
break;
default:
error_setg(&local_err, "Unsupported block format %d", format);
break;
}
if (!local_err) {
visit_check_struct(opts_get_visitor(ov), &local_err);
}
visit_end_struct(opts_get_visitor(ov));
out:
if (local_err) {
error_propagate(errp, local_err);
qapi_free_QCryptoBlockOpenOptions(ret);
ret = NULL;
}
opts_visitor_cleanup(ov);
return ret;
}
static QCryptoBlockCreateOptions *
block_crypto_create_opts_init(QCryptoBlockFormat format,
QemuOpts *opts,
Error **errp)
{
OptsVisitor *ov;
QCryptoBlockCreateOptions *ret = NULL;
Error *local_err = NULL;
ret = g_new0(QCryptoBlockCreateOptions, 1);
ret->format = format;
ov = opts_visitor_new(opts);
visit_start_struct(opts_get_visitor(ov),
NULL, NULL, 0, &local_err);
if (local_err) {
goto out;
}
switch (format) {
case Q_CRYPTO_BLOCK_FORMAT_LUKS:
visit_type_QCryptoBlockCreateOptionsLUKS_members(
opts_get_visitor(ov), &ret->u.luks, &local_err);
break;
default:
error_setg(&local_err, "Unsupported block format %d", format);
break;
}
if (!local_err) {
visit_check_struct(opts_get_visitor(ov), &local_err);
}
visit_end_struct(opts_get_visitor(ov));
out:
if (local_err) {
error_propagate(errp, local_err);
qapi_free_QCryptoBlockCreateOptions(ret);
ret = NULL;
}
opts_visitor_cleanup(ov);
return ret;
}
static int block_crypto_open_generic(QCryptoBlockFormat format,
QemuOptsList *opts_spec,
BlockDriverState *bs,
QDict *options,
int flags,
Error **errp)
{
BlockCrypto *crypto = bs->opaque;
QemuOpts *opts = NULL;
Error *local_err = NULL;
int ret = -EINVAL;
QCryptoBlockOpenOptions *open_opts = NULL;
unsigned int cflags = 0;
opts = qemu_opts_create(opts_spec, NULL, 0, &error_abort);
qemu_opts_absorb_qdict(opts, options, &local_err);
if (local_err) {
error_propagate(errp, local_err);
goto cleanup;
}
open_opts = block_crypto_open_opts_init(format, opts, errp);
if (!open_opts) {
goto cleanup;
}
if (flags & BDRV_O_NO_IO) {
cflags |= QCRYPTO_BLOCK_OPEN_NO_IO;
}
crypto->block = qcrypto_block_open(open_opts,
block_crypto_read_func,
bs,
cflags,
errp);
if (!crypto->block) {
ret = -EIO;
goto cleanup;
}
bs->encrypted = 1;
bs->valid_key = 1;
ret = 0;
cleanup:
qapi_free_QCryptoBlockOpenOptions(open_opts);
return ret;
}
static int block_crypto_create_generic(QCryptoBlockFormat format,
const char *filename,
QemuOpts *opts,
Error **errp)
{
int ret = -EINVAL;
QCryptoBlockCreateOptions *create_opts = NULL;
QCryptoBlock *crypto = NULL;
struct BlockCryptoCreateData data = {
.size = ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0),
BDRV_SECTOR_SIZE),
.opts = opts,
.filename = filename,
};
create_opts = block_crypto_create_opts_init(format, opts, errp);
if (!create_opts) {
return -1;
}
crypto = qcrypto_block_create(create_opts,
block_crypto_init_func,
block_crypto_write_func,
&data,
errp);
if (!crypto) {
ret = -EIO;
goto cleanup;
}
ret = 0;
cleanup:
qcrypto_block_free(crypto);
blk_unref(data.blk);
qapi_free_QCryptoBlockCreateOptions(create_opts);
return ret;
}
static int block_crypto_truncate(BlockDriverState *bs, int64_t offset)
{
BlockCrypto *crypto = bs->opaque;
size_t payload_offset =
qcrypto_block_get_payload_offset(crypto->block);
offset += payload_offset;
return bdrv_truncate(bs->file->bs, offset);
}
static void block_crypto_close(BlockDriverState *bs)
{
BlockCrypto *crypto = bs->opaque;
qcrypto_block_free(crypto->block);
}
#define BLOCK_CRYPTO_MAX_SECTORS 32
static coroutine_fn int
block_crypto_co_readv(BlockDriverState *bs, int64_t sector_num,
int remaining_sectors, QEMUIOVector *qiov)
{
BlockCrypto *crypto = bs->opaque;
int cur_nr_sectors; /* number of sectors in current iteration */
uint64_t bytes_done = 0;
uint8_t *cipher_data = NULL;
QEMUIOVector hd_qiov;
int ret = 0;
size_t payload_offset =
qcrypto_block_get_payload_offset(crypto->block) / 512;
qemu_iovec_init(&hd_qiov, qiov->niov);
/* Bounce buffer so we have a linear mem region for
* entire sector. XXX optimize so we avoid bounce
* buffer in case that qiov->niov == 1
*/
cipher_data =
qemu_try_blockalign(bs->file->bs, MIN(BLOCK_CRYPTO_MAX_SECTORS * 512,
qiov->size));
if (cipher_data == NULL) {
ret = -ENOMEM;
goto cleanup;
}
while (remaining_sectors) {
cur_nr_sectors = remaining_sectors;
if (cur_nr_sectors > BLOCK_CRYPTO_MAX_SECTORS) {
cur_nr_sectors = BLOCK_CRYPTO_MAX_SECTORS;
}
qemu_iovec_reset(&hd_qiov);
qemu_iovec_add(&hd_qiov, cipher_data, cur_nr_sectors * 512);
ret = bdrv_co_readv(bs->file->bs,
payload_offset + sector_num,
cur_nr_sectors, &hd_qiov);
if (ret < 0) {
goto cleanup;
}
if (qcrypto_block_decrypt(crypto->block,
sector_num,
cipher_data, cur_nr_sectors * 512,
NULL) < 0) {
ret = -EIO;
goto cleanup;
}
qemu_iovec_from_buf(qiov, bytes_done,
cipher_data, cur_nr_sectors * 512);
remaining_sectors -= cur_nr_sectors;
sector_num += cur_nr_sectors;
bytes_done += cur_nr_sectors * 512;
}
cleanup:
qemu_iovec_destroy(&hd_qiov);
qemu_vfree(cipher_data);
return ret;
}
static coroutine_fn int
block_crypto_co_writev(BlockDriverState *bs, int64_t sector_num,
int remaining_sectors, QEMUIOVector *qiov)
{
BlockCrypto *crypto = bs->opaque;
int cur_nr_sectors; /* number of sectors in current iteration */
uint64_t bytes_done = 0;
uint8_t *cipher_data = NULL;
QEMUIOVector hd_qiov;
int ret = 0;
size_t payload_offset =
qcrypto_block_get_payload_offset(crypto->block) / 512;
qemu_iovec_init(&hd_qiov, qiov->niov);
/* Bounce buffer so we have a linear mem region for
* entire sector. XXX optimize so we avoid bounce
* buffer in case that qiov->niov == 1
*/
cipher_data =
qemu_try_blockalign(bs->file->bs, MIN(BLOCK_CRYPTO_MAX_SECTORS * 512,
qiov->size));
if (cipher_data == NULL) {
ret = -ENOMEM;
goto cleanup;
}
while (remaining_sectors) {
cur_nr_sectors = remaining_sectors;
if (cur_nr_sectors > BLOCK_CRYPTO_MAX_SECTORS) {
cur_nr_sectors = BLOCK_CRYPTO_MAX_SECTORS;
}
qemu_iovec_to_buf(qiov, bytes_done,
cipher_data, cur_nr_sectors * 512);
if (qcrypto_block_encrypt(crypto->block,
sector_num,
cipher_data, cur_nr_sectors * 512,
NULL) < 0) {
ret = -EIO;
goto cleanup;
}
qemu_iovec_reset(&hd_qiov);
qemu_iovec_add(&hd_qiov, cipher_data, cur_nr_sectors * 512);
ret = bdrv_co_writev(bs->file->bs,
payload_offset + sector_num,
cur_nr_sectors, &hd_qiov);
if (ret < 0) {
goto cleanup;
}
remaining_sectors -= cur_nr_sectors;
sector_num += cur_nr_sectors;
bytes_done += cur_nr_sectors * 512;
}
cleanup:
qemu_iovec_destroy(&hd_qiov);
qemu_vfree(cipher_data);
return ret;
}
static int64_t block_crypto_getlength(BlockDriverState *bs)
{
BlockCrypto *crypto = bs->opaque;
int64_t len = bdrv_getlength(bs->file->bs);
ssize_t offset = qcrypto_block_get_payload_offset(crypto->block);
len -= offset;
return len;
}
static int block_crypto_probe_luks(const uint8_t *buf,
int buf_size,
const char *filename) {
return block_crypto_probe_generic(Q_CRYPTO_BLOCK_FORMAT_LUKS,
buf, buf_size, filename);
}
static int block_crypto_open_luks(BlockDriverState *bs,
QDict *options,
int flags,
Error **errp)
{
return block_crypto_open_generic(Q_CRYPTO_BLOCK_FORMAT_LUKS,
&block_crypto_runtime_opts_luks,
bs, options, flags, errp);
}
static int block_crypto_create_luks(const char *filename,
QemuOpts *opts,
Error **errp)
{
return block_crypto_create_generic(Q_CRYPTO_BLOCK_FORMAT_LUKS,
filename, opts, errp);
}
BlockDriver bdrv_crypto_luks = {
.format_name = "luks",
.instance_size = sizeof(BlockCrypto),
.bdrv_probe = block_crypto_probe_luks,
.bdrv_open = block_crypto_open_luks,
.bdrv_close = block_crypto_close,
.bdrv_create = block_crypto_create_luks,
.bdrv_truncate = block_crypto_truncate,
.create_opts = &block_crypto_create_opts_luks,
.bdrv_co_readv = block_crypto_co_readv,
.bdrv_co_writev = block_crypto_co_writev,
.bdrv_getlength = block_crypto_getlength,
};
static void block_crypto_init(void)
{
bdrv_register(&bdrv_crypto_luks);
}
block_init(block_crypto_init);

View File

@@ -22,21 +22,30 @@
* THE SOFTWARE.
*/
#include "qemu/osdep.h"
#include "qapi/error.h"
#include "qemu-common.h"
#include "qemu/error-report.h"
#include "block/block_int.h"
#include "qapi/qmp/qbool.h"
#include "qapi/qmp/qstring.h"
#include "crypto/secret.h"
#include <curl/curl.h>
#include "qemu/cutils.h"
// #define DEBUG_CURL
// #define DEBUG_VERBOSE
#ifdef DEBUG_CURL
#define DPRINTF(fmt, ...) do { printf(fmt, ## __VA_ARGS__); } while (0)
#define DEBUG_CURL_PRINT 1
#else
#define DPRINTF(fmt, ...) do { } while (0)
#define DEBUG_CURL_PRINT 0
#endif
#define DPRINTF(fmt, ...) \
do { \
if (DEBUG_CURL_PRINT) { \
fprintf(stderr, fmt, ## __VA_ARGS__); \
} \
} while (0)
#if LIBCURL_VERSION_NUM >= 0x071000
/* The multi interface timer callback was introduced in 7.16.0 */
@@ -78,6 +87,10 @@ static CURLMcode __curl_multi_socket_action(CURLM *multi_handle,
#define CURL_BLOCK_OPT_SSLVERIFY "sslverify"
#define CURL_BLOCK_OPT_TIMEOUT "timeout"
#define CURL_BLOCK_OPT_COOKIE "cookie"
#define CURL_BLOCK_OPT_USERNAME "username"
#define CURL_BLOCK_OPT_PASSWORD_SECRET "password-secret"
#define CURL_BLOCK_OPT_PROXY_USERNAME "proxy-username"
#define CURL_BLOCK_OPT_PROXY_PASSWORD_SECRET "proxy-password-secret"
struct BDRVCURLState;
@@ -120,6 +133,10 @@ typedef struct BDRVCURLState {
char *cookie;
bool accept_range;
AioContext *aio_context;
char *username;
char *password;
char *proxyusername;
char *proxypassword;
} BDRVCURLState;
static void curl_clean_state(CURLState *s);
@@ -419,6 +436,21 @@ static CURLState *curl_init_state(BlockDriverState *bs, BDRVCURLState *s)
curl_easy_setopt(state->curl, CURLOPT_ERRORBUFFER, state->errmsg);
curl_easy_setopt(state->curl, CURLOPT_FAILONERROR, 1);
if (s->username) {
curl_easy_setopt(state->curl, CURLOPT_USERNAME, s->username);
}
if (s->password) {
curl_easy_setopt(state->curl, CURLOPT_PASSWORD, s->password);
}
if (s->proxyusername) {
curl_easy_setopt(state->curl,
CURLOPT_PROXYUSERNAME, s->proxyusername);
}
if (s->proxypassword) {
curl_easy_setopt(state->curl,
CURLOPT_PROXYPASSWORD, s->proxypassword);
}
/* Restrict supported protocols to avoid security issues in the more
* obscure protocols. For example, do not allow POP3/SMTP/IMAP see
* CVE-2013-0249.
@@ -525,10 +557,31 @@ static QemuOptsList runtime_opts = {
.type = QEMU_OPT_STRING,
.help = "Pass the cookie or list of cookies with each request"
},
{
.name = CURL_BLOCK_OPT_USERNAME,
.type = QEMU_OPT_STRING,
.help = "Username for HTTP auth"
},
{
.name = CURL_BLOCK_OPT_PASSWORD_SECRET,
.type = QEMU_OPT_STRING,
.help = "ID of secret used as password for HTTP auth",
},
{
.name = CURL_BLOCK_OPT_PROXY_USERNAME,
.type = QEMU_OPT_STRING,
.help = "Username for HTTP proxy auth"
},
{
.name = CURL_BLOCK_OPT_PROXY_PASSWORD_SECRET,
.type = QEMU_OPT_STRING,
.help = "ID of secret used as password for HTTP proxy auth",
},
{ /* end of list */ }
},
};
static int curl_open(BlockDriverState *bs, QDict *options, int flags,
Error **errp)
{
@@ -539,6 +592,7 @@ static int curl_open(BlockDriverState *bs, QDict *options, int flags,
const char *file;
const char *cookie;
double d;
const char *secretid;
static int inited = 0;
@@ -580,6 +634,26 @@ static int curl_open(BlockDriverState *bs, QDict *options, int flags,
goto out_noclean;
}
s->username = g_strdup(qemu_opt_get(opts, CURL_BLOCK_OPT_USERNAME));
secretid = qemu_opt_get(opts, CURL_BLOCK_OPT_PASSWORD_SECRET);
if (secretid) {
s->password = qcrypto_secret_lookup_as_utf8(secretid, errp);
if (!s->password) {
goto out_noclean;
}
}
s->proxyusername = g_strdup(
qemu_opt_get(opts, CURL_BLOCK_OPT_PROXY_USERNAME));
secretid = qemu_opt_get(opts, CURL_BLOCK_OPT_PROXY_PASSWORD_SECRET);
if (secretid) {
s->proxypassword = qcrypto_secret_lookup_as_utf8(secretid, errp);
if (!s->proxypassword) {
goto out_noclean;
}
}
if (!inited) {
curl_global_init(CURL_GLOBAL_ALL);
inited = 1;

387
block/dirty-bitmap.c Normal file
View File

@@ -0,0 +1,387 @@
/*
* Block Dirty Bitmap
*
* Copyright (c) 2016 Red Hat. Inc
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*/
#include "qemu/osdep.h"
#include "qapi/error.h"
#include "qemu-common.h"
#include "trace.h"
#include "block/block_int.h"
#include "block/blockjob.h"
/**
* A BdrvDirtyBitmap can be in three possible states:
* (1) successor is NULL and disabled is false: full r/w mode
* (2) successor is NULL and disabled is true: read only mode ("disabled")
* (3) successor is set: frozen mode.
* A frozen bitmap cannot be renamed, deleted, anonymized, cleared, set,
* or enabled. A frozen bitmap can only abdicate() or reclaim().
*/
struct BdrvDirtyBitmap {
HBitmap *bitmap; /* Dirty sector bitmap implementation */
BdrvDirtyBitmap *successor; /* Anonymous child; implies frozen status */
char *name; /* Optional non-empty unique ID */
int64_t size; /* Size of the bitmap (Number of sectors) */
bool disabled; /* Bitmap is read-only */
QLIST_ENTRY(BdrvDirtyBitmap) list;
};
BdrvDirtyBitmap *bdrv_find_dirty_bitmap(BlockDriverState *bs, const char *name)
{
BdrvDirtyBitmap *bm;
assert(name);
QLIST_FOREACH(bm, &bs->dirty_bitmaps, list) {
if (bm->name && !strcmp(name, bm->name)) {
return bm;
}
}
return NULL;
}
void bdrv_dirty_bitmap_make_anon(BdrvDirtyBitmap *bitmap)
{
assert(!bdrv_dirty_bitmap_frozen(bitmap));
g_free(bitmap->name);
bitmap->name = NULL;
}
BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState *bs,
uint32_t granularity,
const char *name,
Error **errp)
{
int64_t bitmap_size;
BdrvDirtyBitmap *bitmap;
uint32_t sector_granularity;
assert((granularity & (granularity - 1)) == 0);
if (name && bdrv_find_dirty_bitmap(bs, name)) {
error_setg(errp, "Bitmap already exists: %s", name);
return NULL;
}
sector_granularity = granularity >> BDRV_SECTOR_BITS;
assert(sector_granularity);
bitmap_size = bdrv_nb_sectors(bs);
if (bitmap_size < 0) {
error_setg_errno(errp, -bitmap_size, "could not get length of device");
errno = -bitmap_size;
return NULL;
}
bitmap = g_new0(BdrvDirtyBitmap, 1);
bitmap->bitmap = hbitmap_alloc(bitmap_size, ctz32(sector_granularity));
bitmap->size = bitmap_size;
bitmap->name = g_strdup(name);
bitmap->disabled = false;
QLIST_INSERT_HEAD(&bs->dirty_bitmaps, bitmap, list);
return bitmap;
}
bool bdrv_dirty_bitmap_frozen(BdrvDirtyBitmap *bitmap)
{
return bitmap->successor;
}
bool bdrv_dirty_bitmap_enabled(BdrvDirtyBitmap *bitmap)
{
return !(bitmap->disabled || bitmap->successor);
}
DirtyBitmapStatus bdrv_dirty_bitmap_status(BdrvDirtyBitmap *bitmap)
{
if (bdrv_dirty_bitmap_frozen(bitmap)) {
return DIRTY_BITMAP_STATUS_FROZEN;
} else if (!bdrv_dirty_bitmap_enabled(bitmap)) {
return DIRTY_BITMAP_STATUS_DISABLED;
} else {
return DIRTY_BITMAP_STATUS_ACTIVE;
}
}
/**
* Create a successor bitmap destined to replace this bitmap after an operation.
* Requires that the bitmap is not frozen and has no successor.
*/
int bdrv_dirty_bitmap_create_successor(BlockDriverState *bs,
BdrvDirtyBitmap *bitmap, Error **errp)
{
uint64_t granularity;
BdrvDirtyBitmap *child;
if (bdrv_dirty_bitmap_frozen(bitmap)) {
error_setg(errp, "Cannot create a successor for a bitmap that is "
"currently frozen");
return -1;
}
assert(!bitmap->successor);
/* Create an anonymous successor */
granularity = bdrv_dirty_bitmap_granularity(bitmap);
child = bdrv_create_dirty_bitmap(bs, granularity, NULL, errp);
if (!child) {
return -1;
}
/* Successor will be on or off based on our current state. */
child->disabled = bitmap->disabled;
/* Install the successor and freeze the parent */
bitmap->successor = child;
return 0;
}
/**
* For a bitmap with a successor, yield our name to the successor,
* delete the old bitmap, and return a handle to the new bitmap.
*/
BdrvDirtyBitmap *bdrv_dirty_bitmap_abdicate(BlockDriverState *bs,
BdrvDirtyBitmap *bitmap,
Error **errp)
{
char *name;
BdrvDirtyBitmap *successor = bitmap->successor;
if (successor == NULL) {
error_setg(errp, "Cannot relinquish control if "
"there's no successor present");
return NULL;
}
name = bitmap->name;
bitmap->name = NULL;
successor->name = name;
bitmap->successor = NULL;
bdrv_release_dirty_bitmap(bs, bitmap);
return successor;
}
/**
* In cases of failure where we can no longer safely delete the parent,
* we may wish to re-join the parent and child/successor.
* The merged parent will be un-frozen, but not explicitly re-enabled.
*/
BdrvDirtyBitmap *bdrv_reclaim_dirty_bitmap(BlockDriverState *bs,
BdrvDirtyBitmap *parent,
Error **errp)
{
BdrvDirtyBitmap *successor = parent->successor;
if (!successor) {
error_setg(errp, "Cannot reclaim a successor when none is present");
return NULL;
}
if (!hbitmap_merge(parent->bitmap, successor->bitmap)) {
error_setg(errp, "Merging of parent and successor bitmap failed");
return NULL;
}
bdrv_release_dirty_bitmap(bs, successor);
parent->successor = NULL;
return parent;
}
/**
* Truncates _all_ bitmaps attached to a BDS.
*/
void bdrv_dirty_bitmap_truncate(BlockDriverState *bs)
{
BdrvDirtyBitmap *bitmap;
uint64_t size = bdrv_nb_sectors(bs);
QLIST_FOREACH(bitmap, &bs->dirty_bitmaps, list) {
assert(!bdrv_dirty_bitmap_frozen(bitmap));
hbitmap_truncate(bitmap->bitmap, size);
bitmap->size = size;
}
}
static void bdrv_do_release_matching_dirty_bitmap(BlockDriverState *bs,
BdrvDirtyBitmap *bitmap,
bool only_named)
{
BdrvDirtyBitmap *bm, *next;
QLIST_FOREACH_SAFE(bm, &bs->dirty_bitmaps, list, next) {
if ((!bitmap || bm == bitmap) && (!only_named || bm->name)) {
assert(!bdrv_dirty_bitmap_frozen(bm));
QLIST_REMOVE(bm, list);
hbitmap_free(bm->bitmap);
g_free(bm->name);
g_free(bm);
if (bitmap) {
return;
}
}
}
}
void bdrv_release_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap)
{
bdrv_do_release_matching_dirty_bitmap(bs, bitmap, false);
}
/**
* Release all named dirty bitmaps attached to a BDS (for use in bdrv_close()).
* There must not be any frozen bitmaps attached.
*/
void bdrv_release_named_dirty_bitmaps(BlockDriverState *bs)
{
bdrv_do_release_matching_dirty_bitmap(bs, NULL, true);
}
void bdrv_disable_dirty_bitmap(BdrvDirtyBitmap *bitmap)
{
assert(!bdrv_dirty_bitmap_frozen(bitmap));
bitmap->disabled = true;
}
void bdrv_enable_dirty_bitmap(BdrvDirtyBitmap *bitmap)
{
assert(!bdrv_dirty_bitmap_frozen(bitmap));
bitmap->disabled = false;
}
BlockDirtyInfoList *bdrv_query_dirty_bitmaps(BlockDriverState *bs)
{
BdrvDirtyBitmap *bm;
BlockDirtyInfoList *list = NULL;
BlockDirtyInfoList **plist = &list;
QLIST_FOREACH(bm, &bs->dirty_bitmaps, list) {
BlockDirtyInfo *info = g_new0(BlockDirtyInfo, 1);
BlockDirtyInfoList *entry = g_new0(BlockDirtyInfoList, 1);
info->count = bdrv_get_dirty_count(bm);
info->granularity = bdrv_dirty_bitmap_granularity(bm);
info->has_name = !!bm->name;
info->name = g_strdup(bm->name);
info->status = bdrv_dirty_bitmap_status(bm);
entry->value = info;
*plist = entry;
plist = &entry->next;
}
return list;
}
int bdrv_get_dirty(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
int64_t sector)
{
if (bitmap) {
return hbitmap_get(bitmap->bitmap, sector);
} else {
return 0;
}
}
/**
* Chooses a default granularity based on the existing cluster size,
* but clamped between [4K, 64K]. Defaults to 64K in the case that there
* is no cluster size information available.
*/
uint32_t bdrv_get_default_bitmap_granularity(BlockDriverState *bs)
{
BlockDriverInfo bdi;
uint32_t granularity;
if (bdrv_get_info(bs, &bdi) >= 0 && bdi.cluster_size > 0) {
granularity = MAX(4096, bdi.cluster_size);
granularity = MIN(65536, granularity);
} else {
granularity = 65536;
}
return granularity;
}
uint32_t bdrv_dirty_bitmap_granularity(BdrvDirtyBitmap *bitmap)
{
return BDRV_SECTOR_SIZE << hbitmap_granularity(bitmap->bitmap);
}
void bdrv_dirty_iter_init(BdrvDirtyBitmap *bitmap, HBitmapIter *hbi)
{
hbitmap_iter_init(hbi, bitmap->bitmap, 0);
}
void bdrv_set_dirty_bitmap(BdrvDirtyBitmap *bitmap,
int64_t cur_sector, int nr_sectors)
{
assert(bdrv_dirty_bitmap_enabled(bitmap));
hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors);
}
void bdrv_reset_dirty_bitmap(BdrvDirtyBitmap *bitmap,
int64_t cur_sector, int nr_sectors)
{
assert(bdrv_dirty_bitmap_enabled(bitmap));
hbitmap_reset(bitmap->bitmap, cur_sector, nr_sectors);
}
void bdrv_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap, HBitmap **out)
{
assert(bdrv_dirty_bitmap_enabled(bitmap));
if (!out) {
hbitmap_reset_all(bitmap->bitmap);
} else {
HBitmap *backup = bitmap->bitmap;
bitmap->bitmap = hbitmap_alloc(bitmap->size,
hbitmap_granularity(backup));
*out = backup;
}
}
void bdrv_undo_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap, HBitmap *in)
{
HBitmap *tmp = bitmap->bitmap;
assert(bdrv_dirty_bitmap_enabled(bitmap));
bitmap->bitmap = in;
hbitmap_free(tmp);
}
void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
int nr_sectors)
{
BdrvDirtyBitmap *bitmap;
QLIST_FOREACH(bitmap, &bs->dirty_bitmaps, list) {
if (!bdrv_dirty_bitmap_enabled(bitmap)) {
continue;
}
hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors);
}
}
/**
* Advance an HBitmapIter to an arbitrary offset.
*/
void bdrv_set_dirty_iter(HBitmapIter *hbi, int64_t offset)
{
assert(hbi->hb);
hbitmap_iter_init(hbi, hbi->hb, offset);
}
int64_t bdrv_get_dirty_count(BdrvDirtyBitmap *bitmap)
{
return hbitmap_count(bitmap->bitmap);
}

View File

@@ -22,6 +22,7 @@
* THE SOFTWARE.
*/
#include "qemu/osdep.h"
#include "qapi/error.h"
#include "qemu-common.h"
#include "block/block_int.h"
#include "qemu/bswap.h"
@@ -439,6 +440,8 @@ static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
int ret;
bs->read_only = 1;
bs->request_alignment = BDRV_SECTOR_SIZE; /* No sub-sector I/O supported */
s->n_chunks = 0;
s->offsets = s->lengths = s->sectors = s->sectorcounts = NULL;
/* used by dmg_read_mish_block to keep track of the current I/O position */
@@ -658,38 +661,42 @@ static inline int dmg_read_chunk(BlockDriverState *bs, uint64_t sector_num)
return 0;
}
static int dmg_read(BlockDriverState *bs, int64_t sector_num,
uint8_t *buf, int nb_sectors)
static int coroutine_fn
dmg_co_preadv(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
QEMUIOVector *qiov, int flags)
{
BDRVDMGState *s = bs->opaque;
int i;
uint64_t sector_num = offset >> BDRV_SECTOR_BITS;
int nb_sectors = bytes >> BDRV_SECTOR_BITS;
int ret, i;
assert((offset & (BDRV_SECTOR_SIZE - 1)) == 0);
assert((bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
qemu_co_mutex_lock(&s->lock);
for (i = 0; i < nb_sectors; i++) {
uint32_t sector_offset_in_chunk;
void *data;
if (dmg_read_chunk(bs, sector_num + i) != 0) {
return -1;
ret = -EIO;
goto fail;
}
/* Special case: current chunk is all zeroes. Do not perform a memcpy as
* s->uncompressed_chunk may be too small to cover the large all-zeroes
* section. dmg_read_chunk is called to find s->current_chunk */
if (s->types[s->current_chunk] == 2) { /* all zeroes block entry */
memset(buf + i * 512, 0, 512);
qemu_iovec_memset(qiov, i * 512, 0, 512);
continue;
}
sector_offset_in_chunk = sector_num + i - s->sectors[s->current_chunk];
memcpy(buf + i * 512,
s->uncompressed_chunk + sector_offset_in_chunk * 512, 512);
data = s->uncompressed_chunk + sector_offset_in_chunk * 512;
qemu_iovec_from_buf(qiov, i * 512, data, 512);
}
return 0;
}
static coroutine_fn int dmg_co_read(BlockDriverState *bs, int64_t sector_num,
uint8_t *buf, int nb_sectors)
{
int ret;
BDRVDMGState *s = bs->opaque;
qemu_co_mutex_lock(&s->lock);
ret = dmg_read(bs, sector_num, buf, nb_sectors);
ret = 0;
fail:
qemu_co_mutex_unlock(&s->lock);
return ret;
}
@@ -714,7 +721,7 @@ static BlockDriver bdrv_dmg = {
.instance_size = sizeof(BDRVDMGState),
.bdrv_probe = dmg_probe,
.bdrv_open = dmg_open,
.bdrv_read = dmg_co_read,
.bdrv_co_preadv = dmg_co_preadv,
.bdrv_close = dmg_close,
};

View File

@@ -10,6 +10,7 @@
#include "qemu/osdep.h"
#include <glusterfs/api/glfs.h>
#include "block/block_int.h"
#include "qapi/error.h"
#include "qemu/uri.h"
typedef struct GlusterAIOCB {
@@ -246,7 +247,7 @@ static void gluster_finish_aiocb(struct glfs_fd *fd, ssize_t ret, void *arg)
if (!ret || ret == acb->size) {
acb->ret = 0; /* Success */
} else if (ret < 0) {
acb->ret = ret; /* Read/Write failed */
acb->ret = -errno; /* Read/Write failed */
} else {
acb->ret = -EIO; /* Partial read/write - fail it */
}
@@ -313,6 +314,23 @@ static int qemu_gluster_open(BlockDriverState *bs, QDict *options,
goto out;
}
#ifdef CONFIG_GLUSTERFS_XLATOR_OPT
/* Without this, if fsync fails for a recoverable reason (for instance,
* ENOSPC), gluster will dump its cache, preventing retries. This means
* almost certain data loss. Not all gluster versions support the
* 'resync-failed-syncs-after-fsync' key value, but there is no way to
* discover during runtime if it is supported (this api returns success for
* unknown key/value pairs) */
ret = glfs_set_xlator_option(s->glfs, "*-write-behind",
"resync-failed-syncs-after-fsync",
"on");
if (ret < 0) {
error_setg_errno(errp, errno, "Unable to set xlator key/value pair");
ret = -errno;
goto out;
}
#endif
qemu_gluster_parse_flags(bdrv_flags, &open_flags);
s->fd = glfs_open(s->glfs, gconf->image, open_flags);
@@ -365,6 +383,16 @@ static int qemu_gluster_reopen_prepare(BDRVReopenState *state,
goto exit;
}
#ifdef CONFIG_GLUSTERFS_XLATOR_OPT
ret = glfs_set_xlator_option(reop_s->glfs, "*-write-behind",
"resync-failed-syncs-after-fsync", "on");
if (ret < 0) {
error_setg_errno(errp, errno, "Unable to set xlator key/value pair");
ret = -errno;
goto exit;
}
#endif
reop_s->fd = glfs_open(reop_s->glfs, gconf->image, open_flags);
if (reop_s->fd == NULL) {
/* reops->glfs will be cleaned up in _abort */
@@ -588,6 +616,17 @@ static coroutine_fn int qemu_gluster_co_writev(BlockDriverState *bs,
return qemu_gluster_co_rw(bs, sector_num, nb_sectors, qiov, 1);
}
static void qemu_gluster_close(BlockDriverState *bs)
{
BDRVGlusterState *s = bs->opaque;
if (s->fd) {
glfs_close(s->fd);
s->fd = NULL;
}
glfs_fini(s->glfs);
}
static coroutine_fn int qemu_gluster_co_flush_to_disk(BlockDriverState *bs)
{
int ret;
@@ -601,11 +640,35 @@ static coroutine_fn int qemu_gluster_co_flush_to_disk(BlockDriverState *bs)
ret = glfs_fsync_async(s->fd, gluster_finish_aiocb, &acb);
if (ret < 0) {
return -errno;
ret = -errno;
goto error;
}
qemu_coroutine_yield();
if (acb.ret < 0) {
ret = acb.ret;
goto error;
}
return acb.ret;
error:
/* Some versions of Gluster (3.5.6 -> 3.5.8?) will not retain its cache
* after a fsync failure, so we have no way of allowing the guest to safely
* continue. Gluster versions prior to 3.5.6 don't retain the cache
* either, but will invalidate the fd on error, so this is again our only
* option.
*
* The 'resync-failed-syncs-after-fsync' xlator option for the
* write-behind cache will cause later gluster versions to retain its
* cache after error, so long as the fd remains open. However, we
* currently have no way of knowing if this option is supported.
*
* TODO: Once gluster provides a way for us to determine if the option
* is supported, bypass the closure and setting drv to NULL. */
qemu_gluster_close(bs);
bs->drv = NULL;
return ret;
}
#ifdef CONFIG_GLUSTERFS_DISCARD
@@ -660,17 +723,6 @@ static int64_t qemu_gluster_allocated_file_size(BlockDriverState *bs)
}
}
static void qemu_gluster_close(BlockDriverState *bs)
{
BDRVGlusterState *s = bs->opaque;
if (s->fd) {
glfs_close(s->fd);
s->fd = NULL;
}
glfs_fini(s->glfs);
}
static int qemu_gluster_has_zero_init(BlockDriverState *bs)
{
/* GlusterFS volume could be backed by a block device */

File diff suppressed because it is too large Load Diff

View File

@@ -39,6 +39,7 @@
#include "sysemu/sysemu.h"
#include "qmp-commands.h"
#include "qapi/qmp/qstring.h"
#include "crypto/secret.h"
#include <iscsi/iscsi.h>
#include <iscsi/scsi-lowlevel.h>
@@ -69,7 +70,6 @@ typedef struct IscsiLun {
bool lbprz;
bool dpofua;
bool has_write_same;
bool force_next_flush;
bool request_timed_out;
} IscsiLun;
@@ -83,7 +83,6 @@ typedef struct IscsiTask {
QEMUBH *bh;
IscsiLun *iscsilun;
QEMUTimer retry_timer;
bool force_next_flush;
int err_code;
} IscsiTask;
@@ -281,8 +280,6 @@ iscsi_co_generic_cb(struct iscsi_context *iscsi, int status,
}
iTask->err_code = iscsi_translate_sense(&task->sense);
error_report("iSCSI Failure: %s", iscsi_get_error(iscsi));
} else {
iTask->iscsilun->force_next_flush |= iTask->force_next_flush;
}
out:
@@ -451,16 +448,19 @@ static void iscsi_allocationmap_clear(IscsiLun *iscsilun, int64_t sector_num,
}
}
static int coroutine_fn iscsi_co_writev(BlockDriverState *bs,
int64_t sector_num, int nb_sectors,
QEMUIOVector *iov)
static int coroutine_fn
iscsi_co_writev_flags(BlockDriverState *bs, int64_t sector_num, int nb_sectors,
QEMUIOVector *iov, int flags)
{
IscsiLun *iscsilun = bs->opaque;
struct IscsiTask iTask;
uint64_t lba;
uint32_t num_sectors;
int fua;
bool fua = flags & BDRV_REQ_FUA;
if (fua) {
assert(iscsilun->dpofua);
}
if (!is_request_lun_aligned(sector_num, nb_sectors, iscsilun)) {
return -EINVAL;
}
@@ -475,8 +475,6 @@ static int coroutine_fn iscsi_co_writev(BlockDriverState *bs,
num_sectors = sector_qemu2lun(nb_sectors, iscsilun);
iscsi_co_init_iscsitask(iscsilun, &iTask);
retry:
fua = iscsilun->dpofua && !bs->enable_write_cache;
iTask.force_next_flush = !fua;
if (iscsilun->use_16_for_rw) {
iTask.task = iscsi_write16_task(iscsilun->iscsi, iscsilun->lun, lba,
NULL, num_sectors * iscsilun->block_size,
@@ -714,11 +712,6 @@ static int coroutine_fn iscsi_co_flush(BlockDriverState *bs)
IscsiLun *iscsilun = bs->opaque;
struct IscsiTask iTask;
if (!iscsilun->force_next_flush) {
return 0;
}
iscsilun->force_next_flush = false;
iscsi_co_init_iscsitask(iscsilun, &iTask);
retry:
if (iscsi_synchronizecache10_task(iscsilun->iscsi, iscsilun->lun, 0, 0, 0,
@@ -768,6 +761,7 @@ iscsi_aio_ioctl_cb(struct iscsi_context *iscsi, int status,
acb->ioh->driver_status = 0;
acb->ioh->host_status = 0;
acb->ioh->resid = 0;
acb->ioh->status = status;
#define SG_ERR_DRIVER_SENSE 0x08
@@ -839,6 +833,13 @@ static BlockAIOCB *iscsi_aio_ioctl(BlockDriverState *bs,
return &acb->common;
}
if (acb->ioh->cmd_len > SCSI_CDB_MAX_SIZE) {
error_report("iSCSI: ioctl error CDB exceeds max size (%d > %d)",
acb->ioh->cmd_len, SCSI_CDB_MAX_SIZE);
qemu_aio_unref(acb);
return NULL;
}
acb->task = malloc(sizeof(struct scsi_task));
if (acb->task == NULL) {
error_report("iSCSI: Failed to allocate task for scsi command. %s",
@@ -1018,7 +1019,6 @@ coroutine_fn iscsi_co_write_zeroes(BlockDriverState *bs, int64_t sector_num,
}
iscsi_co_init_iscsitask(iscsilun, &iTask);
iTask.force_next_flush = true;
retry:
if (use_16_for_ws) {
iTask.task = iscsi_writesame16_task(iscsilun->iscsi, iscsilun->lun, lba,
@@ -1080,6 +1080,8 @@ static void parse_chap(struct iscsi_context *iscsi, const char *target,
QemuOpts *opts;
const char *user = NULL;
const char *password = NULL;
const char *secretid;
char *secret = NULL;
list = qemu_find_opts("iscsi");
if (!list) {
@@ -1099,8 +1101,20 @@ static void parse_chap(struct iscsi_context *iscsi, const char *target,
return;
}
secretid = qemu_opt_get(opts, "password-secret");
password = qemu_opt_get(opts, "password");
if (!password) {
if (secretid && password) {
error_setg(errp, "'password' and 'password-secret' properties are "
"mutually exclusive");
return;
}
if (secretid) {
secret = qcrypto_secret_lookup_as_utf8(secretid, errp);
if (!secret) {
return;
}
password = secret;
} else if (!password) {
error_setg(errp, "CHAP username specified but no password was given");
return;
}
@@ -1108,6 +1122,8 @@ static void parse_chap(struct iscsi_context *iscsi, const char *target,
if (iscsi_set_initiator_username_pwd(iscsi, user, password)) {
error_setg(errp, "Failed to set initiator username and password");
}
g_free(secret);
}
static void parse_header_digest(struct iscsi_context *iscsi, const char *target,
@@ -1542,6 +1558,10 @@ static int iscsi_open(BlockDriverState *bs, QDict *options, int flags,
task = NULL;
iscsi_modesense_sync(iscsilun);
if (iscsilun->dpofua) {
bs->supported_write_flags = BDRV_REQ_FUA;
}
bs->supported_zero_flags = BDRV_REQ_MAY_UNMAP;
/* Check the write protect flag of the LUN if we want to write */
if (iscsilun->type == TYPE_DISK && (flags & BDRV_O_RDWR) &&
@@ -1834,7 +1854,7 @@ static BlockDriver bdrv_iscsi = {
.bdrv_co_discard = iscsi_co_discard,
.bdrv_co_write_zeroes = iscsi_co_write_zeroes,
.bdrv_co_readv = iscsi_co_readv,
.bdrv_co_writev = iscsi_co_writev,
.bdrv_co_writev_flags = iscsi_co_writev_flags,
.bdrv_co_flush_to_disk = iscsi_co_flush,
#ifdef __linux__
@@ -1857,6 +1877,11 @@ static QemuOptsList qemu_iscsi_opts = {
.name = "password",
.type = QEMU_OPT_STRING,
.help = "password for CHAP authentication to target",
},{
.name = "password-secret",
.type = QEMU_OPT_STRING,
.help = "ID of the secret providing password for CHAP "
"authentication to target",
},{
.name = "header-digest",
.type = QEMU_OPT_STRING,

View File

@@ -30,7 +30,7 @@
struct qemu_laiocb {
BlockAIOCB common;
struct qemu_laio_state *ctx;
LinuxAioState *ctx;
struct iocb iocb;
ssize_t ret;
size_t nbytes;
@@ -46,7 +46,7 @@ typedef struct {
QSIMPLEQ_HEAD(, qemu_laiocb) pending;
} LaioQueue;
struct qemu_laio_state {
struct LinuxAioState {
io_context_t ctx;
EventNotifier e;
@@ -60,7 +60,7 @@ struct qemu_laio_state {
int event_max;
};
static void ioq_submit(struct qemu_laio_state *s);
static void ioq_submit(LinuxAioState *s);
static inline ssize_t io_event_ret(struct io_event *ev)
{
@@ -70,8 +70,7 @@ static inline ssize_t io_event_ret(struct io_event *ev)
/*
* Completes an AIO request (calls the callback and frees the ACB).
*/
static void qemu_laio_process_completion(struct qemu_laio_state *s,
struct qemu_laiocb *laiocb)
static void qemu_laio_process_completion(struct qemu_laiocb *laiocb)
{
int ret;
@@ -99,7 +98,7 @@ static void qemu_laio_process_completion(struct qemu_laio_state *s,
*
* The function is somewhat tricky because it supports nested event loops, for
* example when a request callback invokes aio_poll(). In order to do this,
* the completion events array and index are kept in qemu_laio_state. The BH
* the completion events array and index are kept in LinuxAioState. The BH
* reschedules itself as long as there are completions pending so it will
* either be called again in a nested event loop or will be called after all
* events have been completed. When there are no events left to complete, the
@@ -107,7 +106,7 @@ static void qemu_laio_process_completion(struct qemu_laio_state *s,
*/
static void qemu_laio_completion_bh(void *opaque)
{
struct qemu_laio_state *s = opaque;
LinuxAioState *s = opaque;
/* Fetch more completion events when empty */
if (s->event_idx == s->event_max) {
@@ -136,7 +135,7 @@ static void qemu_laio_completion_bh(void *opaque)
laiocb->ret = io_event_ret(&s->events[s->event_idx]);
s->event_idx++;
qemu_laio_process_completion(s, laiocb);
qemu_laio_process_completion(laiocb);
}
if (!s->io_q.plugged && !QSIMPLEQ_EMPTY(&s->io_q.pending)) {
@@ -146,7 +145,7 @@ static void qemu_laio_completion_bh(void *opaque)
static void qemu_laio_completion_cb(EventNotifier *e)
{
struct qemu_laio_state *s = container_of(e, struct qemu_laio_state, e);
LinuxAioState *s = container_of(e, LinuxAioState, e);
if (event_notifier_test_and_clear(&s->e)) {
qemu_bh_schedule(s->completion_bh);
@@ -185,7 +184,7 @@ static void ioq_init(LaioQueue *io_q)
io_q->blocked = false;
}
static void ioq_submit(struct qemu_laio_state *s)
static void ioq_submit(LinuxAioState *s)
{
int ret, len;
struct qemu_laiocb *aiocb;
@@ -216,33 +215,25 @@ static void ioq_submit(struct qemu_laio_state *s)
s->io_q.blocked = (s->io_q.n > 0);
}
void laio_io_plug(BlockDriverState *bs, void *aio_ctx)
void laio_io_plug(BlockDriverState *bs, LinuxAioState *s)
{
struct qemu_laio_state *s = aio_ctx;
s->io_q.plugged++;
assert(!s->io_q.plugged);
s->io_q.plugged = 1;
}
void laio_io_unplug(BlockDriverState *bs, void *aio_ctx, bool unplug)
void laio_io_unplug(BlockDriverState *bs, LinuxAioState *s)
{
struct qemu_laio_state *s = aio_ctx;
assert(s->io_q.plugged > 0 || !unplug);
if (unplug && --s->io_q.plugged > 0) {
return;
}
assert(s->io_q.plugged);
s->io_q.plugged = 0;
if (!s->io_q.blocked && !QSIMPLEQ_EMPTY(&s->io_q.pending)) {
ioq_submit(s);
}
}
BlockAIOCB *laio_submit(BlockDriverState *bs, void *aio_ctx, int fd,
BlockAIOCB *laio_submit(BlockDriverState *bs, LinuxAioState *s, int fd,
int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
BlockCompletionFunc *cb, void *opaque, int type)
{
struct qemu_laio_state *s = aio_ctx;
struct qemu_laiocb *laiocb;
struct iocb *iocbs;
off_t offset = sector_num * 512;
@@ -284,26 +275,22 @@ out_free_aiocb:
return NULL;
}
void laio_detach_aio_context(void *s_, AioContext *old_context)
void laio_detach_aio_context(LinuxAioState *s, AioContext *old_context)
{
struct qemu_laio_state *s = s_;
aio_set_event_notifier(old_context, &s->e, false, NULL);
qemu_bh_delete(s->completion_bh);
}
void laio_attach_aio_context(void *s_, AioContext *new_context)
void laio_attach_aio_context(LinuxAioState *s, AioContext *new_context)
{
struct qemu_laio_state *s = s_;
s->completion_bh = aio_bh_new(new_context, qemu_laio_completion_bh, s);
aio_set_event_notifier(new_context, &s->e, false,
qemu_laio_completion_cb);
}
void *laio_init(void)
LinuxAioState *laio_init(void)
{
struct qemu_laio_state *s;
LinuxAioState *s;
s = g_malloc0(sizeof(*s));
if (event_notifier_init(&s->e, false) < 0) {
@@ -325,10 +312,8 @@ out_free_state:
return NULL;
}
void laio_cleanup(void *s_)
void laio_cleanup(LinuxAioState *s)
{
struct qemu_laio_state *s = s_;
event_notifier_cleanup(&s->e);
if (io_destroy(s->ctx) != 0) {

View File

@@ -16,10 +16,10 @@
#include "block/blockjob.h"
#include "block/block_int.h"
#include "sysemu/block-backend.h"
#include "qapi/error.h"
#include "qapi/qmp/qerror.h"
#include "qemu/ratelimit.h"
#include "qemu/bitmap.h"
#include "qemu/error-report.h"
#define SLICE_TIME 100000000ULL /* ns */
#define MAX_IN_FLIGHT 16
@@ -35,7 +35,7 @@ typedef struct MirrorBuffer {
typedef struct MirrorBlockJob {
BlockJob common;
RateLimit limit;
BlockDriverState *target;
BlockBackend *target;
BlockDriverState *base;
/* The name of the graph node to replace */
char *replaces;
@@ -47,7 +47,6 @@ typedef struct MirrorBlockJob {
BlockdevOnError on_source_error, on_target_error;
bool synced;
bool should_complete;
int64_t sector_num;
int64_t granularity;
size_t buf_size;
int64_t bdev_length;
@@ -64,6 +63,8 @@ typedef struct MirrorBlockJob {
int ret;
bool unmap;
bool waiting_for_io;
int target_cluster_sectors;
int max_iov;
} MirrorBlockJob;
typedef struct MirrorOp {
@@ -78,11 +79,11 @@ static BlockErrorAction mirror_error_action(MirrorBlockJob *s, bool read,
{
s->synced = false;
if (read) {
return block_job_error_action(&s->common, s->common.bs,
s->on_source_error, true, error);
return block_job_error_action(&s->common, s->on_source_error,
true, error);
} else {
return block_job_error_action(&s->common, s->target,
s->on_target_error, false, error);
return block_job_error_action(&s->common, s->on_target_error,
false, error);
}
}
@@ -106,7 +107,7 @@ static void mirror_iteration_done(MirrorOp *op, int ret)
sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
chunk_num = op->sector_num / sectors_per_chunk;
nb_chunks = op->nb_sectors / sectors_per_chunk;
nb_chunks = DIV_ROUND_UP(op->nb_sectors, sectors_per_chunk);
bitmap_clear(s->in_flight_bitmap, chunk_num, nb_chunks);
if (ret >= 0) {
if (s->cow_bitmap) {
@@ -155,119 +156,99 @@ static void mirror_read_complete(void *opaque, int ret)
mirror_iteration_done(op, ret);
return;
}
bdrv_aio_writev(s->target, op->sector_num, &op->qiov, op->nb_sectors,
blk_aio_pwritev(s->target, op->sector_num * BDRV_SECTOR_SIZE, &op->qiov,
op->nb_sectors * BDRV_SECTOR_SIZE,
mirror_write_complete, op);
}
static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
static inline void mirror_clip_sectors(MirrorBlockJob *s,
int64_t sector_num,
int *nb_sectors)
{
BlockDriverState *source = s->common.bs;
int nb_sectors, sectors_per_chunk, nb_chunks, max_iov;
int64_t end, sector_num, next_chunk, next_sector, hbitmap_next_sector;
uint64_t delay_ns = 0;
*nb_sectors = MIN(*nb_sectors,
s->bdev_length / BDRV_SECTOR_SIZE - sector_num);
}
/* Round sector_num and/or nb_sectors to target cluster if COW is needed, and
* return the offset of the adjusted tail sector against original. */
static int mirror_cow_align(MirrorBlockJob *s,
int64_t *sector_num,
int *nb_sectors)
{
bool need_cow;
int ret = 0;
int chunk_sectors = s->granularity >> BDRV_SECTOR_BITS;
int64_t align_sector_num = *sector_num;
int align_nb_sectors = *nb_sectors;
int max_sectors = chunk_sectors * s->max_iov;
need_cow = !test_bit(*sector_num / chunk_sectors, s->cow_bitmap);
need_cow |= !test_bit((*sector_num + *nb_sectors - 1) / chunk_sectors,
s->cow_bitmap);
if (need_cow) {
bdrv_round_to_clusters(blk_bs(s->target), *sector_num, *nb_sectors,
&align_sector_num, &align_nb_sectors);
}
if (align_nb_sectors > max_sectors) {
align_nb_sectors = max_sectors;
if (need_cow) {
align_nb_sectors = QEMU_ALIGN_DOWN(align_nb_sectors,
s->target_cluster_sectors);
}
}
/* Clipping may result in align_nb_sectors unaligned to chunk boundary, but
* that doesn't matter because it's already the end of source image. */
mirror_clip_sectors(s, align_sector_num, &align_nb_sectors);
ret = align_sector_num + align_nb_sectors - (*sector_num + *nb_sectors);
*sector_num = align_sector_num;
*nb_sectors = align_nb_sectors;
assert(ret >= 0);
return ret;
}
static inline void mirror_wait_for_io(MirrorBlockJob *s)
{
assert(!s->waiting_for_io);
s->waiting_for_io = true;
qemu_coroutine_yield();
s->waiting_for_io = false;
}
/* Submit async read while handling COW.
* Returns: nb_sectors if no alignment is necessary, or
* (new_end - sector_num) if tail is rounded up or down due to
* alignment or buffer limit.
*/
static int mirror_do_read(MirrorBlockJob *s, int64_t sector_num,
int nb_sectors)
{
BlockBackend *source = s->common.blk;
int sectors_per_chunk, nb_chunks;
int ret = nb_sectors;
MirrorOp *op;
int pnum;
int64_t ret;
BlockDriverState *file;
max_iov = MIN(source->bl.max_iov, s->target->bl.max_iov);
s->sector_num = hbitmap_iter_next(&s->hbi);
if (s->sector_num < 0) {
bdrv_dirty_iter_init(s->dirty_bitmap, &s->hbi);
s->sector_num = hbitmap_iter_next(&s->hbi);
trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bitmap));
assert(s->sector_num >= 0);
}
hbitmap_next_sector = s->sector_num;
sector_num = s->sector_num;
sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
end = s->bdev_length / BDRV_SECTOR_SIZE;
/* Extend the QEMUIOVector to include all adjacent blocks that will
* be copied in this operation.
*
* We have to do this if we have no backing file yet in the destination,
* and the cluster size is very large. Then we need to do COW ourselves.
* The first time a cluster is copied, copy it entirely. Note that,
* because both the granularity and the cluster size are powers of two,
* the number of sectors to copy cannot exceed one cluster.
*
* We also want to extend the QEMUIOVector to include more adjacent
* dirty blocks if possible, to limit the number of I/O operations and
* run efficiently even with a small granularity.
*/
nb_chunks = 0;
nb_sectors = 0;
next_sector = sector_num;
next_chunk = sector_num / sectors_per_chunk;
/* We can only handle as much as buf_size at a time. */
nb_sectors = MIN(s->buf_size >> BDRV_SECTOR_BITS, nb_sectors);
assert(nb_sectors);
/* Wait for I/O to this cluster (from a previous iteration) to be done. */
while (test_bit(next_chunk, s->in_flight_bitmap)) {
trace_mirror_yield_in_flight(s, sector_num, s->in_flight);
s->waiting_for_io = true;
qemu_coroutine_yield();
s->waiting_for_io = false;
if (s->cow_bitmap) {
ret += mirror_cow_align(s, &sector_num, &nb_sectors);
}
assert(nb_sectors << BDRV_SECTOR_BITS <= s->buf_size);
/* The sector range must meet granularity because:
* 1) Caller passes in aligned values;
* 2) mirror_cow_align is used only when target cluster is larger. */
assert(!(sector_num % sectors_per_chunk));
nb_chunks = DIV_ROUND_UP(nb_sectors, sectors_per_chunk);
do {
int added_sectors, added_chunks;
if (!bdrv_get_dirty(source, s->dirty_bitmap, next_sector) ||
test_bit(next_chunk, s->in_flight_bitmap)) {
assert(nb_sectors > 0);
break;
}
added_sectors = sectors_per_chunk;
if (s->cow_bitmap && !test_bit(next_chunk, s->cow_bitmap)) {
bdrv_round_to_clusters(s->target,
next_sector, added_sectors,
&next_sector, &added_sectors);
/* On the first iteration, the rounding may make us copy
* sectors before the first dirty one.
*/
if (next_sector < sector_num) {
assert(nb_sectors == 0);
sector_num = next_sector;
next_chunk = next_sector / sectors_per_chunk;
}
}
added_sectors = MIN(added_sectors, end - (sector_num + nb_sectors));
added_chunks = (added_sectors + sectors_per_chunk - 1) / sectors_per_chunk;
/* When doing COW, it may happen that there is not enough space for
* a full cluster. Wait if that is the case.
*/
while (nb_chunks == 0 && s->buf_free_count < added_chunks) {
trace_mirror_yield_buf_busy(s, nb_chunks, s->in_flight);
s->waiting_for_io = true;
qemu_coroutine_yield();
s->waiting_for_io = false;
}
if (s->buf_free_count < nb_chunks + added_chunks) {
trace_mirror_break_buf_busy(s, nb_chunks, s->in_flight);
break;
}
if (max_iov < nb_chunks + added_chunks) {
trace_mirror_break_iov_max(s, nb_chunks, added_chunks);
break;
}
/* We have enough free space to copy these sectors. */
bitmap_set(s->in_flight_bitmap, next_chunk, added_chunks);
nb_sectors += added_sectors;
nb_chunks += added_chunks;
next_sector += added_sectors;
next_chunk += added_chunks;
if (!s->synced && s->common.speed) {
delay_ns = ratelimit_calculate_delay(&s->limit, added_sectors);
}
} while (delay_ns == 0 && next_sector < end);
while (s->buf_free_count < nb_chunks) {
trace_mirror_yield_in_flight(s, sector_num, s->in_flight);
mirror_wait_for_io(s);
}
/* Allocate a MirrorOp that is used as an AIO callback. */
op = g_new(MirrorOp, 1);
@@ -279,47 +260,160 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
* from s->buf_free.
*/
qemu_iovec_init(&op->qiov, nb_chunks);
next_sector = sector_num;
while (nb_chunks-- > 0) {
MirrorBuffer *buf = QSIMPLEQ_FIRST(&s->buf_free);
size_t remaining = (nb_sectors * BDRV_SECTOR_SIZE) - op->qiov.size;
size_t remaining = nb_sectors * BDRV_SECTOR_SIZE - op->qiov.size;
QSIMPLEQ_REMOVE_HEAD(&s->buf_free, next);
s->buf_free_count--;
qemu_iovec_add(&op->qiov, buf, MIN(s->granularity, remaining));
/* Advance the HBitmapIter in parallel, so that we do not examine
* the same sector twice.
*/
if (next_sector > hbitmap_next_sector
&& bdrv_get_dirty(source, s->dirty_bitmap, next_sector)) {
hbitmap_next_sector = hbitmap_iter_next(&s->hbi);
}
next_sector += sectors_per_chunk;
}
bdrv_reset_dirty_bitmap(s->dirty_bitmap, sector_num, nb_sectors);
/* Copy the dirty cluster. */
s->in_flight++;
s->sectors_in_flight += nb_sectors;
trace_mirror_one_iteration(s, sector_num, nb_sectors);
ret = bdrv_get_block_status_above(source, NULL, sector_num,
nb_sectors, &pnum, &file);
if (ret < 0 || pnum < nb_sectors ||
(ret & BDRV_BLOCK_DATA && !(ret & BDRV_BLOCK_ZERO))) {
bdrv_aio_readv(source, sector_num, &op->qiov, nb_sectors,
mirror_read_complete, op);
} else if (ret & BDRV_BLOCK_ZERO) {
bdrv_aio_write_zeroes(s->target, sector_num, op->nb_sectors,
blk_aio_preadv(source, sector_num * BDRV_SECTOR_SIZE, &op->qiov,
nb_sectors * BDRV_SECTOR_SIZE,
mirror_read_complete, op);
return ret;
}
static void mirror_do_zero_or_discard(MirrorBlockJob *s,
int64_t sector_num,
int nb_sectors,
bool is_discard)
{
MirrorOp *op;
/* Allocate a MirrorOp that is used as an AIO callback. The qiov is zeroed
* so the freeing in mirror_iteration_done is nop. */
op = g_new0(MirrorOp, 1);
op->s = s;
op->sector_num = sector_num;
op->nb_sectors = nb_sectors;
s->in_flight++;
s->sectors_in_flight += nb_sectors;
if (is_discard) {
blk_aio_discard(s->target, sector_num, op->nb_sectors,
mirror_write_complete, op);
} else {
blk_aio_pwrite_zeroes(s->target, sector_num * BDRV_SECTOR_SIZE,
op->nb_sectors * BDRV_SECTOR_SIZE,
s->unmap ? BDRV_REQ_MAY_UNMAP : 0,
mirror_write_complete, op);
} else {
assert(!(ret & BDRV_BLOCK_DATA));
bdrv_aio_discard(s->target, sector_num, op->nb_sectors,
mirror_write_complete, op);
}
}
static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
{
BlockDriverState *source = blk_bs(s->common.blk);
int64_t sector_num, first_chunk;
uint64_t delay_ns = 0;
/* At least the first dirty chunk is mirrored in one iteration. */
int nb_chunks = 1;
int64_t end = s->bdev_length / BDRV_SECTOR_SIZE;
int sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
sector_num = hbitmap_iter_next(&s->hbi);
if (sector_num < 0) {
bdrv_dirty_iter_init(s->dirty_bitmap, &s->hbi);
sector_num = hbitmap_iter_next(&s->hbi);
trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bitmap));
assert(sector_num >= 0);
}
first_chunk = sector_num / sectors_per_chunk;
while (test_bit(first_chunk, s->in_flight_bitmap)) {
trace_mirror_yield_in_flight(s, first_chunk, s->in_flight);
mirror_wait_for_io(s);
}
/* Find the number of consective dirty chunks following the first dirty
* one, and wait for in flight requests in them. */
while (nb_chunks * sectors_per_chunk < (s->buf_size >> BDRV_SECTOR_BITS)) {
int64_t hbitmap_next;
int64_t next_sector = sector_num + nb_chunks * sectors_per_chunk;
int64_t next_chunk = next_sector / sectors_per_chunk;
if (next_sector >= end ||
!bdrv_get_dirty(source, s->dirty_bitmap, next_sector)) {
break;
}
if (test_bit(next_chunk, s->in_flight_bitmap)) {
break;
}
hbitmap_next = hbitmap_iter_next(&s->hbi);
if (hbitmap_next > next_sector || hbitmap_next < 0) {
/* The bitmap iterator's cache is stale, refresh it */
bdrv_set_dirty_iter(&s->hbi, next_sector);
hbitmap_next = hbitmap_iter_next(&s->hbi);
}
assert(hbitmap_next == next_sector);
nb_chunks++;
}
/* Clear dirty bits before querying the block status, because
* calling bdrv_get_block_status_above could yield - if some blocks are
* marked dirty in this window, we need to know.
*/
bdrv_reset_dirty_bitmap(s->dirty_bitmap, sector_num,
nb_chunks * sectors_per_chunk);
bitmap_set(s->in_flight_bitmap, sector_num / sectors_per_chunk, nb_chunks);
while (nb_chunks > 0 && sector_num < end) {
int ret;
int io_sectors;
BlockDriverState *file;
enum MirrorMethod {
MIRROR_METHOD_COPY,
MIRROR_METHOD_ZERO,
MIRROR_METHOD_DISCARD
} mirror_method = MIRROR_METHOD_COPY;
assert(!(sector_num % sectors_per_chunk));
ret = bdrv_get_block_status_above(source, NULL, sector_num,
nb_chunks * sectors_per_chunk,
&io_sectors, &file);
if (ret < 0) {
io_sectors = nb_chunks * sectors_per_chunk;
}
io_sectors -= io_sectors % sectors_per_chunk;
if (io_sectors < sectors_per_chunk) {
io_sectors = sectors_per_chunk;
} else if (ret >= 0 && !(ret & BDRV_BLOCK_DATA)) {
int64_t target_sector_num;
int target_nb_sectors;
bdrv_round_to_clusters(blk_bs(s->target), sector_num, io_sectors,
&target_sector_num, &target_nb_sectors);
if (target_sector_num == sector_num &&
target_nb_sectors == io_sectors) {
mirror_method = ret & BDRV_BLOCK_ZERO ?
MIRROR_METHOD_ZERO :
MIRROR_METHOD_DISCARD;
}
}
mirror_clip_sectors(s, sector_num, &io_sectors);
switch (mirror_method) {
case MIRROR_METHOD_COPY:
io_sectors = mirror_do_read(s, sector_num, io_sectors);
break;
case MIRROR_METHOD_ZERO:
mirror_do_zero_or_discard(s, sector_num, io_sectors, false);
break;
case MIRROR_METHOD_DISCARD:
mirror_do_zero_or_discard(s, sector_num, io_sectors, true);
break;
default:
abort();
}
assert(io_sectors);
sector_num += io_sectors;
nb_chunks -= DIV_ROUND_UP(io_sectors, sectors_per_chunk);
delay_ns += ratelimit_calculate_delay(&s->limit, io_sectors);
}
return delay_ns;
}
@@ -344,9 +438,7 @@ static void mirror_free_init(MirrorBlockJob *s)
static void mirror_drain(MirrorBlockJob *s)
{
while (s->in_flight > 0) {
s->waiting_for_io = true;
qemu_coroutine_yield();
s->waiting_for_io = false;
mirror_wait_for_io(s);
}
}
@@ -359,7 +451,8 @@ static void mirror_exit(BlockJob *job, void *opaque)
MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
MirrorExitData *data = opaque;
AioContext *replace_aio_context = NULL;
BlockDriverState *src = s->common.bs;
BlockDriverState *src = blk_bs(s->common.blk);
BlockDriverState *target_bs = blk_bs(s->target);
/* Make sure that the source BDS doesn't go away before we called
* block_job_completed(). */
@@ -371,26 +464,25 @@ static void mirror_exit(BlockJob *job, void *opaque)
}
if (s->should_complete && data->ret == 0) {
BlockDriverState *to_replace = s->common.bs;
BlockDriverState *to_replace = src;
if (s->to_replace) {
to_replace = s->to_replace;
}
/* This was checked in mirror_start_job(), but meanwhile one of the
* nodes could have been newly attached to a BlockBackend. */
if (to_replace->blk && s->target->blk) {
error_report("block job: Can't create node with two BlockBackends");
data->ret = -EINVAL;
goto out;
if (bdrv_get_flags(target_bs) != bdrv_get_flags(to_replace)) {
bdrv_reopen(target_bs, bdrv_get_flags(to_replace), NULL);
}
if (bdrv_get_flags(s->target) != bdrv_get_flags(to_replace)) {
bdrv_reopen(s->target, bdrv_get_flags(to_replace), NULL);
}
bdrv_replace_in_backing_chain(to_replace, s->target);
/* The mirror job has no requests in flight any more, but we need to
* drain potential other users of the BDS before changing the graph. */
bdrv_drained_begin(target_bs);
bdrv_replace_in_backing_chain(to_replace, target_bs);
bdrv_drained_end(target_bs);
/* We just changed the BDS the job BB refers to */
blk_remove_bs(job->blk);
blk_insert_bs(job->blk, src);
}
out:
if (s->to_replace) {
bdrv_op_unblock_all(s->to_replace, s->replace_blocker);
error_free(s->replace_blocker);
@@ -400,11 +492,14 @@ out:
aio_context_release(replace_aio_context);
}
g_free(s->replaces);
bdrv_op_unblock_all(s->target, s->common.blocker);
bdrv_unref(s->target);
bdrv_op_unblock_all(target_bs, s->common.blocker);
blk_unref(s->target);
block_job_completed(&s->common, data->ret);
g_free(data);
bdrv_drained_end(src);
if (qemu_get_aio_context() == bdrv_get_aio_context(src)) {
aio_enable_external(iohandler_get_aio_context());
}
bdrv_unref(src);
}
@@ -412,7 +507,8 @@ static void coroutine_fn mirror_run(void *opaque)
{
MirrorBlockJob *s = opaque;
MirrorExitData *data;
BlockDriverState *bs = s->common.bs;
BlockDriverState *bs = blk_bs(s->common.blk);
BlockDriverState *target_bs = blk_bs(s->target);
int64_t sector_num, end, length;
uint64_t last_pause_ns;
BlockDriverInfo bdi;
@@ -420,6 +516,7 @@ static void coroutine_fn mirror_run(void *opaque)
checking for a NULL string */
int ret = 0;
int n;
int target_cluster_size = BDRV_SECTOR_SIZE;
if (block_job_is_cancelled(&s->common)) {
goto immediate_exit;
@@ -447,18 +544,18 @@ static void coroutine_fn mirror_run(void *opaque)
* the destination do COW. Instead, we copy sectors around the
* dirty data if needed. We need a bitmap to do that.
*/
bdrv_get_backing_filename(s->target, backing_filename,
bdrv_get_backing_filename(target_bs, backing_filename,
sizeof(backing_filename));
if (backing_filename[0] && !s->target->backing) {
ret = bdrv_get_info(s->target, &bdi);
if (ret < 0) {
goto immediate_exit;
}
if (s->granularity < bdi.cluster_size) {
s->buf_size = MAX(s->buf_size, bdi.cluster_size);
s->cow_bitmap = bitmap_new(length);
}
if (!bdrv_get_info(target_bs, &bdi) && bdi.cluster_size) {
target_cluster_size = bdi.cluster_size;
}
if (backing_filename[0] && !target_bs->backing
&& s->granularity < target_cluster_size) {
s->buf_size = MAX(s->buf_size, target_cluster_size);
s->cow_bitmap = bitmap_new(length);
}
s->target_cluster_sectors = target_cluster_size >> BDRV_SECTOR_BITS;
s->max_iov = MIN(bs->bl.max_iov, target_bs->bl.max_iov);
end = s->bdev_length / BDRV_SECTOR_SIZE;
s->buf = qemu_try_blockalign(bs, s->buf_size);
@@ -473,7 +570,7 @@ static void coroutine_fn mirror_run(void *opaque)
if (!s->is_none_mode) {
/* First part, loop on the sectors and initialize the dirty bitmap. */
BlockDriverState *base = s->base;
bool mark_all_dirty = s->base == NULL && !bdrv_has_zero_init(s->target);
bool mark_all_dirty = s->base == NULL && !bdrv_has_zero_init(target_bs);
for (sector_num = 0; sector_num < end; ) {
/* Just to make sure we are not exceeding int limit. */
@@ -533,9 +630,7 @@ static void coroutine_fn mirror_run(void *opaque)
if (s->in_flight == MAX_IN_FLIGHT || s->buf_free_count == 0 ||
(cnt == 0 && s->in_flight > 0)) {
trace_mirror_yield(s, s->in_flight, s->buf_free_count, cnt);
s->waiting_for_io = true;
qemu_coroutine_yield();
s->waiting_for_io = false;
mirror_wait_for_io(s);
continue;
} else if (cnt != 0) {
delay_ns = mirror_iteration(s);
@@ -545,7 +640,7 @@ static void coroutine_fn mirror_run(void *opaque)
should_complete = false;
if (s->in_flight == 0 && cnt == 0) {
trace_mirror_before_flush(s);
ret = bdrv_flush(s->target);
ret = blk_flush(s->target);
if (ret < 0) {
if (mirror_error_action(s, false, -ret) ==
BLOCK_ERROR_ACTION_REPORT) {
@@ -578,7 +673,7 @@ static void coroutine_fn mirror_run(void *opaque)
* mirror_populate runs.
*/
trace_mirror_before_drain(s, cnt);
bdrv_drain(bs);
bdrv_co_drain(bs);
cnt = bdrv_get_dirty_count(s->dirty_bitmap);
}
@@ -618,15 +713,18 @@ immediate_exit:
g_free(s->cow_bitmap);
g_free(s->in_flight_bitmap);
bdrv_release_dirty_bitmap(bs, s->dirty_bitmap);
if (s->target->blk) {
blk_iostatus_disable(s->target->blk);
}
data = g_malloc(sizeof(*data));
data->ret = ret;
/* Before we switch to target in mirror_exit, make sure data doesn't
* change. */
bdrv_drained_begin(s->common.bs);
bdrv_drained_begin(bs);
if (qemu_get_aio_context() == bdrv_get_aio_context(bs)) {
/* FIXME: virtio host notifiers run on iohandler_ctx, therefore the
* above bdrv_drained_end isn't enough to quiesce it. This is ugly, we
* need a block layer API change to achieve this. */
aio_disable_external(iohandler_get_aio_context());
}
block_job_defer_to_main_loop(&s->common, mirror_exit, data);
}
@@ -641,22 +739,14 @@ static void mirror_set_speed(BlockJob *job, int64_t speed, Error **errp)
ratelimit_set_speed(&s->limit, speed / BDRV_SECTOR_SIZE, SLICE_TIME);
}
static void mirror_iostatus_reset(BlockJob *job)
{
MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
if (s->target->blk) {
blk_iostatus_reset(s->target->blk);
}
}
static void mirror_complete(BlockJob *job, Error **errp)
{
MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
Error *local_err = NULL;
int ret;
ret = bdrv_open_backing_file(s->target, NULL, "backing", &local_err);
ret = bdrv_open_backing_file(blk_bs(s->target), NULL, "backing",
&local_err);
if (ret < 0) {
error_propagate(errp, local_err);
return;
@@ -695,7 +785,6 @@ static const BlockJobDriver mirror_job_driver = {
.instance_size = sizeof(MirrorBlockJob),
.job_type = BLOCK_JOB_TYPE_MIRROR,
.set_speed = mirror_set_speed,
.iostatus_reset= mirror_iostatus_reset,
.complete = mirror_complete,
};
@@ -703,8 +792,6 @@ static const BlockJobDriver commit_active_job_driver = {
.instance_size = sizeof(MirrorBlockJob),
.job_type = BLOCK_JOB_TYPE_COMMIT,
.set_speed = mirror_set_speed,
.iostatus_reset
= mirror_iostatus_reset,
.complete = mirror_complete,
};
@@ -721,7 +808,6 @@ static void mirror_start_job(BlockDriverState *bs, BlockDriverState *target,
bool is_none_mode, BlockDriverState *base)
{
MirrorBlockJob *s;
BlockDriverState *replaced_bs;
if (granularity == 0) {
granularity = bdrv_get_default_bitmap_granularity(target);
@@ -729,13 +815,6 @@ static void mirror_start_job(BlockDriverState *bs, BlockDriverState *target,
assert ((granularity & (granularity - 1)) == 0);
if ((on_source_error == BLOCKDEV_ON_ERROR_STOP ||
on_source_error == BLOCKDEV_ON_ERROR_ENOSPC) &&
(!bs->blk || !blk_iostatus_is_enabled(bs->blk))) {
error_setg(errp, QERR_INVALID_PARAMETER, "on-source-error");
return;
}
if (buf_size < 0) {
error_setg(errp, "Invalid parameter 'buf-size'");
return;
@@ -745,30 +824,17 @@ static void mirror_start_job(BlockDriverState *bs, BlockDriverState *target,
buf_size = DEFAULT_MIRROR_BUF_SIZE;
}
/* We can't support this case as long as the block layer can't handle
* multiple BlockBackends per BlockDriverState. */
if (replaces) {
replaced_bs = bdrv_lookup_bs(replaces, replaces, errp);
if (replaced_bs == NULL) {
return;
}
} else {
replaced_bs = bs;
}
if (replaced_bs->blk && target->blk) {
error_setg(errp, "Can't create node with two BlockBackends");
return;
}
s = block_job_create(driver, bs, speed, cb, opaque, errp);
if (!s) {
return;
}
s->target = blk_new();
blk_insert_bs(s->target, target);
s->replaces = g_strdup(replaces);
s->on_source_error = on_source_error;
s->on_target_error = on_target_error;
s->target = target;
s->is_none_mode = is_none_mode;
s->base = base;
s->granularity = granularity;
@@ -778,17 +844,13 @@ static void mirror_start_job(BlockDriverState *bs, BlockDriverState *target,
s->dirty_bitmap = bdrv_create_dirty_bitmap(bs, granularity, NULL, errp);
if (!s->dirty_bitmap) {
g_free(s->replaces);
blk_unref(s->target);
block_job_unref(&s->common);
return;
}
bdrv_op_block_all(s->target, s->common.blocker);
bdrv_op_block_all(target, s->common.blocker);
bdrv_set_enable_write_cache(s->target, true);
if (s->target->blk) {
blk_set_on_error(s->target->blk, on_target_error, on_target_error);
blk_iostatus_enable(s->target->blk);
}
s->common.co = qemu_coroutine_create(mirror_run);
trace_mirror_start(bs, s, s->common.co, opaque);
qemu_coroutine_enter(s->common.co, s);
@@ -860,7 +922,6 @@ void commit_active_start(BlockDriverState *bs, BlockDriverState *base,
}
}
bdrv_ref(base);
mirror_start_job(bs, base, NULL, speed, 0, 0,
on_error, on_error, false, cb, opaque, &local_err,
&commit_active_job_driver, false, base);

View File

@@ -243,15 +243,15 @@ static int nbd_co_readv_1(BlockDriverState *bs, int64_t sector_num,
static int nbd_co_writev_1(BlockDriverState *bs, int64_t sector_num,
int nb_sectors, QEMUIOVector *qiov,
int offset)
int offset, int flags)
{
NbdClientSession *client = nbd_get_client_session(bs);
struct nbd_request request = { .type = NBD_CMD_WRITE };
struct nbd_reply reply;
ssize_t ret;
if (!bdrv_enable_write_cache(bs) &&
(client->nbdflags & NBD_FLAG_SEND_FUA)) {
if (flags & BDRV_REQ_FUA) {
assert(client->nbdflags & NBD_FLAG_SEND_FUA);
request.type |= NBD_CMD_FLAG_FUA;
}
@@ -291,12 +291,13 @@ int nbd_client_co_readv(BlockDriverState *bs, int64_t sector_num,
}
int nbd_client_co_writev(BlockDriverState *bs, int64_t sector_num,
int nb_sectors, QEMUIOVector *qiov)
int nb_sectors, QEMUIOVector *qiov, int flags)
{
int offset = 0;
int ret;
while (nb_sectors > NBD_MAX_SECTORS) {
ret = nbd_co_writev_1(bs, sector_num, NBD_MAX_SECTORS, qiov, offset);
ret = nbd_co_writev_1(bs, sector_num, NBD_MAX_SECTORS, qiov, offset,
flags);
if (ret < 0) {
return ret;
}
@@ -304,7 +305,7 @@ int nbd_client_co_writev(BlockDriverState *bs, int64_t sector_num,
sector_num += NBD_MAX_SECTORS;
nb_sectors -= NBD_MAX_SECTORS;
}
return nbd_co_writev_1(bs, sector_num, nb_sectors, qiov, offset);
return nbd_co_writev_1(bs, sector_num, nb_sectors, qiov, offset, flags);
}
int nbd_client_co_flush(BlockDriverState *bs)
@@ -318,10 +319,6 @@ int nbd_client_co_flush(BlockDriverState *bs)
return 0;
}
if (client->nbdflags & NBD_FLAG_SEND_FUA) {
request.type |= NBD_CMD_FLAG_FUA;
}
request.from = 0;
request.len = 0;
@@ -417,6 +414,9 @@ int nbd_client_init(BlockDriverState *bs,
logout("Failed to negotiate with the NBD server\n");
return ret;
}
if (client->nbdflags & NBD_FLAG_SEND_FUA) {
bs->supported_write_flags = BDRV_REQ_FUA;
}
qemu_co_mutex_init(&client->send_mutex);
qemu_co_mutex_init(&client->free_sema);

View File

@@ -48,7 +48,7 @@ int nbd_client_co_discard(BlockDriverState *bs, int64_t sector_num,
int nb_sectors);
int nbd_client_co_flush(BlockDriverState *bs);
int nbd_client_co_writev(BlockDriverState *bs, int64_t sector_num,
int nb_sectors, QEMUIOVector *qiov);
int nb_sectors, QEMUIOVector *qiov, int flags);
int nbd_client_co_readv(BlockDriverState *bs, int64_t sector_num,
int nb_sectors, QEMUIOVector *qiov);

View File

@@ -28,6 +28,7 @@
#include "qemu/osdep.h"
#include "block/nbd-client.h"
#include "qapi/error.h"
#include "qemu/uri.h"
#include "block/block_int.h"
#include "qemu/module.h"
@@ -35,7 +36,7 @@
#include "qapi/qmp/qjson.h"
#include "qapi/qmp/qint.h"
#include "qapi/qmp/qstring.h"
#include "qemu/cutils.h"
#define EN_OPTSTR ":exportname="
@@ -204,18 +205,20 @@ static SocketAddress *nbd_config(BDRVNBDState *s, QDict *options, char **export,
saddr = g_new0(SocketAddress, 1);
if (qdict_haskey(options, "path")) {
UnixSocketAddress *q_unix;
saddr->type = SOCKET_ADDRESS_KIND_UNIX;
saddr->u.q_unix = g_new0(UnixSocketAddress, 1);
saddr->u.q_unix->path = g_strdup(qdict_get_str(options, "path"));
q_unix = saddr->u.q_unix.data = g_new0(UnixSocketAddress, 1);
q_unix->path = g_strdup(qdict_get_str(options, "path"));
qdict_del(options, "path");
} else {
InetSocketAddress *inet;
saddr->type = SOCKET_ADDRESS_KIND_INET;
saddr->u.inet = g_new0(InetSocketAddress, 1);
saddr->u.inet->host = g_strdup(qdict_get_str(options, "host"));
inet = saddr->u.inet.data = g_new0(InetSocketAddress, 1);
inet->host = g_strdup(qdict_get_str(options, "host"));
if (!qdict_get_try_str(options, "port")) {
saddr->u.inet->port = g_strdup_printf("%d", NBD_DEFAULT_PORT);
inet->port = g_strdup_printf("%d", NBD_DEFAULT_PORT);
} else {
saddr->u.inet->port = g_strdup(qdict_get_str(options, "port"));
inet->port = g_strdup(qdict_get_str(options, "port"));
}
qdict_del(options, "host");
qdict_del(options, "port");
@@ -319,7 +322,7 @@ static int nbd_open(BlockDriverState *bs, QDict *options, int flags,
error_setg(errp, "TLS only supported over IP sockets");
goto error;
}
hostname = saddr->u.inet->host;
hostname = saddr->u.inet.data->host;
}
/* establish TCP connection, return error if it fails
@@ -352,12 +355,6 @@ static int nbd_co_readv(BlockDriverState *bs, int64_t sector_num,
return nbd_client_co_readv(bs, sector_num, nb_sectors, qiov);
}
static int nbd_co_writev(BlockDriverState *bs, int64_t sector_num,
int nb_sectors, QEMUIOVector *qiov)
{
return nbd_client_co_writev(bs, sector_num, nb_sectors, qiov);
}
static int nbd_co_flush(BlockDriverState *bs)
{
return nbd_client_co_flush(bs);
@@ -454,7 +451,7 @@ static BlockDriver bdrv_nbd = {
.bdrv_parse_filename = nbd_parse_filename,
.bdrv_file_open = nbd_open,
.bdrv_co_readv = nbd_co_readv,
.bdrv_co_writev = nbd_co_writev,
.bdrv_co_writev_flags = nbd_client_co_writev,
.bdrv_close = nbd_close,
.bdrv_co_flush_to_os = nbd_co_flush,
.bdrv_co_discard = nbd_co_discard,
@@ -472,7 +469,7 @@ static BlockDriver bdrv_nbd_tcp = {
.bdrv_parse_filename = nbd_parse_filename,
.bdrv_file_open = nbd_open,
.bdrv_co_readv = nbd_co_readv,
.bdrv_co_writev = nbd_co_writev,
.bdrv_co_writev_flags = nbd_client_co_writev,
.bdrv_close = nbd_close,
.bdrv_co_flush_to_os = nbd_co_flush,
.bdrv_co_discard = nbd_co_discard,
@@ -490,7 +487,7 @@ static BlockDriver bdrv_nbd_unix = {
.bdrv_parse_filename = nbd_parse_filename,
.bdrv_file_open = nbd_open,
.bdrv_co_readv = nbd_co_readv,
.bdrv_co_writev = nbd_co_writev,
.bdrv_co_writev_flags = nbd_client_co_writev,
.bdrv_close = nbd_close,
.bdrv_co_flush_to_os = nbd_co_flush,
.bdrv_co_discard = nbd_co_discard,

View File

@@ -28,14 +28,17 @@
#include "qemu-common.h"
#include "qemu/config-file.h"
#include "qemu/error-report.h"
#include "qapi/error.h"
#include "block/block_int.h"
#include "trace.h"
#include "qemu/iov.h"
#include "qemu/uri.h"
#include "qemu/cutils.h"
#include "sysemu/sysemu.h"
#include <nfsc/libnfs.h>
#define QEMU_NFS_MAX_READAHEAD_SIZE 1048576
#define QEMU_NFS_MAX_DEBUG_LEVEL 2
typedef struct NFSClient {
struct nfs_context *context;
@@ -333,6 +336,17 @@ static int64_t nfs_client_open(NFSClient *client, const char *filename,
val = QEMU_NFS_MAX_READAHEAD_SIZE;
}
nfs_set_readahead(client->context, val);
#endif
#ifdef LIBNFS_FEATURE_DEBUG
} else if (!strcmp(qp->p[i].name, "debug")) {
/* limit the maximum debug level to avoid potential flooding
* of our log files. */
if (val > QEMU_NFS_MAX_DEBUG_LEVEL) {
error_report("NFS Warning: Limiting NFS debug level"
" to %d", QEMU_NFS_MAX_DEBUG_LEVEL);
val = QEMU_NFS_MAX_DEBUG_LEVEL;
}
nfs_set_debug(client->context, val);
#endif
} else {
error_setg(errp, "Unknown NFS parameter name: %s",

View File

@@ -11,13 +11,16 @@
*/
#include "qemu/osdep.h"
#include "qapi/error.h"
#include "block/block_int.h"
#define NULL_OPT_LATENCY "latency-ns"
#define NULL_OPT_ZEROES "read-zeroes"
typedef struct {
int64_t length;
int64_t latency_ns;
bool read_zeroes;
} BDRVNullState;
static QemuOptsList runtime_opts = {
@@ -40,6 +43,11 @@ static QemuOptsList runtime_opts = {
.help = "nanoseconds (approximated) to wait "
"before completing request",
},
{
.name = NULL_OPT_ZEROES,
.type = QEMU_OPT_BOOL,
.help = "return zeroes when read",
},
{ /* end of list */ }
},
};
@@ -61,6 +69,7 @@ static int null_file_open(BlockDriverState *bs, QDict *options, int flags,
error_setg(errp, "latency-ns is invalid");
ret = -EINVAL;
}
s->read_zeroes = qemu_opt_get_bool(opts, NULL_OPT_ZEROES, false);
qemu_opts_del(opts);
return ret;
}
@@ -90,6 +99,12 @@ static coroutine_fn int null_co_readv(BlockDriverState *bs,
int64_t sector_num, int nb_sectors,
QEMUIOVector *qiov)
{
BDRVNullState *s = bs->opaque;
if (s->read_zeroes) {
qemu_iovec_memset(qiov, 0, 0, nb_sectors * BDRV_SECTOR_SIZE);
}
return null_co_common(bs);
}
@@ -159,6 +174,12 @@ static BlockAIOCB *null_aio_readv(BlockDriverState *bs,
BlockCompletionFunc *cb,
void *opaque)
{
BDRVNullState *s = bs->opaque;
if (s->read_zeroes) {
qemu_iovec_memset(qiov, 0, 0, nb_sectors * BDRV_SECTOR_SIZE);
}
return null_aio_common(bs, cb, opaque);
}
@@ -184,6 +205,24 @@ static int null_reopen_prepare(BDRVReopenState *reopen_state,
return 0;
}
static int64_t coroutine_fn null_co_get_block_status(BlockDriverState *bs,
int64_t sector_num,
int nb_sectors, int *pnum,
BlockDriverState **file)
{
BDRVNullState *s = bs->opaque;
off_t start = sector_num * BDRV_SECTOR_SIZE;
*pnum = nb_sectors;
*file = bs;
if (s->read_zeroes) {
return BDRV_BLOCK_OFFSET_VALID | start | BDRV_BLOCK_ZERO;
} else {
return BDRV_BLOCK_OFFSET_VALID | start;
}
}
static BlockDriver bdrv_null_co = {
.format_name = "null-co",
.protocol_name = "null-co",
@@ -197,6 +236,8 @@ static BlockDriver bdrv_null_co = {
.bdrv_co_writev = null_co_writev,
.bdrv_co_flush_to_disk = null_co_flush,
.bdrv_reopen_prepare = null_reopen_prepare,
.bdrv_co_get_block_status = null_co_get_block_status,
};
static BlockDriver bdrv_null_aio = {
@@ -212,6 +253,8 @@ static BlockDriver bdrv_null_aio = {
.bdrv_aio_writev = null_aio_writev,
.bdrv_aio_flush = null_aio_flush,
.bdrv_reopen_prepare = null_reopen_prepare,
.bdrv_co_get_block_status = null_co_get_block_status,
};
static void bdrv_null_init(void)

View File

@@ -28,9 +28,12 @@
* THE SOFTWARE.
*/
#include "qemu/osdep.h"
#include "qapi/error.h"
#include "qemu-common.h"
#include "block/block_int.h"
#include "sysemu/block-backend.h"
#include "qemu/module.h"
#include "qemu/bswap.h"
#include "qemu/bitmap.h"
#include "qapi/util.h"
@@ -461,7 +464,7 @@ static int parallels_create(const char *filename, QemuOpts *opts, Error **errp)
int64_t total_size, cl_size;
uint8_t tmp[BDRV_SECTOR_SIZE];
Error *local_err = NULL;
BlockDriverState *file;
BlockBackend *file;
uint32_t bat_entries, bat_sectors;
ParallelsHeader header;
int ret;
@@ -477,14 +480,16 @@ static int parallels_create(const char *filename, QemuOpts *opts, Error **errp)
return ret;
}
file = NULL;
ret = bdrv_open(&file, filename, NULL, NULL,
BDRV_O_RDWR | BDRV_O_PROTOCOL, &local_err);
if (ret < 0) {
file = blk_new_open(filename, NULL, NULL,
BDRV_O_RDWR | BDRV_O_PROTOCOL, &local_err);
if (file == NULL) {
error_propagate(errp, local_err);
return ret;
return -EIO;
}
ret = bdrv_truncate(file, 0);
blk_set_allow_write_beyond_eof(file, true);
ret = blk_truncate(file, 0);
if (ret < 0) {
goto exit;
}
@@ -508,18 +513,19 @@ static int parallels_create(const char *filename, QemuOpts *opts, Error **errp)
memset(tmp, 0, sizeof(tmp));
memcpy(tmp, &header, sizeof(header));
ret = bdrv_pwrite(file, 0, tmp, BDRV_SECTOR_SIZE);
ret = blk_pwrite(file, 0, tmp, BDRV_SECTOR_SIZE, 0);
if (ret < 0) {
goto exit;
}
ret = bdrv_write_zeroes(file, 1, bat_sectors - 1, 0);
ret = blk_pwrite_zeroes(file, BDRV_SECTOR_SIZE,
(bat_sectors - 1) << BDRV_SECTOR_BITS, 0);
if (ret < 0) {
goto exit;
}
ret = 0;
done:
bdrv_unref(file);
blk_unref(file);
return ret;
exit:

View File

@@ -32,8 +32,10 @@
#include "qapi/qmp-output-visitor.h"
#include "qapi/qmp/types.h"
#include "sysemu/block-backend.h"
#include "qemu/cutils.h"
BlockDeviceInfo *bdrv_block_device_info(BlockDriverState *bs, Error **errp)
BlockDeviceInfo *bdrv_block_device_info(BlockBackend *blk,
BlockDriverState *bs, Error **errp)
{
ImageInfo **p_image_info;
BlockDriverState *bs0;
@@ -47,7 +49,7 @@ BlockDeviceInfo *bdrv_block_device_info(BlockDriverState *bs, Error **errp)
info->cache = g_new(BlockdevCacheInfo, 1);
*info->cache = (BlockdevCacheInfo) {
.writeback = bdrv_enable_write_cache(bs),
.writeback = blk ? blk_enable_write_cache(blk) : true,
.direct = !!(bs->open_flags & BDRV_O_NOCACHE),
.no_flush = !!(bs->open_flags & BDRV_O_NO_FLUSH),
};
@@ -65,10 +67,10 @@ BlockDeviceInfo *bdrv_block_device_info(BlockDriverState *bs, Error **errp)
info->backing_file_depth = bdrv_get_backing_file_depth(bs);
info->detect_zeroes = bs->detect_zeroes;
if (bs->throttle_state) {
if (blk && blk_get_public(blk)->throttle_state) {
ThrottleConfig cfg;
throttle_group_get_config(bs, &cfg);
throttle_group_get_config(blk, &cfg);
info->bps = cfg.buckets[THROTTLE_BPS_TOTAL].avg;
info->bps_rd = cfg.buckets[THROTTLE_BPS_READ].avg;
@@ -116,7 +118,7 @@ BlockDeviceInfo *bdrv_block_device_info(BlockDriverState *bs, Error **errp)
info->iops_size = cfg.op_size;
info->has_group = true;
info->group = g_strdup(throttle_group_get_name(bs));
info->group = g_strdup(throttle_group_get_name(blk));
}
info->write_threshold = bdrv_write_threshold_get(bs);
@@ -342,7 +344,7 @@ static void bdrv_query_info(BlockBackend *blk, BlockInfo **p_info,
if (bs && bs->drv) {
info->has_inserted = true;
info->inserted = bdrv_block_device_info(bs, errp);
info->inserted = bdrv_block_device_info(blk, bs, errp);
if (info->inserted == NULL) {
goto err;
}
@@ -355,100 +357,115 @@ static void bdrv_query_info(BlockBackend *blk, BlockInfo **p_info,
qapi_free_BlockInfo(info);
}
static BlockStats *bdrv_query_stats(const BlockDriverState *bs,
bool query_backing)
static BlockStats *bdrv_query_stats(BlockBackend *blk,
const BlockDriverState *bs,
bool query_backing);
static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
{
BlockStats *s;
BlockAcctStats *stats = blk_get_stats(blk);
BlockAcctTimedStats *ts = NULL;
s = g_malloc0(sizeof(*s));
ds->rd_bytes = stats->nr_bytes[BLOCK_ACCT_READ];
ds->wr_bytes = stats->nr_bytes[BLOCK_ACCT_WRITE];
ds->rd_operations = stats->nr_ops[BLOCK_ACCT_READ];
ds->wr_operations = stats->nr_ops[BLOCK_ACCT_WRITE];
if (bdrv_get_device_name(bs)[0]) {
s->has_device = true;
s->device = g_strdup(bdrv_get_device_name(bs));
ds->failed_rd_operations = stats->failed_ops[BLOCK_ACCT_READ];
ds->failed_wr_operations = stats->failed_ops[BLOCK_ACCT_WRITE];
ds->failed_flush_operations = stats->failed_ops[BLOCK_ACCT_FLUSH];
ds->invalid_rd_operations = stats->invalid_ops[BLOCK_ACCT_READ];
ds->invalid_wr_operations = stats->invalid_ops[BLOCK_ACCT_WRITE];
ds->invalid_flush_operations =
stats->invalid_ops[BLOCK_ACCT_FLUSH];
ds->rd_merged = stats->merged[BLOCK_ACCT_READ];
ds->wr_merged = stats->merged[BLOCK_ACCT_WRITE];
ds->flush_operations = stats->nr_ops[BLOCK_ACCT_FLUSH];
ds->wr_total_time_ns = stats->total_time_ns[BLOCK_ACCT_WRITE];
ds->rd_total_time_ns = stats->total_time_ns[BLOCK_ACCT_READ];
ds->flush_total_time_ns = stats->total_time_ns[BLOCK_ACCT_FLUSH];
ds->has_idle_time_ns = stats->last_access_time_ns > 0;
if (ds->has_idle_time_ns) {
ds->idle_time_ns = block_acct_idle_time_ns(stats);
}
ds->account_invalid = stats->account_invalid;
ds->account_failed = stats->account_failed;
while ((ts = block_acct_interval_next(stats, ts))) {
BlockDeviceTimedStatsList *timed_stats =
g_malloc0(sizeof(*timed_stats));
BlockDeviceTimedStats *dev_stats = g_malloc0(sizeof(*dev_stats));
timed_stats->next = ds->timed_stats;
timed_stats->value = dev_stats;
ds->timed_stats = timed_stats;
TimedAverage *rd = &ts->latency[BLOCK_ACCT_READ];
TimedAverage *wr = &ts->latency[BLOCK_ACCT_WRITE];
TimedAverage *fl = &ts->latency[BLOCK_ACCT_FLUSH];
dev_stats->interval_length = ts->interval_length;
dev_stats->min_rd_latency_ns = timed_average_min(rd);
dev_stats->max_rd_latency_ns = timed_average_max(rd);
dev_stats->avg_rd_latency_ns = timed_average_avg(rd);
dev_stats->min_wr_latency_ns = timed_average_min(wr);
dev_stats->max_wr_latency_ns = timed_average_max(wr);
dev_stats->avg_wr_latency_ns = timed_average_avg(wr);
dev_stats->min_flush_latency_ns = timed_average_min(fl);
dev_stats->max_flush_latency_ns = timed_average_max(fl);
dev_stats->avg_flush_latency_ns = timed_average_avg(fl);
dev_stats->avg_rd_queue_depth =
block_acct_queue_depth(ts, BLOCK_ACCT_READ);
dev_stats->avg_wr_queue_depth =
block_acct_queue_depth(ts, BLOCK_ACCT_WRITE);
}
}
static void bdrv_query_bds_stats(BlockStats *s, const BlockDriverState *bs,
bool query_backing)
{
if (bdrv_get_node_name(bs)[0]) {
s->has_node_name = true;
s->node_name = g_strdup(bdrv_get_node_name(bs));
}
s->stats = g_malloc0(sizeof(*s->stats));
if (bs->blk) {
BlockAcctStats *stats = blk_get_stats(bs->blk);
BlockAcctTimedStats *ts = NULL;
s->stats->rd_bytes = stats->nr_bytes[BLOCK_ACCT_READ];
s->stats->wr_bytes = stats->nr_bytes[BLOCK_ACCT_WRITE];
s->stats->rd_operations = stats->nr_ops[BLOCK_ACCT_READ];
s->stats->wr_operations = stats->nr_ops[BLOCK_ACCT_WRITE];
s->stats->failed_rd_operations = stats->failed_ops[BLOCK_ACCT_READ];
s->stats->failed_wr_operations = stats->failed_ops[BLOCK_ACCT_WRITE];
s->stats->failed_flush_operations = stats->failed_ops[BLOCK_ACCT_FLUSH];
s->stats->invalid_rd_operations = stats->invalid_ops[BLOCK_ACCT_READ];
s->stats->invalid_wr_operations = stats->invalid_ops[BLOCK_ACCT_WRITE];
s->stats->invalid_flush_operations =
stats->invalid_ops[BLOCK_ACCT_FLUSH];
s->stats->rd_merged = stats->merged[BLOCK_ACCT_READ];
s->stats->wr_merged = stats->merged[BLOCK_ACCT_WRITE];
s->stats->flush_operations = stats->nr_ops[BLOCK_ACCT_FLUSH];
s->stats->wr_total_time_ns = stats->total_time_ns[BLOCK_ACCT_WRITE];
s->stats->rd_total_time_ns = stats->total_time_ns[BLOCK_ACCT_READ];
s->stats->flush_total_time_ns = stats->total_time_ns[BLOCK_ACCT_FLUSH];
s->stats->has_idle_time_ns = stats->last_access_time_ns > 0;
if (s->stats->has_idle_time_ns) {
s->stats->idle_time_ns = block_acct_idle_time_ns(stats);
}
s->stats->account_invalid = stats->account_invalid;
s->stats->account_failed = stats->account_failed;
while ((ts = block_acct_interval_next(stats, ts))) {
BlockDeviceTimedStatsList *timed_stats =
g_malloc0(sizeof(*timed_stats));
BlockDeviceTimedStats *dev_stats = g_malloc0(sizeof(*dev_stats));
timed_stats->next = s->stats->timed_stats;
timed_stats->value = dev_stats;
s->stats->timed_stats = timed_stats;
TimedAverage *rd = &ts->latency[BLOCK_ACCT_READ];
TimedAverage *wr = &ts->latency[BLOCK_ACCT_WRITE];
TimedAverage *fl = &ts->latency[BLOCK_ACCT_FLUSH];
dev_stats->interval_length = ts->interval_length;
dev_stats->min_rd_latency_ns = timed_average_min(rd);
dev_stats->max_rd_latency_ns = timed_average_max(rd);
dev_stats->avg_rd_latency_ns = timed_average_avg(rd);
dev_stats->min_wr_latency_ns = timed_average_min(wr);
dev_stats->max_wr_latency_ns = timed_average_max(wr);
dev_stats->avg_wr_latency_ns = timed_average_avg(wr);
dev_stats->min_flush_latency_ns = timed_average_min(fl);
dev_stats->max_flush_latency_ns = timed_average_max(fl);
dev_stats->avg_flush_latency_ns = timed_average_avg(fl);
dev_stats->avg_rd_queue_depth =
block_acct_queue_depth(ts, BLOCK_ACCT_READ);
dev_stats->avg_wr_queue_depth =
block_acct_queue_depth(ts, BLOCK_ACCT_WRITE);
}
}
s->stats->wr_highest_offset = bs->wr_highest_offset;
if (bs->file) {
s->has_parent = true;
s->parent = bdrv_query_stats(bs->file->bs, query_backing);
s->parent = bdrv_query_stats(NULL, bs->file->bs, query_backing);
}
if (query_backing && bs->backing) {
s->has_backing = true;
s->backing = bdrv_query_stats(bs->backing->bs, query_backing);
s->backing = bdrv_query_stats(NULL, bs->backing->bs, query_backing);
}
}
static BlockStats *bdrv_query_stats(BlockBackend *blk,
const BlockDriverState *bs,
bool query_backing)
{
BlockStats *s;
s = g_malloc0(sizeof(*s));
s->stats = g_malloc0(sizeof(*s->stats));
if (blk) {
s->has_device = true;
s->device = g_strdup(blk_name(blk));
bdrv_query_blk_stats(s->stats, blk);
}
if (bs) {
bdrv_query_bds_stats(s, bs, query_backing);
}
return s;
@@ -477,22 +494,38 @@ BlockInfoList *qmp_query_block(Error **errp)
return head;
}
static bool next_query_bds(BlockBackend **blk, BlockDriverState **bs,
bool query_nodes)
{
if (query_nodes) {
*bs = bdrv_next_node(*bs);
return !!*bs;
}
*blk = blk_next(*blk);
*bs = *blk ? blk_bs(*blk) : NULL;
return !!*blk;
}
BlockStatsList *qmp_query_blockstats(bool has_query_nodes,
bool query_nodes,
Error **errp)
{
BlockStatsList *head = NULL, **p_next = &head;
BlockBackend *blk = NULL;
BlockDriverState *bs = NULL;
/* Just to be safe if query_nodes is not always initialized */
query_nodes = has_query_nodes && query_nodes;
while ((bs = query_nodes ? bdrv_next_node(bs) : bdrv_next(bs))) {
while (next_query_bds(&blk, &bs, query_nodes)) {
BlockStatsList *info = g_malloc0(sizeof(*info));
AioContext *ctx = bdrv_get_aio_context(bs);
AioContext *ctx = blk ? blk_get_aio_context(blk)
: bdrv_get_aio_context(bs);
aio_context_acquire(ctx);
info->value = bdrv_query_stats(bs, !query_nodes);
info->value = bdrv_query_stats(blk, bs, !query_nodes);
aio_context_release(ctx);
*p_next = info;
@@ -619,9 +652,8 @@ static void dump_qlist(fprintf_function func_fprintf, void *f, int indentation,
for (entry = qlist_first(list); entry; entry = qlist_next(entry), i++) {
QType type = qobject_type(entry->value);
bool composite = (type == QTYPE_QDICT || type == QTYPE_QLIST);
const char *format = composite ? "%*s[%i]:\n" : "%*s[%i]: ";
func_fprintf(f, format, indentation * 4, "", i);
func_fprintf(f, "%*s[%i]:%c", indentation * 4, "", i,
composite ? '\n' : ' ');
dump_qobject(func_fprintf, f, indentation + 1, entry->value);
if (!composite) {
func_fprintf(f, "\n");
@@ -637,8 +669,7 @@ static void dump_qdict(fprintf_function func_fprintf, void *f, int indentation,
for (entry = qdict_first(dict); entry; entry = qdict_next(dict, entry)) {
QType type = qobject_type(entry->value);
bool composite = (type == QTYPE_QDICT || type == QTYPE_QLIST);
const char *format = composite ? "%*s%s:\n" : "%*s%s: ";
char key[strlen(entry->key) + 1];
char *key = g_malloc(strlen(entry->key) + 1);
int i;
/* replace dashes with spaces in key (variable) names */
@@ -646,12 +677,13 @@ static void dump_qdict(fprintf_function func_fprintf, void *f, int indentation,
key[i] = entry->key[i] == '-' ? ' ' : entry->key[i];
}
key[i] = 0;
func_fprintf(f, format, indentation * 4, "", key);
func_fprintf(f, "%*s%s:%c", indentation * 4, "", key,
composite ? '\n' : ' ');
dump_qobject(func_fprintf, f, indentation + 1, entry->value);
if (!composite) {
func_fprintf(f, "\n");
}
g_free(key);
}
}

View File

@@ -22,9 +22,13 @@
* THE SOFTWARE.
*/
#include "qemu/osdep.h"
#include "qapi/error.h"
#include "qemu-common.h"
#include "qemu/error-report.h"
#include "block/block_int.h"
#include "sysemu/block-backend.h"
#include "qemu/module.h"
#include "qemu/bswap.h"
#include <zlib.h>
#include "qapi/qmp/qerror.h"
#include "crypto/cipher.h"
@@ -120,11 +124,7 @@ static int qcow_open(BlockDriverState *bs, QDict *options, int flags,
goto fail;
}
if (header.version != QCOW_VERSION) {
char version[64];
snprintf(version, sizeof(version), "QCOW version %" PRIu32,
header.version);
error_setg(errp, QERR_UNKNOWN_BLOCK_FORMAT_FEATURE,
bdrv_get_device_or_node_name(bs), "qcow", version);
error_setg(errp, "Unsupported qcow version %" PRIu32, header.version);
ret = -ENOTSUP;
goto fail;
}
@@ -160,6 +160,14 @@ static int qcow_open(BlockDriverState *bs, QDict *options, int flags,
}
s->crypt_method_header = header.crypt_method;
if (s->crypt_method_header) {
if (bdrv_uses_whitelist() &&
s->crypt_method_header == QCOW_CRYPT_AES) {
error_report("qcow built-in AES encryption is deprecated");
error_printf("Support for it will be removed in a future release.\n"
"You can use 'qemu-img convert' to switch to an\n"
"unencrypted qcow image, or a LUKS raw image.\n");
}
bs->encrypted = 1;
}
s->cluster_bits = header.cluster_bits;
@@ -780,7 +788,7 @@ static int qcow_create(const char *filename, QemuOpts *opts, Error **errp)
int flags = 0;
Error *local_err = NULL;
int ret;
BlockDriverState *qcow_bs;
BlockBackend *qcow_blk;
/* Read out options */
total_size = ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0),
@@ -796,15 +804,17 @@ static int qcow_create(const char *filename, QemuOpts *opts, Error **errp)
goto cleanup;
}
qcow_bs = NULL;
ret = bdrv_open(&qcow_bs, filename, NULL, NULL,
BDRV_O_RDWR | BDRV_O_PROTOCOL, &local_err);
if (ret < 0) {
qcow_blk = blk_new_open(filename, NULL, NULL,
BDRV_O_RDWR | BDRV_O_PROTOCOL, &local_err);
if (qcow_blk == NULL) {
error_propagate(errp, local_err);
ret = -EIO;
goto cleanup;
}
ret = bdrv_truncate(qcow_bs, 0);
blk_set_allow_write_beyond_eof(qcow_blk, true);
ret = blk_truncate(qcow_blk, 0);
if (ret < 0) {
goto exit;
}
@@ -844,14 +854,14 @@ static int qcow_create(const char *filename, QemuOpts *opts, Error **errp)
}
/* write all the data */
ret = bdrv_pwrite(qcow_bs, 0, &header, sizeof(header));
ret = blk_pwrite(qcow_blk, 0, &header, sizeof(header), 0);
if (ret != sizeof(header)) {
goto exit;
}
if (backing_file) {
ret = bdrv_pwrite(qcow_bs, sizeof(header),
backing_file, backing_filename_len);
ret = blk_pwrite(qcow_blk, sizeof(header),
backing_file, backing_filename_len, 0);
if (ret != backing_filename_len) {
goto exit;
}
@@ -860,8 +870,8 @@ static int qcow_create(const char *filename, QemuOpts *opts, Error **errp)
tmp = g_malloc0(BDRV_SECTOR_SIZE);
for (i = 0; i < ((sizeof(uint64_t)*l1_size + BDRV_SECTOR_SIZE - 1)/
BDRV_SECTOR_SIZE); i++) {
ret = bdrv_pwrite(qcow_bs, header_size +
BDRV_SECTOR_SIZE*i, tmp, BDRV_SECTOR_SIZE);
ret = blk_pwrite(qcow_blk, header_size + BDRV_SECTOR_SIZE * i,
tmp, BDRV_SECTOR_SIZE, 0);
if (ret != BDRV_SECTOR_SIZE) {
g_free(tmp);
goto exit;
@@ -871,7 +881,7 @@ static int qcow_create(const char *filename, QemuOpts *opts, Error **errp)
g_free(tmp);
ret = 0;
exit:
bdrv_unref(qcow_bs);
blk_unref(qcow_blk);
cleanup:
g_free(backing_file);
return ret;

View File

@@ -25,9 +25,11 @@
#include "qemu/osdep.h"
#include <zlib.h>
#include "qapi/error.h"
#include "qemu-common.h"
#include "block/block_int.h"
#include "block/qcow2.h"
#include "qemu/bswap.h"
#include "trace.h"
int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,

View File

@@ -23,10 +23,12 @@
*/
#include "qemu/osdep.h"
#include "qapi/error.h"
#include "qemu-common.h"
#include "block/block_int.h"
#include "block/qcow2.h"
#include "qemu/range.h"
#include "qemu/bswap.h"
static int64_t alloc_clusters_noref(BlockDriverState *bs, uint64_t size);
static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,

View File

@@ -23,10 +23,12 @@
*/
#include "qemu/osdep.h"
#include "qemu-common.h"
#include "qapi/error.h"
#include "block/block_int.h"
#include "block/qcow2.h"
#include "qemu/bswap.h"
#include "qemu/error-report.h"
#include "qemu/cutils.h"
void qcow2_free_snapshots(BlockDriverState *bs)
{

View File

@@ -22,8 +22,8 @@
* THE SOFTWARE.
*/
#include "qemu/osdep.h"
#include "qemu-common.h"
#include "block/block_int.h"
#include "sysemu/block-backend.h"
#include "qemu/module.h"
#include <zlib.h>
#include "block/qcow2.h"
@@ -35,6 +35,8 @@
#include "qapi-event.h"
#include "trace.h"
#include "qemu/option_int.h"
#include "qemu/cutils.h"
#include "qemu/bswap.h"
/*
Differences with QCOW:
@@ -197,22 +199,8 @@ static void cleanup_unknown_header_ext(BlockDriverState *bs)
}
}
static void GCC_FMT_ATTR(3, 4) report_unsupported(BlockDriverState *bs,
Error **errp, const char *fmt, ...)
{
char msg[64];
va_list ap;
va_start(ap, fmt);
vsnprintf(msg, sizeof(msg), fmt, ap);
va_end(ap);
error_setg(errp, QERR_UNKNOWN_BLOCK_FORMAT_FEATURE,
bdrv_get_device_or_node_name(bs), "qcow2", msg);
}
static void report_unsupported_feature(BlockDriverState *bs,
Error **errp, Qcow2Feature *table, uint64_t mask)
static void report_unsupported_feature(Error **errp, Qcow2Feature *table,
uint64_t mask)
{
char *features = g_strdup("");
char *old;
@@ -237,7 +225,7 @@ static void report_unsupported_feature(BlockDriverState *bs,
g_free(old);
}
report_unsupported(bs, errp, "%s", features);
error_setg(errp, "Unsupported qcow2 feature(s): %s", features);
g_free(features);
}
@@ -854,7 +842,7 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
goto fail;
}
if (header.version < 2 || header.version > 3) {
report_unsupported(bs, errp, "QCOW version %" PRIu32, header.version);
error_setg(errp, "Unsupported qcow2 version %" PRIu32, header.version);
ret = -ENOTSUP;
goto fail;
}
@@ -934,7 +922,7 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
void *feature_table = NULL;
qcow2_read_extensions(bs, header.header_length, ext_end,
&feature_table, NULL);
report_unsupported_feature(bs, errp, feature_table,
report_unsupported_feature(errp, feature_table,
s->incompatible_features &
~QCOW2_INCOMPAT_MASK);
ret = -ENOTSUP;
@@ -978,6 +966,14 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
}
s->crypt_method_header = header.crypt_method;
if (s->crypt_method_header) {
if (bdrv_uses_whitelist() &&
s->crypt_method_header == QCOW_CRYPT_AES) {
error_report("qcow2 built-in AES encryption is deprecated");
error_printf("Support for it will be removed in a future release.\n"
"You can use 'qemu-img convert' to switch to an\n"
"unencrypted qcow2 image, or a LUKS raw image.\n");
}
bs->encrypted = 1;
}
@@ -1762,13 +1758,6 @@ static void qcow2_invalidate_cache(BlockDriverState *bs, Error **errp)
qcow2_close(bs);
bdrv_invalidate_cache(bs->file->bs, &local_err);
if (local_err) {
error_propagate(errp, local_err);
bs->drv = NULL;
return;
}
memset(s, 0, sizeof(BDRVQcow2State));
options = qdict_clone_shallow(bs->options);
@@ -1991,6 +1980,10 @@ static int qcow2_change_backing_file(BlockDriverState *bs,
{
BDRVQcow2State *s = bs->opaque;
if (backing_file && strlen(backing_file) > 1023) {
return -EINVAL;
}
pstrcpy(bs->backing_file, sizeof(bs->backing_file), backing_file ?: "");
pstrcpy(bs->backing_format, sizeof(bs->backing_format), backing_fmt ?: "");
@@ -2097,7 +2090,7 @@ static int qcow2_create2(const char *filename, int64_t total_size,
* 2 GB for 64k clusters, and we don't want to have a 2 GB initial file
* size for any qcow2 image.
*/
BlockDriverState* bs;
BlockBackend *blk;
QCowHeader *header;
uint64_t* refcount_table;
Error *local_err = NULL;
@@ -2172,14 +2165,15 @@ static int qcow2_create2(const char *filename, int64_t total_size,
return ret;
}
bs = NULL;
ret = bdrv_open(&bs, filename, NULL, NULL, BDRV_O_RDWR | BDRV_O_PROTOCOL,
&local_err);
if (ret < 0) {
blk = blk_new_open(filename, NULL, NULL,
BDRV_O_RDWR | BDRV_O_PROTOCOL, &local_err);
if (blk == NULL) {
error_propagate(errp, local_err);
return ret;
return -EIO;
}
blk_set_allow_write_beyond_eof(blk, true);
/* Write the header */
QEMU_BUILD_BUG_ON((1 << MIN_CLUSTER_BITS) < sizeof(*header));
header = g_malloc0(cluster_size);
@@ -2207,7 +2201,7 @@ static int qcow2_create2(const char *filename, int64_t total_size,
cpu_to_be64(QCOW2_COMPAT_LAZY_REFCOUNTS);
}
ret = bdrv_pwrite(bs, 0, header, cluster_size);
ret = blk_pwrite(blk, 0, header, cluster_size, 0);
g_free(header);
if (ret < 0) {
error_setg_errno(errp, -ret, "Could not write qcow2 header");
@@ -2217,7 +2211,7 @@ static int qcow2_create2(const char *filename, int64_t total_size,
/* Write a refcount table with one refcount block */
refcount_table = g_malloc0(2 * cluster_size);
refcount_table[0] = cpu_to_be64(2 * cluster_size);
ret = bdrv_pwrite(bs, cluster_size, refcount_table, 2 * cluster_size);
ret = blk_pwrite(blk, cluster_size, refcount_table, 2 * cluster_size, 0);
g_free(refcount_table);
if (ret < 0) {
@@ -2225,8 +2219,8 @@ static int qcow2_create2(const char *filename, int64_t total_size,
goto out;
}
bdrv_unref(bs);
bs = NULL;
blk_unref(blk);
blk = NULL;
/*
* And now open the image and make it consistent first (i.e. increase the
@@ -2235,15 +2229,15 @@ static int qcow2_create2(const char *filename, int64_t total_size,
*/
options = qdict_new();
qdict_put(options, "driver", qstring_from_str("qcow2"));
ret = bdrv_open(&bs, filename, NULL, options,
BDRV_O_RDWR | BDRV_O_CACHE_WB | BDRV_O_NO_FLUSH,
&local_err);
if (ret < 0) {
blk = blk_new_open(filename, NULL, options,
BDRV_O_RDWR | BDRV_O_NO_FLUSH, &local_err);
if (blk == NULL) {
error_propagate(errp, local_err);
ret = -EIO;
goto out;
}
ret = qcow2_alloc_clusters(bs, 3 * cluster_size);
ret = qcow2_alloc_clusters(blk_bs(blk), 3 * cluster_size);
if (ret < 0) {
error_setg_errno(errp, -ret, "Could not allocate clusters for qcow2 "
"header and refcount table");
@@ -2255,14 +2249,14 @@ static int qcow2_create2(const char *filename, int64_t total_size,
}
/* Create a full header (including things like feature table) */
ret = qcow2_update_header(bs);
ret = qcow2_update_header(blk_bs(blk));
if (ret < 0) {
error_setg_errno(errp, -ret, "Could not update qcow2 header");
goto out;
}
/* Okay, now that we have a valid image, let's give it the right size */
ret = bdrv_truncate(bs, total_size);
ret = blk_truncate(blk, total_size);
if (ret < 0) {
error_setg_errno(errp, -ret, "Could not resize image");
goto out;
@@ -2270,7 +2264,7 @@ static int qcow2_create2(const char *filename, int64_t total_size,
/* Want a backing file? There you go.*/
if (backing_file) {
ret = bdrv_change_backing_file(bs, backing_file, backing_format);
ret = bdrv_change_backing_file(blk_bs(blk), backing_file, backing_format);
if (ret < 0) {
error_setg_errno(errp, -ret, "Could not assign backing file '%s' "
"with format '%s'", backing_file, backing_format);
@@ -2280,9 +2274,9 @@ static int qcow2_create2(const char *filename, int64_t total_size,
/* And if we're supposed to preallocate metadata, do that now */
if (prealloc != PREALLOC_MODE_OFF) {
BDRVQcow2State *s = bs->opaque;
BDRVQcow2State *s = blk_bs(blk)->opaque;
qemu_co_mutex_lock(&s->lock);
ret = preallocate(bs);
ret = preallocate(blk_bs(blk));
qemu_co_mutex_unlock(&s->lock);
if (ret < 0) {
error_setg_errno(errp, -ret, "Could not preallocate metadata");
@@ -2290,24 +2284,24 @@ static int qcow2_create2(const char *filename, int64_t total_size,
}
}
bdrv_unref(bs);
bs = NULL;
blk_unref(blk);
blk = NULL;
/* Reopen the image without BDRV_O_NO_FLUSH to flush it before returning */
options = qdict_new();
qdict_put(options, "driver", qstring_from_str("qcow2"));
ret = bdrv_open(&bs, filename, NULL, options,
BDRV_O_RDWR | BDRV_O_CACHE_WB | BDRV_O_NO_BACKING,
&local_err);
if (local_err) {
blk = blk_new_open(filename, NULL, options,
BDRV_O_RDWR | BDRV_O_NO_BACKING, &local_err);
if (blk == NULL) {
error_propagate(errp, local_err);
ret = -EIO;
goto out;
}
ret = 0;
out:
if (bs) {
bdrv_unref(bs);
if (blk) {
blk_unref(blk);
}
return ret;
}
@@ -2411,21 +2405,75 @@ finish:
return ret;
}
static bool is_zero_cluster(BlockDriverState *bs, int64_t start)
{
BDRVQcow2State *s = bs->opaque;
int nr;
BlockDriverState *file;
int64_t res = bdrv_get_block_status_above(bs, NULL, start,
s->cluster_sectors, &nr, &file);
return res >= 0 && (res & BDRV_BLOCK_ZERO) && nr == s->cluster_sectors;
}
static bool is_zero_cluster_top_locked(BlockDriverState *bs, int64_t start)
{
BDRVQcow2State *s = bs->opaque;
int nr = s->cluster_sectors;
uint64_t off;
int ret;
ret = qcow2_get_cluster_offset(bs, start << BDRV_SECTOR_BITS, &nr, &off);
assert(nr == s->cluster_sectors);
return ret == QCOW2_CLUSTER_UNALLOCATED || ret == QCOW2_CLUSTER_ZERO;
}
static coroutine_fn int qcow2_co_write_zeroes(BlockDriverState *bs,
int64_t sector_num, int nb_sectors, BdrvRequestFlags flags)
{
int ret;
BDRVQcow2State *s = bs->opaque;
/* Emulate misaligned zero writes */
if (sector_num % s->cluster_sectors || nb_sectors % s->cluster_sectors) {
return -ENOTSUP;
int head = sector_num % s->cluster_sectors;
int tail = (sector_num + nb_sectors) % s->cluster_sectors;
if (head != 0 || tail != 0) {
int64_t cl_end = -1;
sector_num -= head;
nb_sectors += head;
if (tail != 0) {
nb_sectors += s->cluster_sectors - tail;
}
if (!is_zero_cluster(bs, sector_num)) {
return -ENOTSUP;
}
if (nb_sectors > s->cluster_sectors) {
/* Technically the request can cover 2 clusters, f.e. 4k write
at s->cluster_sectors - 2k offset. One of these cluster can
be zeroed, one unallocated */
cl_end = sector_num + nb_sectors - s->cluster_sectors;
if (!is_zero_cluster(bs, cl_end)) {
return -ENOTSUP;
}
}
qemu_co_mutex_lock(&s->lock);
/* We can have new write after previous check */
if (!is_zero_cluster_top_locked(bs, sector_num) ||
(cl_end > 0 && !is_zero_cluster_top_locked(bs, cl_end))) {
qemu_co_mutex_unlock(&s->lock);
return -ENOTSUP;
}
} else {
qemu_co_mutex_lock(&s->lock);
}
/* Whatever is left can use real zero clusters */
qemu_co_mutex_lock(&s->lock);
ret = qcow2_zero_clusters(bs, sector_num << BDRV_SECTOR_BITS,
nb_sectors);
ret = qcow2_zero_clusters(bs, sector_num << BDRV_SECTOR_BITS, nb_sectors);
qemu_co_mutex_unlock(&s->lock);
return ret;
@@ -2809,15 +2857,15 @@ static ImageInfoSpecific *qcow2_get_specific_info(BlockDriverState *bs)
*spec_info = (ImageInfoSpecific){
.type = IMAGE_INFO_SPECIFIC_KIND_QCOW2,
.u.qcow2 = g_new(ImageInfoSpecificQCow2, 1),
.u.qcow2.data = g_new(ImageInfoSpecificQCow2, 1),
};
if (s->qcow_version == 2) {
*spec_info->u.qcow2 = (ImageInfoSpecificQCow2){
*spec_info->u.qcow2.data = (ImageInfoSpecificQCow2){
.compat = g_strdup("0.10"),
.refcount_bits = s->refcount_bits,
};
} else if (s->qcow_version == 3) {
*spec_info->u.qcow2 = (ImageInfoSpecificQCow2){
*spec_info->u.qcow2.data = (ImageInfoSpecificQCow2){
.compat = g_strdup("1.1"),
.lazy_refcounts = s->compatible_features &
QCOW2_COMPAT_LAZY_REFCOUNTS,

View File

@@ -16,6 +16,7 @@
#include "trace.h"
#include "qemu/sockets.h" /* for EINPROGRESS on Windows */
#include "qed.h"
#include "qemu/bswap.h"
typedef struct {
GenericCB gencb;

View File

@@ -13,11 +13,14 @@
*/
#include "qemu/osdep.h"
#include "qapi/error.h"
#include "qemu/timer.h"
#include "qemu/bswap.h"
#include "trace.h"
#include "qed.h"
#include "qapi/qmp/qerror.h"
#include "migration/migration.h"
#include "sysemu/block-backend.h"
static const AIOCBInfo qed_aiocb_info = {
.aiocb_size = sizeof(QEDAIOCB),
@@ -345,7 +348,7 @@ static void qed_start_need_check_timer(BDRVQEDState *s)
* migration.
*/
timer_mod(s->need_check_timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
get_ticks_per_sec() * QED_NEED_CHECK_TIMEOUT);
NANOSECONDS_PER_SECOND * QED_NEED_CHECK_TIMEOUT);
}
/* It's okay to call this multiple times or when no timer is started */
@@ -376,18 +379,6 @@ static void bdrv_qed_attach_aio_context(BlockDriverState *bs,
}
}
static void bdrv_qed_drain(BlockDriverState *bs)
{
BDRVQEDState *s = bs->opaque;
/* Cancel timer and start doing I/O that were meant to happen as if it
* fired, that way we get bdrv_drain() taking care of the ongoing requests
* correctly. */
qed_cancel_need_check_timer(s);
qed_plug_allocating_write_reqs(s);
bdrv_aio_flush(s->bs, qed_clear_need_check, s);
}
static int bdrv_qed_open(BlockDriverState *bs, QDict *options, int flags,
Error **errp)
{
@@ -411,11 +402,8 @@ static int bdrv_qed_open(BlockDriverState *bs, QDict *options, int flags,
}
if (s->header.features & ~QED_FEATURE_MASK) {
/* image uses unsupported feature bits */
char buf[64];
snprintf(buf, sizeof(buf), "%" PRIx64,
s->header.features & ~QED_FEATURE_MASK);
error_setg(errp, QERR_UNKNOWN_BLOCK_FORMAT_FEATURE,
bdrv_get_device_or_node_name(bs), "QED", buf);
error_setg(errp, "Unsupported QED features: %" PRIx64,
s->header.features & ~QED_FEATURE_MASK);
return -ENOTSUP;
}
if (!qed_is_cluster_size_valid(s->header.cluster_size)) {
@@ -580,7 +568,7 @@ static int qed_create(const char *filename, uint32_t cluster_size,
size_t l1_size = header.cluster_size * header.table_size;
Error *local_err = NULL;
int ret = 0;
BlockDriverState *bs;
BlockBackend *blk;
ret = bdrv_create_file(filename, opts, &local_err);
if (ret < 0) {
@@ -588,17 +576,17 @@ static int qed_create(const char *filename, uint32_t cluster_size,
return ret;
}
bs = NULL;
ret = bdrv_open(&bs, filename, NULL, NULL,
BDRV_O_RDWR | BDRV_O_CACHE_WB | BDRV_O_PROTOCOL,
&local_err);
if (ret < 0) {
blk = blk_new_open(filename, NULL, NULL,
BDRV_O_RDWR | BDRV_O_PROTOCOL, &local_err);
if (blk == NULL) {
error_propagate(errp, local_err);
return ret;
return -EIO;
}
blk_set_allow_write_beyond_eof(blk, true);
/* File must start empty and grow, check truncate is supported */
ret = bdrv_truncate(bs, 0);
ret = blk_truncate(blk, 0);
if (ret < 0) {
goto out;
}
@@ -614,18 +602,18 @@ static int qed_create(const char *filename, uint32_t cluster_size,
}
qed_header_cpu_to_le(&header, &le_header);
ret = bdrv_pwrite(bs, 0, &le_header, sizeof(le_header));
ret = blk_pwrite(blk, 0, &le_header, sizeof(le_header), 0);
if (ret < 0) {
goto out;
}
ret = bdrv_pwrite(bs, sizeof(le_header), backing_file,
header.backing_filename_size);
ret = blk_pwrite(blk, sizeof(le_header), backing_file,
header.backing_filename_size, 0);
if (ret < 0) {
goto out;
}
l1_table = g_malloc0(l1_size);
ret = bdrv_pwrite(bs, header.l1_table_offset, l1_table, l1_size);
ret = blk_pwrite(blk, header.l1_table_offset, l1_table, l1_size, 0);
if (ret < 0) {
goto out;
}
@@ -633,7 +621,7 @@ static int qed_create(const char *filename, uint32_t cluster_size,
ret = 0; /* success */
out:
g_free(l1_table);
bdrv_unref(bs);
blk_unref(blk);
return ret;
}
@@ -1607,12 +1595,6 @@ static void bdrv_qed_invalidate_cache(BlockDriverState *bs, Error **errp)
bdrv_qed_close(bs);
bdrv_invalidate_cache(bs->file->bs, &local_err);
if (local_err) {
error_propagate(errp, local_err);
return;
}
memset(s, 0, sizeof(BDRVQEDState));
ret = bdrv_qed_open(bs, NULL, bs->open_flags, &local_err);
if (local_err) {
@@ -1692,7 +1674,6 @@ static BlockDriver bdrv_qed = {
.bdrv_check = bdrv_qed_check,
.bdrv_detach_aio_context = bdrv_qed_detach_aio_context,
.bdrv_attach_aio_context = bdrv_qed_attach_aio_context,
.bdrv_drain = bdrv_qed_drain,
};
static void bdrv_qed_init(void)

View File

@@ -16,6 +16,7 @@
#define BLOCK_QED_H
#include "block/block_int.h"
#include "qemu/cutils.h"
/* The layout of a QED file is as follows:
*

View File

@@ -14,6 +14,7 @@
*/
#include "qemu/osdep.h"
#include "qemu/cutils.h"
#include "block/block_int.h"
#include "qapi/qmp/qbool.h"
#include "qapi/qmp/qdict.h"
@@ -67,6 +68,9 @@ typedef struct QuorumVotes {
typedef struct BDRVQuorumState {
BdrvChild **children; /* children BlockDriverStates */
int num_children; /* children count */
unsigned next_child_index; /* the index of the next child that should
* be added
*/
int threshold; /* if less than threshold children reads gave the
* same result a quorum error occurs.
*/
@@ -215,14 +219,16 @@ static QuorumAIOCB *quorum_aio_get(BDRVQuorumState *s,
return acb;
}
static void quorum_report_bad(QuorumAIOCB *acb, char *node_name, int ret)
static void quorum_report_bad(QuorumOpType type, uint64_t sector_num,
int nb_sectors, char *node_name, int ret)
{
const char *msg = NULL;
if (ret < 0) {
msg = strerror(-ret);
}
qapi_event_send_quorum_report_bad(!!msg, msg, node_name,
acb->sector_num, acb->nb_sectors, &error_abort);
qapi_event_send_quorum_report_bad(type, !!msg, msg, node_name,
sector_num, nb_sectors, &error_abort);
}
static void quorum_report_failure(QuorumAIOCB *acb)
@@ -284,6 +290,15 @@ static void quorum_aio_cb(void *opaque, int ret)
BDRVQuorumState *s = acb->common.bs->opaque;
bool rewrite = false;
if (ret == 0) {
acb->success_count++;
} else {
QuorumOpType type;
type = acb->is_read ? QUORUM_OP_TYPE_READ : QUORUM_OP_TYPE_WRITE;
quorum_report_bad(type, acb->sector_num, acb->nb_sectors,
sacb->aiocb->bs->node_name, ret);
}
if (acb->is_read && s->read_pattern == QUORUM_READ_PATTERN_FIFO) {
/* We try to read next child in FIFO order if we fail to read */
if (ret < 0 && (acb->child_iter + 1) < s->num_children) {
@@ -302,11 +317,6 @@ static void quorum_aio_cb(void *opaque, int ret)
sacb->ret = ret;
acb->count++;
if (ret == 0) {
acb->success_count++;
} else {
quorum_report_bad(acb, sacb->aiocb->bs->node_name, ret);
}
assert(acb->count <= s->num_children);
assert(acb->success_count <= s->num_children);
if (acb->count < s->num_children) {
@@ -338,7 +348,9 @@ static void quorum_report_bad_versions(BDRVQuorumState *s,
continue;
}
QLIST_FOREACH(item, &version->items, next) {
quorum_report_bad(acb, s->children[item->index]->bs->node_name, 0);
quorum_report_bad(QUORUM_OP_TYPE_READ, acb->sector_num,
acb->nb_sectors,
s->children[item->index]->bs->node_name, 0);
}
}
}
@@ -648,8 +660,9 @@ static BlockAIOCB *read_quorum_children(QuorumAIOCB *acb)
}
for (i = 0; i < s->num_children; i++) {
bdrv_aio_readv(s->children[i]->bs, acb->sector_num, &acb->qcrs[i].qiov,
acb->nb_sectors, quorum_aio_cb, &acb->qcrs[i]);
acb->qcrs[i].aiocb = bdrv_aio_readv(s->children[i]->bs, acb->sector_num,
&acb->qcrs[i].qiov, acb->nb_sectors,
quorum_aio_cb, &acb->qcrs[i]);
}
return &acb->common;
@@ -664,9 +677,10 @@ static BlockAIOCB *read_fifo_child(QuorumAIOCB *acb)
qemu_iovec_init(&acb->qcrs[acb->child_iter].qiov, acb->qiov->niov);
qemu_iovec_clone(&acb->qcrs[acb->child_iter].qiov, acb->qiov,
acb->qcrs[acb->child_iter].buf);
bdrv_aio_readv(s->children[acb->child_iter]->bs, acb->sector_num,
&acb->qcrs[acb->child_iter].qiov, acb->nb_sectors,
quorum_aio_cb, &acb->qcrs[acb->child_iter]);
acb->qcrs[acb->child_iter].aiocb =
bdrv_aio_readv(s->children[acb->child_iter]->bs, acb->sector_num,
&acb->qcrs[acb->child_iter].qiov, acb->nb_sectors,
quorum_aio_cb, &acb->qcrs[acb->child_iter]);
return &acb->common;
}
@@ -737,21 +751,6 @@ static int64_t quorum_getlength(BlockDriverState *bs)
return result;
}
static void quorum_invalidate_cache(BlockDriverState *bs, Error **errp)
{
BDRVQuorumState *s = bs->opaque;
Error *local_err = NULL;
int i;
for (i = 0; i < s->num_children; i++) {
bdrv_invalidate_cache(s->children[i]->bs, &local_err);
if (local_err) {
error_propagate(errp, local_err);
return;
}
}
}
static coroutine_fn int quorum_co_flush(BlockDriverState *bs)
{
BDRVQuorumState *s = bs->opaque;
@@ -760,19 +759,30 @@ static coroutine_fn int quorum_co_flush(BlockDriverState *bs)
QuorumVoteValue result_value;
int i;
int result = 0;
int success_count = 0;
QLIST_INIT(&error_votes.vote_list);
error_votes.compare = quorum_64bits_compare;
for (i = 0; i < s->num_children; i++) {
result = bdrv_co_flush(s->children[i]->bs);
result_value.l = result;
quorum_count_vote(&error_votes, &result_value, i);
if (result) {
quorum_report_bad(QUORUM_OP_TYPE_FLUSH, 0,
bdrv_nb_sectors(s->children[i]->bs),
s->children[i]->bs->node_name, result);
result_value.l = result;
quorum_count_vote(&error_votes, &result_value, i);
} else {
success_count++;
}
}
winner = quorum_get_vote_winner(&error_votes);
result = winner->value.l;
if (success_count >= s->threshold) {
result = 0;
} else {
winner = quorum_get_vote_winner(&error_votes);
result = winner->value.l;
}
quorum_free_vote_list(&error_votes);
return result;
@@ -877,9 +887,9 @@ static int quorum_open(BlockDriverState *bs, QDict *options, int flags,
ret = -EINVAL;
goto exit;
}
if (s->num_children < 2) {
if (s->num_children < 1) {
error_setg(&local_err,
"Number of provided children must be greater than 1");
"Number of provided children must be 1 or more");
ret = -EINVAL;
goto exit;
}
@@ -943,6 +953,7 @@ static int quorum_open(BlockDriverState *bs, QDict *options, int flags,
opened[i] = true;
}
s->next_child_index = s->num_children;
g_free(opened);
goto exit;
@@ -978,25 +989,70 @@ static void quorum_close(BlockDriverState *bs)
g_free(s->children);
}
static void quorum_detach_aio_context(BlockDriverState *bs)
static void quorum_add_child(BlockDriverState *bs, BlockDriverState *child_bs,
Error **errp)
{
BDRVQuorumState *s = bs->opaque;
int i;
BdrvChild *child;
char indexstr[32];
int ret;
for (i = 0; i < s->num_children; i++) {
bdrv_detach_aio_context(s->children[i]->bs);
assert(s->num_children <= INT_MAX / sizeof(BdrvChild *));
if (s->num_children == INT_MAX / sizeof(BdrvChild *) ||
s->next_child_index == UINT_MAX) {
error_setg(errp, "Too many children");
return;
}
ret = snprintf(indexstr, 32, "children.%u", s->next_child_index);
if (ret < 0 || ret >= 32) {
error_setg(errp, "cannot generate child name");
return;
}
s->next_child_index++;
bdrv_drained_begin(bs);
/* We can safely add the child now */
bdrv_ref(child_bs);
child = bdrv_attach_child(bs, child_bs, indexstr, &child_format);
s->children = g_renew(BdrvChild *, s->children, s->num_children + 1);
s->children[s->num_children++] = child;
bdrv_drained_end(bs);
}
static void quorum_attach_aio_context(BlockDriverState *bs,
AioContext *new_context)
static void quorum_del_child(BlockDriverState *bs, BdrvChild *child,
Error **errp)
{
BDRVQuorumState *s = bs->opaque;
int i;
for (i = 0; i < s->num_children; i++) {
bdrv_attach_aio_context(s->children[i]->bs, new_context);
if (s->children[i] == child) {
break;
}
}
/* we have checked it in bdrv_del_child() */
assert(i < s->num_children);
if (s->num_children <= s->threshold) {
error_setg(errp,
"The number of children cannot be lower than the vote threshold %d",
s->threshold);
return;
}
bdrv_drained_begin(bs);
/* We can safely remove this child now */
memmove(&s->children[i], &s->children[i + 1],
(s->num_children - i - 1) * sizeof(BdrvChild *));
s->children = g_renew(BdrvChild *, s->children, --s->num_children);
bdrv_unref_child(bs, child);
bdrv_drained_end(bs);
}
static void quorum_refresh_filename(BlockDriverState *bs, QDict *options)
@@ -1049,10 +1105,9 @@ static BlockDriver bdrv_quorum = {
.bdrv_aio_readv = quorum_aio_readv,
.bdrv_aio_writev = quorum_aio_writev,
.bdrv_invalidate_cache = quorum_invalidate_cache,
.bdrv_detach_aio_context = quorum_detach_aio_context,
.bdrv_attach_aio_context = quorum_attach_aio_context,
.bdrv_add_child = quorum_add_child,
.bdrv_del_child = quorum_del_child,
.is_filter = true,
.bdrv_recurse_is_first_non_filter = quorum_recurse_is_first_non_filter,

View File

@@ -15,6 +15,8 @@
#ifndef QEMU_RAW_AIO_H
#define QEMU_RAW_AIO_H
#include "qemu/iov.h"
/* AIO request types */
#define QEMU_AIO_READ 0x0001
#define QEMU_AIO_WRITE 0x0002
@@ -33,15 +35,16 @@
/* linux-aio.c - Linux native implementation */
#ifdef CONFIG_LINUX_AIO
void *laio_init(void);
void laio_cleanup(void *s);
BlockAIOCB *laio_submit(BlockDriverState *bs, void *aio_ctx, int fd,
typedef struct LinuxAioState LinuxAioState;
LinuxAioState *laio_init(void);
void laio_cleanup(LinuxAioState *s);
BlockAIOCB *laio_submit(BlockDriverState *bs, LinuxAioState *s, int fd,
int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
BlockCompletionFunc *cb, void *opaque, int type);
void laio_detach_aio_context(void *s, AioContext *old_context);
void laio_attach_aio_context(void *s, AioContext *new_context);
void laio_io_plug(BlockDriverState *bs, void *aio_ctx);
void laio_io_unplug(BlockDriverState *bs, void *aio_ctx, bool unplug);
void laio_detach_aio_context(LinuxAioState *s, AioContext *old_context);
void laio_attach_aio_context(LinuxAioState *s, AioContext *new_context);
void laio_io_plug(BlockDriverState *bs, LinuxAioState *s);
void laio_io_unplug(BlockDriverState *bs, LinuxAioState *s);
#endif
#ifdef _WIN32

View File

@@ -22,7 +22,8 @@
* THE SOFTWARE.
*/
#include "qemu/osdep.h"
#include "qemu-common.h"
#include "qapi/error.h"
#include "qemu/cutils.h"
#include "qemu/error-report.h"
#include "qemu/timer.h"
#include "qemu/log.h"
@@ -44,6 +45,7 @@
#include <IOKit/storage/IOMedia.h>
#include <IOKit/storage/IOCDMedia.h>
//#include <IOKit/storage/IOCDTypes.h>
#include <IOKit/storage/IODVDMedia.h>
#include <CoreFoundation/CoreFoundation.h>
#endif
@@ -137,7 +139,7 @@ typedef struct BDRVRawState {
#ifdef CONFIG_LINUX_AIO
int use_aio;
void *aio_ctx;
LinuxAioState *aio_ctx;
#endif
#ifdef CONFIG_XFS
bool is_xfs:1;
@@ -396,7 +398,7 @@ static void raw_attach_aio_context(BlockDriverState *bs,
}
#ifdef CONFIG_LINUX_AIO
static int raw_set_aio(void **aio_ctx, int *use_aio, int bdrv_flags)
static int raw_set_aio(LinuxAioState **aio_ctx, int *use_aio, int bdrv_flags)
{
int ret = -1;
assert(aio_ctx != NULL);
@@ -515,6 +517,7 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
s->has_discard = true;
s->has_write_zeroes = true;
bs->supported_zero_flags = BDRV_REQ_MAY_UNMAP;
if ((bs->open_flags & BDRV_O_NOCACHE) != 0) {
s->needs_alignment = true;
}
@@ -1343,17 +1346,7 @@ static void raw_aio_unplug(BlockDriverState *bs)
#ifdef CONFIG_LINUX_AIO
BDRVRawState *s = bs->opaque;
if (s->use_aio) {
laio_io_unplug(bs, s->aio_ctx, true);
}
#endif
}
static void raw_aio_flush_io_queue(BlockDriverState *bs)
{
#ifdef CONFIG_LINUX_AIO
BDRVRawState *s = bs->opaque;
if (s->use_aio) {
laio_io_unplug(bs, s->aio_ctx, false);
laio_io_unplug(bs, s->aio_ctx);
}
#endif
}
@@ -1947,7 +1940,6 @@ BlockDriver bdrv_file = {
.bdrv_refresh_limits = raw_refresh_limits,
.bdrv_io_plug = raw_aio_plug,
.bdrv_io_unplug = raw_aio_unplug,
.bdrv_flush_io_queue = raw_aio_flush_io_queue,
.bdrv_truncate = raw_truncate,
.bdrv_getlength = raw_getlength,
@@ -1965,33 +1957,47 @@ BlockDriver bdrv_file = {
/* host device */
#if defined(__APPLE__) && defined(__MACH__)
static kern_return_t FindEjectableCDMedia( io_iterator_t *mediaIterator );
static kern_return_t GetBSDPath(io_iterator_t mediaIterator, char *bsdPath,
CFIndex maxPathSize, int flags);
kern_return_t FindEjectableCDMedia( io_iterator_t *mediaIterator )
static char *FindEjectableOpticalMedia(io_iterator_t *mediaIterator)
{
kern_return_t kernResult;
kern_return_t kernResult = KERN_FAILURE;
mach_port_t masterPort;
CFMutableDictionaryRef classesToMatch;
const char *matching_array[] = {kIODVDMediaClass, kIOCDMediaClass};
char *mediaType = NULL;
kernResult = IOMasterPort( MACH_PORT_NULL, &masterPort );
if ( KERN_SUCCESS != kernResult ) {
printf( "IOMasterPort returned %d\n", kernResult );
}
classesToMatch = IOServiceMatching( kIOCDMediaClass );
if ( classesToMatch == NULL ) {
printf( "IOServiceMatching returned a NULL dictionary.\n" );
} else {
CFDictionarySetValue( classesToMatch, CFSTR( kIOMediaEjectableKey ), kCFBooleanTrue );
}
kernResult = IOServiceGetMatchingServices( masterPort, classesToMatch, mediaIterator );
if ( KERN_SUCCESS != kernResult )
{
printf( "IOServiceGetMatchingServices returned %d\n", kernResult );
}
int index;
for (index = 0; index < ARRAY_SIZE(matching_array); index++) {
classesToMatch = IOServiceMatching(matching_array[index]);
if (classesToMatch == NULL) {
error_report("IOServiceMatching returned NULL for %s",
matching_array[index]);
continue;
}
CFDictionarySetValue(classesToMatch, CFSTR(kIOMediaEjectableKey),
kCFBooleanTrue);
kernResult = IOServiceGetMatchingServices(masterPort, classesToMatch,
mediaIterator);
if (kernResult != KERN_SUCCESS) {
error_report("Note: IOServiceGetMatchingServices returned %d",
kernResult);
continue;
}
return kernResult;
/* If a match was found, leave the loop */
if (*mediaIterator != 0) {
DPRINTF("Matching using %s\n", matching_array[index]);
mediaType = g_strdup(matching_array[index]);
break;
}
}
return mediaType;
}
kern_return_t GetBSDPath(io_iterator_t mediaIterator, char *bsdPath,
@@ -2023,7 +2029,46 @@ kern_return_t GetBSDPath(io_iterator_t mediaIterator, char *bsdPath,
return kernResult;
}
#endif
/* Sets up a real cdrom for use in QEMU */
static bool setup_cdrom(char *bsd_path, Error **errp)
{
int index, num_of_test_partitions = 2, fd;
char test_partition[MAXPATHLEN];
bool partition_found = false;
/* look for a working partition */
for (index = 0; index < num_of_test_partitions; index++) {
snprintf(test_partition, sizeof(test_partition), "%ss%d", bsd_path,
index);
fd = qemu_open(test_partition, O_RDONLY | O_BINARY | O_LARGEFILE);
if (fd >= 0) {
partition_found = true;
qemu_close(fd);
break;
}
}
/* if a working partition on the device was not found */
if (partition_found == false) {
error_setg(errp, "Failed to find a working partition on disc");
} else {
DPRINTF("Using %s as optical disc\n", test_partition);
pstrcpy(bsd_path, MAXPATHLEN, test_partition);
}
return partition_found;
}
/* Prints directions on mounting and unmounting a device */
static void print_unmounting_directions(const char *file_name)
{
error_report("If device %s is mounted on the desktop, unmount"
" it first before using it in QEMU", file_name);
error_report("Command to unmount device: diskutil unmountDisk %s",
file_name);
error_report("Command to mount device: diskutil mountDisk %s", file_name);
}
#endif /* defined(__APPLE__) && defined(__MACH__) */
static int hdev_probe_device(const char *filename)
{
@@ -2114,33 +2159,57 @@ static int hdev_open(BlockDriverState *bs, QDict *options, int flags,
#if defined(__APPLE__) && defined(__MACH__)
const char *filename = qdict_get_str(options, "filename");
char bsd_path[MAXPATHLEN] = "";
bool error_occurred = false;
if (strstart(filename, "/dev/cdrom", NULL)) {
kern_return_t kernResult;
io_iterator_t mediaIterator;
char bsdPath[ MAXPATHLEN ];
int fd;
/* If using a real cdrom */
if (strcmp(filename, "/dev/cdrom") == 0) {
char *mediaType = NULL;
kern_return_t ret_val;
io_iterator_t mediaIterator = 0;
kernResult = FindEjectableCDMedia( &mediaIterator );
kernResult = GetBSDPath(mediaIterator, bsdPath, sizeof(bsdPath),
flags);
if ( bsdPath[ 0 ] != '\0' ) {
strcat(bsdPath,"s0");
/* some CDs don't have a partition 0 */
fd = qemu_open(bsdPath, O_RDONLY | O_BINARY | O_LARGEFILE);
if (fd < 0) {
bsdPath[strlen(bsdPath)-1] = '1';
} else {
qemu_close(fd);
}
filename = bsdPath;
qdict_put(options, "filename", qstring_from_str(filename));
mediaType = FindEjectableOpticalMedia(&mediaIterator);
if (mediaType == NULL) {
error_setg(errp, "Please make sure your CD/DVD is in the optical"
" drive");
error_occurred = true;
goto hdev_open_Mac_error;
}
if ( mediaIterator )
IOObjectRelease( mediaIterator );
ret_val = GetBSDPath(mediaIterator, bsd_path, sizeof(bsd_path), flags);
if (ret_val != KERN_SUCCESS) {
error_setg(errp, "Could not get BSD path for optical drive");
error_occurred = true;
goto hdev_open_Mac_error;
}
/* If a real optical drive was not found */
if (bsd_path[0] == '\0') {
error_setg(errp, "Failed to obtain bsd path for optical drive");
error_occurred = true;
goto hdev_open_Mac_error;
}
/* If using a cdrom disc and finding a partition on the disc failed */
if (strncmp(mediaType, kIOCDMediaClass, 9) == 0 &&
setup_cdrom(bsd_path, errp) == false) {
print_unmounting_directions(bsd_path);
error_occurred = true;
goto hdev_open_Mac_error;
}
qdict_put(options, "filename", qstring_from_str(bsd_path));
hdev_open_Mac_error:
g_free(mediaType);
if (mediaIterator) {
IOObjectRelease(mediaIterator);
}
if (error_occurred) {
return -ENOENT;
}
}
#endif
#endif /* defined(__APPLE__) && defined(__MACH__) */
s->type = FTYPE_FILE;
@@ -2149,6 +2218,15 @@ static int hdev_open(BlockDriverState *bs, QDict *options, int flags,
if (local_err) {
error_propagate(errp, local_err);
}
#if defined(__APPLE__) && defined(__MACH__)
if (*bsd_path) {
filename = bsd_path;
}
/* if a physical device experienced an error while being opened */
if (strncmp(filename, "/dev/", 5) == 0) {
print_unmounting_directions(filename);
}
#endif /* defined(__APPLE__) && defined(__MACH__) */
return ret;
}
@@ -2310,7 +2388,6 @@ static BlockDriver bdrv_host_device = {
.bdrv_refresh_limits = raw_refresh_limits,
.bdrv_io_plug = raw_aio_plug,
.bdrv_io_unplug = raw_aio_unplug,
.bdrv_flush_io_queue = raw_aio_flush_io_queue,
.bdrv_truncate = raw_truncate,
.bdrv_getlength = raw_getlength,
@@ -2440,7 +2517,6 @@ static BlockDriver bdrv_host_cdrom = {
.bdrv_refresh_limits = raw_refresh_limits,
.bdrv_io_plug = raw_aio_plug,
.bdrv_io_unplug = raw_aio_unplug,
.bdrv_flush_io_queue = raw_aio_flush_io_queue,
.bdrv_truncate = raw_truncate,
.bdrv_getlength = raw_getlength,
@@ -2576,7 +2652,6 @@ static BlockDriver bdrv_host_cdrom = {
.bdrv_refresh_limits = raw_refresh_limits,
.bdrv_io_plug = raw_aio_plug,
.bdrv_io_unplug = raw_aio_unplug,
.bdrv_flush_io_queue = raw_aio_flush_io_queue,
.bdrv_truncate = raw_truncate,
.bdrv_getlength = raw_getlength,

View File

@@ -22,7 +22,8 @@
* THE SOFTWARE.
*/
#include "qemu/osdep.h"
#include "qemu-common.h"
#include "qapi/error.h"
#include "qemu/cutils.h"
#include "qemu/timer.h"
#include "block/block_int.h"
#include "qemu/module.h"

View File

@@ -28,6 +28,7 @@
#include "qemu/osdep.h"
#include "block/block_int.h"
#include "qapi/error.h"
#include "qemu/option.h"
static QemuOptsList raw_create_opts = {
@@ -56,8 +57,9 @@ static int coroutine_fn raw_co_readv(BlockDriverState *bs, int64_t sector_num,
return bdrv_co_readv(bs->file->bs, sector_num, nb_sectors, qiov);
}
static int coroutine_fn raw_co_writev(BlockDriverState *bs, int64_t sector_num,
int nb_sectors, QEMUIOVector *qiov)
static int coroutine_fn
raw_co_writev_flags(BlockDriverState *bs, int64_t sector_num, int nb_sectors,
QEMUIOVector *qiov, int flags)
{
void *buf = NULL;
BlockDriver *drv;
@@ -103,7 +105,8 @@ static int coroutine_fn raw_co_writev(BlockDriverState *bs, int64_t sector_num,
}
BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
ret = bdrv_co_writev(bs->file->bs, sector_num, nb_sectors, qiov);
ret = bdrv_co_pwritev(bs->file->bs, sector_num * BDRV_SECTOR_SIZE,
nb_sectors * BDRV_SECTOR_SIZE, qiov, flags);
fail:
if (qiov == &local_qiov) {
@@ -201,6 +204,8 @@ static int raw_open(BlockDriverState *bs, QDict *options, int flags,
Error **errp)
{
bs->sg = bs->file->bs->sg;
bs->supported_write_flags = BDRV_REQ_FUA;
bs->supported_zero_flags = BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP;
if (bs->probed && !bdrv_is_read_only(bs)) {
fprintf(stderr,
@@ -246,7 +251,7 @@ BlockDriver bdrv_raw = {
.bdrv_close = &raw_close,
.bdrv_create = &raw_create,
.bdrv_co_readv = &raw_co_readv,
.bdrv_co_writev = &raw_co_writev,
.bdrv_co_writev_flags = &raw_co_writev_flags,
.bdrv_co_write_zeroes = &raw_co_write_zeroes,
.bdrv_co_discard = &raw_co_discard,
.bdrv_co_get_block_status = &raw_co_get_block_status,

View File

@@ -13,9 +13,11 @@
#include "qemu/osdep.h"
#include "qemu-common.h"
#include "qapi/error.h"
#include "qemu/error-report.h"
#include "block/block_int.h"
#include "crypto/secret.h"
#include "qemu/cutils.h"
#include <rbd/librbd.h>
@@ -228,6 +230,27 @@ static char *qemu_rbd_parse_clientname(const char *conf, char *clientname)
return NULL;
}
static int qemu_rbd_set_auth(rados_t cluster, const char *secretid,
Error **errp)
{
if (secretid == 0) {
return 0;
}
gchar *secret = qcrypto_secret_lookup_as_base64(secretid,
errp);
if (!secret) {
return -1;
}
rados_conf_set(cluster, "key", secret);
g_free(secret);
return 0;
}
static int qemu_rbd_set_conf(rados_t cluster, const char *conf,
bool only_read_conf_file,
Error **errp)
@@ -299,10 +322,13 @@ static int qemu_rbd_create(const char *filename, QemuOpts *opts, Error **errp)
char conf[RBD_MAX_CONF_SIZE];
char clientname_buf[RBD_MAX_CONF_SIZE];
char *clientname;
const char *secretid;
rados_t cluster;
rados_ioctx_t io_ctx;
int ret;
secretid = qemu_opt_get(opts, "password-secret");
if (qemu_rbd_parsename(filename, pool, sizeof(pool),
snap_buf, sizeof(snap_buf),
name, sizeof(name),
@@ -350,6 +376,11 @@ static int qemu_rbd_create(const char *filename, QemuOpts *opts, Error **errp)
return -EIO;
}
if (qemu_rbd_set_auth(cluster, secretid, errp) < 0) {
rados_shutdown(cluster);
return -EIO;
}
if (rados_connect(cluster) < 0) {
error_setg(errp, "error connecting");
rados_shutdown(cluster);
@@ -423,6 +454,11 @@ static QemuOptsList runtime_opts = {
.type = QEMU_OPT_STRING,
.help = "Specification of the rbd image",
},
{
.name = "password-secret",
.type = QEMU_OPT_STRING,
.help = "ID of secret providing the password",
},
{ /* end of list */ }
},
};
@@ -436,6 +472,7 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict *options, int flags,
char conf[RBD_MAX_CONF_SIZE];
char clientname_buf[RBD_MAX_CONF_SIZE];
char *clientname;
const char *secretid;
QemuOpts *opts;
Error *local_err = NULL;
const char *filename;
@@ -450,6 +487,7 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict *options, int flags,
}
filename = qemu_opt_get(opts, "filename");
secretid = qemu_opt_get(opts, "password-secret");
if (qemu_rbd_parsename(filename, pool, sizeof(pool),
snap_buf, sizeof(snap_buf),
@@ -488,6 +526,11 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict *options, int flags,
}
}
if (qemu_rbd_set_auth(s->cluster, secretid, errp) < 0) {
r = -EIO;
goto failed_shutdown;
}
/*
* Fallback to more conservative semantics if setting cache
* options fails. Ignore errors from setting rbd_cache because the
@@ -919,6 +962,11 @@ static QemuOptsList qemu_rbd_create_opts = {
.type = QEMU_OPT_SIZE,
.help = "RBD object size"
},
{
.name = "password-secret",
.type = QEMU_OPT_STRING,
.help = "ID of secret providing the password",
},
{ /* end of list */ }
}
};

View File

@@ -13,12 +13,14 @@
*/
#include "qemu/osdep.h"
#include "qemu-common.h"
#include "qapi/error.h"
#include "qemu/uri.h"
#include "qemu/error-report.h"
#include "qemu/sockets.h"
#include "block/block_int.h"
#include "sysemu/block-backend.h"
#include "qemu/bitops.h"
#include "qemu/cutils.h"
#define SD_PROTO_VER 0x01
@@ -284,15 +286,24 @@ static inline bool is_snapshot(struct SheepdogInode *inode)
return !!inode->snap_ctime;
}
static inline size_t count_data_objs(const struct SheepdogInode *inode)
{
return DIV_ROUND_UP(inode->vdi_size,
(1UL << inode->block_size_shift));
}
#undef DPRINTF
#ifdef DEBUG_SDOG
#define DPRINTF(fmt, args...) \
do { \
fprintf(stdout, "%s %d: " fmt, __func__, __LINE__, ##args); \
} while (0)
#define DEBUG_SDOG_PRINT 1
#else
#define DPRINTF(fmt, args...)
#define DEBUG_SDOG_PRINT 0
#endif
#define DPRINTF(fmt, args...) \
do { \
if (DEBUG_SDOG_PRINT) { \
fprintf(stderr, "%s %d: " fmt, __func__, __LINE__, ##args); \
} \
} while (0)
typedef struct SheepdogAIOCB SheepdogAIOCB;
@@ -609,14 +620,13 @@ static coroutine_fn int send_co_req(int sockfd, SheepdogReq *hdr, void *data,
ret = qemu_co_send(sockfd, hdr, sizeof(*hdr));
if (ret != sizeof(*hdr)) {
error_report("failed to send a req, %s", strerror(errno));
ret = -socket_error();
return ret;
return -errno;
}
ret = qemu_co_send(sockfd, data, *wlen);
if (ret != *wlen) {
ret = -socket_error();
error_report("failed to send a req, %s", strerror(errno));
return -errno;
}
return ret;
@@ -1631,7 +1641,7 @@ static int do_sd_create(BDRVSheepdogState *s, uint32_t *vdi_id, int snapshot,
static int sd_prealloc(const char *filename, Error **errp)
{
BlockDriverState *bs = NULL;
BlockBackend *blk = NULL;
BDRVSheepdogState *base = NULL;
unsigned long buf_size;
uint32_t idx, max_idx;
@@ -1640,19 +1650,22 @@ static int sd_prealloc(const char *filename, Error **errp)
void *buf = NULL;
int ret;
ret = bdrv_open(&bs, filename, NULL, NULL, BDRV_O_RDWR | BDRV_O_PROTOCOL,
errp);
if (ret < 0) {
blk = blk_new_open(filename, NULL, NULL,
BDRV_O_RDWR | BDRV_O_PROTOCOL, errp);
if (blk == NULL) {
ret = -EIO;
goto out_with_err_set;
}
vdi_size = bdrv_getlength(bs);
blk_set_allow_write_beyond_eof(blk, true);
vdi_size = blk_getlength(blk);
if (vdi_size < 0) {
ret = vdi_size;
goto out;
}
base = bs->opaque;
base = blk_bs(blk)->opaque;
object_size = (UINT32_C(1) << base->inode.block_size_shift);
buf_size = MIN(object_size, SD_DATA_OBJ_SIZE);
buf = g_malloc0(buf_size);
@@ -1664,23 +1677,24 @@ static int sd_prealloc(const char *filename, Error **errp)
* The created image can be a cloned image, so we need to read
* a data from the source image.
*/
ret = bdrv_pread(bs, idx * buf_size, buf, buf_size);
ret = blk_pread(blk, idx * buf_size, buf, buf_size);
if (ret < 0) {
goto out;
}
ret = bdrv_pwrite(bs, idx * buf_size, buf, buf_size);
ret = blk_pwrite(blk, idx * buf_size, buf, buf_size, 0);
if (ret < 0) {
goto out;
}
}
ret = 0;
out:
if (ret < 0) {
error_setg_errno(errp, -ret, "Can't pre-allocate");
}
out_with_err_set:
if (bs) {
bdrv_unref(bs);
if (blk) {
blk_unref(blk);
}
g_free(buf);
@@ -1820,7 +1834,7 @@ static int sd_create(const char *filename, QemuOpts *opts,
}
if (backing_file) {
BlockDriverState *bs;
BlockBackend *blk;
BDRVSheepdogState *base;
BlockDriver *drv;
@@ -1832,22 +1846,23 @@ static int sd_create(const char *filename, QemuOpts *opts,
goto out;
}
bs = NULL;
ret = bdrv_open(&bs, backing_file, NULL, NULL, BDRV_O_PROTOCOL, errp);
if (ret < 0) {
blk = blk_new_open(backing_file, NULL, NULL,
BDRV_O_PROTOCOL, errp);
if (blk == NULL) {
ret = -EIO;
goto out;
}
base = bs->opaque;
base = blk_bs(blk)->opaque;
if (!is_snapshot(&base->inode)) {
error_setg(errp, "cannot clone from a non snapshot vdi");
bdrv_unref(bs);
blk_unref(blk);
ret = -EINVAL;
goto out;
}
s->inode.vdi_id = base->inode.vdi_id;
bdrv_unref(bs);
blk_unref(blk);
}
s->aio_context = qemu_get_aio_context();
@@ -2478,13 +2493,131 @@ out:
return ret;
}
#define NR_BATCHED_DISCARD 128
static bool remove_objects(BDRVSheepdogState *s)
{
int fd, i = 0, nr_objs = 0;
Error *local_err = NULL;
int ret = 0;
bool result = true;
SheepdogInode *inode = &s->inode;
fd = connect_to_sdog(s, &local_err);
if (fd < 0) {
error_report_err(local_err);
return false;
}
nr_objs = count_data_objs(inode);
while (i < nr_objs) {
int start_idx, nr_filled_idx;
while (i < nr_objs && !inode->data_vdi_id[i]) {
i++;
}
start_idx = i;
nr_filled_idx = 0;
while (i < nr_objs && nr_filled_idx < NR_BATCHED_DISCARD) {
if (inode->data_vdi_id[i]) {
inode->data_vdi_id[i] = 0;
nr_filled_idx++;
}
i++;
}
ret = write_object(fd, s->aio_context,
(char *)&inode->data_vdi_id[start_idx],
vid_to_vdi_oid(s->inode.vdi_id), inode->nr_copies,
(i - start_idx) * sizeof(uint32_t),
offsetof(struct SheepdogInode,
data_vdi_id[start_idx]),
false, s->cache_flags);
if (ret < 0) {
error_report("failed to discard snapshot inode.");
result = false;
goto out;
}
}
out:
closesocket(fd);
return result;
}
static int sd_snapshot_delete(BlockDriverState *bs,
const char *snapshot_id,
const char *name,
Error **errp)
{
/* FIXME: Delete specified snapshot id. */
return 0;
unsigned long snap_id = 0;
char snap_tag[SD_MAX_VDI_TAG_LEN];
Error *local_err = NULL;
int fd, ret;
char buf[SD_MAX_VDI_LEN + SD_MAX_VDI_TAG_LEN];
BDRVSheepdogState *s = bs->opaque;
unsigned int wlen = SD_MAX_VDI_LEN + SD_MAX_VDI_TAG_LEN, rlen = 0;
uint32_t vid;
SheepdogVdiReq hdr = {
.opcode = SD_OP_DEL_VDI,
.data_length = wlen,
.flags = SD_FLAG_CMD_WRITE,
};
SheepdogVdiRsp *rsp = (SheepdogVdiRsp *)&hdr;
if (!remove_objects(s)) {
return -1;
}
memset(buf, 0, sizeof(buf));
memset(snap_tag, 0, sizeof(snap_tag));
pstrcpy(buf, SD_MAX_VDI_LEN, s->name);
ret = qemu_strtoul(snapshot_id, NULL, 10, &snap_id);
if (ret || snap_id > UINT32_MAX) {
error_setg(errp, "Invalid snapshot ID: %s",
snapshot_id ? snapshot_id : "<null>");
return -EINVAL;
}
if (snap_id) {
hdr.snapid = (uint32_t) snap_id;
} else {
pstrcpy(snap_tag, sizeof(snap_tag), snapshot_id);
pstrcpy(buf + SD_MAX_VDI_LEN, SD_MAX_VDI_TAG_LEN, snap_tag);
}
ret = find_vdi_name(s, s->name, snap_id, snap_tag, &vid, true,
&local_err);
if (ret) {
return ret;
}
fd = connect_to_sdog(s, &local_err);
if (fd < 0) {
error_report_err(local_err);
return -1;
}
ret = do_req(fd, s->aio_context, (SheepdogReq *)&hdr,
buf, &wlen, &rlen);
closesocket(fd);
if (ret) {
return ret;
}
switch (rsp->result) {
case SD_RES_NO_VDI:
error_report("%s was already deleted", s->name);
case SD_RES_SUCCESS:
break;
default:
error_report("%s, %s", sd_strerror(rsp->result), s->name);
return -1;
}
return ret;
}
static int sd_snapshot_list(BlockDriverState *bs, QEMUSnapshotInfo **psn_tab)

View File

@@ -25,6 +25,7 @@
#include "qemu/osdep.h"
#include "block/snapshot.h"
#include "block/block_int.h"
#include "qapi/error.h"
#include "qapi/qmp/qerror.h"
QemuOptsList internal_snapshot_opts = {
@@ -372,9 +373,10 @@ int bdrv_snapshot_load_tmp_by_id_or_name(BlockDriverState *bs,
bool bdrv_all_can_snapshot(BlockDriverState **first_bad_bs)
{
bool ok = true;
BlockDriverState *bs = NULL;
BlockDriverState *bs;
BdrvNextIterator it;
while (ok && (bs = bdrv_next(bs))) {
for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
AioContext *ctx = bdrv_get_aio_context(bs);
aio_context_acquire(ctx);
@@ -382,8 +384,12 @@ bool bdrv_all_can_snapshot(BlockDriverState **first_bad_bs)
ok = bdrv_can_snapshot(bs);
}
aio_context_release(ctx);
if (!ok) {
goto fail;
}
}
fail:
*first_bad_bs = bs;
return ok;
}
@@ -392,20 +398,28 @@ int bdrv_all_delete_snapshot(const char *name, BlockDriverState **first_bad_bs,
Error **err)
{
int ret = 0;
BlockDriverState *bs = NULL;
BlockDriverState *bs;
BdrvNextIterator it;
QEMUSnapshotInfo sn1, *snapshot = &sn1;
while (ret == 0 && (bs = bdrv_next(bs))) {
for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
AioContext *ctx = bdrv_get_aio_context(bs);
aio_context_acquire(ctx);
if (bdrv_can_snapshot(bs) &&
bdrv_snapshot_find(bs, snapshot, name) >= 0) {
ret = bdrv_snapshot_delete_by_id_or_name(bs, name, err);
if (ret < 0) {
goto fail;
}
}
aio_context_release(ctx);
if (ret < 0) {
goto fail;
}
}
fail:
*first_bad_bs = bs;
return ret;
}
@@ -414,9 +428,10 @@ int bdrv_all_delete_snapshot(const char *name, BlockDriverState **first_bad_bs,
int bdrv_all_goto_snapshot(const char *name, BlockDriverState **first_bad_bs)
{
int err = 0;
BlockDriverState *bs = NULL;
BlockDriverState *bs;
BdrvNextIterator it;
while (err == 0 && (bs = bdrv_next(bs))) {
for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
AioContext *ctx = bdrv_get_aio_context(bs);
aio_context_acquire(ctx);
@@ -424,8 +439,12 @@ int bdrv_all_goto_snapshot(const char *name, BlockDriverState **first_bad_bs)
err = bdrv_snapshot_goto(bs, name);
}
aio_context_release(ctx);
if (err < 0) {
goto fail;
}
}
fail:
*first_bad_bs = bs;
return err;
}
@@ -434,9 +453,10 @@ int bdrv_all_find_snapshot(const char *name, BlockDriverState **first_bad_bs)
{
QEMUSnapshotInfo sn;
int err = 0;
BlockDriverState *bs = NULL;
BlockDriverState *bs;
BdrvNextIterator it;
while (err == 0 && (bs = bdrv_next(bs))) {
for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
AioContext *ctx = bdrv_get_aio_context(bs);
aio_context_acquire(ctx);
@@ -444,8 +464,12 @@ int bdrv_all_find_snapshot(const char *name, BlockDriverState **first_bad_bs)
err = bdrv_snapshot_find(bs, &sn, name);
}
aio_context_release(ctx);
if (err < 0) {
goto fail;
}
}
fail:
*first_bad_bs = bs;
return err;
}
@@ -456,9 +480,10 @@ int bdrv_all_create_snapshot(QEMUSnapshotInfo *sn,
BlockDriverState **first_bad_bs)
{
int err = 0;
BlockDriverState *bs = NULL;
BlockDriverState *bs;
BdrvNextIterator it;
while (err == 0 && (bs = bdrv_next(bs))) {
for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
AioContext *ctx = bdrv_get_aio_context(bs);
aio_context_acquire(ctx);
@@ -470,23 +495,32 @@ int bdrv_all_create_snapshot(QEMUSnapshotInfo *sn,
err = bdrv_snapshot_create(bs, sn);
}
aio_context_release(ctx);
if (err < 0) {
goto fail;
}
}
fail:
*first_bad_bs = bs;
return err;
}
BlockDriverState *bdrv_all_find_vmstate_bs(void)
{
bool not_found = true;
BlockDriverState *bs = NULL;
BlockDriverState *bs;
BdrvNextIterator it;
while (not_found && (bs = bdrv_next(bs))) {
for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
AioContext *ctx = bdrv_get_aio_context(bs);
bool found;
aio_context_acquire(ctx);
not_found = !bdrv_can_snapshot(bs);
found = bdrv_can_snapshot(bs);
aio_context_release(ctx);
if (found) {
break;
}
}
return bs;
}

View File

@@ -28,6 +28,7 @@
#include <libssh2_sftp.h>
#include "block/block_int.h"
#include "qapi/error.h"
#include "qemu/error-report.h"
#include "qemu/sockets.h"
#include "qemu/uri.h"

View File

@@ -15,6 +15,7 @@
#include "trace.h"
#include "block/block_int.h"
#include "block/blockjob.h"
#include "qapi/error.h"
#include "qapi/qmp/qerror.h"
#include "qemu/ratelimit.h"
#include "sysemu/block-backend.h"
@@ -38,7 +39,7 @@ typedef struct StreamBlockJob {
char *backing_file_str;
} StreamBlockJob;
static int coroutine_fn stream_populate(BlockDriverState *bs,
static int coroutine_fn stream_populate(BlockBackend *blk,
int64_t sector_num, int nb_sectors,
void *buf)
{
@@ -51,7 +52,8 @@ static int coroutine_fn stream_populate(BlockDriverState *bs,
qemu_iovec_init_external(&qiov, &iov, 1);
/* Copy-on-read the unallocated clusters */
return bdrv_co_copy_on_readv(bs, sector_num, nb_sectors, &qiov);
return blk_co_preadv(blk, sector_num * BDRV_SECTOR_SIZE, qiov.size, &qiov,
BDRV_REQ_COPY_ON_READ);
}
typedef struct {
@@ -63,6 +65,7 @@ static void stream_complete(BlockJob *job, void *opaque)
{
StreamBlockJob *s = container_of(job, StreamBlockJob, common);
StreamCompleteData *data = opaque;
BlockDriverState *bs = blk_bs(job->blk);
BlockDriverState *base = s->base;
if (!block_job_is_cancelled(&s->common) && data->reached_end &&
@@ -74,8 +77,8 @@ static void stream_complete(BlockJob *job, void *opaque)
base_fmt = base->drv->format_name;
}
}
data->ret = bdrv_change_backing_file(job->bs, base_id, base_fmt);
bdrv_set_backing_hd(job->bs, base);
data->ret = bdrv_change_backing_file(bs, base_id, base_fmt);
bdrv_set_backing_hd(bs, base);
}
g_free(s->backing_file_str);
@@ -87,23 +90,24 @@ static void coroutine_fn stream_run(void *opaque)
{
StreamBlockJob *s = opaque;
StreamCompleteData *data;
BlockDriverState *bs = s->common.bs;
BlockBackend *blk = s->common.blk;
BlockDriverState *bs = blk_bs(blk);
BlockDriverState *base = s->base;
int64_t sector_num, end;
int64_t sector_num = 0;
int64_t end = -1;
int error = 0;
int ret = 0;
int n = 0;
void *buf;
if (!bs->backing) {
block_job_completed(&s->common, 0);
return;
goto out;
}
s->common.len = bdrv_getlength(bs);
if (s->common.len < 0) {
block_job_completed(&s->common, s->common.len);
return;
ret = s->common.len;
goto out;
}
end = s->common.len >> BDRV_SECTOR_BITS;
@@ -158,12 +162,11 @@ wait:
goto wait;
}
}
ret = stream_populate(bs, sector_num, n, buf);
ret = stream_populate(blk, sector_num, n, buf);
}
if (ret < 0) {
BlockErrorAction action =
block_job_error_action(&s->common, s->common.bs, s->on_error,
true, -ret);
block_job_error_action(&s->common, s->on_error, true, -ret);
if (action == BLOCK_ERROR_ACTION_STOP) {
n = 0;
continue;
@@ -190,6 +193,7 @@ wait:
qemu_vfree(buf);
out:
/* Modify backing chain and close BDSes in main loop */
data = g_malloc(sizeof(*data));
data->ret = ret;
@@ -222,13 +226,6 @@ void stream_start(BlockDriverState *bs, BlockDriverState *base,
{
StreamBlockJob *s;
if ((on_error == BLOCKDEV_ON_ERROR_STOP ||
on_error == BLOCKDEV_ON_ERROR_ENOSPC) &&
(!bs->blk || !blk_iostatus_is_enabled(bs->blk))) {
error_setg(errp, QERR_INVALID_PARAMETER, "on-error");
return;
}
s = block_job_create(&stream_job_driver, bs, speed, cb, opaque, errp);
if (!s) {
return;

View File

@@ -23,13 +23,14 @@
*/
#include "qemu/osdep.h"
#include "sysemu/block-backend.h"
#include "block/throttle-groups.h"
#include "qemu/queue.h"
#include "qemu/thread.h"
#include "sysemu/qtest.h"
/* The ThrottleGroup structure (with its ThrottleState) is shared
* among different BlockDriverState and it's independent from
* among different BlockBackends and it's independent from
* AioContext, so in order to use it from different threads it needs
* its own locking.
*
@@ -39,26 +40,26 @@
* The whole ThrottleGroup structure is private and invisible to
* outside users, that only use it through its ThrottleState.
*
* In addition to the ThrottleGroup structure, BlockDriverState has
* In addition to the ThrottleGroup structure, BlockBackendPublic has
* fields that need to be accessed by other members of the group and
* therefore also need to be protected by this lock. Once a BDS is
* registered in a group those fields can be accessed by other threads
* any time.
* therefore also need to be protected by this lock. Once a
* BlockBackend is registered in a group those fields can be accessed
* by other threads any time.
*
* Again, all this is handled internally and is mostly transparent to
* the outside. The 'throttle_timers' field however has an additional
* constraint because it may be temporarily invalid (see for example
* bdrv_set_aio_context()). Therefore in this file a thread will
* access some other BDS's timers only after verifying that that BDS
* has throttled requests in the queue.
* access some other BlockBackend's timers only after verifying that
* that BlockBackend has throttled requests in the queue.
*/
typedef struct ThrottleGroup {
char *name; /* This is constant during the lifetime of the group */
QemuMutex lock; /* This lock protects the following four fields */
ThrottleState ts;
QLIST_HEAD(, BlockDriverState) head;
BlockDriverState *tokens[2];
QLIST_HEAD(, BlockBackendPublic) head;
BlockBackend *tokens[2];
bool any_timer_armed[2];
/* These two are protected by the global throttle_groups_lock */
@@ -132,93 +133,98 @@ void throttle_group_unref(ThrottleState *ts)
qemu_mutex_unlock(&throttle_groups_lock);
}
/* Get the name from a BlockDriverState's ThrottleGroup. The name (and
* the pointer) is guaranteed to remain constant during the lifetime
* of the group.
/* Get the name from a BlockBackend's ThrottleGroup. The name (and the pointer)
* is guaranteed to remain constant during the lifetime of the group.
*
* @bs: a BlockDriverState that is member of a throttling group
* @blk: a BlockBackend that is member of a throttling group
* @ret: the name of the group.
*/
const char *throttle_group_get_name(BlockDriverState *bs)
const char *throttle_group_get_name(BlockBackend *blk)
{
ThrottleGroup *tg = container_of(bs->throttle_state, ThrottleGroup, ts);
BlockBackendPublic *blkp = blk_get_public(blk);
ThrottleGroup *tg = container_of(blkp->throttle_state, ThrottleGroup, ts);
return tg->name;
}
/* Return the next BlockDriverState in the round-robin sequence,
* simulating a circular list.
/* Return the next BlockBackend in the round-robin sequence, simulating a
* circular list.
*
* This assumes that tg->lock is held.
*
* @bs: the current BlockDriverState
* @ret: the next BlockDriverState in the sequence
* @blk: the current BlockBackend
* @ret: the next BlockBackend in the sequence
*/
static BlockDriverState *throttle_group_next_bs(BlockDriverState *bs)
static BlockBackend *throttle_group_next_blk(BlockBackend *blk)
{
ThrottleState *ts = bs->throttle_state;
BlockBackendPublic *blkp = blk_get_public(blk);
ThrottleState *ts = blkp->throttle_state;
ThrottleGroup *tg = container_of(ts, ThrottleGroup, ts);
BlockDriverState *next = QLIST_NEXT(bs, round_robin);
BlockBackendPublic *next = QLIST_NEXT(blkp, round_robin);
if (!next) {
return QLIST_FIRST(&tg->head);
next = QLIST_FIRST(&tg->head);
}
return next;
return blk_by_public(next);
}
/* Return the next BlockDriverState in the round-robin sequence with
* pending I/O requests.
/* Return the next BlockBackend in the round-robin sequence with pending I/O
* requests.
*
* This assumes that tg->lock is held.
*
* @bs: the current BlockDriverState
* @blk: the current BlockBackend
* @is_write: the type of operation (read/write)
* @ret: the next BlockDriverState with pending requests, or bs
* if there is none.
* @ret: the next BlockBackend with pending requests, or blk if there is
* none.
*/
static BlockDriverState *next_throttle_token(BlockDriverState *bs,
bool is_write)
static BlockBackend *next_throttle_token(BlockBackend *blk, bool is_write)
{
ThrottleGroup *tg = container_of(bs->throttle_state, ThrottleGroup, ts);
BlockDriverState *token, *start;
BlockBackendPublic *blkp = blk_get_public(blk);
ThrottleGroup *tg = container_of(blkp->throttle_state, ThrottleGroup, ts);
BlockBackend *token, *start;
start = token = tg->tokens[is_write];
/* get next bs round in round robin style */
token = throttle_group_next_bs(token);
while (token != start && !token->pending_reqs[is_write]) {
token = throttle_group_next_bs(token);
token = throttle_group_next_blk(token);
while (token != start && !blkp->pending_reqs[is_write]) {
token = throttle_group_next_blk(token);
}
/* If no IO are queued for scheduling on the next round robin token
* then decide the token is the current bs because chances are
* the current bs get the current request queued.
*/
if (token == start && !token->pending_reqs[is_write]) {
token = bs;
if (token == start && !blkp->pending_reqs[is_write]) {
token = blk;
}
return token;
}
/* Check if the next I/O request for a BlockDriverState needs to be
* throttled or not. If there's no timer set in this group, set one
* and update the token accordingly.
/* Check if the next I/O request for a BlockBackend needs to be throttled or
* not. If there's no timer set in this group, set one and update the token
* accordingly.
*
* This assumes that tg->lock is held.
*
* @bs: the current BlockDriverState
* @blk: the current BlockBackend
* @is_write: the type of operation (read/write)
* @ret: whether the I/O request needs to be throttled or not
*/
static bool throttle_group_schedule_timer(BlockDriverState *bs,
bool is_write)
static bool throttle_group_schedule_timer(BlockBackend *blk, bool is_write)
{
ThrottleState *ts = bs->throttle_state;
ThrottleTimers *tt = &bs->throttle_timers;
BlockBackendPublic *blkp = blk_get_public(blk);
ThrottleState *ts = blkp->throttle_state;
ThrottleTimers *tt = &blkp->throttle_timers;
ThrottleGroup *tg = container_of(ts, ThrottleGroup, ts);
bool must_wait;
if (blkp->io_limits_disabled) {
return false;
}
/* Check if any of the timers in this group is already armed */
if (tg->any_timer_armed[is_write]) {
return true;
@@ -226,9 +232,9 @@ static bool throttle_group_schedule_timer(BlockDriverState *bs,
must_wait = throttle_schedule_timer(ts, tt, is_write);
/* If a timer just got armed, set bs as the current token */
/* If a timer just got armed, set blk as the current token */
if (must_wait) {
tg->tokens[is_write] = bs;
tg->tokens[is_write] = blk;
tg->any_timer_armed[is_write] = true;
}
@@ -239,18 +245,19 @@ static bool throttle_group_schedule_timer(BlockDriverState *bs,
*
* This assumes that tg->lock is held.
*
* @bs: the current BlockDriverState
* @blk: the current BlockBackend
* @is_write: the type of operation (read/write)
*/
static void schedule_next_request(BlockDriverState *bs, bool is_write)
static void schedule_next_request(BlockBackend *blk, bool is_write)
{
ThrottleGroup *tg = container_of(bs->throttle_state, ThrottleGroup, ts);
BlockBackendPublic *blkp = blk_get_public(blk);
ThrottleGroup *tg = container_of(blkp->throttle_state, ThrottleGroup, ts);
bool must_wait;
BlockDriverState *token;
BlockBackend *token;
/* Check if there's any pending request to schedule next */
token = next_throttle_token(bs, is_write);
if (!token->pending_reqs[is_write]) {
token = next_throttle_token(blk, is_write);
if (!blkp->pending_reqs[is_write]) {
return;
}
@@ -259,12 +266,12 @@ static void schedule_next_request(BlockDriverState *bs, bool is_write)
/* If it doesn't have to wait, queue it for immediate execution */
if (!must_wait) {
/* Give preference to requests from the current bs */
/* Give preference to requests from the current blk */
if (qemu_in_coroutine() &&
qemu_co_queue_next(&bs->throttled_reqs[is_write])) {
token = bs;
qemu_co_queue_next(&blkp->throttled_reqs[is_write])) {
token = blk;
} else {
ThrottleTimers *tt = &token->throttle_timers;
ThrottleTimers *tt = &blkp->throttle_timers;
int64_t now = qemu_clock_get_ns(tt->clock_type);
timer_mod(tt->timers[is_write], now + 1);
tg->any_timer_armed[is_write] = true;
@@ -277,53 +284,67 @@ static void schedule_next_request(BlockDriverState *bs, bool is_write)
* if necessary, and schedule the next request using a round robin
* algorithm.
*
* @bs: the current BlockDriverState
* @blk: the current BlockBackend
* @bytes: the number of bytes for this I/O
* @is_write: the type of operation (read/write)
*/
void coroutine_fn throttle_group_co_io_limits_intercept(BlockDriverState *bs,
void coroutine_fn throttle_group_co_io_limits_intercept(BlockBackend *blk,
unsigned int bytes,
bool is_write)
{
bool must_wait;
BlockDriverState *token;
BlockBackend *token;
ThrottleGroup *tg = container_of(bs->throttle_state, ThrottleGroup, ts);
BlockBackendPublic *blkp = blk_get_public(blk);
ThrottleGroup *tg = container_of(blkp->throttle_state, ThrottleGroup, ts);
qemu_mutex_lock(&tg->lock);
/* First we check if this I/O has to be throttled. */
token = next_throttle_token(bs, is_write);
token = next_throttle_token(blk, is_write);
must_wait = throttle_group_schedule_timer(token, is_write);
/* Wait if there's a timer set or queued requests of this type */
if (must_wait || bs->pending_reqs[is_write]) {
bs->pending_reqs[is_write]++;
if (must_wait || blkp->pending_reqs[is_write]) {
blkp->pending_reqs[is_write]++;
qemu_mutex_unlock(&tg->lock);
qemu_co_queue_wait(&bs->throttled_reqs[is_write]);
qemu_co_queue_wait(&blkp->throttled_reqs[is_write]);
qemu_mutex_lock(&tg->lock);
bs->pending_reqs[is_write]--;
blkp->pending_reqs[is_write]--;
}
/* The I/O will be executed, so do the accounting */
throttle_account(bs->throttle_state, is_write, bytes);
throttle_account(blkp->throttle_state, is_write, bytes);
/* Schedule the next request */
schedule_next_request(bs, is_write);
schedule_next_request(blk, is_write);
qemu_mutex_unlock(&tg->lock);
}
void throttle_group_restart_blk(BlockBackend *blk)
{
BlockBackendPublic *blkp = blk_get_public(blk);
int i;
for (i = 0; i < 2; i++) {
while (qemu_co_enter_next(&blkp->throttled_reqs[i])) {
;
}
}
}
/* Update the throttle configuration for a particular group. Similar
* to throttle_config(), but guarantees atomicity within the
* throttling group.
*
* @bs: a BlockDriverState that is member of the group
* @blk: a BlockBackend that is a member of the group
* @cfg: the configuration to set
*/
void throttle_group_config(BlockDriverState *bs, ThrottleConfig *cfg)
void throttle_group_config(BlockBackend *blk, ThrottleConfig *cfg)
{
ThrottleTimers *tt = &bs->throttle_timers;
ThrottleState *ts = bs->throttle_state;
BlockBackendPublic *blkp = blk_get_public(blk);
ThrottleTimers *tt = &blkp->throttle_timers;
ThrottleState *ts = blkp->throttle_state;
ThrottleGroup *tg = container_of(ts, ThrottleGroup, ts);
qemu_mutex_lock(&tg->lock);
/* throttle_config() cancels the timers */
@@ -335,18 +356,22 @@ void throttle_group_config(BlockDriverState *bs, ThrottleConfig *cfg)
}
throttle_config(ts, tt, cfg);
qemu_mutex_unlock(&tg->lock);
qemu_co_enter_next(&blkp->throttled_reqs[0]);
qemu_co_enter_next(&blkp->throttled_reqs[1]);
}
/* Get the throttle configuration from a particular group. Similar to
* throttle_get_config(), but guarantees atomicity within the
* throttling group.
*
* @bs: a BlockDriverState that is member of the group
* @blk: a BlockBackend that is a member of the group
* @cfg: the configuration will be written here
*/
void throttle_group_get_config(BlockDriverState *bs, ThrottleConfig *cfg)
void throttle_group_get_config(BlockBackend *blk, ThrottleConfig *cfg)
{
ThrottleState *ts = bs->throttle_state;
BlockBackendPublic *blkp = blk_get_public(blk);
ThrottleState *ts = blkp->throttle_state;
ThrottleGroup *tg = container_of(ts, ThrottleGroup, ts);
qemu_mutex_lock(&tg->lock);
throttle_get_config(ts, cfg);
@@ -356,12 +381,13 @@ void throttle_group_get_config(BlockDriverState *bs, ThrottleConfig *cfg)
/* ThrottleTimers callback. This wakes up a request that was waiting
* because it had been throttled.
*
* @bs: the BlockDriverState whose request had been throttled
* @blk: the BlockBackend whose request had been throttled
* @is_write: the type of operation (read/write)
*/
static void timer_cb(BlockDriverState *bs, bool is_write)
static void timer_cb(BlockBackend *blk, bool is_write)
{
ThrottleState *ts = bs->throttle_state;
BlockBackendPublic *blkp = blk_get_public(blk);
ThrottleState *ts = blkp->throttle_state;
ThrottleGroup *tg = container_of(ts, ThrottleGroup, ts);
bool empty_queue;
@@ -371,13 +397,13 @@ static void timer_cb(BlockDriverState *bs, bool is_write)
qemu_mutex_unlock(&tg->lock);
/* Run the request that was waiting for this timer */
empty_queue = !qemu_co_enter_next(&bs->throttled_reqs[is_write]);
empty_queue = !qemu_co_enter_next(&blkp->throttled_reqs[is_write]);
/* If the request queue was empty then we have to take care of
* scheduling the next one */
if (empty_queue) {
qemu_mutex_lock(&tg->lock);
schedule_next_request(bs, is_write);
schedule_next_request(blk, is_write);
qemu_mutex_unlock(&tg->lock);
}
}
@@ -392,17 +418,17 @@ static void write_timer_cb(void *opaque)
timer_cb(opaque, true);
}
/* Register a BlockDriverState in the throttling group, also
* initializing its timers and updating its throttle_state pointer to
* point to it. If a throttling group with that name does not exist
* yet, it will be created.
/* Register a BlockBackend in the throttling group, also initializing its
* timers and updating its throttle_state pointer to point to it. If a
* throttling group with that name does not exist yet, it will be created.
*
* @bs: the BlockDriverState to insert
* @blk: the BlockBackend to insert
* @groupname: the name of the group
*/
void throttle_group_register_bs(BlockDriverState *bs, const char *groupname)
void throttle_group_register_blk(BlockBackend *blk, const char *groupname)
{
int i;
BlockBackendPublic *blkp = blk_get_public(blk);
ThrottleState *ts = throttle_group_incref(groupname);
ThrottleGroup *tg = container_of(ts, ThrottleGroup, ts);
int clock_type = QEMU_CLOCK_REALTIME;
@@ -412,67 +438,67 @@ void throttle_group_register_bs(BlockDriverState *bs, const char *groupname)
clock_type = QEMU_CLOCK_VIRTUAL;
}
bs->throttle_state = ts;
blkp->throttle_state = ts;
qemu_mutex_lock(&tg->lock);
/* If the ThrottleGroup is new set this BlockDriverState as the token */
/* If the ThrottleGroup is new set this BlockBackend as the token */
for (i = 0; i < 2; i++) {
if (!tg->tokens[i]) {
tg->tokens[i] = bs;
tg->tokens[i] = blk;
}
}
QLIST_INSERT_HEAD(&tg->head, bs, round_robin);
QLIST_INSERT_HEAD(&tg->head, blkp, round_robin);
throttle_timers_init(&bs->throttle_timers,
bdrv_get_aio_context(bs),
throttle_timers_init(&blkp->throttle_timers,
blk_get_aio_context(blk),
clock_type,
read_timer_cb,
write_timer_cb,
bs);
blk);
qemu_mutex_unlock(&tg->lock);
}
/* Unregister a BlockDriverState from its group, removing it from the
* list, destroying the timers and setting the throttle_state pointer
* to NULL.
/* Unregister a BlockBackend from its group, removing it from the list,
* destroying the timers and setting the throttle_state pointer to NULL.
*
* The BlockDriverState must not have pending throttled requests, so
* the caller has to drain them first.
* The BlockBackend must not have pending throttled requests, so the caller has
* to drain them first.
*
* The group will be destroyed if it's empty after this operation.
*
* @bs: the BlockDriverState to remove
* @blk: the BlockBackend to remove
*/
void throttle_group_unregister_bs(BlockDriverState *bs)
void throttle_group_unregister_blk(BlockBackend *blk)
{
ThrottleGroup *tg = container_of(bs->throttle_state, ThrottleGroup, ts);
BlockBackendPublic *blkp = blk_get_public(blk);
ThrottleGroup *tg = container_of(blkp->throttle_state, ThrottleGroup, ts);
int i;
assert(bs->pending_reqs[0] == 0 && bs->pending_reqs[1] == 0);
assert(qemu_co_queue_empty(&bs->throttled_reqs[0]));
assert(qemu_co_queue_empty(&bs->throttled_reqs[1]));
assert(blkp->pending_reqs[0] == 0 && blkp->pending_reqs[1] == 0);
assert(qemu_co_queue_empty(&blkp->throttled_reqs[0]));
assert(qemu_co_queue_empty(&blkp->throttled_reqs[1]));
qemu_mutex_lock(&tg->lock);
for (i = 0; i < 2; i++) {
if (tg->tokens[i] == bs) {
BlockDriverState *token = throttle_group_next_bs(bs);
/* Take care of the case where this is the last bs in the group */
if (token == bs) {
if (tg->tokens[i] == blk) {
BlockBackend *token = throttle_group_next_blk(blk);
/* Take care of the case where this is the last blk in the group */
if (token == blk) {
token = NULL;
}
tg->tokens[i] = token;
}
}
/* remove the current bs from the list */
QLIST_REMOVE(bs, round_robin);
throttle_timers_destroy(&bs->throttle_timers);
/* remove the current blk from the list */
QLIST_REMOVE(blkp, round_robin);
throttle_timers_destroy(&blkp->throttle_timers);
qemu_mutex_unlock(&tg->lock);
throttle_group_unref(&tg->ts);
bs->throttle_state = NULL;
blkp->throttle_state = NULL;
}
static void throttle_groups_init(void)

View File

@@ -50,11 +50,14 @@
*/
#include "qemu/osdep.h"
#include "qemu-common.h"
#include "qapi/error.h"
#include "block/block_int.h"
#include "sysemu/block-backend.h"
#include "qemu/module.h"
#include "qemu/bswap.h"
#include "migration/migration.h"
#include "qemu/coroutine.h"
#include "qemu/cutils.h"
#if defined(CONFIG_UUID)
#include <uuid/uuid.h>
@@ -555,98 +558,109 @@ static int64_t coroutine_fn vdi_co_get_block_status(BlockDriverState *bs,
return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | offset;
}
static int vdi_co_read(BlockDriverState *bs,
int64_t sector_num, uint8_t *buf, int nb_sectors)
static int coroutine_fn
vdi_co_preadv(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
QEMUIOVector *qiov, int flags)
{
BDRVVdiState *s = bs->opaque;
QEMUIOVector local_qiov;
uint32_t bmap_entry;
uint32_t block_index;
uint32_t sector_in_block;
uint32_t n_sectors;
uint32_t offset_in_block;
uint32_t n_bytes;
uint64_t bytes_done = 0;
int ret = 0;
logout("\n");
while (ret >= 0 && nb_sectors > 0) {
block_index = sector_num / s->block_sectors;
sector_in_block = sector_num % s->block_sectors;
n_sectors = s->block_sectors - sector_in_block;
if (n_sectors > nb_sectors) {
n_sectors = nb_sectors;
}
qemu_iovec_init(&local_qiov, qiov->niov);
logout("will read %u sectors starting at sector %" PRIu64 "\n",
n_sectors, sector_num);
while (ret >= 0 && bytes > 0) {
block_index = offset / s->block_size;
offset_in_block = offset % s->block_size;
n_bytes = MIN(bytes, s->block_size - offset_in_block);
logout("will read %u bytes starting at offset %" PRIu64 "\n",
n_bytes, offset);
/* prepare next AIO request */
bmap_entry = le32_to_cpu(s->bmap[block_index]);
if (!VDI_IS_ALLOCATED(bmap_entry)) {
/* Block not allocated, return zeros, no need to wait. */
memset(buf, 0, n_sectors * SECTOR_SIZE);
qemu_iovec_memset(qiov, bytes_done, 0, n_bytes);
ret = 0;
} else {
uint64_t offset = s->header.offset_data / SECTOR_SIZE +
(uint64_t)bmap_entry * s->block_sectors +
sector_in_block;
ret = bdrv_read(bs->file->bs, offset, buf, n_sectors);
}
logout("%u sectors read\n", n_sectors);
uint64_t data_offset = s->header.offset_data +
(uint64_t)bmap_entry * s->block_size +
offset_in_block;
nb_sectors -= n_sectors;
sector_num += n_sectors;
buf += n_sectors * SECTOR_SIZE;
qemu_iovec_reset(&local_qiov);
qemu_iovec_concat(&local_qiov, qiov, bytes_done, n_bytes);
ret = bdrv_co_preadv(bs->file->bs, data_offset, n_bytes,
&local_qiov, 0);
}
logout("%u bytes read\n", n_bytes);
bytes -= n_bytes;
offset += n_bytes;
bytes_done += n_bytes;
}
qemu_iovec_destroy(&local_qiov);
return ret;
}
static int vdi_co_write(BlockDriverState *bs,
int64_t sector_num, const uint8_t *buf, int nb_sectors)
static int coroutine_fn
vdi_co_pwritev(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
QEMUIOVector *qiov, int flags)
{
BDRVVdiState *s = bs->opaque;
QEMUIOVector local_qiov;
uint32_t bmap_entry;
uint32_t block_index;
uint32_t sector_in_block;
uint32_t n_sectors;
uint32_t offset_in_block;
uint32_t n_bytes;
uint32_t bmap_first = VDI_UNALLOCATED;
uint32_t bmap_last = VDI_UNALLOCATED;
uint8_t *block = NULL;
uint64_t bytes_done = 0;
int ret = 0;
logout("\n");
while (ret >= 0 && nb_sectors > 0) {
block_index = sector_num / s->block_sectors;
sector_in_block = sector_num % s->block_sectors;
n_sectors = s->block_sectors - sector_in_block;
if (n_sectors > nb_sectors) {
n_sectors = nb_sectors;
}
qemu_iovec_init(&local_qiov, qiov->niov);
logout("will write %u sectors starting at sector %" PRIu64 "\n",
n_sectors, sector_num);
while (ret >= 0 && bytes > 0) {
block_index = offset / s->block_size;
offset_in_block = offset % s->block_size;
n_bytes = MIN(bytes, s->block_size - offset_in_block);
logout("will write %u bytes starting at offset %" PRIu64 "\n",
n_bytes, offset);
/* prepare next AIO request */
bmap_entry = le32_to_cpu(s->bmap[block_index]);
if (!VDI_IS_ALLOCATED(bmap_entry)) {
/* Allocate new block and write to it. */
uint64_t offset;
uint64_t data_offset;
bmap_entry = s->header.blocks_allocated;
s->bmap[block_index] = cpu_to_le32(bmap_entry);
s->header.blocks_allocated++;
offset = s->header.offset_data / SECTOR_SIZE +
(uint64_t)bmap_entry * s->block_sectors;
data_offset = s->header.offset_data +
(uint64_t)bmap_entry * s->block_size;
if (block == NULL) {
block = g_malloc(s->block_size);
bmap_first = block_index;
}
bmap_last = block_index;
/* Copy data to be written to new block and zero unused parts. */
memset(block, 0, sector_in_block * SECTOR_SIZE);
memcpy(block + sector_in_block * SECTOR_SIZE,
buf, n_sectors * SECTOR_SIZE);
memset(block + (sector_in_block + n_sectors) * SECTOR_SIZE, 0,
(s->block_sectors - n_sectors - sector_in_block) * SECTOR_SIZE);
memset(block, 0, offset_in_block);
qemu_iovec_to_buf(qiov, bytes_done, block + offset_in_block,
n_bytes);
memset(block + offset_in_block + n_bytes, 0,
s->block_size - n_bytes - offset_in_block);
/* Note that this coroutine does not yield anywhere from reading the
* bmap entry until here, so in regards to all the coroutines trying
@@ -656,12 +670,12 @@ static int vdi_co_write(BlockDriverState *bs,
* acquire the lock and thus the padded cluster is written before
* the other coroutines can write to the affected area. */
qemu_co_mutex_lock(&s->write_lock);
ret = bdrv_write(bs->file->bs, offset, block, s->block_sectors);
ret = bdrv_pwrite(bs->file->bs, data_offset, block, s->block_size);
qemu_co_mutex_unlock(&s->write_lock);
} else {
uint64_t offset = s->header.offset_data / SECTOR_SIZE +
(uint64_t)bmap_entry * s->block_sectors +
sector_in_block;
uint64_t data_offset = s->header.offset_data +
(uint64_t)bmap_entry * s->block_size +
offset_in_block;
qemu_co_mutex_lock(&s->write_lock);
/* This lock is only used to make sure the following write operation
* is executed after the write issued by the coroutine allocating
@@ -672,16 +686,23 @@ static int vdi_co_write(BlockDriverState *bs,
* that that write operation has returned (there may be other writes
* in flight, but they do not concern this very operation). */
qemu_co_mutex_unlock(&s->write_lock);
ret = bdrv_write(bs->file->bs, offset, buf, n_sectors);
qemu_iovec_reset(&local_qiov);
qemu_iovec_concat(&local_qiov, qiov, bytes_done, n_bytes);
ret = bdrv_co_pwritev(bs->file->bs, data_offset, n_bytes,
&local_qiov, 0);
}
nb_sectors -= n_sectors;
sector_num += n_sectors;
buf += n_sectors * SECTOR_SIZE;
bytes -= n_bytes;
offset += n_bytes;
bytes_done += n_bytes;
logout("%u sectors written\n", n_sectors);
logout("%u bytes written\n", n_bytes);
}
qemu_iovec_destroy(&local_qiov);
logout("finished data write\n");
if (ret < 0) {
return ret;
@@ -692,6 +713,7 @@ static int vdi_co_write(BlockDriverState *bs,
VdiHeader *header = (VdiHeader *) block;
uint8_t *base;
uint64_t offset;
uint32_t n_sectors;
logout("now writing modified header\n");
assert(VDI_IS_ALLOCATED(bmap_first));
@@ -733,7 +755,7 @@ static int vdi_create(const char *filename, QemuOpts *opts, Error **errp)
size_t bmap_size;
int64_t offset = 0;
Error *local_err = NULL;
BlockDriverState *bs = NULL;
BlockBackend *blk = NULL;
uint32_t *bmap = NULL;
logout("\n");
@@ -766,13 +788,17 @@ static int vdi_create(const char *filename, QemuOpts *opts, Error **errp)
error_propagate(errp, local_err);
goto exit;
}
ret = bdrv_open(&bs, filename, NULL, NULL, BDRV_O_RDWR | BDRV_O_PROTOCOL,
&local_err);
if (ret < 0) {
blk = blk_new_open(filename, NULL, NULL,
BDRV_O_RDWR | BDRV_O_PROTOCOL, &local_err);
if (blk == NULL) {
error_propagate(errp, local_err);
ret = -EIO;
goto exit;
}
blk_set_allow_write_beyond_eof(blk, true);
/* We need enough blocks to store the given disk size,
so always round up. */
blocks = DIV_ROUND_UP(bytes, block_size);
@@ -802,7 +828,7 @@ static int vdi_create(const char *filename, QemuOpts *opts, Error **errp)
vdi_header_print(&header);
#endif
vdi_header_to_le(&header);
ret = bdrv_pwrite_sync(bs, offset, &header, sizeof(header));
ret = blk_pwrite(blk, offset, &header, sizeof(header), 0);
if (ret < 0) {
error_setg(errp, "Error writing header to %s", filename);
goto exit;
@@ -823,7 +849,7 @@ static int vdi_create(const char *filename, QemuOpts *opts, Error **errp)
bmap[i] = VDI_UNALLOCATED;
}
}
ret = bdrv_pwrite_sync(bs, offset, bmap, bmap_size);
ret = blk_pwrite(blk, offset, bmap, bmap_size, 0);
if (ret < 0) {
error_setg(errp, "Error writing bmap to %s", filename);
goto exit;
@@ -832,7 +858,7 @@ static int vdi_create(const char *filename, QemuOpts *opts, Error **errp)
}
if (image_type == VDI_TYPE_STATIC) {
ret = bdrv_truncate(bs, offset + blocks * block_size);
ret = blk_truncate(blk, offset + blocks * block_size);
if (ret < 0) {
error_setg(errp, "Failed to statically allocate %s", filename);
goto exit;
@@ -840,7 +866,7 @@ static int vdi_create(const char *filename, QemuOpts *opts, Error **errp)
}
exit:
bdrv_unref(bs);
blk_unref(blk);
g_free(bmap);
return ret;
}
@@ -897,9 +923,9 @@ static BlockDriver bdrv_vdi = {
.bdrv_co_get_block_status = vdi_co_get_block_status,
.bdrv_make_empty = vdi_make_empty,
.bdrv_read = vdi_co_read,
.bdrv_co_preadv = vdi_co_preadv,
#if defined(CONFIG_VDI_WRITE)
.bdrv_write = vdi_co_write,
.bdrv_co_pwritev = vdi_co_pwritev,
#endif
.bdrv_get_info = vdi_get_info,

View File

@@ -18,6 +18,7 @@
#include "qemu/osdep.h"
#include "qemu-common.h"
#include "block/block_int.h"
#include "qemu/bswap.h"
#include "block/vhdx.h"
#include <uuid/uuid.h>

View File

@@ -18,10 +18,12 @@
*
*/
#include "qemu/osdep.h"
#include "qapi/error.h"
#include "qemu-common.h"
#include "block/block_int.h"
#include "qemu/error-report.h"
#include "qemu/module.h"
#include "qemu/bswap.h"
#include "block/vhdx.h"

View File

@@ -16,10 +16,13 @@
*/
#include "qemu/osdep.h"
#include "qapi/error.h"
#include "qemu-common.h"
#include "block/block_int.h"
#include "sysemu/block-backend.h"
#include "qemu/module.h"
#include "qemu/crc32c.h"
#include "qemu/bswap.h"
#include "block/vhdx.h"
#include "migration/migration.h"
@@ -264,10 +267,10 @@ static void vhdx_region_unregister_all(BDRVVHDXState *s)
static void vhdx_set_shift_bits(BDRVVHDXState *s)
{
s->logical_sector_size_bits = 31 - clz32(s->logical_sector_size);
s->sectors_per_block_bits = 31 - clz32(s->sectors_per_block);
s->chunk_ratio_bits = 63 - clz64(s->chunk_ratio);
s->block_size_bits = 31 - clz32(s->block_size);
s->logical_sector_size_bits = ctz32(s->logical_sector_size);
s->sectors_per_block_bits = ctz32(s->sectors_per_block);
s->chunk_ratio_bits = ctz64(s->chunk_ratio);
s->block_size_bits = ctz32(s->block_size);
}
/*
@@ -857,14 +860,8 @@ static void vhdx_calc_bat_entries(BDRVVHDXState *s)
{
uint32_t data_blocks_cnt, bitmap_blocks_cnt;
data_blocks_cnt = s->virtual_disk_size >> s->block_size_bits;
if (s->virtual_disk_size - (data_blocks_cnt << s->block_size_bits)) {
data_blocks_cnt++;
}
bitmap_blocks_cnt = data_blocks_cnt >> s->chunk_ratio_bits;
if (data_blocks_cnt - (bitmap_blocks_cnt << s->chunk_ratio_bits)) {
bitmap_blocks_cnt++;
}
data_blocks_cnt = DIV_ROUND_UP(s->virtual_disk_size, s->block_size);
bitmap_blocks_cnt = DIV_ROUND_UP(data_blocks_cnt, s->chunk_ratio);
if (s->parent_entries) {
s->bat_entries = bitmap_blocks_cnt * (s->chunk_ratio + 1);
@@ -1778,7 +1775,7 @@ static int vhdx_create(const char *filename, QemuOpts *opts, Error **errp)
gunichar2 *creator = NULL;
glong creator_items;
BlockDriverState *bs;
BlockBackend *blk;
char *type = NULL;
VHDXImageType image_type;
Error *local_err = NULL;
@@ -1843,14 +1840,16 @@ static int vhdx_create(const char *filename, QemuOpts *opts, Error **errp)
goto exit;
}
bs = NULL;
ret = bdrv_open(&bs, filename, NULL, NULL, BDRV_O_RDWR | BDRV_O_PROTOCOL,
&local_err);
if (ret < 0) {
blk = blk_new_open(filename, NULL, NULL,
BDRV_O_RDWR | BDRV_O_PROTOCOL, &local_err);
if (blk == NULL) {
error_propagate(errp, local_err);
ret = -EIO;
goto exit;
}
blk_set_allow_write_beyond_eof(blk, true);
/* Create (A) */
/* The creator field is optional, but may be useful for
@@ -1858,13 +1857,14 @@ static int vhdx_create(const char *filename, QemuOpts *opts, Error **errp)
creator = g_utf8_to_utf16("QEMU v" QEMU_VERSION, -1, NULL,
&creator_items, NULL);
signature = cpu_to_le64(VHDX_FILE_SIGNATURE);
ret = bdrv_pwrite(bs, VHDX_FILE_ID_OFFSET, &signature, sizeof(signature));
ret = blk_pwrite(blk, VHDX_FILE_ID_OFFSET, &signature, sizeof(signature),
0);
if (ret < 0) {
goto delete_and_exit;
}
if (creator) {
ret = bdrv_pwrite(bs, VHDX_FILE_ID_OFFSET + sizeof(signature),
creator, creator_items * sizeof(gunichar2));
ret = blk_pwrite(blk, VHDX_FILE_ID_OFFSET + sizeof(signature),
creator, creator_items * sizeof(gunichar2), 0);
if (ret < 0) {
goto delete_and_exit;
}
@@ -1872,13 +1872,13 @@ static int vhdx_create(const char *filename, QemuOpts *opts, Error **errp)
/* Creates (B),(C) */
ret = vhdx_create_new_headers(bs, image_size, log_size);
ret = vhdx_create_new_headers(blk_bs(blk), image_size, log_size);
if (ret < 0) {
goto delete_and_exit;
}
/* Creates (D),(E),(G) explicitly. (F) created as by-product */
ret = vhdx_create_new_region_table(bs, image_size, block_size, 512,
ret = vhdx_create_new_region_table(blk_bs(blk), image_size, block_size, 512,
log_size, use_zero_blocks, image_type,
&metadata_offset);
if (ret < 0) {
@@ -1886,7 +1886,7 @@ static int vhdx_create(const char *filename, QemuOpts *opts, Error **errp)
}
/* Creates (H) */
ret = vhdx_create_new_metadata(bs, image_size, block_size, 512,
ret = vhdx_create_new_metadata(blk_bs(blk), image_size, block_size, 512,
metadata_offset, image_type);
if (ret < 0) {
goto delete_and_exit;
@@ -1894,7 +1894,7 @@ static int vhdx_create(const char *filename, QemuOpts *opts, Error **errp)
delete_and_exit:
bdrv_unref(bs);
blk_unref(blk);
exit:
g_free(type);
g_free(creator);

File diff suppressed because it is too large Load Diff

View File

@@ -23,10 +23,13 @@
* THE SOFTWARE.
*/
#include "qemu/osdep.h"
#include "qapi/error.h"
#include "qemu-common.h"
#include "block/block_int.h"
#include "sysemu/block-backend.h"
#include "qemu/module.h"
#include "migration/migration.h"
#include "qemu/bswap.h"
#if defined(CONFIG_UUID)
#include <uuid/uuid.h>
#endif
@@ -43,28 +46,34 @@ enum vhd_type {
VHD_DIFFERENCING = 4,
};
// Seconds since Jan 1, 2000 0:00:00 (UTC)
/* Seconds since Jan 1, 2000 0:00:00 (UTC) */
#define VHD_TIMESTAMP_BASE 946684800
#define VHD_MAX_SECTORS (65535LL * 255 * 255)
#define VHD_MAX_GEOMETRY (65535LL * 16 * 255)
#define VHD_CHS_MAX_C 65535LL
#define VHD_CHS_MAX_H 16
#define VHD_CHS_MAX_S 255
// always big-endian
#define VHD_MAX_SECTORS 0xff000000 /* 2040 GiB max image size */
#define VHD_MAX_GEOMETRY (VHD_CHS_MAX_C * VHD_CHS_MAX_H * VHD_CHS_MAX_S)
#define VPC_OPT_FORCE_SIZE "force_size"
/* always big-endian */
typedef struct vhd_footer {
char creator[8]; // "conectix"
char creator[8]; /* "conectix" */
uint32_t features;
uint32_t version;
// Offset of next header structure, 0xFFFFFFFF if none
/* Offset of next header structure, 0xFFFFFFFF if none */
uint64_t data_offset;
// Seconds since Jan 1, 2000 0:00:00 (UTC)
/* Seconds since Jan 1, 2000 0:00:00 (UTC) */
uint32_t timestamp;
char creator_app[4]; // "vpc "
char creator_app[4]; /* e.g., "vpc " */
uint16_t major;
uint16_t minor;
char creator_os[4]; // "Wi2k"
char creator_os[4]; /* "Wi2k" */
uint64_t orig_size;
uint64_t current_size;
@@ -75,29 +84,29 @@ typedef struct vhd_footer {
uint32_t type;
// Checksum of the Hard Disk Footer ("one's complement of the sum of all
// the bytes in the footer without the checksum field")
/* Checksum of the Hard Disk Footer ("one's complement of the sum of all
the bytes in the footer without the checksum field") */
uint32_t checksum;
// UUID used to identify a parent hard disk (backing file)
/* UUID used to identify a parent hard disk (backing file) */
uint8_t uuid[16];
uint8_t in_saved_state;
} QEMU_PACKED VHDFooter;
typedef struct vhd_dyndisk_header {
char magic[8]; // "cxsparse"
char magic[8]; /* "cxsparse" */
// Offset of next header structure, 0xFFFFFFFF if none
/* Offset of next header structure, 0xFFFFFFFF if none */
uint64_t data_offset;
// Offset of the Block Allocation Table (BAT)
/* Offset of the Block Allocation Table (BAT) */
uint64_t table_offset;
uint32_t version;
uint32_t max_table_entries; // 32bit/entry
uint32_t max_table_entries; /* 32bit/entry */
// 2 MB by default, must be a power of two
/* 2 MB by default, must be a power of two */
uint32_t block_size;
uint32_t checksum;
@@ -105,7 +114,7 @@ typedef struct vhd_dyndisk_header {
uint32_t parent_timestamp;
uint32_t reserved;
// Backing file name (in UTF-16)
/* Backing file name (in UTF-16) */
uint8_t parent_name[512];
struct {
@@ -128,6 +137,8 @@ typedef struct BDRVVPCState {
uint32_t block_size;
uint32_t bitmap_size;
bool force_use_chs;
bool force_use_sz;
#ifdef CACHE
uint8_t *pageentry_u8;
@@ -140,6 +151,22 @@ typedef struct BDRVVPCState {
Error *migration_blocker;
} BDRVVPCState;
#define VPC_OPT_SIZE_CALC "force_size_calc"
static QemuOptsList vpc_runtime_opts = {
.name = "vpc-runtime-opts",
.head = QTAILQ_HEAD_INITIALIZER(vpc_runtime_opts.head),
.desc = {
{
.name = VPC_OPT_SIZE_CALC,
.type = QEMU_OPT_STRING,
.help = "Force disk size calculation to use either CHS geometry, "
"or use the disk current_size specified in the VHD footer. "
"{chs, current_size}"
},
{ /* end of list */ }
}
};
static uint32_t vpc_checksum(uint8_t* buf, size_t size)
{
uint32_t res = 0;
@@ -159,6 +186,25 @@ static int vpc_probe(const uint8_t *buf, int buf_size, const char *filename)
return 0;
}
static void vpc_parse_options(BlockDriverState *bs, QemuOpts *opts,
Error **errp)
{
BDRVVPCState *s = bs->opaque;
const char *size_calc;
size_calc = qemu_opt_get(opts, VPC_OPT_SIZE_CALC);
if (!size_calc) {
/* no override, use autodetect only */
} else if (!strcmp(size_calc, "current_size")) {
s->force_use_sz = true;
} else if (!strcmp(size_calc, "chs")) {
s->force_use_chs = true;
} else {
error_setg(errp, "Invalid size calculation mode: '%s'", size_calc);
}
}
static int vpc_open(BlockDriverState *bs, QDict *options, int flags,
Error **errp)
{
@@ -166,6 +212,9 @@ static int vpc_open(BlockDriverState *bs, QDict *options, int flags,
int i;
VHDFooter *footer;
VHDDynDiskHeader *dyndisk_header;
QemuOpts *opts = NULL;
Error *local_err = NULL;
bool use_chs;
uint8_t buf[HEADER_SIZE];
uint32_t checksum;
uint64_t computed_size;
@@ -173,8 +222,24 @@ static int vpc_open(BlockDriverState *bs, QDict *options, int flags,
int disk_type = VHD_DYNAMIC;
int ret;
opts = qemu_opts_create(&vpc_runtime_opts, NULL, 0, &error_abort);
qemu_opts_absorb_qdict(opts, options, &local_err);
if (local_err) {
error_propagate(errp, local_err);
ret = -EINVAL;
goto fail;
}
vpc_parse_options(bs, opts, &local_err);
if (local_err) {
error_propagate(errp, local_err);
ret = -EINVAL;
goto fail;
}
ret = bdrv_pread(bs->file->bs, 0, s->footer_buf, HEADER_SIZE);
if (ret < 0) {
error_setg(errp, "Unable to read VHD header");
goto fail;
}
@@ -183,9 +248,11 @@ static int vpc_open(BlockDriverState *bs, QDict *options, int flags,
int64_t offset = bdrv_getlength(bs->file->bs);
if (offset < 0) {
ret = offset;
error_setg(errp, "Invalid file size");
goto fail;
} else if (offset < HEADER_SIZE) {
ret = -EINVAL;
error_setg(errp, "File too small for a VHD header");
goto fail;
}
@@ -212,22 +279,50 @@ static int vpc_open(BlockDriverState *bs, QDict *options, int flags,
/* Write 'checksum' back to footer, or else will leave it with zero. */
footer->checksum = cpu_to_be32(checksum);
// The visible size of a image in Virtual PC depends on the geometry
// rather than on the size stored in the footer (the size in the footer
// is too large usually)
/* The visible size of a image in Virtual PC depends on the geometry
rather than on the size stored in the footer (the size in the footer
is too large usually) */
bs->total_sectors = (int64_t)
be16_to_cpu(footer->cyls) * footer->heads * footer->secs_per_cyl;
/* Images that have exactly the maximum geometry are probably bigger and
* would be truncated if we adhered to the geometry for them. Rely on
* footer->current_size for them. */
if (bs->total_sectors == VHD_MAX_GEOMETRY) {
/* Microsoft Virtual PC and Microsoft Hyper-V produce and read
* VHD image sizes differently. VPC will rely on CHS geometry,
* while Hyper-V and disk2vhd use the size specified in the footer.
*
* We use a couple of approaches to try and determine the correct method:
* look at the Creator App field, and look for images that have CHS
* geometry that is the maximum value.
*
* If the CHS geometry is the maximum CHS geometry, then we assume that
* the size is the footer->current_size to avoid truncation. Otherwise,
* we follow the table based on footer->creator_app:
*
* Known creator apps:
* 'vpc ' : CHS Virtual PC (uses disk geometry)
* 'qemu' : CHS QEMU (uses disk geometry)
* 'qem2' : current_size QEMU (uses current_size)
* 'win ' : current_size Hyper-V
* 'd2v ' : current_size Disk2vhd
* 'tap\0' : current_size XenServer
* 'CTXS' : current_size XenConverter
*
* The user can override the table values via drive options, however
* even with an override we will still use current_size for images
* that have CHS geometry of the maximum size.
*/
use_chs = (!!strncmp(footer->creator_app, "win ", 4) &&
!!strncmp(footer->creator_app, "qem2", 4) &&
!!strncmp(footer->creator_app, "d2v ", 4) &&
!!strncmp(footer->creator_app, "CTXS", 4) &&
!!memcmp(footer->creator_app, "tap", 4)) || s->force_use_chs;
if (!use_chs || bs->total_sectors == VHD_MAX_GEOMETRY || s->force_use_sz) {
bs->total_sectors = be64_to_cpu(footer->current_size) /
BDRV_SECTOR_SIZE;
BDRV_SECTOR_SIZE;
}
/* Allow a maximum disk size of approximately 2 TB */
if (bs->total_sectors >= VHD_MAX_SECTORS) {
/* Allow a maximum disk size of 2040 GiB */
if (bs->total_sectors > VHD_MAX_SECTORS) {
ret = -EFBIG;
goto fail;
}
@@ -236,12 +331,14 @@ static int vpc_open(BlockDriverState *bs, QDict *options, int flags,
ret = bdrv_pread(bs->file->bs, be64_to_cpu(footer->data_offset), buf,
HEADER_SIZE);
if (ret < 0) {
error_setg(errp, "Error reading dynamic VHD header");
goto fail;
}
dyndisk_header = (VHDDynDiskHeader *) buf;
if (strncmp(dyndisk_header->magic, "cxsparse", 8)) {
error_setg(errp, "Invalid header magic");
ret = -EINVAL;
goto fail;
}
@@ -257,16 +354,14 @@ static int vpc_open(BlockDriverState *bs, QDict *options, int flags,
s->max_table_entries = be32_to_cpu(dyndisk_header->max_table_entries);
if ((bs->total_sectors * 512) / s->block_size > 0xffffffffU) {
ret = -EINVAL;
goto fail;
}
if (s->max_table_entries > (VHD_MAX_SECTORS * 512) / s->block_size) {
error_setg(errp, "Too many blocks");
ret = -EINVAL;
goto fail;
}
computed_size = (uint64_t) s->max_table_entries * s->block_size;
if (computed_size < bs->total_sectors * 512) {
error_setg(errp, "Page table too small");
ret = -EINVAL;
goto fail;
}
@@ -283,6 +378,7 @@ static int vpc_open(BlockDriverState *bs, QDict *options, int flags,
s->pagetable = qemu_try_blockalign(bs->file->bs, pagetable_size);
if (s->pagetable == NULL) {
error_setg(errp, "Unable to allocate memory for page table");
ret = -ENOMEM;
goto fail;
}
@@ -292,6 +388,7 @@ static int vpc_open(BlockDriverState *bs, QDict *options, int flags,
ret = bdrv_pread(bs->file->bs, s->bat_offset, s->pagetable,
pagetable_size);
if (ret < 0) {
error_setg(errp, "Error reading pagetable");
goto fail;
}
@@ -358,28 +455,27 @@ static int vpc_reopen_prepare(BDRVReopenState *state,
* The parameter write must be 1 if the offset will be used for a write
* operation (the block bitmaps is updated then), 0 otherwise.
*/
static inline int64_t get_sector_offset(BlockDriverState *bs,
int64_t sector_num, int write)
static inline int64_t get_image_offset(BlockDriverState *bs, uint64_t offset,
bool write)
{
BDRVVPCState *s = bs->opaque;
uint64_t offset = sector_num * 512;
uint64_t bitmap_offset, block_offset;
uint32_t pagetable_index, pageentry_index;
uint32_t pagetable_index, offset_in_block;
pagetable_index = offset / s->block_size;
pageentry_index = (offset % s->block_size) / 512;
offset_in_block = offset % s->block_size;
if (pagetable_index >= s->max_table_entries || s->pagetable[pagetable_index] == 0xffffffff)
return -1; // not allocated
return -1; /* not allocated */
bitmap_offset = 512 * (uint64_t) s->pagetable[pagetable_index];
block_offset = bitmap_offset + s->bitmap_size + (512 * pageentry_index);
block_offset = bitmap_offset + s->bitmap_size + offset_in_block;
// We must ensure that we don't write to any sectors which are marked as
// unused in the bitmap. We get away with setting all bits in the block
// bitmap each time we write to a new block. This might cause Virtual PC to
// miss sparse read optimization, but it's not a problem in terms of
// correctness.
/* We must ensure that we don't write to any sectors which are marked as
unused in the bitmap. We get away with setting all bits in the block
bitmap each time we write to a new block. This might cause Virtual PC to
miss sparse read optimization, but it's not a problem in terms of
correctness. */
if (write && (s->last_bitmap_offset != bitmap_offset)) {
uint8_t bitmap[s->bitmap_size];
@@ -391,6 +487,12 @@ static inline int64_t get_sector_offset(BlockDriverState *bs,
return block_offset;
}
static inline int64_t get_sector_offset(BlockDriverState *bs,
int64_t sector_num, bool write)
{
return get_image_offset(bs, sector_num * BDRV_SECTOR_SIZE, write);
}
/*
* Writes the footer to the end of the image file. This is needed when the
* file grows as it overwrites the old footer
@@ -417,7 +519,7 @@ static int rewrite_footer(BlockDriverState* bs)
*
* Returns the sectors' offset in the image file on success and < 0 on error
*/
static int64_t alloc_block(BlockDriverState* bs, int64_t sector_num)
static int64_t alloc_block(BlockDriverState* bs, int64_t offset)
{
BDRVVPCState *s = bs->opaque;
int64_t bat_offset;
@@ -425,18 +527,17 @@ static int64_t alloc_block(BlockDriverState* bs, int64_t sector_num)
int ret;
uint8_t bitmap[s->bitmap_size];
// Check if sector_num is valid
if ((sector_num < 0) || (sector_num > bs->total_sectors))
return -1;
// Write entry into in-memory BAT
index = (sector_num * 512) / s->block_size;
if (s->pagetable[index] != 0xFFFFFFFF)
return -1;
/* Check if sector_num is valid */
if ((offset < 0) || (offset > bs->total_sectors * BDRV_SECTOR_SIZE)) {
return -EINVAL;
}
/* Write entry into in-memory BAT */
index = offset / s->block_size;
assert(s->pagetable[index] == 0xFFFFFFFF);
s->pagetable[index] = s->free_data_block_offset / 512;
// Initialize the block's bitmap
/* Initialize the block's bitmap */
memset(bitmap, 0xff, s->bitmap_size);
ret = bdrv_pwrite_sync(bs->file->bs, s->free_data_block_offset, bitmap,
s->bitmap_size);
@@ -444,24 +545,24 @@ static int64_t alloc_block(BlockDriverState* bs, int64_t sector_num)
return ret;
}
// Write new footer (the old one will be overwritten)
/* Write new footer (the old one will be overwritten) */
s->free_data_block_offset += s->block_size + s->bitmap_size;
ret = rewrite_footer(bs);
if (ret < 0)
goto fail;
// Write BAT entry to disk
/* Write BAT entry to disk */
bat_offset = s->bat_offset + (4 * index);
bat_value = cpu_to_be32(s->pagetable[index]);
ret = bdrv_pwrite_sync(bs->file->bs, bat_offset, &bat_value, 4);
if (ret < 0)
goto fail;
return get_sector_offset(bs, sector_num, 0);
return get_image_offset(bs, offset, false);
fail:
s->free_data_block_offset -= (s->block_size + s->bitmap_size);
return -1;
return ret;
}
static int vpc_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
@@ -477,104 +578,105 @@ static int vpc_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
return 0;
}
static int vpc_read(BlockDriverState *bs, int64_t sector_num,
uint8_t *buf, int nb_sectors)
static int coroutine_fn
vpc_co_preadv(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
QEMUIOVector *qiov, int flags)
{
BDRVVPCState *s = bs->opaque;
int ret;
int64_t offset;
int64_t sectors, sectors_per_block;
int64_t image_offset;
int64_t n_bytes;
int64_t bytes_done = 0;
VHDFooter *footer = (VHDFooter *) s->footer_buf;
QEMUIOVector local_qiov;
if (be32_to_cpu(footer->type) == VHD_FIXED) {
return bdrv_read(bs->file->bs, sector_num, buf, nb_sectors);
return bdrv_co_preadv(bs->file->bs, offset, bytes, qiov, 0);
}
while (nb_sectors > 0) {
offset = get_sector_offset(bs, sector_num, 0);
sectors_per_block = s->block_size >> BDRV_SECTOR_BITS;
sectors = sectors_per_block - (sector_num % sectors_per_block);
if (sectors > nb_sectors) {
sectors = nb_sectors;
}
qemu_co_mutex_lock(&s->lock);
qemu_iovec_init(&local_qiov, qiov->niov);
if (offset == -1) {
memset(buf, 0, sectors * BDRV_SECTOR_SIZE);
while (bytes > 0) {
image_offset = get_image_offset(bs, offset, false);
n_bytes = MIN(bytes, s->block_size - (offset % s->block_size));
if (image_offset == -1) {
qemu_iovec_memset(qiov, bytes_done, 0, n_bytes);
} else {
ret = bdrv_pread(bs->file->bs, offset, buf,
sectors * BDRV_SECTOR_SIZE);
if (ret != sectors * BDRV_SECTOR_SIZE) {
return -1;
qemu_iovec_reset(&local_qiov);
qemu_iovec_concat(&local_qiov, qiov, bytes_done, n_bytes);
ret = bdrv_co_preadv(bs->file->bs, image_offset, n_bytes,
&local_qiov, 0);
if (ret < 0) {
goto fail;
}
}
nb_sectors -= sectors;
sector_num += sectors;
buf += sectors * BDRV_SECTOR_SIZE;
bytes -= n_bytes;
offset += n_bytes;
bytes_done += n_bytes;
}
return 0;
}
static coroutine_fn int vpc_co_read(BlockDriverState *bs, int64_t sector_num,
uint8_t *buf, int nb_sectors)
{
int ret;
BDRVVPCState *s = bs->opaque;
qemu_co_mutex_lock(&s->lock);
ret = vpc_read(bs, sector_num, buf, nb_sectors);
ret = 0;
fail:
qemu_iovec_destroy(&local_qiov);
qemu_co_mutex_unlock(&s->lock);
return ret;
}
static int vpc_write(BlockDriverState *bs, int64_t sector_num,
const uint8_t *buf, int nb_sectors)
static int coroutine_fn
vpc_co_pwritev(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
QEMUIOVector *qiov, int flags)
{
BDRVVPCState *s = bs->opaque;
int64_t offset;
int64_t sectors, sectors_per_block;
int64_t image_offset;
int64_t n_bytes;
int64_t bytes_done = 0;
int ret;
VHDFooter *footer = (VHDFooter *) s->footer_buf;
QEMUIOVector local_qiov;
if (be32_to_cpu(footer->type) == VHD_FIXED) {
return bdrv_write(bs->file->bs, sector_num, buf, nb_sectors);
}
while (nb_sectors > 0) {
offset = get_sector_offset(bs, sector_num, 1);
sectors_per_block = s->block_size >> BDRV_SECTOR_BITS;
sectors = sectors_per_block - (sector_num % sectors_per_block);
if (sectors > nb_sectors) {
sectors = nb_sectors;
}
if (offset == -1) {
offset = alloc_block(bs, sector_num);
if (offset < 0)
return -1;
}
ret = bdrv_pwrite(bs->file->bs, offset, buf,
sectors * BDRV_SECTOR_SIZE);
if (ret != sectors * BDRV_SECTOR_SIZE) {
return -1;
}
nb_sectors -= sectors;
sector_num += sectors;
buf += sectors * BDRV_SECTOR_SIZE;
return bdrv_co_pwritev(bs->file->bs, offset, bytes, qiov, 0);
}
return 0;
}
static coroutine_fn int vpc_co_write(BlockDriverState *bs, int64_t sector_num,
const uint8_t *buf, int nb_sectors)
{
int ret;
BDRVVPCState *s = bs->opaque;
qemu_co_mutex_lock(&s->lock);
ret = vpc_write(bs, sector_num, buf, nb_sectors);
qemu_iovec_init(&local_qiov, qiov->niov);
while (bytes > 0) {
image_offset = get_image_offset(bs, offset, true);
n_bytes = MIN(bytes, s->block_size - (offset % s->block_size));
if (image_offset == -1) {
image_offset = alloc_block(bs, offset);
if (image_offset < 0) {
ret = image_offset;
goto fail;
}
}
qemu_iovec_reset(&local_qiov);
qemu_iovec_concat(&local_qiov, qiov, bytes_done, n_bytes);
ret = bdrv_co_pwritev(bs->file->bs, image_offset, n_bytes,
&local_qiov, 0);
if (ret < 0) {
goto fail;
}
bytes -= n_bytes;
offset += n_bytes;
bytes_done += n_bytes;
}
ret = 0;
fail:
qemu_iovec_destroy(&local_qiov);
qemu_co_mutex_unlock(&s->lock);
return ret;
}
@@ -631,7 +733,7 @@ static int64_t coroutine_fn vpc_co_get_block_status(BlockDriverState *bs,
* Note that the geometry doesn't always exactly match total_sectors but
* may round it down.
*
* Returns 0 on success, -EFBIG if the size is larger than ~2 TB. Override
* Returns 0 on success, -EFBIG if the size is larger than 2040 GiB. Override
* the hardware EIDE and ATA-2 limit of 16 heads (max disk size of 127 GB)
* and instead allow up to 255 heads.
*/
@@ -673,7 +775,7 @@ static int calculate_geometry(int64_t total_sectors, uint16_t* cyls,
return 0;
}
static int create_dynamic_disk(BlockDriverState *bs, uint8_t *buf,
static int create_dynamic_disk(BlockBackend *blk, uint8_t *buf,
int64_t total_sectors)
{
VHDDynDiskHeader *dyndisk_header =
@@ -683,34 +785,34 @@ static int create_dynamic_disk(BlockDriverState *bs, uint8_t *buf,
int ret;
int64_t offset = 0;
// Write the footer (twice: at the beginning and at the end)
/* Write the footer (twice: at the beginning and at the end) */
block_size = 0x200000;
num_bat_entries = (total_sectors + block_size / 512) / (block_size / 512);
ret = bdrv_pwrite_sync(bs, offset, buf, HEADER_SIZE);
if (ret) {
goto fail;
}
offset = 1536 + ((num_bat_entries * 4 + 511) & ~511);
ret = bdrv_pwrite_sync(bs, offset, buf, HEADER_SIZE);
ret = blk_pwrite(blk, offset, buf, HEADER_SIZE, 0);
if (ret < 0) {
goto fail;
}
// Write the initial BAT
offset = 1536 + ((num_bat_entries * 4 + 511) & ~511);
ret = blk_pwrite(blk, offset, buf, HEADER_SIZE, 0);
if (ret < 0) {
goto fail;
}
/* Write the initial BAT */
offset = 3 * 512;
memset(buf, 0xFF, 512);
for (i = 0; i < (num_bat_entries * 4 + 511) / 512; i++) {
ret = bdrv_pwrite_sync(bs, offset, buf, 512);
ret = blk_pwrite(blk, offset, buf, 512, 0);
if (ret < 0) {
goto fail;
}
offset += 512;
}
// Prepare the Dynamic Disk Header
/* Prepare the Dynamic Disk Header */
memset(buf, 0, 1024);
memcpy(dyndisk_header->magic, "cxsparse", 8);
@@ -727,10 +829,10 @@ static int create_dynamic_disk(BlockDriverState *bs, uint8_t *buf,
dyndisk_header->checksum = cpu_to_be32(vpc_checksum(buf, 1024));
// Write the header
/* Write the header */
offset = 512;
ret = bdrv_pwrite_sync(bs, offset, buf, 1024);
ret = blk_pwrite(blk, offset, buf, 1024, 0);
if (ret < 0) {
goto fail;
}
@@ -739,7 +841,7 @@ static int create_dynamic_disk(BlockDriverState *bs, uint8_t *buf,
return ret;
}
static int create_fixed_disk(BlockDriverState *bs, uint8_t *buf,
static int create_fixed_disk(BlockBackend *blk, uint8_t *buf,
int64_t total_size)
{
int ret;
@@ -747,12 +849,12 @@ static int create_fixed_disk(BlockDriverState *bs, uint8_t *buf,
/* Add footer to total size */
total_size += HEADER_SIZE;
ret = bdrv_truncate(bs, total_size);
ret = blk_truncate(blk, total_size);
if (ret < 0) {
return ret;
}
ret = bdrv_pwrite_sync(bs, total_size - HEADER_SIZE, buf, HEADER_SIZE);
ret = blk_pwrite(blk, total_size - HEADER_SIZE, buf, HEADER_SIZE, 0);
if (ret < 0) {
return ret;
}
@@ -773,8 +875,9 @@ static int vpc_create(const char *filename, QemuOpts *opts, Error **errp)
int64_t total_size;
int disk_type;
int ret = -EIO;
bool force_size;
Error *local_err = NULL;
BlockDriverState *bs = NULL;
BlockBackend *blk = NULL;
/* Read out options */
total_size = ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0),
@@ -786,6 +889,7 @@ static int vpc_create(const char *filename, QemuOpts *opts, Error **errp)
} else if (!strcmp(disk_type_param, "fixed")) {
disk_type = VHD_FIXED;
} else {
error_setg(errp, "Invalid disk type, %s", disk_type_param);
ret = -EINVAL;
goto out;
}
@@ -793,36 +897,50 @@ static int vpc_create(const char *filename, QemuOpts *opts, Error **errp)
disk_type = VHD_DYNAMIC;
}
force_size = qemu_opt_get_bool_del(opts, VPC_OPT_FORCE_SIZE, false);
ret = bdrv_create_file(filename, opts, &local_err);
if (ret < 0) {
error_propagate(errp, local_err);
goto out;
}
ret = bdrv_open(&bs, filename, NULL, NULL, BDRV_O_RDWR | BDRV_O_PROTOCOL,
&local_err);
if (ret < 0) {
blk = blk_new_open(filename, NULL, NULL,
BDRV_O_RDWR | BDRV_O_PROTOCOL, &local_err);
if (blk == NULL) {
error_propagate(errp, local_err);
ret = -EIO;
goto out;
}
blk_set_allow_write_beyond_eof(blk, true);
/*
* Calculate matching total_size and geometry. Increase the number of
* sectors requested until we get enough (or fail). This ensures that
* qemu-img convert doesn't truncate images, but rather rounds up.
*
* If the image size can't be represented by a spec conform CHS geometry,
* If the image size can't be represented by a spec conformant CHS geometry,
* we set the geometry to 65535 x 16 x 255 (CxHxS) sectors and use
* the image size from the VHD footer to calculate total_sectors.
*/
total_sectors = MIN(VHD_MAX_GEOMETRY, total_size / BDRV_SECTOR_SIZE);
for (i = 0; total_sectors > (int64_t)cyls * heads * secs_per_cyl; i++) {
calculate_geometry(total_sectors + i, &cyls, &heads, &secs_per_cyl);
if (force_size) {
/* This will force the use of total_size for sector count, below */
cyls = VHD_CHS_MAX_C;
heads = VHD_CHS_MAX_H;
secs_per_cyl = VHD_CHS_MAX_S;
} else {
total_sectors = MIN(VHD_MAX_GEOMETRY, total_size / BDRV_SECTOR_SIZE);
for (i = 0; total_sectors > (int64_t)cyls * heads * secs_per_cyl; i++) {
calculate_geometry(total_sectors + i, &cyls, &heads, &secs_per_cyl);
}
}
if ((int64_t)cyls * heads * secs_per_cyl == VHD_MAX_GEOMETRY) {
total_sectors = total_size / BDRV_SECTOR_SIZE;
/* Allow a maximum disk size of approximately 2 TB */
/* Allow a maximum disk size of 2040 GiB */
if (total_sectors > VHD_MAX_SECTORS) {
error_setg(errp, "Disk size is too large, max size is 2040 GiB");
ret = -EFBIG;
goto out;
}
@@ -835,8 +953,11 @@ static int vpc_create(const char *filename, QemuOpts *opts, Error **errp)
memset(buf, 0, 1024);
memcpy(footer->creator, "conectix", 8);
/* TODO Check if "qemu" creator_app is ok for VPC */
memcpy(footer->creator_app, "qemu", 4);
if (force_size) {
memcpy(footer->creator_app, "qem2", 4);
} else {
memcpy(footer->creator_app, "qemu", 4);
}
memcpy(footer->creator_os, "Wi2k", 4);
footer->features = cpu_to_be32(0x02);
@@ -866,13 +987,16 @@ static int vpc_create(const char *filename, QemuOpts *opts, Error **errp)
footer->checksum = cpu_to_be32(vpc_checksum(buf, HEADER_SIZE));
if (disk_type == VHD_DYNAMIC) {
ret = create_dynamic_disk(bs, buf, total_sectors);
ret = create_dynamic_disk(blk, buf, total_sectors);
} else {
ret = create_fixed_disk(bs, buf, total_size);
ret = create_fixed_disk(blk, buf, total_size);
}
if (ret < 0) {
error_setg(errp, "Unable to create or write VHD header");
}
out:
bdrv_unref(bs);
blk_unref(blk);
g_free(disk_type_param);
return ret;
}
@@ -917,6 +1041,13 @@ static QemuOptsList vpc_create_opts = {
"Type of virtual hard disk format. Supported formats are "
"{dynamic (default) | fixed} "
},
{
.name = VPC_OPT_FORCE_SIZE,
.type = QEMU_OPT_BOOL,
.help = "Force disk size calculation to use the actual size "
"specified, rather than using the nearest CHS-based "
"calculation"
},
{ /* end of list */ }
}
};
@@ -931,8 +1062,8 @@ static BlockDriver bdrv_vpc = {
.bdrv_reopen_prepare = vpc_reopen_prepare,
.bdrv_create = vpc_create,
.bdrv_read = vpc_co_read,
.bdrv_write = vpc_co_write,
.bdrv_co_preadv = vpc_co_preadv,
.bdrv_co_pwritev = vpc_co_pwritev,
.bdrv_co_get_block_status = vpc_co_get_block_status,
.bdrv_get_info = vpc_get_info,

View File

@@ -24,13 +24,15 @@
*/
#include "qemu/osdep.h"
#include <dirent.h>
#include "qemu-common.h"
#include "qapi/error.h"
#include "block/block_int.h"
#include "qemu/module.h"
#include "qemu/bswap.h"
#include "migration/migration.h"
#include "qapi/qmp/qint.h"
#include "qapi/qmp/qbool.h"
#include "qapi/qmp/qstring.h"
#include "qemu/cutils.h"
#ifndef S_IWGRP
#define S_IWGRP 0
@@ -1108,6 +1110,8 @@ static int vvfat_open(BlockDriverState *bs, QDict *options, int flags,
goto fail;
}
memcpy(s->volume_label, label, label_length);
} else {
memcpy(s->volume_label, "QEMU VVFAT", 10);
}
if (floppy) {
@@ -1176,6 +1180,7 @@ static int vvfat_open(BlockDriverState *bs, QDict *options, int flags,
bs->read_only = 0;
}
bs->request_alignment = BDRV_SECTOR_SIZE; /* No sub-sector I/O supported */
bs->total_sectors = cyls * heads * secs;
if (init_directories(s, dirname, heads, secs, errp)) {
@@ -1418,14 +1423,31 @@ DLOG(fprintf(stderr, "sector %d not allocated\n", (int)sector_num));
return 0;
}
static coroutine_fn int vvfat_co_read(BlockDriverState *bs, int64_t sector_num,
uint8_t *buf, int nb_sectors)
static int coroutine_fn
vvfat_co_preadv(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
QEMUIOVector *qiov, int flags)
{
int ret;
BDRVVVFATState *s = bs->opaque;
uint64_t sector_num = offset >> BDRV_SECTOR_BITS;
int nb_sectors = bytes >> BDRV_SECTOR_BITS;
void *buf;
assert((offset & (BDRV_SECTOR_SIZE - 1)) == 0);
assert((bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
buf = g_try_malloc(bytes);
if (bytes && buf == NULL) {
return -ENOMEM;
}
qemu_co_mutex_lock(&s->lock);
ret = vvfat_read(bs, sector_num, buf, nb_sectors);
qemu_co_mutex_unlock(&s->lock);
qemu_iovec_from_buf(qiov, 0, buf, bytes);
g_free(buf);
return ret;
}
@@ -2282,12 +2304,17 @@ DLOG(fprintf(stderr, "commit_direntries for %s, parent_mapping_index %d\n", mapp
factor * (old_cluster_count - new_cluster_count));
for (c = first_cluster; !fat_eof(s, c); c = modified_fat_get(s, c)) {
direntry_t *first_direntry;
void* direntry = array_get(&(s->directory), current_dir_index);
int ret = vvfat_read(s->bs, cluster2sector(s, c), direntry,
s->sectors_per_cluster);
if (ret)
return ret;
assert(!strncmp(s->directory.pointer, "QEMU", 4));
/* The first directory entry on the filesystem is the volume name */
first_direntry = (direntry_t*) s->directory.pointer;
assert(!memcmp(first_direntry->name, s->volume_label, 11));
current_dir_index += factor;
}
@@ -2872,14 +2899,31 @@ DLOG(checkpoint());
return 0;
}
static coroutine_fn int vvfat_co_write(BlockDriverState *bs, int64_t sector_num,
const uint8_t *buf, int nb_sectors)
static int coroutine_fn
vvfat_co_pwritev(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
QEMUIOVector *qiov, int flags)
{
int ret;
BDRVVVFATState *s = bs->opaque;
uint64_t sector_num = offset >> BDRV_SECTOR_BITS;
int nb_sectors = bytes >> BDRV_SECTOR_BITS;
void *buf;
assert((offset & (BDRV_SECTOR_SIZE - 1)) == 0);
assert((bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
buf = g_try_malloc(bytes);
if (bytes && buf == NULL) {
return -ENOMEM;
}
qemu_iovec_to_buf(qiov, 0, buf, bytes);
qemu_co_mutex_lock(&s->lock);
ret = vvfat_write(bs, sector_num, buf, nb_sectors);
qemu_co_mutex_unlock(&s->lock);
g_free(buf);
return ret;
}
@@ -2896,8 +2940,10 @@ static int64_t coroutine_fn vvfat_co_get_block_status(BlockDriverState *bs,
return BDRV_BLOCK_DATA;
}
static int write_target_commit(BlockDriverState *bs, int64_t sector_num,
const uint8_t* buffer, int nb_sectors) {
static int coroutine_fn
write_target_commit(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
QEMUIOVector *qiov, int flags)
{
BDRVVVFATState* s = *((BDRVVVFATState**) bs->opaque);
return try_commit(s);
}
@@ -2910,7 +2956,7 @@ static void write_target_close(BlockDriverState *bs) {
static BlockDriver vvfat_write_target = {
.format_name = "vvfat_write_target",
.bdrv_write = write_target_commit,
.bdrv_co_pwritev = write_target_commit,
.bdrv_close = write_target_close,
};
@@ -2952,13 +2998,12 @@ static int enable_write_target(BDRVVVFATState *s, Error **errp)
goto err;
}
s->qcow = NULL;
options = qdict_new();
qdict_put(options, "driver", qstring_from_str("qcow"));
ret = bdrv_open(&s->qcow, s->qcow_filename, NULL, options,
BDRV_O_RDWR | BDRV_O_CACHE_WB | BDRV_O_NO_FLUSH,
errp);
if (ret < 0) {
s->qcow = bdrv_open(s->qcow_filename, NULL, options,
BDRV_O_RDWR | BDRV_O_NO_FLUSH, errp);
if (!s->qcow) {
ret = -EINVAL;
goto err;
}
@@ -3007,8 +3052,8 @@ static BlockDriver bdrv_vvfat = {
.bdrv_file_open = vvfat_open,
.bdrv_close = vvfat_close,
.bdrv_read = vvfat_co_read,
.bdrv_write = vvfat_co_write,
.bdrv_co_preadv = vvfat_co_preadv,
.bdrv_co_pwritev = vvfat_co_pwritev,
.bdrv_co_get_block_status = vvfat_co_get_block_status,
};

View File

@@ -50,6 +50,8 @@
#include "qmp-commands.h"
#include "trace.h"
#include "sysemu/arch_init.h"
#include "qemu/cutils.h"
#include "qemu/help_option.h"
static QTAILQ_HEAD(, BlockDriverState) monitor_bdrv_states =
QTAILQ_HEAD_INITIALIZER(monitor_bdrv_states);
@@ -71,7 +73,7 @@ static int if_max_devs[IF_COUNT] = {
* Do not change these numbers! They govern how drive option
* index maps to unit and bus. That mapping is ABI.
*
* All controllers used to imlement if=T drives need to support
* All controllers used to implement if=T drives need to support
* if_max_devs[T] units, for any T with if_max_devs[T] != 0.
* Otherwise, some index values map to "impossible" bus, unit
* values.
@@ -147,6 +149,7 @@ void blockdev_auto_del(BlockBackend *blk)
DriveInfo *dinfo = blk_legacy_dinfo(blk);
if (dinfo && dinfo->auto_del) {
monitor_remove_blk(blk);
blk_unref(blk);
}
}
@@ -466,6 +469,7 @@ static BlockBackend *blockdev_init(const char *file, QDict *bs_opts,
int bdrv_flags = 0;
int on_read_error, on_write_error;
bool account_invalid, account_failed;
bool writethrough;
BlockBackend *blk;
BlockDriverState *bs;
ThrottleConfig cfg;
@@ -504,6 +508,8 @@ static BlockBackend *blockdev_init(const char *file, QDict *bs_opts,
account_invalid = qemu_opt_get_bool(opts, "stats-account-invalid", true);
account_failed = qemu_opt_get_bool(opts, "stats-account-failed", true);
writethrough = !qemu_opt_get_bool(opts, BDRV_OPT_CACHE_WB, true);
qdict_extract_subqdict(bs_opts, &interval_dict, "stats-intervals.");
qdict_array_split(interval_dict, &interval_list);
@@ -561,25 +567,12 @@ static BlockBackend *blockdev_init(const char *file, QDict *bs_opts,
if ((!file || !*file) && !qdict_size(bs_opts)) {
BlockBackendRootState *blk_rs;
blk = blk_new(qemu_opts_id(opts), errp);
if (!blk) {
goto early_err;
}
blk = blk_new();
blk_rs = blk_get_root_state(blk);
blk_rs->open_flags = bdrv_flags;
blk_rs->read_only = !(bdrv_flags & BDRV_O_RDWR);
blk_rs->detect_zeroes = detect_zeroes;
if (throttle_enabled(&cfg)) {
if (!throttling_group) {
throttling_group = blk_name(blk);
}
blk_rs->throttle_group = g_strdup(throttling_group);
blk_rs->throttle_state = throttle_group_incref(throttling_group);
blk_rs->throttle_state->cfg = cfg;
}
QDECREF(bs_opts);
} else {
if (file && !*file) {
@@ -589,23 +582,15 @@ static BlockBackend *blockdev_init(const char *file, QDict *bs_opts,
/* bdrv_open() defaults to the values in bdrv_flags (for compatibility
* with other callers) rather than what we want as the real defaults.
* Apply the defaults here instead. */
qdict_set_default_str(bs_opts, BDRV_OPT_CACHE_WB, "on");
qdict_set_default_str(bs_opts, BDRV_OPT_CACHE_DIRECT, "off");
qdict_set_default_str(bs_opts, BDRV_OPT_CACHE_NO_FLUSH, "off");
if (snapshot) {
/* always use cache=unsafe with snapshot */
qdict_put(bs_opts, BDRV_OPT_CACHE_WB, qstring_from_str("on"));
qdict_put(bs_opts, BDRV_OPT_CACHE_DIRECT, qstring_from_str("off"));
qdict_put(bs_opts, BDRV_OPT_CACHE_NO_FLUSH, qstring_from_str("on"));
}
assert((bdrv_flags & BDRV_O_CACHE_MASK) == 0);
if (runstate_check(RUN_STATE_INMIGRATE)) {
bdrv_flags |= BDRV_O_INACTIVE;
}
blk = blk_new_open(qemu_opts_id(opts), file, NULL, bs_opts, bdrv_flags,
errp);
blk = blk_new_open(file, NULL, bs_opts, bdrv_flags, errp);
if (!blk) {
goto err_no_bs_opts;
}
@@ -613,15 +598,6 @@ static BlockBackend *blockdev_init(const char *file, QDict *bs_opts,
bs->detect_zeroes = detect_zeroes;
/* disk I/O throttling */
if (throttle_enabled(&cfg)) {
if (!throttling_group) {
throttling_group = blk_name(blk);
}
bdrv_io_limits_enable(bs, throttling_group);
bdrv_set_io_limits(bs, &cfg);
}
if (bdrv_key_required(bs)) {
autostart = 0;
}
@@ -635,8 +611,24 @@ static BlockBackend *blockdev_init(const char *file, QDict *bs_opts,
}
}
/* disk I/O throttling */
if (throttle_enabled(&cfg)) {
if (!throttling_group) {
throttling_group = blk_name(blk);
}
blk_io_limits_enable(blk, throttling_group);
blk_set_io_limits(blk, &cfg);
}
blk_set_enable_write_cache(blk, !writethrough);
blk_set_on_error(blk, on_read_error, on_write_error);
if (!monitor_add_blk(blk, qemu_opts_id(opts), errp)) {
blk_unref(blk);
blk = NULL;
goto err_no_bs_opts;
}
err_no_bs_opts:
qemu_opts_del(opts);
QDECREF(interval_dict);
@@ -661,7 +653,6 @@ static BlockDriverState *bds_tree_init(QDict *bs_opts, Error **errp)
QemuOpts *opts;
Error *local_error = NULL;
BlockdevDetectZeroesOptions detect_zeroes;
int ret;
int bdrv_flags = 0;
opts = qemu_opts_create(&qemu_root_bds_opts, NULL, 1, errp);
@@ -682,13 +673,18 @@ static BlockDriverState *bds_tree_init(QDict *bs_opts, Error **errp)
goto fail;
}
/* bdrv_open() defaults to the values in bdrv_flags (for compatibility
* with other callers) rather than what we want as the real defaults.
* Apply the defaults here instead. */
qdict_set_default_str(bs_opts, BDRV_OPT_CACHE_DIRECT, "off");
qdict_set_default_str(bs_opts, BDRV_OPT_CACHE_NO_FLUSH, "off");
if (runstate_check(RUN_STATE_INMIGRATE)) {
bdrv_flags |= BDRV_O_INACTIVE;
}
bs = NULL;
ret = bdrv_open(&bs, NULL, NULL, bs_opts, bdrv_flags, errp);
if (ret < 0) {
bs = bdrv_open(NULL, NULL, bs_opts, bdrv_flags, errp);
if (!bs) {
goto fail_no_bs_opts;
}
@@ -717,6 +713,13 @@ void blockdev_close_all_bdrv_states(void)
}
}
/* Iterates over the list of monitor-owned BlockDriverStates */
BlockDriverState *bdrv_next_monitor_owned(BlockDriverState *bs)
{
return bs ? QTAILQ_NEXT(bs, monitor_list)
: QTAILQ_FIRST(&monitor_bdrv_states);
}
static void qemu_opt_rename(QemuOpts *opts, const char *from, const char *to,
Error **errp)
{
@@ -879,8 +882,9 @@ DriveInfo *drive_new(QemuOpts *all_opts, BlockInterfaceType block_default_type)
value = qemu_opt_get(all_opts, "cache");
if (value) {
int flags = 0;
bool writethrough;
if (bdrv_parse_cache_flags(value, &flags) != 0) {
if (bdrv_parse_cache_mode(value, &flags, &writethrough) != 0) {
error_report("invalid cache option");
return NULL;
}
@@ -888,7 +892,7 @@ DriveInfo *drive_new(QemuOpts *all_opts, BlockInterfaceType block_default_type)
/* Specific options take precedence */
if (!qemu_opt_get(all_opts, BDRV_OPT_CACHE_WB)) {
qemu_opt_set_bool(all_opts, BDRV_OPT_CACHE_WB,
!!(flags & BDRV_O_CACHE_WB), &error_abort);
!writethrough, &error_abort);
}
if (!qemu_opt_get(all_opts, BDRV_OPT_CACHE_DIRECT)) {
qemu_opt_set_bool(all_opts, BDRV_OPT_CACHE_DIRECT,
@@ -1173,7 +1177,7 @@ void hmp_commit(Monitor *mon, const QDict *qdict)
int ret;
if (!strcmp(device, "all")) {
ret = bdrv_commit_all();
ret = blk_commit_all();
} else {
BlockDriverState *bs;
AioContext *aio_context;
@@ -1202,15 +1206,11 @@ void hmp_commit(Monitor *mon, const QDict *qdict)
}
}
static void blockdev_do_action(TransactionActionKind type, void *data,
Error **errp)
static void blockdev_do_action(TransactionAction *action, Error **errp)
{
TransactionAction action;
TransactionActionList list;
action.type = type;
action.u.data = data;
list.value = &action;
list.value = action;
list.next = NULL;
qmp_transaction(&list, false, NULL, errp);
}
@@ -1236,8 +1236,11 @@ void qmp_blockdev_snapshot_sync(bool has_device, const char *device,
.has_mode = has_mode,
.mode = mode,
};
blockdev_do_action(TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT_SYNC,
&snapshot, errp);
TransactionAction action = {
.type = TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT_SYNC,
.u.blockdev_snapshot_sync.data = &snapshot,
};
blockdev_do_action(&action, errp);
}
void qmp_blockdev_snapshot(const char *node, const char *overlay,
@@ -1247,9 +1250,11 @@ void qmp_blockdev_snapshot(const char *node, const char *overlay,
.node = (char *) node,
.overlay = (char *) overlay
};
blockdev_do_action(TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT,
&snapshot_data, errp);
TransactionAction action = {
.type = TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT,
.u.blockdev_snapshot.data = &snapshot_data,
};
blockdev_do_action(&action, errp);
}
void qmp_blockdev_snapshot_internal_sync(const char *device,
@@ -1260,9 +1265,11 @@ void qmp_blockdev_snapshot_internal_sync(const char *device,
.device = (char *) device,
.name = (char *) name
};
blockdev_do_action(TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT_INTERNAL_SYNC,
&snapshot, errp);
TransactionAction action = {
.type = TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT_INTERNAL_SYNC,
.u.blockdev_snapshot_internal_sync.data = &snapshot,
};
blockdev_do_action(&action, errp);
}
SnapshotInfo *qmp_blockdev_snapshot_delete_internal_sync(const char *device,
@@ -1499,7 +1506,7 @@ static void internal_snapshot_prepare(BlkActionState *common,
g_assert(common->action->type ==
TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT_INTERNAL_SYNC);
internal = common->action->u.blockdev_snapshot_internal_sync;
internal = common->action->u.blockdev_snapshot_internal_sync.data;
state = DO_UPCAST(InternalSnapshotState, common, common);
/* 1. parse input */
@@ -1630,7 +1637,7 @@ typedef struct ExternalSnapshotState {
static void external_snapshot_prepare(BlkActionState *common,
Error **errp)
{
int flags = 0, ret;
int flags = 0;
QDict *options = NULL;
Error *local_err = NULL;
/* Device and node name of the image to generate the snapshot from */
@@ -1649,7 +1656,7 @@ static void external_snapshot_prepare(BlkActionState *common,
switch (action->type) {
case TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT:
{
BlockdevSnapshot *s = action->u.blockdev_snapshot;
BlockdevSnapshot *s = action->u.blockdev_snapshot.data;
device = s->node;
node_name = s->node;
new_image_file = NULL;
@@ -1658,7 +1665,7 @@ static void external_snapshot_prepare(BlkActionState *common,
break;
case TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT_SYNC:
{
BlockdevSnapshotSync *s = action->u.blockdev_snapshot_sync;
BlockdevSnapshotSync *s = action->u.blockdev_snapshot_sync.data;
device = s->has_device ? s->device : NULL;
node_name = s->has_node_name ? s->node_name : NULL;
new_image_file = s->snapshot_file;
@@ -1707,7 +1714,7 @@ static void external_snapshot_prepare(BlkActionState *common,
}
if (action->type == TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT_SYNC) {
BlockdevSnapshotSync *s = action->u.blockdev_snapshot_sync;
BlockdevSnapshotSync *s = action->u.blockdev_snapshot_sync.data;
const char *format = s->has_format ? s->format : "qcow2";
enum NewImageMode mode;
const char *snapshot_node_name =
@@ -1725,14 +1732,20 @@ static void external_snapshot_prepare(BlkActionState *common,
}
flags = state->old_bs->open_flags;
flags &= ~(BDRV_O_SNAPSHOT | BDRV_O_NO_BACKING | BDRV_O_COPY_ON_READ);
/* create new image w/backing file */
mode = s->has_mode ? s->mode : NEW_IMAGE_MODE_ABSOLUTE_PATHS;
if (mode != NEW_IMAGE_MODE_EXISTING) {
int64_t size = bdrv_getlength(state->old_bs);
if (size < 0) {
error_setg_errno(errp, -size, "bdrv_getlength failed");
return;
}
bdrv_img_create(new_image_file, format,
state->old_bs->filename,
state->old_bs->drv->format_name,
NULL, -1, flags, &local_err, false);
NULL, size, flags, &local_err, false);
if (local_err) {
error_propagate(errp, local_err);
return;
@@ -1749,17 +1762,16 @@ static void external_snapshot_prepare(BlkActionState *common,
flags |= BDRV_O_NO_BACKING;
}
assert(state->new_bs == NULL);
ret = bdrv_open(&state->new_bs, new_image_file, snapshot_ref, options,
flags, errp);
state->new_bs = bdrv_open(new_image_file, snapshot_ref, options, flags,
errp);
/* We will manually add the backing_hd field to the bs later */
if (ret != 0) {
if (!state->new_bs) {
return;
}
if (state->new_bs->blk != NULL) {
if (bdrv_has_blk(state->new_bs)) {
error_setg(errp, "The snapshot is already in use by %s",
blk_name(state->new_bs->blk));
bdrv_get_parent_name(state->new_bs));
return;
}
@@ -1790,8 +1802,10 @@ static void external_snapshot_commit(BlkActionState *common)
/* We don't need (or want) to use the transactional
* bdrv_reopen_multiple() across all the entries at once, because we
* don't want to abort all of them if one of them fails the reopen */
bdrv_reopen(state->old_bs, state->old_bs->open_flags & ~BDRV_O_RDWR,
NULL);
if (!state->old_bs->copy_on_read) {
bdrv_reopen(state->old_bs, state->old_bs->open_flags & ~BDRV_O_RDWR,
NULL);
}
}
static void external_snapshot_abort(BlkActionState *common)
@@ -1840,7 +1854,7 @@ static void drive_backup_prepare(BlkActionState *common, Error **errp)
Error *local_err = NULL;
assert(common->action->type == TRANSACTION_ACTION_KIND_DRIVE_BACKUP);
backup = common->action->u.drive_backup;
backup = common->action->u.drive_backup.data;
blk = blk_by_name(backup->device);
if (!blk) {
@@ -1922,7 +1936,7 @@ static void blockdev_backup_prepare(BlkActionState *common, Error **errp)
Error *local_err = NULL;
assert(common->action->type == TRANSACTION_ACTION_KIND_BLOCKDEV_BACKUP);
backup = common->action->u.blockdev_backup;
backup = common->action->u.blockdev_backup.data;
blk = blk_by_name(backup->device);
if (!blk) {
@@ -2008,7 +2022,7 @@ static void block_dirty_bitmap_add_prepare(BlkActionState *common,
return;
}
action = common->action->u.block_dirty_bitmap_add;
action = common->action->u.block_dirty_bitmap_add.data;
/* AIO context taken and released within qmp_block_dirty_bitmap_add */
qmp_block_dirty_bitmap_add(action->node, action->name,
action->has_granularity, action->granularity,
@@ -2027,7 +2041,7 @@ static void block_dirty_bitmap_add_abort(BlkActionState *common)
BlockDirtyBitmapState *state = DO_UPCAST(BlockDirtyBitmapState,
common, common);
action = common->action->u.block_dirty_bitmap_add;
action = common->action->u.block_dirty_bitmap_add.data;
/* Should not be able to fail: IF the bitmap was added via .prepare(),
* then the node reference and bitmap name must have been valid.
*/
@@ -2047,7 +2061,7 @@ static void block_dirty_bitmap_clear_prepare(BlkActionState *common,
return;
}
action = common->action->u.block_dirty_bitmap_clear;
action = common->action->u.block_dirty_bitmap_clear.data;
state->bitmap = block_dirty_bitmap_lookup(action->node,
action->name,
&state->bs,
@@ -2260,16 +2274,29 @@ exit:
block_job_txn_unref(block_job_txn);
}
static int do_open_tray(const char *device, bool force, Error **errp);
void qmp_eject(const char *device, bool has_force, bool force, Error **errp)
{
Error *local_err = NULL;
int rc;
qmp_blockdev_open_tray(device, has_force, force, &local_err);
if (!has_force) {
force = false;
}
rc = do_open_tray(device, force, &local_err);
if (local_err) {
error_propagate(errp, local_err);
return;
}
if (rc == EINPROGRESS) {
error_setg(errp, "Device '%s' is locked and force was not specified, "
"wait for tray to open and try again", device);
return;
}
qmp_x_blockdev_remove_medium(device, errp);
}
@@ -2297,35 +2324,36 @@ void qmp_block_passwd(bool has_device, const char *device,
aio_context_release(aio_context);
}
void qmp_blockdev_open_tray(const char *device, bool has_force, bool force,
Error **errp)
/**
* returns -errno on fatal error, +errno for non-fatal situations.
* errp will always be set when the return code is negative.
* May return +ENOSYS if the device has no tray,
* or +EINPROGRESS if the tray is locked and the guest has been notified.
*/
static int do_open_tray(const char *device, bool force, Error **errp)
{
BlockBackend *blk;
bool locked;
if (!has_force) {
force = false;
}
blk = blk_by_name(device);
if (!blk) {
error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
"Device '%s' not found", device);
return;
return -ENODEV;
}
if (!blk_dev_has_removable_media(blk)) {
error_setg(errp, "Device '%s' is not removable", device);
return;
return -ENOTSUP;
}
if (!blk_dev_has_tray(blk)) {
/* Ignore this command on tray-less devices */
return;
return ENOSYS;
}
if (blk_dev_is_tray_open(blk)) {
return;
return 0;
}
locked = blk_dev_is_medium_locked(blk);
@@ -2336,6 +2364,21 @@ void qmp_blockdev_open_tray(const char *device, bool has_force, bool force,
if (!locked || force) {
blk_dev_change_media_cb(blk, false);
}
if (locked && !force) {
return EINPROGRESS;
}
return 0;
}
void qmp_blockdev_open_tray(const char *device, bool has_force, bool force,
Error **errp)
{
if (!has_force) {
force = false;
}
do_open_tray(device, force, errp);
}
void qmp_blockdev_close_tray(const char *device, Error **errp)
@@ -2405,11 +2448,6 @@ void qmp_x_blockdev_remove_medium(const char *device, Error **errp)
goto out;
}
/* This follows the convention established by bdrv_make_anon() */
if (bs->device_list.tqe_prev) {
bdrv_device_remove(bs);
}
blk_remove_bs(blk);
if (!blk_dev_has_tray(blk)) {
@@ -2457,8 +2495,6 @@ static void qmp_blockdev_insert_anon_medium(const char *device,
blk_insert_bs(blk, bs);
QTAILQ_INSERT_TAIL(&bdrv_states, bs, device_list);
if (!blk_dev_has_tray(blk)) {
/* For tray-less devices, blockdev-close-tray is a no-op (or may not be
* called at all); therefore, the medium needs to be pushed into the
@@ -2480,9 +2516,9 @@ void qmp_x_blockdev_insert_medium(const char *device, const char *node_name,
return;
}
if (bs->blk) {
if (bdrv_has_blk(bs)) {
error_setg(errp, "Node '%s' is already in use by '%s'", node_name,
blk_name(bs->blk));
bdrv_get_parent_name(bs));
return;
}
@@ -2497,7 +2533,7 @@ void qmp_blockdev_change_medium(const char *device, const char *filename,
{
BlockBackend *blk;
BlockDriverState *medium_bs = NULL;
int bdrv_flags, ret;
int bdrv_flags;
QDict *options = NULL;
Error *err = NULL;
@@ -2541,14 +2577,11 @@ void qmp_blockdev_change_medium(const char *device, const char *filename,
qdict_put(options, "driver", qstring_from_str(format));
}
assert(!medium_bs);
ret = bdrv_open(&medium_bs, filename, NULL, options, bdrv_flags, errp);
if (ret < 0) {
medium_bs = bdrv_open(filename, NULL, options, bdrv_flags, errp);
if (!medium_bs) {
goto fail;
}
blk_apply_root_state(blk, medium_bs);
bdrv_add_key(medium_bs, NULL, &err);
if (err) {
error_propagate(errp, err);
@@ -2573,6 +2606,8 @@ void qmp_blockdev_change_medium(const char *device, const char *filename,
goto fail;
}
blk_apply_root_state(blk, medium_bs);
qmp_blockdev_close_tray(device, errp);
fail:
@@ -2696,16 +2731,16 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd,
if (throttle_enabled(&cfg)) {
/* Enable I/O limits if they're not enabled yet, otherwise
* just update the throttling group. */
if (!bs->throttle_state) {
bdrv_io_limits_enable(bs, has_group ? group : device);
if (!blk_get_public(blk)->throttle_state) {
blk_io_limits_enable(blk, has_group ? group : device);
} else if (has_group) {
bdrv_io_limits_update_group(bs, group);
blk_io_limits_update_group(blk, group);
}
/* Set the new throttling configuration */
bdrv_set_io_limits(bs, &cfg);
} else if (bs->throttle_state) {
blk_set_io_limits(blk, &cfg);
} else if (blk_get_public(blk)->throttle_state) {
/* If all throttling settings are set to 0, disable I/O limits */
bdrv_io_limits_disable(bs);
blk_io_limits_disable(blk);
}
out:
@@ -2816,6 +2851,15 @@ void hmp_drive_del(Monitor *mon, const QDict *qdict)
AioContext *aio_context;
Error *local_err = NULL;
bs = bdrv_find_node(id);
if (bs) {
qmp_x_blockdev_del(false, NULL, true, id, &local_err);
if (local_err) {
error_report_err(local_err);
}
return;
}
blk = blk_by_name(id);
if (!blk) {
error_report("Device '%s' not found", id);
@@ -2842,13 +2886,13 @@ void hmp_drive_del(Monitor *mon, const QDict *qdict)
blk_remove_bs(blk);
}
/* if we have a device attached to this BlockDriverState
* then we need to make the drive anonymous until the device
* can be removed. If this is a drive with no device backing
* then we can just get rid of the block driver state right here.
/* Make the BlockBackend and the attached BlockDriverState anonymous */
monitor_remove_blk(blk);
/* If this BlockBackend has a device attached to it, its refcount will be
* decremented when the device is removed; otherwise we have to do so here.
*/
if (blk_get_attached_dev(blk)) {
blk_hide_on_behalf_of_hmp_drive_del(blk);
/* Further I/O must not pause the guest */
blk_set_on_error(blk, BLOCKDEV_ON_ERROR_REPORT,
BLOCKDEV_ON_ERROR_REPORT);
@@ -3147,7 +3191,6 @@ static void do_drive_backup(const char *device, const char *target,
Error *local_err = NULL;
int flags;
int64_t size;
int ret;
if (!has_speed) {
speed = 0;
@@ -3231,10 +3274,8 @@ static void do_drive_backup(const char *device, const char *target,
qdict_put(options, "driver", qstring_from_str(format));
}
target_bs = NULL;
ret = bdrv_open(&target_bs, target, NULL, options, flags, &local_err);
if (ret < 0) {
error_propagate(errp, local_err);
target_bs = bdrv_open(target, NULL, options, flags, errp);
if (!target_bs) {
goto out;
}
@@ -3252,8 +3293,8 @@ static void do_drive_backup(const char *device, const char *target,
backup_start(bs, target_bs, speed, sync, bmap,
on_source_error, on_target_error,
block_job_cb, bs, txn, &local_err);
bdrv_unref(target_bs);
if (local_err != NULL) {
bdrv_unref(target_bs);
error_propagate(errp, local_err);
goto out;
}
@@ -3337,12 +3378,10 @@ void do_blockdev_backup(const char *device, const char *target,
}
target_bs = blk_bs(target_blk);
bdrv_ref(target_bs);
bdrv_set_aio_context(target_bs, aio_context);
backup_start(bs, target_bs, speed, sync, NULL, on_source_error,
on_target_error, block_job_cb, bs, txn, &local_err);
if (local_err != NULL) {
bdrv_unref(target_bs);
error_propagate(errp, local_err);
}
out:
@@ -3418,10 +3457,6 @@ static void blockdev_mirror_common(BlockDriverState *bs,
if (bdrv_op_is_blocked(target, BLOCK_OP_TYPE_MIRROR_TARGET, errp)) {
return;
}
if (target->blk) {
error_setg(errp, "Cannot mirror to an attached block device");
return;
}
if (!bs->backing && sync == MIRROR_SYNC_MODE_TOP) {
sync = MIRROR_SYNC_MODE_FULL;
@@ -3459,7 +3494,6 @@ void qmp_drive_mirror(const char *device, const char *target,
QDict *options = NULL;
int flags;
int64_t size;
int ret;
blk = blk_by_name(device);
if (!blk) {
@@ -3568,11 +3602,9 @@ void qmp_drive_mirror(const char *device, const char *target,
/* Mirroring takes care of copy-on-write using the source's backing
* file.
*/
target_bs = NULL;
ret = bdrv_open(&target_bs, target, NULL, options,
flags | BDRV_O_NO_BACKING, &local_err);
if (ret < 0) {
error_propagate(errp, local_err);
target_bs = bdrv_open(target, NULL, options, flags | BDRV_O_NO_BACKING,
errp);
if (!target_bs) {
goto out;
}
@@ -3587,9 +3619,9 @@ void qmp_drive_mirror(const char *device, const char *target,
has_on_target_error, on_target_error,
has_unmap, unmap,
&local_err);
bdrv_unref(target_bs);
if (local_err) {
error_propagate(errp, local_err);
bdrv_unref(target_bs);
}
out:
aio_context_release(aio_context);
@@ -3633,7 +3665,6 @@ void qmp_blockdev_mirror(const char *device, const char *target,
aio_context = bdrv_get_aio_context(bs);
aio_context_acquire(aio_context);
bdrv_ref(target_bs);
bdrv_set_aio_context(target_bs, aio_context);
blockdev_mirror_common(bs, target_bs,
@@ -3647,7 +3678,6 @@ void qmp_blockdev_mirror(const char *device, const char *target,
&local_err);
if (local_err) {
error_propagate(errp, local_err);
bdrv_unref(target_bs);
}
aio_context_release(aio_context);
@@ -3867,6 +3897,37 @@ out:
aio_context_release(aio_context);
}
void hmp_drive_add_node(Monitor *mon, const char *optstr)
{
QemuOpts *opts;
QDict *qdict;
Error *local_err = NULL;
opts = qemu_opts_parse_noisily(&qemu_drive_opts, optstr, false);
if (!opts) {
return;
}
qdict = qemu_opts_to_qdict(opts, NULL);
if (!qdict_get_try_str(qdict, "node-name")) {
QDECREF(qdict);
error_report("'node-name' needs to be specified");
goto out;
}
BlockDriverState *bs = bds_tree_init(qdict, &local_err);
if (!bs) {
error_report_err(local_err);
goto out;
}
QTAILQ_INSERT_TAIL(&monitor_bdrv_states, bs, monitor_list);
out:
qemu_opts_del(opts);
}
void qmp_blockdev_add(BlockdevOptions *options, Error **errp)
{
QmpOutputVisitor *ov = qmp_output_visitor_new();
@@ -3928,6 +3989,7 @@ void qmp_blockdev_add(BlockdevOptions *options, Error **errp)
if (bs && bdrv_key_required(bs)) {
if (blk) {
monitor_remove_blk(blk);
blk_unref(blk);
} else {
QTAILQ_REMOVE(&monitor_bdrv_states, bs, monitor_list);
@@ -3957,11 +4019,17 @@ void qmp_x_blockdev_del(bool has_id, const char *id,
}
if (has_id) {
/* blk_by_name() never returns a BB that is not owned by the monitor */
blk = blk_by_name(id);
if (!blk) {
error_setg(errp, "Cannot find block backend %s", id);
return;
}
if (blk_legacy_dinfo(blk)) {
error_setg(errp, "Deleting block backend added with drive-add"
" is not supported");
return;
}
if (blk_get_refcnt(blk) > 1) {
error_setg(errp, "Block backend %s is in use", id);
return;
@@ -3969,15 +4037,15 @@ void qmp_x_blockdev_del(bool has_id, const char *id,
bs = blk_bs(blk);
aio_context = blk_get_aio_context(blk);
} else {
blk = NULL;
bs = bdrv_find_node(node_name);
if (!bs) {
error_setg(errp, "Cannot find node %s", node_name);
return;
}
blk = bs->blk;
if (blk) {
if (bdrv_has_blk(bs)) {
error_setg(errp, "Node %s is in use by %s",
node_name, blk_name(blk));
node_name, bdrv_get_parent_name(bs));
return;
}
aio_context = bdrv_get_aio_context(bs);
@@ -4004,6 +4072,7 @@ void qmp_x_blockdev_del(bool has_id, const char *id,
}
if (blk) {
monitor_remove_blk(blk);
blk_unref(blk);
} else {
QTAILQ_REMOVE(&monitor_bdrv_states, bs, monitor_list);
@@ -4014,12 +4083,68 @@ out:
aio_context_release(aio_context);
}
static BdrvChild *bdrv_find_child(BlockDriverState *parent_bs,
const char *child_name)
{
BdrvChild *child;
QLIST_FOREACH(child, &parent_bs->children, next) {
if (strcmp(child->name, child_name) == 0) {
return child;
}
}
return NULL;
}
void qmp_x_blockdev_change(const char *parent, bool has_child,
const char *child, bool has_node,
const char *node, Error **errp)
{
BlockDriverState *parent_bs, *new_bs = NULL;
BdrvChild *p_child;
parent_bs = bdrv_lookup_bs(parent, parent, errp);
if (!parent_bs) {
return;
}
if (has_child == has_node) {
if (has_child) {
error_setg(errp, "The parameters child and node are in conflict");
} else {
error_setg(errp, "Either child or node must be specified");
}
return;
}
if (has_child) {
p_child = bdrv_find_child(parent_bs, child);
if (!p_child) {
error_setg(errp, "Node '%s' does not have child '%s'",
parent, child);
return;
}
bdrv_del_child(parent_bs, p_child, errp);
}
if (has_node) {
new_bs = bdrv_find_node(node);
if (!new_bs) {
error_setg(errp, "Node '%s' not found", node);
return;
}
bdrv_add_child(parent_bs, new_bs, errp);
}
}
BlockJobInfoList *qmp_query_block_jobs(Error **errp)
{
BlockJobInfoList *head = NULL, **p_next = &head;
BlockDriverState *bs;
BdrvNextIterator it;
for (bs = bdrv_next(NULL); bs; bs = bdrv_next(bs)) {
for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
AioContext *aio_context = bdrv_get_aio_context(bs);
aio_context_acquire(aio_context);
@@ -4053,6 +4178,10 @@ QemuOptsList qemu_common_drive_opts = {
.name = "aio",
.type = QEMU_OPT_STRING,
.help = "host AIO implementation (threads, native)",
},{
.name = BDRV_OPT_CACHE_WB,
.type = QEMU_OPT_BOOL,
.help = "Enable writeback mode",
},{
.name = "format",
.type = QEMU_OPT_STRING,
@@ -4174,7 +4303,7 @@ QemuOptsList qemu_common_drive_opts = {
static QemuOptsList qemu_root_bds_opts = {
.name = "root-bds",
.head = QTAILQ_HEAD_INITIALIZER(qemu_common_drive_opts.head),
.head = QTAILQ_HEAD_INITIALIZER(qemu_root_bds_opts.head),
.desc = {
{
.name = "discard",

View File

@@ -50,17 +50,31 @@ struct BlockJobTxn {
int refcnt;
};
static QLIST_HEAD(, BlockJob) block_jobs = QLIST_HEAD_INITIALIZER(block_jobs);
BlockJob *block_job_next(BlockJob *job)
{
if (!job) {
return QLIST_FIRST(&block_jobs);
}
return QLIST_NEXT(job, job_list);
}
void *block_job_create(const BlockJobDriver *driver, BlockDriverState *bs,
int64_t speed, BlockCompletionFunc *cb,
void *opaque, Error **errp)
{
BlockBackend *blk;
BlockJob *job;
if (bs->job) {
error_setg(errp, QERR_DEVICE_IN_USE, bdrv_get_device_name(bs));
return NULL;
}
bdrv_ref(bs);
blk = blk_new();
blk_insert_bs(blk, bs);
job = g_malloc0(driver->instance_size);
error_setg(&job->blocker, "block device is in use by block job: %s",
BlockJobType_lookup[driver->job_type]);
@@ -69,13 +83,15 @@ void *block_job_create(const BlockJobDriver *driver, BlockDriverState *bs,
job->driver = driver;
job->id = g_strdup(bdrv_get_device_name(bs));
job->bs = bs;
job->blk = blk;
job->cb = cb;
job->opaque = opaque;
job->busy = true;
job->refcnt = 1;
bs->job = job;
QLIST_INSERT_HEAD(&block_jobs, job, job_list);
/* Only set speed when necessary to avoid NotSupported error */
if (speed != 0) {
Error *local_err = NULL;
@@ -98,11 +114,13 @@ void block_job_ref(BlockJob *job)
void block_job_unref(BlockJob *job)
{
if (--job->refcnt == 0) {
job->bs->job = NULL;
bdrv_op_unblock_all(job->bs, job->blocker);
bdrv_unref(job->bs);
BlockDriverState *bs = blk_bs(job->blk);
bs->job = NULL;
bdrv_op_unblock_all(bs, job->blocker);
blk_unref(job->blk);
error_free(job->blocker);
g_free(job->id);
QLIST_REMOVE(job, job_list);
g_free(job);
}
}
@@ -140,7 +158,7 @@ static void block_job_completed_txn_abort(BlockJob *job)
txn->aborting = true;
/* We are the first failed job. Cancel other jobs. */
QLIST_FOREACH(other_job, &txn->jobs, txn_list) {
ctx = bdrv_get_aio_context(other_job->bs);
ctx = blk_get_aio_context(other_job->blk);
aio_context_acquire(ctx);
}
QLIST_FOREACH(other_job, &txn->jobs, txn_list) {
@@ -157,7 +175,7 @@ static void block_job_completed_txn_abort(BlockJob *job)
assert(other_job->completed);
}
QLIST_FOREACH_SAFE(other_job, &txn->jobs, txn_list, next) {
ctx = bdrv_get_aio_context(other_job->bs);
ctx = blk_get_aio_context(other_job->blk);
block_job_completed_single(other_job);
aio_context_release(ctx);
}
@@ -179,7 +197,7 @@ static void block_job_completed_txn_success(BlockJob *job)
}
/* We are the last completed job, commit the transaction. */
QLIST_FOREACH_SAFE(other_job, &txn->jobs, txn_list, next) {
ctx = bdrv_get_aio_context(other_job->bs);
ctx = blk_get_aio_context(other_job->blk);
aio_context_acquire(ctx);
assert(other_job->ret == 0);
block_job_completed_single(other_job);
@@ -189,9 +207,7 @@ static void block_job_completed_txn_success(BlockJob *job)
void block_job_completed(BlockJob *job, int ret)
{
BlockDriverState *bs = job->bs;
assert(bs->job == job);
assert(blk_bs(job->blk)->job == job);
assert(!job->completed);
job->completed = true;
job->ret = ret;
@@ -282,11 +298,10 @@ static int block_job_finish_sync(BlockJob *job,
void (*finish)(BlockJob *, Error **errp),
Error **errp)
{
BlockDriverState *bs = job->bs;
Error *local_err = NULL;
int ret;
assert(bs->job == job);
assert(blk_bs(job->blk)->job == job);
block_job_ref(job);
finish(job, &local_err);
@@ -297,7 +312,7 @@ static int block_job_finish_sync(BlockJob *job,
}
while (!job->completed) {
aio_poll(job->deferred_to_main_loop ? qemu_get_aio_context() :
bdrv_get_aio_context(bs),
blk_get_aio_context(job->blk),
true);
}
ret = (job->cancelled && job->ret == 0) ? -ECANCELED : job->ret;
@@ -318,6 +333,19 @@ int block_job_cancel_sync(BlockJob *job)
return block_job_finish_sync(job, &block_job_cancel_err, NULL);
}
void block_job_cancel_sync_all(void)
{
BlockJob *job;
AioContext *aio_context;
while ((job = QLIST_FIRST(&block_jobs))) {
aio_context = blk_get_aio_context(job->blk);
aio_context_acquire(aio_context);
block_job_cancel_sync(job);
aio_context_release(aio_context);
}
}
int block_job_complete_sync(BlockJob *job, Error **errp)
{
return block_job_finish_sync(job, &block_job_complete, errp);
@@ -336,7 +364,7 @@ void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns)
if (block_job_is_paused(job)) {
qemu_coroutine_yield();
} else {
co_aio_sleep_ns(bdrv_get_aio_context(job->bs), type, ns);
co_aio_sleep_ns(blk_get_aio_context(job->blk), type, ns);
}
job->busy = true;
}
@@ -411,8 +439,7 @@ void block_job_event_ready(BlockJob *job)
job->speed, &error_abort);
}
BlockErrorAction block_job_error_action(BlockJob *job, BlockDriverState *bs,
BlockdevOnError on_err,
BlockErrorAction block_job_error_action(BlockJob *job, BlockdevOnError on_err,
int is_read, int error)
{
BlockErrorAction action;
@@ -443,9 +470,6 @@ BlockErrorAction block_job_error_action(BlockJob *job, BlockDriverState *bs,
job->user_paused = true;
block_job_pause(job);
block_job_iostatus_set_err(job, error);
if (bs->blk && bs != job->bs) {
blk_iostatus_set_err(bs->blk, error);
}
}
return action;
}
@@ -469,7 +493,7 @@ static void block_job_defer_to_main_loop_bh(void *opaque)
aio_context_acquire(data->aio_context);
/* Fetch BDS AioContext again, in case it has changed */
aio_context = bdrv_get_aio_context(data->job->bs);
aio_context = blk_get_aio_context(data->job->blk);
aio_context_acquire(aio_context);
data->job->deferred_to_main_loop = false;
@@ -489,7 +513,7 @@ void block_job_defer_to_main_loop(BlockJob *job,
BlockJobDeferToMainLoopData *data = g_malloc(sizeof(*data));
data->job = job;
data->bh = qemu_bh_new(block_job_defer_to_main_loop_bh, data);
data->aio_context = bdrv_get_aio_context(job->bs);
data->aio_context = blk_get_aio_context(job->blk);
data->fn = fn;
data->opaque = opaque;
job->deferred_to_main_loop = true;

View File

@@ -23,10 +23,12 @@
*/
#include "qemu/osdep.h"
#include "qapi/error.h"
#include "sysemu/sysemu.h"
#include "qapi/visitor.h"
#include "qemu/error-report.h"
#include "hw/hw.h"
#include "hw/qdev-core.h"
typedef struct FWBootEntry FWBootEntry;

View File

@@ -5,6 +5,7 @@
#include "qemu.h"
#include "disas/disas.h"
#include "qemu/path.h"
#ifdef _ARCH_PPC64
#undef ARCH_DLINFO

View File

@@ -21,9 +21,11 @@
#include <sys/mman.h>
#include "qemu.h"
#include "qemu-common.h"
#include "qemu/path.h"
#include "qemu/help_option.h"
/* For tb_lock */
#include "cpu.h"
#include "exec/exec-all.h"
#include "tcg.h"
#include "qemu/timer.h"
#include "qemu/envlist.h"
@@ -751,9 +753,6 @@ int main(int argc, char **argv)
}
cpu_model = NULL;
#if defined(cpudef_setup)
cpudef_setup(); /* parse cpu definitions in target config file (TBD) */
#endif
optind = 1;
for(;;) {
@@ -848,6 +847,7 @@ int main(int argc, char **argv)
}
/* init debug */
qemu_log_needs_buffers();
qemu_set_log_filename(log_file);
if (log_mask) {
int mask;

View File

@@ -19,6 +19,7 @@
#include "cpu.h"
#include "exec/exec-all.h"
#include "exec/cpu_ldst.h"
#undef DEBUG_REMAP

View File

@@ -17,6 +17,8 @@
* along with this program; if not, see <http://www.gnu.org/licenses/>.
*/
#include "qemu/osdep.h"
#include "qemu/cutils.h"
#include "qemu/path.h"
#include <sys/mman.h>
#include <sys/syscall.h>
#include <sys/param.h>

View File

@@ -1,5 +1,6 @@
/* User memory access */
#include "qemu/osdep.h"
#include "qemu/cutils.h"
#include "qemu.h"
@@ -50,7 +51,7 @@ abi_long target_strlen(abi_ulong guest_addr1)
ptr = lock_user(VERIFY_READ, guest_addr, max_len, 1);
if (!ptr)
return -TARGET_EFAULT;
len = qemu_strnlen((char *)ptr, max_len);
len = qemu_strnlen((const char *)ptr, max_len);
unlock_user(ptr, guest_addr, 0);
guest_addr += len;
/* we don't allow wrapping or integer overflow */

177
configure vendored
View File

@@ -207,7 +207,7 @@ fdt=""
netmap="no"
pixman=""
sdl=""
sdlabi="1.2"
sdlabi=""
virtfs=""
vnc="yes"
sparse="no"
@@ -280,6 +280,7 @@ libusb=""
usb_redir=""
opengl=""
opengl_dmabuf="no"
avx2_opt="no"
zlib="yes"
lzo=""
snappy=""
@@ -297,6 +298,7 @@ coroutine=""
coroutine_pool=""
seccomp=""
glusterfs=""
glusterfs_xlator_opt="no"
glusterfs_discard="no"
glusterfs_zerofill="no"
archipelago="no"
@@ -305,8 +307,11 @@ gtkabi=""
gtk_gl="no"
gnutls=""
gnutls_hash=""
gnutls_rnd=""
nettle=""
nettle_kdf="no"
gcrypt=""
gcrypt_kdf="no"
vte=""
virglrenderer=""
tpm="yes"
@@ -1773,6 +1778,21 @@ EOF
fi
##########################################
# avx2 optimization requirement check
cat > $TMPC << EOF
static void bar(void) {}
static void *bar_ifunc(void) {return (void*) bar;}
static void foo(void) __attribute__((ifunc("bar_ifunc")));
int main(void) { foo(); return 0; }
EOF
if compile_prog "-mavx2" "" ; then
if readelf --syms $TMPE |grep "IFUNC.*foo" >/dev/null 2>&1; then
avx2_opt="yes"
fi
fi
#########################################
# zlib check
if test "$zlib" != "no" ; then
@@ -1853,6 +1873,9 @@ if test "$seccomp" != "no" ; then
i386|x86_64)
libseccomp_minver="2.1.0"
;;
mips)
libseccomp_minver="2.2.0"
;;
arm|aarch64)
libseccomp_minver="2.2.3"
;;
@@ -2134,6 +2157,7 @@ if test "$gtk" != "no"; then
if $pkg_config --exists "$gtkpackage >= $gtkversion"; then
gtk_cflags=`$pkg_config --cflags $gtkpackage`
gtk_libs=`$pkg_config --libs $gtkpackage`
gtk_version=`$pkg_config --modversion $gtkpackage`
if $pkg_config --exists "$gtkx11package >= $gtkversion"; then
gtk_cflags="$gtk_cflags $x11_cflags"
gtk_libs="$gtk_libs $x11_libs"
@@ -2185,6 +2209,13 @@ if test "$gnutls" != "no"; then
gnutls_hash="no"
fi
# gnutls_rnd requires >= 2.11.0
if $pkg_config --exists "gnutls >= 2.11.0"; then
gnutls_rnd="yes"
else
gnutls_rnd="no"
fi
if $pkg_config --exists 'gnutls >= 3.0'; then
gnutls_gcrypt=no
gnutls_nettle=yes
@@ -2212,9 +2243,11 @@ if test "$gnutls" != "no"; then
else
gnutls="no"
gnutls_hash="no"
gnutls_rnd="no"
fi
else
gnutls_hash="no"
gnutls_rnd="no"
fi
@@ -2276,6 +2309,19 @@ if test "$gcrypt" != "no"; then
if test -z "$nettle"; then
nettle="no"
fi
cat > $TMPC << EOF
#include <gcrypt.h>
int main(void) {
gcry_kdf_derive(NULL, 0, GCRY_KDF_PBKDF2,
GCRY_MD_SHA256,
NULL, 0, 0, 0, NULL);
return 0;
}
EOF
if compile_prog "$gcrypt_cflags" "$gcrypt_libs" ; then
gcrypt_kdf=yes
fi
else
if test "$gcrypt" = "yes"; then
feature_not_found "gcrypt" "Install gcrypt devel"
@@ -2295,6 +2341,17 @@ if test "$nettle" != "no"; then
libs_tools="$nettle_libs $libs_tools"
QEMU_CFLAGS="$QEMU_CFLAGS $nettle_cflags"
nettle="yes"
cat > $TMPC << EOF
#include <nettle/pbkdf2.h>
int main(void) {
pbkdf2_hmac_sha256(8, NULL, 1000, 8, NULL, 8, NULL);
return 0;
}
EOF
if compile_prog "$nettle_cflags" "$nettle_libs" ; then
nettle_kdf=yes
fi
else
if test "$nettle" = "yes"; then
feature_not_found "nettle" "Install nettle devel"
@@ -2336,20 +2393,25 @@ fi
if test "$vte" != "no"; then
if test "$gtkabi" = "3.0"; then
vtepackage="vte-2.90"
vteversion="0.32.0"
vteminversion="0.32.0"
if $pkg_config --exists "vte-2.91"; then
vtepackage="vte-2.91"
else
vtepackage="vte-2.90"
fi
else
vtepackage="vte"
vteversion="0.24.0"
vteminversion="0.24.0"
fi
if $pkg_config --exists "$vtepackage >= $vteversion"; then
if $pkg_config --exists "$vtepackage >= $vteminversion"; then
vte_cflags=`$pkg_config --cflags $vtepackage`
vte_libs=`$pkg_config --libs $vtepackage`
vteversion=`$pkg_config --modversion $vtepackage`
libs_softmmu="$vte_libs $libs_softmmu"
vte="yes"
elif test "$vte" = "yes"; then
if test "$gtkabi" = "3.0"; then
feature_not_found "vte" "Install libvte-2.90 devel"
feature_not_found "vte" "Install libvte-2.90/2.91 devel"
else
feature_not_found "vte" "Install libvte devel"
fi
@@ -2364,13 +2426,25 @@ fi
# Look for sdl configuration program (pkg-config or sdl-config). Try
# sdl-config even without cross prefix, and favour pkg-config over sdl-config.
if test "$sdlabi" = ""; then
if $pkg_config --exists "sdl"; then
sdlabi=1.2
elif $pkg_config --exists "sdl2"; then
sdlabi=2.0
else
sdlabi=1.2
fi
fi
if test $sdlabi = "2.0"; then
sdl_config=$sdl2_config
sdlname=sdl2
sdlconfigname=sdl2_config
else
elif test $sdlabi = "1.2"; then
sdlname=sdl
sdlconfigname=sdl_config
else
error_exit "Unknown sdlabi $sdlabi, must be 1.2 or 2.0"
fi
if test "`basename $sdl_config`" != $sdlconfigname && ! has ${sdl_config}; then
@@ -2379,10 +2453,10 @@ fi
if $pkg_config $sdlname --exists; then
sdlconfig="$pkg_config $sdlname"
_sdlversion=`$sdlconfig --modversion 2>/dev/null | sed 's/[^0-9]//g'`
sdlversion=`$sdlconfig --modversion 2>/dev/null`
elif has ${sdl_config}; then
sdlconfig="$sdl_config"
_sdlversion=`$sdlconfig --version | sed 's/[^0-9]//g'`
sdlversion=`$sdlconfig --version`
else
if test "$sdl" = "yes" ; then
feature_not_found "sdl" "Install SDL devel"
@@ -2407,7 +2481,7 @@ EOF
sdl_libs=`$sdlconfig --libs 2> /dev/null`
fi
if compile_prog "$sdl_cflags" "$sdl_libs" ; then
if test "$_sdlversion" -lt 121 ; then
if test `echo $sdlversion | sed 's/[^0-9]//g'` -lt 121 ; then
sdl_too_old=yes
else
sdl=yes
@@ -2739,8 +2813,8 @@ for drv in $audio_drv_list; do
;;
pa)
audio_drv_probe $drv pulse/mainloop.h "-lpulse" \
"pa_mainloop *m = 0; pa_mainloop_free (m); return 0;"
audio_drv_probe $drv pulse/pulseaudio.h "-lpulse" \
"pa_context_set_source_output_volume(NULL, 0, NULL, NULL, NULL); return 0;"
libs_softmmu="-lpulse $libs_softmmu"
audio_pt_int="yes"
;;
@@ -2796,7 +2870,7 @@ fi
# curses probe
if test "$curses" != "no" ; then
if test "$mingw32" = "yes" ; then
curses_list="-lpdcurses"
curses_list="$($pkg_config --libs ncurses 2>/dev/null):-lpdcurses"
else
curses_list="$($pkg_config --libs ncurses 2>/dev/null):-lncurses:-lcurses"
fi
@@ -2911,7 +2985,7 @@ int main(void) {
}
EOF
if ! compile_prog "-Werror $CFLAGS" "$LIBS" ; then
if ! compile_prog "$CFLAGS" "$LIBS" ; then
error_exit "sizeof(size_t) doesn't match GLIB_SIZEOF_SIZE_T."\
"You probably need to set PKG_CONFIG_LIBDIR"\
"to point to the right pkg-config files for your"\
@@ -3345,6 +3419,9 @@ if test "$glusterfs" != "no" ; then
glusterfs="yes"
glusterfs_cflags=`$pkg_config --cflags glusterfs-api`
glusterfs_libs=`$pkg_config --libs glusterfs-api`
if $pkg_config --atleast-version=4 glusterfs-api; then
glusterfs_xlator_opt="yes"
fi
if $pkg_config --atleast-version=5 glusterfs-api; then
glusterfs_discard="yes"
fi
@@ -4434,6 +4511,21 @@ if test "$fortify_source" != "no"; then
fi
fi
##########################################
# check if struct fsxattr is available via linux/fs.h
have_fsxattr=no
cat > $TMPC << EOF
#include <linux/fs.h>
struct fsxattr foo;
int main(void) {
return 0;
}
EOF
if compile_prog "" "" ; then
have_fsxattr=yes
fi
##########################################
# End of CC checks
# After here, no more $cc or $ld runs
@@ -4519,7 +4611,7 @@ if test "$softmmu" = yes ; then
tools="$tools fsdev/virtfs-proxy-helper\$(EXESUF)"
else
if test "$virtfs" = yes; then
error_exit "VirtFS is supported only on Linux and requires libcap-devel and libattr-devel"
error_exit "VirtFS is supported only on Linux and requires libcap devel and libattr devel"
fi
virtfs=no
fi
@@ -4644,6 +4736,12 @@ EOF
fi
fi
echo_version() {
if test "$1" = "yes" ; then
echo "($2)"
fi
}
# prepend pixman and ftd flags after all config tests are done
QEMU_CFLAGS="$pixman_cflags $fdt_cflags $QEMU_CFLAGS"
libs_softmmu="$pixman_libs $libs_softmmu"
@@ -4693,19 +4791,18 @@ if test "$darwin" = "yes" ; then
echo "Cocoa support $cocoa"
fi
echo "pixman $pixman"
echo "SDL support $sdl"
echo "GTK support $gtk"
echo "SDL support $sdl `echo_version $sdl $sdlversion`"
echo "GTK support $gtk `echo_version $gtk $gtk_version`"
echo "GTK GL support $gtk_gl"
echo "VTE support $vte `echo_version $vte $vteversion`"
echo "GNUTLS support $gnutls"
echo "GNUTLS hash $gnutls_hash"
echo "GNUTLS rnd $gnutls_rnd"
echo "libgcrypt $gcrypt"
if test "$nettle" = "yes"; then
echo "nettle $nettle ($nettle_version)"
else
echo "nettle $nettle"
fi
echo "libgcrypt kdf $gcrypt_kdf"
echo "nettle $nettle `echo_version $nettle $nettle_version`"
echo "nettle kdf $nettle_kdf"
echo "libtasn1 $tasn1"
echo "VTE support $vte"
echo "curses support $curses"
echo "virgl support $virglrenderer"
echo "curl support $curl"
@@ -4754,11 +4851,7 @@ echo "Trace backends $trace_backends"
if have_backend "simple"; then
echo "Trace output file $trace_file-<pid>"
fi
if test "$spice" = "yes"; then
echo "spice support $spice ($spice_protocol_version/$spice_server_version)"
else
echo "spice support $spice"
fi
echo "spice support $spice `echo_version $spice $spice_protocol_version/$spice_server_version`"
echo "rbd support $rbd"
echo "xfsctl support $xfs"
echo "smartcard support $smartcard"
@@ -4790,6 +4883,7 @@ echo "bzip2 support $bzip2"
echo "NUMA host support $numa"
echo "tcmalloc support $tcmalloc"
echo "jemalloc support $jemalloc"
echo "avx2 optimization $avx2_opt"
if test "$sdl_too_old" = "yes"; then
echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -5075,12 +5169,21 @@ fi
if test "$gnutls_hash" = "yes" ; then
echo "CONFIG_GNUTLS_HASH=y" >> $config_host_mak
fi
if test "$gnutls_rnd" = "yes" ; then
echo "CONFIG_GNUTLS_RND=y" >> $config_host_mak
fi
if test "$gcrypt" = "yes" ; then
echo "CONFIG_GCRYPT=y" >> $config_host_mak
if test "$gcrypt_kdf" = "yes" ; then
echo "CONFIG_GCRYPT_KDF=y" >> $config_host_mak
fi
fi
if test "$nettle" = "yes" ; then
echo "CONFIG_NETTLE=y" >> $config_host_mak
echo "CONFIG_NETTLE_VERSION_MAJOR=${nettle_version%%.*}" >> $config_host_mak
if test "$nettle_kdf" = "yes" ; then
echo "CONFIG_NETTLE_KDF=y" >> $config_host_mak
fi
fi
if test "$tasn1" = "yes" ; then
echo "CONFIG_TASN1=y" >> $config_host_mak
@@ -5088,6 +5191,14 @@ fi
if test "$have_ifaddrs_h" = "yes" ; then
echo "HAVE_IFADDRS_H=y" >> $config_host_mak
fi
# Work around a system header bug with some kernel/XFS header
# versions where they both try to define 'struct fsxattr':
# xfs headers will not try to redefine structs from linux headers
# if this macro is set.
if test "$have_fsxattr" = "yes" ; then
echo "HAVE_FSXATTR=y" >> $config_host_mak
fi
if test "$vte" = "yes" ; then
echo "CONFIG_VTE=y" >> $config_host_mak
echo "VTE_CFLAGS=$vte_cflags" >> $config_host_mak
@@ -5178,6 +5289,10 @@ if test "$opengl" = "yes" ; then
fi
fi
if test "$avx2_opt" = "yes" ; then
echo "CONFIG_AVX2_OPT=y" >> $config_host_mak
fi
if test "$lzo" = "yes" ; then
echo "CONFIG_LZO=y" >> $config_host_mak
fi
@@ -5270,6 +5385,10 @@ if test "$glusterfs" = "yes" ; then
echo "GLUSTERFS_LIBS=$glusterfs_libs" >> $config_host_mak
fi
if test "$glusterfs_xlator_opt" = "yes" ; then
echo "CONFIG_GLUSTERFS_XLATOR_OPT=y" >> $config_host_mak
fi
if test "$glusterfs_discard" = "yes" ; then
echo "CONFIG_GLUSTERFS_DISCARD=y" >> $config_host_mak
fi
@@ -5889,7 +6008,7 @@ cat <<EOD >config.status
EOD
printf "exec" >>config.status
printf " '%s'" "$0" "$@" >>config.status
echo >>config.status
echo ' "$@"' >>config.status
chmod +x config.status
rm -r "$TMPDIR1"

View File

@@ -7,14 +7,12 @@
*/
#include "qemu/osdep.h"
#include "qemu-common.h"
#include "qemu/host-utils.h"
#include "qemu/sockets.h"
#include <sys/mman.h>
#include <sys/socket.h>
#include <sys/un.h>
#ifdef CONFIG_LINUX
#include <sys/vfs.h>
#endif
#include "ivshmem-server.h"
@@ -257,7 +255,8 @@ ivshmem_server_ftruncate(int fd, unsigned shmsize)
/* Init a new ivshmem server */
int
ivshmem_server_init(IvshmemServer *server, const char *unix_sock_path,
const char *shm_path, size_t shm_size, unsigned n_vectors,
const char *shm_path, bool use_shm_open,
size_t shm_size, unsigned n_vectors,
bool verbose)
{
int ret;
@@ -278,6 +277,7 @@ ivshmem_server_init(IvshmemServer *server, const char *unix_sock_path,
return -1;
}
server->use_shm_open = use_shm_open;
server->shm_size = shm_size;
server->n_vectors = n_vectors;
@@ -286,31 +286,6 @@ ivshmem_server_init(IvshmemServer *server, const char *unix_sock_path,
return 0;
}
#ifdef CONFIG_LINUX
#define HUGETLBFS_MAGIC 0x958458f6
static long gethugepagesize(const char *path)
{
struct statfs fs;
int ret;
do {
ret = statfs(path, &fs);
} while (ret != 0 && errno == EINTR);
if (ret != 0) {
return -1;
}
if (fs.f_type != HUGETLBFS_MAGIC) {
return -1;
}
return fs.f_bsize;
}
#endif
/* open shm, create and bind to the unix socket */
int
ivshmem_server_start(IvshmemServer *server)
@@ -319,27 +294,17 @@ ivshmem_server_start(IvshmemServer *server)
int shm_fd, sock_fd, ret;
/* open shm file */
#ifdef CONFIG_LINUX
long hpagesize;
hpagesize = gethugepagesize(server->shm_path);
if (hpagesize < 0 && errno != ENOENT) {
IVSHMEM_SERVER_DEBUG(server, "cannot stat shm file %s: %s\n",
server->shm_path, strerror(errno));
}
if (hpagesize > 0) {
if (server->use_shm_open) {
IVSHMEM_SERVER_DEBUG(server, "Using POSIX shared memory: %s\n",
server->shm_path);
shm_fd = shm_open(server->shm_path, O_CREAT | O_RDWR, S_IRWXU);
} else {
gchar *filename = g_strdup_printf("%s/ivshmem.XXXXXX", server->shm_path);
IVSHMEM_SERVER_DEBUG(server, "Using hugepages: %s\n", server->shm_path);
IVSHMEM_SERVER_DEBUG(server, "Using file-backed shared memory: %s\n",
server->shm_path);
shm_fd = mkstemp(filename);
unlink(filename);
g_free(filename);
} else
#endif
{
IVSHMEM_SERVER_DEBUG(server, "Using POSIX shared memory: %s\n",
server->shm_path);
shm_fd = shm_open(server->shm_path, O_CREAT|O_RDWR, S_IRWXU);
}
if (shm_fd < 0) {

View File

@@ -15,7 +15,7 @@
* unix socket. For each client, the server will create some eventfd
* (see EVENTFD(2)), one per vector. These fd are transmitted to all
* clients using the SCM_RIGHTS cmsg message. Therefore, each client is
* able to send a notification to another client without beeing
* able to send a notification to another client without being
* "profixied" by the server.
*
* We use this mechanism to send interruptions between guests.
@@ -66,6 +66,7 @@ typedef struct IvshmemServer {
char unix_sock_path[PATH_MAX]; /**< path to unix socket */
int sock_fd; /**< unix sock file descriptor */
char shm_path[PATH_MAX]; /**< path to shm */
bool use_shm_open;
size_t shm_size; /**< size of shm */
int shm_fd; /**< shm file descriptor */
unsigned n_vectors; /**< number of vectors */
@@ -89,7 +90,8 @@ typedef struct IvshmemServer {
*/
int
ivshmem_server_init(IvshmemServer *server, const char *unix_sock_path,
const char *shm_path, size_t shm_size, unsigned n_vectors,
const char *shm_path, bool use_shm_open,
size_t shm_size, unsigned n_vectors,
bool verbose);
/**

View File

@@ -7,7 +7,8 @@
*/
#include "qemu/osdep.h"
#include "qemu-common.h"
#include "qapi/error.h"
#include "qemu/cutils.h"
#include "ivshmem-server.h"
@@ -29,35 +30,38 @@ typedef struct IvshmemServerArgs {
const char *pid_file;
const char *unix_socket_path;
const char *shm_path;
bool use_shm_open;
uint64_t shm_size;
unsigned n_vectors;
} IvshmemServerArgs;
/* show ivshmem_server_usage and exit with given error code */
static void
ivshmem_server_usage(const char *name, int code)
ivshmem_server_usage(const char *progname)
{
fprintf(stderr, "%s [opts]\n", name);
fprintf(stderr, " -h: show this help\n");
fprintf(stderr, " -v: verbose mode\n");
fprintf(stderr, " -F: foreground mode (default is to daemonize)\n");
fprintf(stderr, " -p <pid_file>: path to the PID file (used in daemon\n"
" mode only).\n"
" Default=%s\n", IVSHMEM_SERVER_DEFAULT_SHM_PATH);
fprintf(stderr, " -S <unix_socket_path>: path to the unix socket\n"
" to listen to.\n"
" Default=%s\n", IVSHMEM_SERVER_DEFAULT_UNIX_SOCK_PATH);
fprintf(stderr, " -m <shm_path>: path to the shared memory.\n"
" The path corresponds to a POSIX shm name or a\n"
" hugetlbfs mount point.\n"
" default=%s\n", IVSHMEM_SERVER_DEFAULT_SHM_PATH);
fprintf(stderr, " -l <size>: size of shared memory in bytes. The suffix\n"
" K, M and G can be used (ex: 1K means 1024).\n"
" default=%u\n", IVSHMEM_SERVER_DEFAULT_SHM_SIZE);
fprintf(stderr, " -n <n_vects>: number of vectors.\n"
" default=%u\n", IVSHMEM_SERVER_DEFAULT_N_VECTORS);
printf("Usage: %s [OPTION]...\n"
" -h: show this help\n"
" -v: verbose mode\n"
" -F: foreground mode (default is to daemonize)\n"
" -p <pid-file>: path to the PID file (used in daemon mode only)\n"
" default " IVSHMEM_SERVER_DEFAULT_PID_FILE "\n"
" -S <unix-socket-path>: path to the unix socket to listen to\n"
" default " IVSHMEM_SERVER_DEFAULT_UNIX_SOCK_PATH "\n"
" -M <shm-name>: POSIX shared memory object to use\n"
" default " IVSHMEM_SERVER_DEFAULT_SHM_PATH "\n"
" -m <dir-name>: where to create shared memory\n"
" -l <size>: size of shared memory in bytes\n"
" suffixes K, M and G can be used, e.g. 1K means 1024\n"
" default %u\n"
" -n <nvectors>: number of vectors\n"
" default %u\n",
progname, IVSHMEM_SERVER_DEFAULT_SHM_SIZE,
IVSHMEM_SERVER_DEFAULT_N_VECTORS);
}
exit(code);
static void
ivshmem_server_help(const char *progname)
{
fprintf(stderr, "Try '%s -h' for more information.\n", progname);
}
/* parse the program arguments, exit on error */
@@ -68,20 +72,12 @@ ivshmem_server_parse_args(IvshmemServerArgs *args, int argc, char *argv[])
unsigned long long v;
Error *err = NULL;
while ((c = getopt(argc, argv,
"h" /* help */
"v" /* verbose */
"F" /* foreground */
"p:" /* pid_file */
"S:" /* unix_socket_path */
"m:" /* shm_path */
"l:" /* shm_size */
"n:" /* n_vectors */
)) != -1) {
while ((c = getopt(argc, argv, "hvFp:S:m:M:l:n:")) != -1) {
switch (c) {
case 'h': /* help */
ivshmem_server_usage(argv[0], 0);
ivshmem_server_usage(argv[0]);
exit(0);
break;
case 'v': /* verbose */
@@ -92,36 +88,41 @@ ivshmem_server_parse_args(IvshmemServerArgs *args, int argc, char *argv[])
args->foreground = 1;
break;
case 'p': /* pid_file */
case 'p': /* pid file */
args->pid_file = optarg;
break;
case 'S': /* unix_socket_path */
case 'S': /* unix socket path */
args->unix_socket_path = optarg;
break;
case 'm': /* shm_path */
case 'M': /* shm name */
case 'm': /* dir name */
args->shm_path = optarg;
args->use_shm_open = c == 'M';
break;
case 'l': /* shm_size */
case 'l': /* shm size */
parse_option_size("shm_size", optarg, &args->shm_size, &err);
if (err) {
error_report_err(err);
ivshmem_server_usage(argv[0], 1);
ivshmem_server_help(argv[0]);
exit(1);
}
break;
case 'n': /* n_vectors */
case 'n': /* number of vectors */
if (parse_uint_full(optarg, &v, 0) < 0) {
fprintf(stderr, "cannot parse n_vectors\n");
ivshmem_server_usage(argv[0], 1);
ivshmem_server_help(argv[0]);
exit(1);
}
args->n_vectors = v;
break;
default:
ivshmem_server_usage(argv[0], 1);
ivshmem_server_usage(argv[0]);
exit(1);
break;
}
}
@@ -129,12 +130,14 @@ ivshmem_server_parse_args(IvshmemServerArgs *args, int argc, char *argv[])
if (args->n_vectors > IVSHMEM_SERVER_MAX_VECTORS) {
fprintf(stderr, "too many requested vectors (max is %d)\n",
IVSHMEM_SERVER_MAX_VECTORS);
ivshmem_server_usage(argv[0], 1);
ivshmem_server_help(argv[0]);
exit(1);
}
if (args->verbose == 1 && args->foreground == 0) {
fprintf(stderr, "cannot use verbose in daemon mode\n");
ivshmem_server_usage(argv[0], 1);
ivshmem_server_help(argv[0]);
exit(1);
}
}
@@ -192,11 +195,18 @@ main(int argc, char *argv[])
.pid_file = IVSHMEM_SERVER_DEFAULT_PID_FILE,
.unix_socket_path = IVSHMEM_SERVER_DEFAULT_UNIX_SOCK_PATH,
.shm_path = IVSHMEM_SERVER_DEFAULT_SHM_PATH,
.use_shm_open = true,
.shm_size = IVSHMEM_SERVER_DEFAULT_SHM_SIZE,
.n_vectors = IVSHMEM_SERVER_DEFAULT_N_VECTORS,
};
int ret = 1;
/*
* Do not remove this notice without adding proper error handling!
* Start with handling ivshmem_server_send_one_msg() failure.
*/
printf("*** Example code, do not use in production ***\n");
/* parse arguments, will exit on error */
ivshmem_server_parse_args(&args, argc, argv);
@@ -219,7 +229,8 @@ main(int argc, char *argv[])
}
/* init the ivshms structure */
if (ivshmem_server_init(&server, args.unix_socket_path, args.shm_path,
if (ivshmem_server_init(&server, args.unix_socket_path,
args.shm_path, args.use_shm_open,
args.shm_size, args.n_vectors, args.verbose) < 0) {
fprintf(stderr, "cannot init server\n");
goto err;

View File

@@ -20,6 +20,7 @@
#include "qemu/osdep.h"
#include "cpu.h"
#include "sysemu/cpus.h"
#include "exec/exec-all.h"
#include "exec/memory-internal.h"
bool exit_request;
@@ -68,7 +69,6 @@ void cpu_reloading_memory_map(void)
void cpu_loop_exit(CPUState *cpu)
{
cpu->current_tb = NULL;
siglongjmp(cpu->jmp_env, 1);
}
@@ -77,6 +77,5 @@ void cpu_loop_exit_restore(CPUState *cpu, uintptr_t pc)
if (pc) {
cpu_restore_state(cpu, pc);
}
cpu->current_tb = NULL;
siglongjmp(cpu->jmp_env, 1);
}

View File

@@ -20,6 +20,7 @@
#include "cpu.h"
#include "trace.h"
#include "disas/disas.h"
#include "exec/exec-all.h"
#include "tcg.h"
#include "qemu/atomic.h"
#include "sysemu/qtest.h"
@@ -133,10 +134,17 @@ static void init_delay_params(SyncClocks *sc, const CPUState *cpu)
#endif /* CONFIG USER ONLY */
/* Execute a TB, and fix up the CPU state afterwards if necessary */
static inline tcg_target_ulong cpu_tb_exec(CPUState *cpu, uint8_t *tb_ptr)
static inline tcg_target_ulong cpu_tb_exec(CPUState *cpu, TranslationBlock *itb)
{
CPUArchState *env = cpu->env_ptr;
uintptr_t next_tb;
uintptr_t ret;
TranslationBlock *last_tb;
int tb_exit;
uint8_t *tb_ptr = itb->tc_ptr;
qemu_log_mask_and_addr(CPU_LOG_EXEC, itb->pc,
"Trace %p [" TARGET_FMT_lx "] %s\n",
itb->tc_ptr, itb->pc, lookup_symbol(itb->pc));
#if defined(DEBUG_DISAS)
if (qemu_loglevel_mask(CPU_LOG_TB_CPU)) {
@@ -155,114 +163,125 @@ static inline tcg_target_ulong cpu_tb_exec(CPUState *cpu, uint8_t *tb_ptr)
#endif /* DEBUG_DISAS */
cpu->can_do_io = !use_icount;
next_tb = tcg_qemu_tb_exec(env, tb_ptr);
ret = tcg_qemu_tb_exec(env, tb_ptr);
cpu->can_do_io = 1;
trace_exec_tb_exit((void *) (next_tb & ~TB_EXIT_MASK),
next_tb & TB_EXIT_MASK);
last_tb = (TranslationBlock *)(ret & ~TB_EXIT_MASK);
tb_exit = ret & TB_EXIT_MASK;
trace_exec_tb_exit(last_tb, tb_exit);
if ((next_tb & TB_EXIT_MASK) > TB_EXIT_IDX1) {
if (tb_exit > TB_EXIT_IDX1) {
/* We didn't start executing this TB (eg because the instruction
* counter hit zero); we must restore the guest PC to the address
* of the start of the TB.
*/
CPUClass *cc = CPU_GET_CLASS(cpu);
TranslationBlock *tb = (TranslationBlock *)(next_tb & ~TB_EXIT_MASK);
qemu_log_mask_and_addr(CPU_LOG_EXEC, last_tb->pc,
"Stopped execution of TB chain before %p ["
TARGET_FMT_lx "] %s\n",
last_tb->tc_ptr, last_tb->pc,
lookup_symbol(last_tb->pc));
if (cc->synchronize_from_tb) {
cc->synchronize_from_tb(cpu, tb);
cc->synchronize_from_tb(cpu, last_tb);
} else {
assert(cc->set_pc);
cc->set_pc(cpu, tb->pc);
cc->set_pc(cpu, last_tb->pc);
}
}
if ((next_tb & TB_EXIT_MASK) == TB_EXIT_REQUESTED) {
if (tb_exit == TB_EXIT_REQUESTED) {
/* We were asked to stop executing TBs (probably a pending
* interrupt. We've now stopped, so clear the flag.
*/
cpu->tcg_exit_req = 0;
}
return next_tb;
return ret;
}
#ifndef CONFIG_USER_ONLY
/* Execute the code without caching the generated code. An interpreter
could be used if available. */
static void cpu_exec_nocache(CPUState *cpu, int max_cycles,
TranslationBlock *orig_tb, bool ignore_icount)
{
TranslationBlock *tb;
bool old_tb_flushed;
/* Should never happen.
We only end up here when an existing TB is too long. */
if (max_cycles > CF_COUNT_MASK)
max_cycles = CF_COUNT_MASK;
old_tb_flushed = cpu->tb_flushed;
cpu->tb_flushed = false;
tb = tb_gen_code(cpu, orig_tb->pc, orig_tb->cs_base, orig_tb->flags,
max_cycles | CF_NOCACHE
| (ignore_icount ? CF_IGNORE_ICOUNT : 0));
tb->orig_tb = tcg_ctx.tb_ctx.tb_invalidated_flag ? NULL : orig_tb;
cpu->current_tb = tb;
tb->orig_tb = cpu->tb_flushed ? NULL : orig_tb;
cpu->tb_flushed |= old_tb_flushed;
/* execute the generated code */
trace_exec_tb_nocache(tb, tb->pc);
cpu_tb_exec(cpu, tb->tc_ptr);
cpu->current_tb = NULL;
cpu_tb_exec(cpu, tb);
tb_phys_invalidate(tb, -1);
tb_free(tb);
}
#endif
static TranslationBlock *tb_find_physical(CPUState *cpu,
target_ulong pc,
target_ulong cs_base,
uint64_t flags)
uint32_t flags)
{
CPUArchState *env = (CPUArchState *)cpu->env_ptr;
TranslationBlock *tb, **ptb1;
TranslationBlock *tb, **tb_hash_head, **ptb1;
unsigned int h;
tb_page_addr_t phys_pc, phys_page1;
target_ulong virt_page2;
tcg_ctx.tb_ctx.tb_invalidated_flag = 0;
/* find translated block using physical mappings */
phys_pc = get_page_addr_code(env, pc);
phys_page1 = phys_pc & TARGET_PAGE_MASK;
h = tb_phys_hash_func(phys_pc);
ptb1 = &tcg_ctx.tb_ctx.tb_phys_hash[h];
for(;;) {
tb = *ptb1;
if (!tb) {
return NULL;
}
/* Start at head of the hash entry */
ptb1 = tb_hash_head = &tcg_ctx.tb_ctx.tb_phys_hash[h];
tb = *ptb1;
while (tb) {
if (tb->pc == pc &&
tb->page_addr[0] == phys_page1 &&
tb->cs_base == cs_base &&
tb->flags == flags) {
/* check next page if needed */
if (tb->page_addr[1] != -1) {
tb_page_addr_t phys_page2;
virt_page2 = (pc & TARGET_PAGE_MASK) +
TARGET_PAGE_SIZE;
phys_page2 = get_page_addr_code(env, virt_page2);
if (tb->page_addr[1] == -1) {
/* done, we have a match */
break;
} else {
/* check next page if needed */
target_ulong virt_page2 = (pc & TARGET_PAGE_MASK) +
TARGET_PAGE_SIZE;
tb_page_addr_t phys_page2 = get_page_addr_code(env, virt_page2);
if (tb->page_addr[1] == phys_page2) {
break;
}
} else {
break;
}
}
ptb1 = &tb->phys_hash_next;
tb = *ptb1;
}
/* Move the TB to the head of the list */
*ptb1 = tb->phys_hash_next;
tb->phys_hash_next = tcg_ctx.tb_ctx.tb_phys_hash[h];
tcg_ctx.tb_ctx.tb_phys_hash[h] = tb;
if (tb) {
/* Move the TB to the head of the list */
*ptb1 = tb->phys_hash_next;
tb->phys_hash_next = *tb_hash_head;
*tb_hash_head = tb;
}
return tb;
}
static TranslationBlock *tb_find_slow(CPUState *cpu,
target_ulong pc,
target_ulong cs_base,
uint64_t flags)
uint32_t flags)
{
TranslationBlock *tb;
@@ -300,26 +319,72 @@ found:
return tb;
}
static inline TranslationBlock *tb_find_fast(CPUState *cpu)
static inline TranslationBlock *tb_find_fast(CPUState *cpu,
TranslationBlock **last_tb,
int tb_exit)
{
CPUArchState *env = (CPUArchState *)cpu->env_ptr;
TranslationBlock *tb;
target_ulong cs_base, pc;
int flags;
uint32_t flags;
/* we record a subset of the CPU state. It will
always be the same before a given translated block
is executed. */
cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
tb_lock();
tb = cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)];
if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
tb->flags != flags)) {
tb = tb_find_slow(cpu, pc, cs_base, flags);
}
if (cpu->tb_flushed) {
/* Ensure that no TB jump will be modified as the
* translation buffer has been flushed.
*/
*last_tb = NULL;
cpu->tb_flushed = false;
}
#ifndef CONFIG_USER_ONLY
/* We don't take care of direct jumps when address mapping changes in
* system emulation. So it's not safe to make a direct jump to a TB
* spanning two pages because the mapping for the second page can change.
*/
if (tb->page_addr[1] != -1) {
*last_tb = NULL;
}
#endif
/* See if we can patch the calling TB. */
if (*last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
tb_add_jump(*last_tb, tb_exit, tb);
}
tb_unlock();
return tb;
}
static void cpu_handle_debug_exception(CPUState *cpu)
static inline bool cpu_handle_halt(CPUState *cpu)
{
if (cpu->halted) {
#if defined(TARGET_I386) && !defined(CONFIG_USER_ONLY)
if ((cpu->interrupt_request & CPU_INTERRUPT_POLL)
&& replay_interrupt()) {
X86CPU *x86_cpu = X86_CPU(cpu);
apic_poll_irq(x86_cpu->apic_state);
cpu_reset_interrupt(cpu, CPU_INTERRUPT_POLL);
}
#endif
if (!cpu_has_work(cpu)) {
current_cpu = NULL;
return true;
}
cpu->halted = 0;
}
return false;
}
static inline void cpu_handle_debug_exception(CPUState *cpu)
{
CPUClass *cc = CPU_GET_CLASS(cpu);
CPUWatchpoint *wp;
@@ -333,38 +398,197 @@ static void cpu_handle_debug_exception(CPUState *cpu)
cc->debug_excp_handler(cpu);
}
static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
{
if (cpu->exception_index >= 0) {
if (cpu->exception_index >= EXCP_INTERRUPT) {
/* exit request from the cpu execution loop */
*ret = cpu->exception_index;
if (*ret == EXCP_DEBUG) {
cpu_handle_debug_exception(cpu);
}
cpu->exception_index = -1;
return true;
} else {
#if defined(CONFIG_USER_ONLY)
/* if user mode only, we simulate a fake exception
which will be handled outside the cpu execution
loop */
#if defined(TARGET_I386)
CPUClass *cc = CPU_GET_CLASS(cpu);
cc->do_interrupt(cpu);
#endif
*ret = cpu->exception_index;
cpu->exception_index = -1;
return true;
#else
if (replay_exception()) {
CPUClass *cc = CPU_GET_CLASS(cpu);
cc->do_interrupt(cpu);
cpu->exception_index = -1;
} else if (!replay_has_interrupt()) {
/* give a chance to iothread in replay mode */
*ret = EXCP_INTERRUPT;
return true;
}
#endif
}
#ifndef CONFIG_USER_ONLY
} else if (replay_has_exception()
&& cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
/* try to cause an exception pending in the log */
TranslationBlock *last_tb = NULL; /* Avoid chaining TBs */
cpu_exec_nocache(cpu, 1, tb_find_fast(cpu, &last_tb, 0), true);
*ret = -1;
return true;
#endif
}
return false;
}
static inline void cpu_handle_interrupt(CPUState *cpu,
TranslationBlock **last_tb)
{
CPUClass *cc = CPU_GET_CLASS(cpu);
int interrupt_request = cpu->interrupt_request;
if (unlikely(interrupt_request)) {
if (unlikely(cpu->singlestep_enabled & SSTEP_NOIRQ)) {
/* Mask out external interrupts for this step. */
interrupt_request &= ~CPU_INTERRUPT_SSTEP_MASK;
}
if (interrupt_request & CPU_INTERRUPT_DEBUG) {
cpu->interrupt_request &= ~CPU_INTERRUPT_DEBUG;
cpu->exception_index = EXCP_DEBUG;
cpu_loop_exit(cpu);
}
if (replay_mode == REPLAY_MODE_PLAY && !replay_has_interrupt()) {
/* Do nothing */
} else if (interrupt_request & CPU_INTERRUPT_HALT) {
replay_interrupt();
cpu->interrupt_request &= ~CPU_INTERRUPT_HALT;
cpu->halted = 1;
cpu->exception_index = EXCP_HLT;
cpu_loop_exit(cpu);
}
#if defined(TARGET_I386)
else if (interrupt_request & CPU_INTERRUPT_INIT) {
X86CPU *x86_cpu = X86_CPU(cpu);
CPUArchState *env = &x86_cpu->env;
replay_interrupt();
cpu_svm_check_intercept_param(env, SVM_EXIT_INIT, 0);
do_cpu_init(x86_cpu);
cpu->exception_index = EXCP_HALTED;
cpu_loop_exit(cpu);
}
#else
else if (interrupt_request & CPU_INTERRUPT_RESET) {
replay_interrupt();
cpu_reset(cpu);
cpu_loop_exit(cpu);
}
#endif
/* The target hook has 3 exit conditions:
False when the interrupt isn't processed,
True when it is, and we should restart on a new TB,
and via longjmp via cpu_loop_exit. */
else {
replay_interrupt();
if (cc->cpu_exec_interrupt(cpu, interrupt_request)) {
*last_tb = NULL;
}
/* The target hook may have updated the 'cpu->interrupt_request';
* reload the 'interrupt_request' value */
interrupt_request = cpu->interrupt_request;
}
if (interrupt_request & CPU_INTERRUPT_EXITTB) {
cpu->interrupt_request &= ~CPU_INTERRUPT_EXITTB;
/* ensure that no TB jump will be modified as
the program flow was changed */
*last_tb = NULL;
}
}
if (unlikely(cpu->exit_request || replay_has_interrupt())) {
cpu->exit_request = 0;
cpu->exception_index = EXCP_INTERRUPT;
cpu_loop_exit(cpu);
}
}
static inline void cpu_loop_exec_tb(CPUState *cpu, TranslationBlock *tb,
TranslationBlock **last_tb, int *tb_exit,
SyncClocks *sc)
{
uintptr_t ret;
if (unlikely(cpu->exit_request)) {
return;
}
trace_exec_tb(tb, tb->pc);
ret = cpu_tb_exec(cpu, tb);
*last_tb = (TranslationBlock *)(ret & ~TB_EXIT_MASK);
*tb_exit = ret & TB_EXIT_MASK;
switch (*tb_exit) {
case TB_EXIT_REQUESTED:
/* Something asked us to stop executing
* chained TBs; just continue round the main
* loop. Whatever requested the exit will also
* have set something else (eg exit_request or
* interrupt_request) which we will handle
* next time around the loop. But we need to
* ensure the tcg_exit_req read in generated code
* comes before the next read of cpu->exit_request
* or cpu->interrupt_request.
*/
smp_rmb();
*last_tb = NULL;
break;
case TB_EXIT_ICOUNT_EXPIRED:
{
/* Instruction counter expired. */
#ifdef CONFIG_USER_ONLY
abort();
#else
int insns_left = cpu->icount_decr.u32;
if (cpu->icount_extra && insns_left >= 0) {
/* Refill decrementer and continue execution. */
cpu->icount_extra += insns_left;
insns_left = MIN(0xffff, cpu->icount_extra);
cpu->icount_extra -= insns_left;
cpu->icount_decr.u16.low = insns_left;
} else {
if (insns_left > 0) {
/* Execute remaining instructions. */
cpu_exec_nocache(cpu, insns_left, *last_tb, false);
align_clocks(sc, cpu);
}
cpu->exception_index = EXCP_INTERRUPT;
*last_tb = NULL;
cpu_loop_exit(cpu);
}
break;
#endif
}
default:
break;
}
}
/* main execution loop */
int cpu_exec(CPUState *cpu)
{
CPUClass *cc = CPU_GET_CLASS(cpu);
#ifdef TARGET_I386
X86CPU *x86_cpu = X86_CPU(cpu);
CPUArchState *env = &x86_cpu->env;
#endif
int ret, interrupt_request;
TranslationBlock *tb;
uint8_t *tc_ptr;
uintptr_t next_tb;
int ret;
SyncClocks sc;
/* replay_interrupt may need current_cpu */
current_cpu = cpu;
if (cpu->halted) {
#if defined(TARGET_I386) && !defined(CONFIG_USER_ONLY)
if ((cpu->interrupt_request & CPU_INTERRUPT_POLL)
&& replay_interrupt()) {
apic_poll_irq(x86_cpu->apic_state);
cpu_reset_interrupt(cpu, CPU_INTERRUPT_POLL);
}
#endif
if (!cpu_has_work(cpu)) {
current_cpu = NULL;
return EXCP_HALTED;
}
cpu->halted = 0;
if (cpu_handle_halt(cpu)) {
return EXCP_HALTED;
}
atomic_mb_set(&tcg_current_cpu, cpu);
@@ -383,190 +607,26 @@ int cpu_exec(CPUState *cpu)
*/
init_delay_params(&sc, cpu);
/* prepare setjmp context for exception handling */
for(;;) {
TranslationBlock *tb, *last_tb;
int tb_exit = 0;
/* prepare setjmp context for exception handling */
if (sigsetjmp(cpu->jmp_env, 0) == 0) {
/* if an exception is pending, we execute it here */
if (cpu->exception_index >= 0) {
if (cpu->exception_index >= EXCP_INTERRUPT) {
/* exit request from the cpu execution loop */
ret = cpu->exception_index;
if (ret == EXCP_DEBUG) {
cpu_handle_debug_exception(cpu);
}
cpu->exception_index = -1;
break;
} else {
#if defined(CONFIG_USER_ONLY)
/* if user mode only, we simulate a fake exception
which will be handled outside the cpu execution
loop */
#if defined(TARGET_I386)
cc->do_interrupt(cpu);
#endif
ret = cpu->exception_index;
cpu->exception_index = -1;
break;
#else
if (replay_exception()) {
cc->do_interrupt(cpu);
cpu->exception_index = -1;
} else if (!replay_has_interrupt()) {
/* give a chance to iothread in replay mode */
ret = EXCP_INTERRUPT;
break;
}
#endif
}
} else if (replay_has_exception()
&& cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
/* try to cause an exception pending in the log */
cpu_exec_nocache(cpu, 1, tb_find_fast(cpu), true);
ret = -1;
if (cpu_handle_exception(cpu, &ret)) {
break;
}
next_tb = 0; /* force lookup of first TB */
last_tb = NULL; /* forget the last executed TB after exception */
cpu->tb_flushed = false; /* reset before first TB lookup */
for(;;) {
interrupt_request = cpu->interrupt_request;
if (unlikely(interrupt_request)) {
if (unlikely(cpu->singlestep_enabled & SSTEP_NOIRQ)) {
/* Mask out external interrupts for this step. */
interrupt_request &= ~CPU_INTERRUPT_SSTEP_MASK;
}
if (interrupt_request & CPU_INTERRUPT_DEBUG) {
cpu->interrupt_request &= ~CPU_INTERRUPT_DEBUG;
cpu->exception_index = EXCP_DEBUG;
cpu_loop_exit(cpu);
}
if (replay_mode == REPLAY_MODE_PLAY
&& !replay_has_interrupt()) {
/* Do nothing */
} else if (interrupt_request & CPU_INTERRUPT_HALT) {
replay_interrupt();
cpu->interrupt_request &= ~CPU_INTERRUPT_HALT;
cpu->halted = 1;
cpu->exception_index = EXCP_HLT;
cpu_loop_exit(cpu);
}
#if defined(TARGET_I386)
else if (interrupt_request & CPU_INTERRUPT_INIT) {
replay_interrupt();
cpu_svm_check_intercept_param(env, SVM_EXIT_INIT, 0);
do_cpu_init(x86_cpu);
cpu->exception_index = EXCP_HALTED;
cpu_loop_exit(cpu);
}
#else
else if (interrupt_request & CPU_INTERRUPT_RESET) {
replay_interrupt();
cpu_reset(cpu);
cpu_loop_exit(cpu);
}
#endif
/* The target hook has 3 exit conditions:
False when the interrupt isn't processed,
True when it is, and we should restart on a new TB,
and via longjmp via cpu_loop_exit. */
else {
replay_interrupt();
if (cc->cpu_exec_interrupt(cpu, interrupt_request)) {
next_tb = 0;
}
}
/* Don't use the cached interrupt_request value,
do_interrupt may have updated the EXITTB flag. */
if (cpu->interrupt_request & CPU_INTERRUPT_EXITTB) {
cpu->interrupt_request &= ~CPU_INTERRUPT_EXITTB;
/* ensure that no TB jump will be modified as
the program flow was changed */
next_tb = 0;
}
}
if (unlikely(cpu->exit_request
|| replay_has_interrupt())) {
cpu->exit_request = 0;
cpu->exception_index = EXCP_INTERRUPT;
cpu_loop_exit(cpu);
}
tb_lock();
tb = tb_find_fast(cpu);
/* Note: we do it here to avoid a gcc bug on Mac OS X when
doing it in tb_find_slow */
if (tcg_ctx.tb_ctx.tb_invalidated_flag) {
/* as some TB could have been invalidated because
of memory exceptions while generating the code, we
must recompute the hash index here */
next_tb = 0;
tcg_ctx.tb_ctx.tb_invalidated_flag = 0;
}
if (qemu_loglevel_mask(CPU_LOG_EXEC)) {
qemu_log("Trace %p [" TARGET_FMT_lx "] %s\n",
tb->tc_ptr, tb->pc, lookup_symbol(tb->pc));
}
/* see if we can patch the calling TB. When the TB
spans two pages, we cannot safely do a direct
jump. */
if (next_tb != 0 && tb->page_addr[1] == -1
&& !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
tb_add_jump((TranslationBlock *)(next_tb & ~TB_EXIT_MASK),
next_tb & TB_EXIT_MASK, tb);
}
tb_unlock();
if (likely(!cpu->exit_request)) {
trace_exec_tb(tb, tb->pc);
tc_ptr = tb->tc_ptr;
/* execute the generated code */
cpu->current_tb = tb;
next_tb = cpu_tb_exec(cpu, tc_ptr);
cpu->current_tb = NULL;
switch (next_tb & TB_EXIT_MASK) {
case TB_EXIT_REQUESTED:
/* Something asked us to stop executing
* chained TBs; just continue round the main
* loop. Whatever requested the exit will also
* have set something else (eg exit_request or
* interrupt_request) which we will handle
* next time around the loop. But we need to
* ensure the tcg_exit_req read in generated code
* comes before the next read of cpu->exit_request
* or cpu->interrupt_request.
*/
smp_rmb();
next_tb = 0;
break;
case TB_EXIT_ICOUNT_EXPIRED:
{
/* Instruction counter expired. */
int insns_left = cpu->icount_decr.u32;
if (cpu->icount_extra && insns_left >= 0) {
/* Refill decrementer and continue execution. */
cpu->icount_extra += insns_left;
insns_left = MIN(0xffff, cpu->icount_extra);
cpu->icount_extra -= insns_left;
cpu->icount_decr.u16.low = insns_left;
} else {
if (insns_left > 0) {
/* Execute remaining instructions. */
tb = (TranslationBlock *)(next_tb & ~TB_EXIT_MASK);
cpu_exec_nocache(cpu, insns_left, tb, false);
align_clocks(&sc, cpu);
}
cpu->exception_index = EXCP_INTERRUPT;
next_tb = 0;
cpu_loop_exit(cpu);
}
break;
}
default:
break;
}
}
cpu_handle_interrupt(cpu, &last_tb);
tb = tb_find_fast(cpu, &last_tb, tb_exit);
cpu_loop_exec_tb(cpu, tb, &last_tb, &tb_exit, &sc);
/* Try to align the host and virtual clocks
if the guest is in advance */
align_clocks(&sc, cpu);
/* reset soft MMU for next block (it can currently
only be set by a memory fault) */
} /* for(;;) */
} else {
#if defined(__clang__) || !QEMU_GNUC_PREREQ(4, 6)
@@ -576,18 +636,10 @@ int cpu_exec(CPUState *cpu)
* Newer versions of gcc would complain about this code (-Wclobbered). */
cpu = current_cpu;
cc = CPU_GET_CLASS(cpu);
#ifdef TARGET_I386
x86_cpu = X86_CPU(cpu);
env = &x86_cpu->env;
#endif
#else /* buggy compiler */
/* Assert that the compiler does not smash local variables. */
g_assert(cpu == current_cpu);
g_assert(cc == CPU_GET_CLASS(cpu));
#ifdef TARGET_I386
g_assert(x86_cpu == X86_CPU(cpu));
g_assert(env == &x86_cpu->env);
#endif
#endif /* buggy compiler */
cpu->can_do_io = 1;
tb_lock_reset();

157
cpus.c
View File

@@ -24,15 +24,18 @@
/* Needed early for CONFIG_BSD etc. */
#include "qemu/osdep.h"
#include "qemu-common.h"
#include "cpu.h"
#include "monitor/monitor.h"
#include "qapi/qmp/qerror.h"
#include "qemu/error-report.h"
#include "sysemu/sysemu.h"
#include "sysemu/block-backend.h"
#include "exec/gdbstub.h"
#include "sysemu/dma.h"
#include "sysemu/kvm.h"
#include "qmp-commands.h"
#include "exec/exec-all.h"
#include "qemu/thread.h"
#include "sysemu/cpus.h"
@@ -275,7 +278,7 @@ void cpu_disable_ticks(void)
fairly approximate, so ignore small variation.
When the guest is idle real and virtual time will be aligned in
the IO wait loop. */
#define ICOUNT_WOBBLE (get_ticks_per_sec() / 10)
#define ICOUNT_WOBBLE (NANOSECONDS_PER_SECOND / 10)
static void icount_adjust(void)
{
@@ -326,7 +329,7 @@ static void icount_adjust_vm(void *opaque)
{
timer_mod(icount_vm_timer,
qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
get_ticks_per_sec() / 10);
NANOSECONDS_PER_SECOND / 10);
icount_adjust();
}
@@ -337,10 +340,18 @@ static int64_t qemu_icount_round(int64_t count)
static void icount_warp_rt(void)
{
unsigned seq;
int64_t warp_start;
/* The icount_warp_timer is rescheduled soon after vm_clock_warp_start
* changes from -1 to another value, so the race here is okay.
*/
if (atomic_read(&vm_clock_warp_start) == -1) {
do {
seq = seqlock_read_begin(&timers_state.vm_clock_seqlock);
warp_start = vm_clock_warp_start;
} while (seqlock_read_retry(&timers_state.vm_clock_seqlock, seq));
if (warp_start == -1) {
return;
}
@@ -370,9 +381,12 @@ static void icount_warp_rt(void)
}
}
static void icount_dummy_timer(void *opaque)
static void icount_timer_cb(void *opaque)
{
(void)opaque;
/* No need for a checkpoint because the timer already synchronizes
* with CHECKPOINT_CLOCK_VIRTUAL_RT.
*/
icount_warp_rt();
}
void qtest_clock_warp(int64_t dest)
@@ -396,17 +410,12 @@ void qtest_clock_warp(int64_t dest)
qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
}
void qemu_clock_warp(QEMUClockType type)
void qemu_start_warp_timer(void)
{
int64_t clock;
int64_t deadline;
/*
* There are too many global variables to make the "warp" behavior
* applicable to other clocks. But a clock argument removes the
* need for if statements all over the place.
*/
if (type != QEMU_CLOCK_VIRTUAL || !use_icount) {
if (!use_icount) {
return;
}
@@ -418,29 +427,17 @@ void qemu_clock_warp(QEMUClockType type)
}
/* warp clock deterministically in record/replay mode */
if (!replay_checkpoint(CHECKPOINT_CLOCK_WARP)) {
if (!replay_checkpoint(CHECKPOINT_CLOCK_WARP_START)) {
return;
}
if (icount_sleep) {
/*
* If the CPUs have been sleeping, advance QEMU_CLOCK_VIRTUAL timer now.
* This ensures that the deadline for the timer is computed correctly
* below.
* This also makes sure that the insn counter is synchronized before
* the CPU starts running, in case the CPU is woken by an event other
* than the earliest QEMU_CLOCK_VIRTUAL timer.
*/
icount_warp_rt();
timer_del(icount_warp_timer);
}
if (!all_cpu_threads_idle()) {
return;
}
if (qtest_enabled()) {
/* When testing, qtest commands advance icount. */
return;
return;
}
/* We want to use the earliest deadline from ALL vm_clocks */
@@ -496,6 +493,28 @@ void qemu_clock_warp(QEMUClockType type)
}
}
static void qemu_account_warp_timer(void)
{
if (!use_icount || !icount_sleep) {
return;
}
/* Nothing to do if the VM is stopped: QEMU_CLOCK_VIRTUAL timers
* do not fire, so computing the deadline does not make sense.
*/
if (!runstate_is_running()) {
return;
}
/* warp clock deterministically in record/replay mode */
if (!replay_checkpoint(CHECKPOINT_CLOCK_WARP_ACCOUNT)) {
return;
}
timer_del(icount_warp_timer);
icount_warp_rt();
}
static bool icount_state_needed(void *opaque)
{
return use_icount;
@@ -624,13 +643,13 @@ void configure_icount(QemuOpts *opts, Error **errp)
icount_sleep = qemu_opt_get_bool(opts, "sleep", true);
if (icount_sleep) {
icount_warp_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL_RT,
icount_dummy_timer, NULL);
icount_timer_cb, NULL);
}
icount_align_option = qemu_opt_get_bool(opts, "align", false);
if (icount_align_option && !icount_sleep) {
error_setg(errp, "align=on and sleep=no are incompatible");
error_setg(errp, "align=on and sleep=off are incompatible");
}
if (strcmp(option, "auto") != 0) {
errno = 0;
@@ -643,7 +662,7 @@ void configure_icount(QemuOpts *opts, Error **errp)
} else if (icount_align_option) {
error_setg(errp, "shift=auto and align=on are incompatible");
} else if (!icount_sleep) {
error_setg(errp, "shift=auto and sleep=no are incompatible");
error_setg(errp, "shift=auto and sleep=off are incompatible");
}
use_icount = 2;
@@ -665,7 +684,7 @@ void configure_icount(QemuOpts *opts, Error **errp)
icount_adjust_vm, NULL);
timer_mod(icount_vm_timer,
qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
get_ticks_per_sec() / 10);
NANOSECONDS_PER_SECOND / 10);
}
/***********************************************************/
@@ -726,7 +745,7 @@ static int do_vm_stop(RunState state)
}
bdrv_drain_all();
ret = bdrv_flush_all();
ret = blk_flush_all();
return ret;
}
@@ -761,7 +780,7 @@ static void sigbus_reraise(void)
raise(SIGBUS);
sigemptyset(&set);
sigaddset(&set, SIGBUS);
sigprocmask(SIG_UNBLOCK, &set, NULL);
pthread_sigmask(SIG_UNBLOCK, &set, NULL);
}
perror("Failed to re-raise SIGBUS!\n");
abort();
@@ -953,6 +972,18 @@ void async_run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data)
qemu_cpu_kick(cpu);
}
static void qemu_kvm_destroy_vcpu(CPUState *cpu)
{
if (kvm_destroy_vcpu(cpu) < 0) {
error_report("kvm_destroy_vcpu failed");
exit(EXIT_FAILURE);
}
}
static void qemu_tcg_destroy_vcpu(CPUState *cpu)
{
}
static void flush_queued_work(CPUState *cpu)
{
struct qemu_work_item *wi;
@@ -995,9 +1026,6 @@ static void qemu_wait_io_event_common(CPUState *cpu)
static void qemu_tcg_wait_io_event(CPUState *cpu)
{
while (all_cpu_threads_idle()) {
/* Start accounting real time to the virtual clock if the CPUs
are idle. */
qemu_clock_warp(QEMU_CLOCK_VIRTUAL);
qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex);
}
@@ -1045,7 +1073,7 @@ static void *qemu_kvm_cpu_thread_fn(void *arg)
cpu->created = true;
qemu_cond_signal(&qemu_cpu_cond);
while (1) {
do {
if (cpu_can_run(cpu)) {
r = kvm_cpu_exec(cpu);
if (r == EXCP_DEBUG) {
@@ -1053,8 +1081,12 @@ static void *qemu_kvm_cpu_thread_fn(void *arg)
}
}
qemu_kvm_wait_io_event(cpu);
}
} while (!cpu->unplug || cpu_can_run(cpu));
qemu_kvm_destroy_vcpu(cpu);
cpu->created = false;
qemu_cond_signal(&qemu_cpu_cond);
qemu_mutex_unlock_iothread();
return NULL;
}
@@ -1108,6 +1140,7 @@ static void tcg_exec_all(void);
static void *qemu_tcg_cpu_thread_fn(void *arg)
{
CPUState *cpu = arg;
CPUState *remove_cpu = NULL;
rcu_register_thread();
@@ -1145,6 +1178,18 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
}
}
qemu_tcg_wait_io_event(QTAILQ_FIRST(&cpus));
CPU_FOREACH(cpu) {
if (cpu->unplug && !cpu_can_run(cpu)) {
remove_cpu = cpu;
break;
}
}
if (remove_cpu) {
qemu_tcg_destroy_vcpu(remove_cpu);
cpu->created = false;
qemu_cond_signal(&qemu_cpu_cond);
remove_cpu = NULL;
}
}
return NULL;
@@ -1301,6 +1346,21 @@ void resume_all_vcpus(void)
}
}
void cpu_remove(CPUState *cpu)
{
cpu->stop = true;
cpu->unplug = true;
qemu_cpu_kick(cpu);
}
void cpu_remove_sync(CPUState *cpu)
{
cpu_remove(cpu);
while (cpu->created) {
qemu_cond_wait(&qemu_cpu_cond, &qemu_global_mutex);
}
}
/* For temporary buffers for forming a name */
#define VCPU_THREAD_NAME_SIZE 16
@@ -1428,7 +1488,7 @@ int vm_stop_force_state(RunState state)
bdrv_drain_all();
/* Make sure to return an error if the flush in a previous vm_stop()
* failed. */
return bdrv_flush_all();
return blk_flush_all();
}
}
@@ -1499,7 +1559,7 @@ static void tcg_exec_all(void)
int r;
/* Account partial waits to QEMU_CLOCK_VIRTUAL. */
qemu_clock_warp(QEMU_CLOCK_VIRTUAL);
qemu_account_warp_timer();
if (next_cpu == NULL) {
next_cpu = first_cpu;
@@ -1517,6 +1577,9 @@ static void tcg_exec_all(void)
break;
}
} else if (cpu->stop || cpu->stopped) {
if (cpu->unplug) {
next_cpu = CPU_NEXT(cpu);
}
break;
}
}
@@ -1677,21 +1740,7 @@ exit:
void qmp_inject_nmi(Error **errp)
{
#if defined(TARGET_I386)
CPUState *cs;
CPU_FOREACH(cs) {
X86CPU *cpu = X86_CPU(cs);
if (!cpu->apic_state) {
cpu_interrupt(cs, CPU_INTERRUPT_NMI);
} else {
apic_deliver_nmi(cpu->apic_state);
}
}
#else
nmi_monitor_handle(monitor_get_cpu_index(), errp);
#endif
}
void dump_drift_info(FILE *f, fprintf_function cpu_fprintf)

107
cputlb.c
View File

@@ -28,10 +28,33 @@
#include "exec/memory-internal.h"
#include "exec/ram_addr.h"
#include "exec/exec-all.h"
#include "tcg/tcg.h"
//#define DEBUG_TLB
//#define DEBUG_TLB_CHECK
/* DEBUG defines, enable DEBUG_TLB_LOG to log to the CPU_LOG_MMU target */
/* #define DEBUG_TLB */
/* #define DEBUG_TLB_LOG */
#ifdef DEBUG_TLB
# define DEBUG_TLB_GATE 1
# ifdef DEBUG_TLB_LOG
# define DEBUG_TLB_LOG_GATE 1
# else
# define DEBUG_TLB_LOG_GATE 0
# endif
#else
# define DEBUG_TLB_GATE 0
# define DEBUG_TLB_LOG_GATE 0
#endif
#define tlb_debug(fmt, ...) do { \
if (DEBUG_TLB_LOG_GATE) { \
qemu_log_mask(CPU_LOG_MMU, "%s: " fmt, __func__, \
## __VA_ARGS__); \
} else if (DEBUG_TLB_GATE) { \
fprintf(stderr, "%s: " fmt, __func__, ## __VA_ARGS__); \
} \
} while (0)
/* statistics */
int tlb_flush_count;
@@ -52,12 +75,7 @@ void tlb_flush(CPUState *cpu, int flush_global)
{
CPUArchState *env = cpu->env_ptr;
#if defined(DEBUG_TLB)
printf("tlb_flush:\n");
#endif
/* must reset current TB so that interrupts cannot modify the
links while we are modifying them */
cpu->current_tb = NULL;
tlb_debug("(%d)\n", flush_global);
memset(env->tlb_table, -1, sizeof(env->tlb_table));
memset(env->tlb_v_table, -1, sizeof(env->tlb_v_table));
@@ -73,12 +91,7 @@ static inline void v_tlb_flush_by_mmuidx(CPUState *cpu, va_list argp)
{
CPUArchState *env = cpu->env_ptr;
#if defined(DEBUG_TLB)
printf("tlb_flush_by_mmuidx:");
#endif
/* must reset current TB so that interrupts cannot modify the
links while we are modifying them */
cpu->current_tb = NULL;
tlb_debug("start\n");
for (;;) {
int mmu_idx = va_arg(argp, int);
@@ -87,18 +100,12 @@ static inline void v_tlb_flush_by_mmuidx(CPUState *cpu, va_list argp)
break;
}
#if defined(DEBUG_TLB)
printf(" %d", mmu_idx);
#endif
tlb_debug("%d\n", mmu_idx);
memset(env->tlb_table[mmu_idx], -1, sizeof(env->tlb_table[0]));
memset(env->tlb_v_table[mmu_idx], -1, sizeof(env->tlb_v_table[0]));
}
#if defined(DEBUG_TLB)
printf("\n");
#endif
memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache));
}
@@ -128,22 +135,17 @@ void tlb_flush_page(CPUState *cpu, target_ulong addr)
int i;
int mmu_idx;
#if defined(DEBUG_TLB)
printf("tlb_flush_page: " TARGET_FMT_lx "\n", addr);
#endif
tlb_debug("page :" TARGET_FMT_lx "\n", addr);
/* Check if we need to flush due to large pages. */
if ((addr & env->tlb_flush_mask) == env->tlb_flush_addr) {
#if defined(DEBUG_TLB)
printf("tlb_flush_page: forced full flush ("
TARGET_FMT_lx "/" TARGET_FMT_lx ")\n",
env->tlb_flush_addr, env->tlb_flush_mask);
#endif
tlb_debug("forcing full flush ("
TARGET_FMT_lx "/" TARGET_FMT_lx ")\n",
env->tlb_flush_addr, env->tlb_flush_mask);
tlb_flush(cpu, 1);
return;
}
/* must reset current TB so that interrupts cannot modify the
links while we are modifying them */
cpu->current_tb = NULL;
addr &= TARGET_PAGE_MASK;
i = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
@@ -170,23 +172,18 @@ void tlb_flush_page_by_mmuidx(CPUState *cpu, target_ulong addr, ...)
va_start(argp, addr);
#if defined(DEBUG_TLB)
printf("tlb_flush_page_by_mmu_idx: " TARGET_FMT_lx, addr);
#endif
tlb_debug("addr "TARGET_FMT_lx"\n", addr);
/* Check if we need to flush due to large pages. */
if ((addr & env->tlb_flush_mask) == env->tlb_flush_addr) {
#if defined(DEBUG_TLB)
printf(" forced full flush ("
TARGET_FMT_lx "/" TARGET_FMT_lx ")\n",
env->tlb_flush_addr, env->tlb_flush_mask);
#endif
tlb_debug("forced full flush ("
TARGET_FMT_lx "/" TARGET_FMT_lx ")\n",
env->tlb_flush_addr, env->tlb_flush_mask);
v_tlb_flush_by_mmuidx(cpu, argp);
va_end(argp);
return;
}
/* must reset current TB so that interrupts cannot modify the
links while we are modifying them */
cpu->current_tb = NULL;
addr &= TARGET_PAGE_MASK;
i = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
@@ -198,9 +195,7 @@ void tlb_flush_page_by_mmuidx(CPUState *cpu, target_ulong addr, ...)
break;
}
#if defined(DEBUG_TLB)
printf(" %d", mmu_idx);
#endif
tlb_debug("idx %d\n", mmu_idx);
tlb_flush_entry(&env->tlb_table[mmu_idx][i], addr);
@@ -211,10 +206,6 @@ void tlb_flush_page_by_mmuidx(CPUState *cpu, target_ulong addr, ...)
}
va_end(argp);
#if defined(DEBUG_TLB)
printf("\n");
#endif
tb_flush_jmp_cache(cpu, addr);
}
@@ -255,7 +246,8 @@ static inline ram_addr_t qemu_ram_addr_from_host_nofail(void *ptr)
{
ram_addr_t ram_addr;
if (qemu_ram_addr_from_host(ptr, &ram_addr) == NULL) {
ram_addr = qemu_ram_addr_from_host(ptr);
if (ram_addr == RAM_ADDR_INVALID) {
fprintf(stderr, "Bad ram pointer %p\n", ptr);
abort();
}
@@ -367,12 +359,9 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
section = address_space_translate_for_iotlb(cpu, asidx, paddr, &xlat, &sz);
assert(sz >= TARGET_PAGE_SIZE);
#if defined(DEBUG_TLB)
qemu_log_mask(CPU_LOG_MMU,
"tlb_set_page: vaddr=" TARGET_FMT_lx " paddr=0x" TARGET_FMT_plx
" prot=%x idx=%d\n",
vaddr, paddr, prot, mmu_idx);
#endif
tlb_debug("vaddr=" TARGET_FMT_lx " paddr=0x" TARGET_FMT_plx
" prot=%x idx=%d\n",
vaddr, paddr, prot, mmu_idx);
address = vaddr;
if (!memory_region_is_ram(section->mr) && !memory_region_is_romd(section->mr)) {
@@ -416,8 +405,8 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong vaddr,
/* Write access calls the I/O callback. */
te->addr_write = address | TLB_MMIO;
} else if (memory_region_is_ram(section->mr)
&& cpu_physical_memory_is_clean(section->mr->ram_addr
+ xlat)) {
&& cpu_physical_memory_is_clean(
memory_region_get_ram_addr(section->mr) + xlat)) {
te->addr_write = address | TLB_NOTDIRTY;
} else {
te->addr_write = address;

View File

@@ -8,6 +8,23 @@ crypto-obj-y += tlscredsanon.o
crypto-obj-y += tlscredsx509.o
crypto-obj-y += tlssession.o
crypto-obj-y += secret.o
crypto-obj-$(CONFIG_GCRYPT) += random-gcrypt.o
crypto-obj-$(if $(CONFIG_GCRYPT),n,$(CONFIG_GNUTLS_RND)) += random-gnutls.o
crypto-obj-y += pbkdf.o
crypto-obj-$(CONFIG_NETTLE_KDF) += pbkdf-nettle.o
crypto-obj-$(if $(CONFIG_NETTLE_KDF),n,$(CONFIG_GCRYPT_KDF)) += pbkdf-gcrypt.o
crypto-obj-y += ivgen.o
crypto-obj-y += ivgen-essiv.o
crypto-obj-y += ivgen-plain.o
crypto-obj-y += ivgen-plain64.o
crypto-obj-y += afsplit.o
crypto-obj-y += xts.o
crypto-obj-y += block.o
crypto-obj-y += block-qcow.o
crypto-obj-y += block-luks.o
# Let the userspace emulators avoid linking gnutls/etc
crypto-aes-obj-y = aes.o
stub-obj-y += random-stub.o
stub-obj-y += pbkdf-stub.o

159
crypto/afsplit.c Normal file
View File

@@ -0,0 +1,159 @@
/*
* QEMU Crypto anti forensic information splitter
*
* Copyright (c) 2015-2016 Red Hat, Inc.
*
* Derived from cryptsetup package lib/luks1/af.c
*
* Copyright (C) 2004, Clemens Fruhwirth <clemens@endorphin.org>
* Copyright (C) 2009-2012, Red Hat, Inc. All rights reserved.
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This library is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* Lesser General Public License for more details.
*
* You should have received a copy of the GNU Lesser General Public
* License along with this library; if not, see <http://www.gnu.org/licenses/>.
*
*/
#include "qemu/osdep.h"
#include "qemu/bswap.h"
#include "crypto/afsplit.h"
#include "crypto/random.h"
static void qcrypto_afsplit_xor(size_t blocklen,
const uint8_t *in1,
const uint8_t *in2,
uint8_t *out)
{
size_t i;
for (i = 0; i < blocklen; i++) {
out[i] = in1[i] ^ in2[i];
}
}
static int qcrypto_afsplit_hash(QCryptoHashAlgorithm hash,
size_t blocklen,
uint8_t *block,
Error **errp)
{
size_t digestlen = qcrypto_hash_digest_len(hash);
size_t hashcount = blocklen / digestlen;
size_t finallen = blocklen % digestlen;
uint32_t i;
if (finallen) {
hashcount++;
} else {
finallen = digestlen;
}
for (i = 0; i < hashcount; i++) {
uint8_t *out = NULL;
size_t outlen = 0;
uint32_t iv = cpu_to_be32(i);
struct iovec in[] = {
{ .iov_base = &iv,
.iov_len = sizeof(iv) },
{ .iov_base = block + (i * digestlen),
.iov_len = (i == (hashcount - 1)) ? finallen : digestlen },
};
if (qcrypto_hash_bytesv(hash,
in,
G_N_ELEMENTS(in),
&out, &outlen,
errp) < 0) {
return -1;
}
assert(outlen == digestlen);
memcpy(block + (i * digestlen), out,
(i == (hashcount - 1)) ? finallen : digestlen);
g_free(out);
}
return 0;
}
int qcrypto_afsplit_encode(QCryptoHashAlgorithm hash,
size_t blocklen,
uint32_t stripes,
const uint8_t *in,
uint8_t *out,
Error **errp)
{
uint8_t *block = g_new0(uint8_t, blocklen);
size_t i;
int ret = -1;
for (i = 0; i < (stripes - 1); i++) {
if (qcrypto_random_bytes(out + (i * blocklen), blocklen, errp) < 0) {
goto cleanup;
}
qcrypto_afsplit_xor(blocklen,
out + (i * blocklen),
block,
block);
if (qcrypto_afsplit_hash(hash, blocklen, block,
errp) < 0) {
goto cleanup;
}
}
qcrypto_afsplit_xor(blocklen,
in,
block,
out + (i * blocklen));
ret = 0;
cleanup:
g_free(block);
return ret;
}
int qcrypto_afsplit_decode(QCryptoHashAlgorithm hash,
size_t blocklen,
uint32_t stripes,
const uint8_t *in,
uint8_t *out,
Error **errp)
{
uint8_t *block = g_new0(uint8_t, blocklen);
size_t i;
int ret = -1;
for (i = 0; i < (stripes - 1); i++) {
qcrypto_afsplit_xor(blocklen,
in + (i * blocklen),
block,
block);
if (qcrypto_afsplit_hash(hash, blocklen, block,
errp) < 0) {
goto cleanup;
}
}
qcrypto_afsplit_xor(blocklen,
in + (i * blocklen),
block,
out);
ret = 0;
cleanup:
g_free(block);
return ret;
}

1330
crypto/block-luks.c Normal file

File diff suppressed because it is too large Load Diff

28
crypto/block-luks.h Normal file
View File

@@ -0,0 +1,28 @@
/*
* QEMU Crypto block device encryption LUKS format
*
* Copyright (c) 2015-2016 Red Hat, Inc.
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
* License as published by the Free Software Foundation; either
* version 2 of the License, or (at your option) any later version.
*
* This library is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* Lesser General Public License for more details.
*
* You should have received a copy of the GNU Lesser General Public
* License along with this library; if not, see <http://www.gnu.org/licenses/>.
*
*/
#ifndef QCRYPTO_BLOCK_LUKS_H__
#define QCRYPTO_BLOCK_LUKS_H__
#include "crypto/blockpriv.h"
extern const QCryptoBlockDriver qcrypto_block_driver_luks;
#endif /* QCRYPTO_BLOCK_LUKS_H__ */

Some files were not shown because too many files have changed in this diff Show More