Compare commits

..

69 Commits

Author SHA1 Message Date
Bruce Rogers
265aa090c4 roms/Makefile: pass a packaging timestamp to subpackages with date info
Certain rom subpackages build from qemu git-submodules call the date
program to include date information in the packaged binaries. This
causes repeated builds of the package to be different, wkere the only
real difference is due to the fact that time build timestamp has
changed. To promote reproducible builds and avoid customers being
prompted to update packages needlessly, we'll use the timestamp of the
VERSION file as the packaging timestamp for all packages that build in a
timestamp for whatever reason.

[BR: BSC#1011213]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2016-11-21 08:32:41 -07:00
P J P
b7f162a686 net: imx: limit buffer descriptor count
i.MX Fast Ethernet Controller uses buffer descriptors to manage
data flow to/fro receive & transmit queues. While transmitting
packets, it could continue to read buffer descriptors if a buffer
descriptor has length of zero and has crafted values in bd.flags.
Set an upper limit to number of buffer descriptors.

Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
[BR: CVE-2016-7907 BSC#1002549]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2016-11-10 14:46:54 -07:00
P J P
491b61b48c dma: rc4030: limit interval timer reload value
The JAZZ RC4030 chipset emulator has a periodic timer and
associated interval reload register. The reload value is used
as divider when computing timer's next tick value. If reload
value is large, it could lead to divide by zero error. Limit
the interval reload value to avoid it.

Reported-by: Huawei PSIRT <psirt@huawei.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
[BR: CVE-2016-8667 BSC#1004702]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2016-11-10 14:39:33 -07:00
Li Qiang
a3ada2d4ba 9pfs: fix integer overflow issue in xattr read/write
The v9fs_xattr_read() and v9fs_xattr_write() are passed a guest
originated offset: they must ensure this offset does not go beyond
the size of the extended attribute that was set in v9fs_xattrcreate().
Unfortunately, the current code implement these checks with unsafe
calculations on 32 and 64 bit values, which may allow a malicious
guest to cause OOB access anyway.

Fix this by comparing the offset and the xattr size, which are
both uint64_t, before trying to compute the effective number of bytes
to read or write.

Suggested-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Li Qiang <liqiang6-s@360.cn>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-By: Guido Günther <agx@sigxcpu.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 7e55d65c56)
[BR: CVE-2016-9104 BSC#1007493]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2016-11-10 13:55:08 -07:00
Li Qiang
6562305928 virtio-gpu: fix memory leak in virtio_gpu_resource_create_2d
In virtio gpu resource create dispatch, if the pixman format is zero
it doesn't free the resource object allocated previously. Thus leading
a host memory leak issue. This patch avoid this.

Signed-off-by: Li Qiang <liqiang6-s@360.cn>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 57df486e.8379240a.c3620.ff81@mx.google.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit cb3a0522b6)
[BR: CVE-2016-7994 BSC#1003613]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2016-11-10 13:54:49 -07:00
Prasad J Pandit
1f01b4d6f3 audio: intel-hda: check stream entry count during transfer
Intel HDA emulator uses stream of buffers during DMA data
transfers. Each entry has buffer length and buffer pointer
position, which are used to derive bytes to 'copy'. If this
length and buffer pointer were to be same, 'copy' could be
set to zero(0), leading to an infinite loop. Add check to
avoid it.

Reported-by: Huawei PSIRT <psirt@huawei.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1476949224-6865-1-git-send-email-ppandit@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 0c0fc2b5fd)
[BR: CVE-2016-8909 BSC#1006536]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2016-11-10 13:54:30 -07:00
Prasad J Pandit
854b5adf36 net: rtl8139: limit processing of ring descriptors
RTL8139 ethernet controller in C+ mode supports multiple
descriptor rings, each with maximum of 64 descriptors. While
processing transmit descriptor ring in 'rtl8139_cplus_transmit',
it does not limit the descriptor count and runs forever. Add
check to avoid it.

Reported-by: Andrew Henderson <hendersa@icculus.org>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit c7c3591669)
[BR: CVE-2016-8910 BSC#1006538]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2016-11-10 13:54:07 -07:00
Li Qiang
d77a9e7e19 net: vmxnet: initialise local tx descriptor
In Vmxnet3 device emulator while processing transmit(tx) queue,
when it reaches end of packet, it calls vmxnet3_complete_packet.
In that local 'txcq_descr' object is not initialised, which could
leak host memory bytes a guest.

Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Reviewed-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit fdda170e50)
[BR: CVE-2016-6836 BSC#994760]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2016-11-10 13:52:50 -07:00
Prasad J Pandit
9999bb270b net: rocker: set limit to DMA buffer size
Rocker network switch emulator has test registers to help debug
DMA operations. While testing host DMA access, a buffer address
is written to register 'TEST_DMA_ADDR' and its size is written to
register 'TEST_DMA_SIZE'. When performing TEST_DMA_CTRL_INVERT
test, if DMA buffer size was greater than 'INT_MAX', it leads to
an invalid buffer access. Limit the DMA buffer size to avoid it.

Reported-by: Huawei PSIRT <psirt@huawei.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit 8caed3d564)
[BR: CVE-2016-8668 BSC#1004706]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2016-11-10 13:52:09 -07:00
Li Qiang
c266d99908 net: eepro100: fix memory leak in device uninit
The exit dispatch of eepro100 network card device doesn't free
the 's->vmstate' field which was allocated in device realize thus
leading a host memory leak. This patch avoid this.

Signed-off-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit 2634ab7fe2)
[BR: CVE-2016-9101 BSC#1007391]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2016-11-10 13:51:44 -07:00
Prasad J Pandit
ac4e972999 net: pcnet: check rx/tx descriptor ring length
The AMD PC-Net II emulator has set of control and status(CSR)
registers. Of these, CSR76 and CSR78 hold receive and transmit
descriptor ring length respectively. This ring length could range
from 1 to 65535. Setting ring length to zero leads to an infinite
loop in pcnet_rdra_addr() or pcnet_transmit(). Add check to avoid it.

Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit 34e29ce754)
[BR: CVE-2016-7909 BSC#1002557]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2016-11-10 13:51:23 -07:00
Prasad J Pandit
5a47222773 char: serial: check divider value against baud base
16550A UART device uses an oscillator to generate frequencies
(baud base), which decide communication speed. This speed could
be changed by dividing it by a divider. If the divider is
greater than the baud base, speed is set to zero, leading to a
divide by zero error. Add check to avoid it.

Reported-by: Huawei PSIRT <psirt@huawei.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <1476251888-20238-1-git-send-email-ppandit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 3592fe0c91)
[BR: CVE-2016-8669 BSC#1004707]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2016-11-10 13:50:55 -07:00
Li Qiang
1dd9e4b00e 9pfs: fix memory leak in v9fs_write
If an error occurs when marshalling the transfer length to the guest, the
v9fs_write() function doesn't free an IO vector, thus leading to a memory
leak. This patch fixes the issue.

Signed-off-by: Li Qiang <liqiang6-s@360.cn>
Reviewed-by: Greg Kurz <groug@kaod.org>
[groug, rephrased the changelog]
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit fdfcc9aeea)
[BR: CVE-2016-9106 BSC#1007495]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2016-11-10 13:50:26 -07:00
Li Qiang
61eb543d36 9pfs: fix memory leak in v9fs_xattrcreate
The 'fs.xattr.value' field in V9fsFidState object doesn't consider the
situation that this field has been allocated previously. Every time, it
will be allocated directly. This leads to a host memory leak issue if
the client sends another Txattrcreate message with the same fid number
before the fid from the previous time got clunked.

Signed-off-by: Li Qiang <liqiang6-s@360.cn>
Reviewed-by: Greg Kurz <groug@kaod.org>
[groug, updated the changelog to indicate how the leak can occur]
Signed-off-by: Greg Kurz <groug@kaod.org>

(cherry picked from commit ff55e94d23)
[BR: CVE-2016-9102 BSC#1007450]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2016-11-10 13:48:55 -07:00
Li Qiang
9f8a42e3f3 9pfs: fix information leak in xattr read
9pfs uses g_malloc() to allocate the xattr memory space, if the guest
reads this memory before writing to it, this will leak host heap memory
to the guest. This patch avoid this.

Signed-off-by: Li Qiang <liqiang6-s@360.cn>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit eb68760285)
[BR: CVE-2016-9103 BSC#1007454]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2016-11-10 13:46:30 -07:00
Li Qiang
5f29f9ab1d 9pfs: fix potential host memory leak in v9fs_read
In 9pfs read dispatch function, it doesn't free two QEMUIOVector
object thus causing potential memory leak. This patch avoid this.

Signed-off-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit e95c9a493a)
[BR: CVE-2016-8577 BSC#1003893]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2016-11-10 13:44:51 -07:00
Li Qiang
9f7f59799e 9pfs: fix memory leak in v9fs_link
The v9fs_link() function keeps a reference on the source fid object. This
causes a memory leak since the reference never goes down to 0. This patch
fixes the issue.

Signed-off-by: Li Qiang <liqiang6-s@360.cn>
Reviewed-by: Greg Kurz <groug@kaod.org>
[groug, rephrased the changelog]
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 4c1586787f)
[BR: CVE-2016-9105 BSC#1007494]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2016-11-10 13:44:12 -07:00
Li Qiang
2d4128223e 9pfs: allocate space for guest originated empty strings
If a guest sends an empty string paramater to any 9P operation, the current
code unmarshals it into a V9fsString equal to { .size = 0, .data = NULL }.

This is unfortunate because it can cause NULL pointer dereference to happen
at various locations in the 9pfs code. And we don't want to check str->data
everywhere we pass it to strcmp() or any other function which expects a
dereferenceable pointer.

This patch enforces the allocation of genuine C empty strings instead, so
callers don't have to bother.

Out of all v9fs_iov_vunmarshal() users, only v9fs_xattrwalk() checks if
the returned string is empty. It now uses v9fs_string_size() since
name.data cannot be NULL anymore.

Signed-off-by: Li Qiang <liqiang6-s@360.cn>
[groug, rewritten title and changelog,
 fix empty string check in v9fs_xattrwalk()]
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit ba42ebb863)
[BR: CVE-2016-8578 BSC#1003894]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2016-11-10 13:43:41 -07:00
Gerd Hoffmann
8e5cea1968 xhci: limit the number of link trbs we are willing to process
Needed to avoid we run in circles forever in case the guest builds
an endless loop with link trbs.

Reported-by: Li Qiang <liqiang6-s@360.cn>
Tested-by: P J P <ppandit@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 1476096382-7981-1-git-send-email-kraxel@redhat.com
(cherry picked from commit 05f43d44e4)
[BR: CVE-2016-8576 BSC#1003878]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2016-11-10 13:43:16 -07:00
Li Qiang
9d2c9efdb4 usb: ehci: fix memory leak in ehci_process_itd
While processing isochronous transfer descriptors(iTD), if the page
select(PG) field value is out of bands it will return. In this
situation the ehci's sg list is not freed thus leading to a memory
leak issue. This patch avoid this.

Signed-off-by: Li Qiang <liqiang6-s@360.cn>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(cherry picked from commit b16c129daf)
[BR: CVE-2016-7995 BSC#1003612]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2016-11-10 13:42:43 -07:00
Prasad J Pandit
60f6f3204d net: mcf: limit buffer descriptor count
ColdFire Fast Ethernet Controller uses buffer descriptors to manage
data flow to/fro receive & transmit queues. While transmitting
packets, it could continue to read buffer descriptors if a buffer
descriptor has length of zero and has crafted values in bd.flags.
Set upper limit to number of buffer descriptors.

Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit 070c4b92b8)
[BR: CVE-2016-7908 BSC#1002550]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2016-11-10 13:42:08 -07:00
Prasad J Pandit
db87d12d0e virtio: add check for descriptor's mapped address
virtio back end uses set of buffers to facilitate I/O operations.
If its size is too large, 'cpu_physical_memory_map' could return
a null address. This would result in a null dereference while
un-mapping descriptors. Add check to avoid it.

Reported-by: Qinghao Tang <luodalongde@gmail.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
(cherry picked from commit 973e7170dd)
[BR: CVE-2016-7422 BSC#1000346]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2016-11-10 13:41:44 -07:00
Prasad J Pandit
a6cfc94b9a scsi: pvscsi: limit process IO loop to ring size
Vmware Paravirtual SCSI emulator while processing IO requests
could run into an infinite loop if 'pvscsi_ring_pop_req_descr'
always returned positive value. Limit IO loop to the ring size.

Cc: qemu-stable@nongnu.org
Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <1473845952-30785-1-git-send-email-ppandit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit d251157ac1)
[BR: CVE-2016-7421 BSC#999661]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2016-11-10 13:40:38 -07:00
Li Qiang
9115b36311 scsi: mptsas: use g_new0 to allocate MPTSASRequest object
When processing IO request in mptsas, it uses g_new to allocate
a 'req' object. If an error occurs before 'req->sreq' is
allocated, It could lead to an OOB write in mptsas_free_request
function. Use g_new0 to avoid it.

Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <1473684251-17476-1-git-send-email-ppandit@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 670e56d3ed)
[BR: CVE-2016-7423 BSC#1000397]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2016-11-10 13:37:00 -07:00
Li Qiang
c559aa3037 usb:xhci:fix memory leak in usb_xhci_exit
If the xhci uses msix, it doesn't free the corresponding
memory, thus leading a memory leak. This patch avoid this.

Signed-off-by: Li Qiang <liqiang6-s@360.cn>
Message-id: 57d7d2e0.d4301c0a.d13e9.9a55@mx.google.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit b53dd4495c)
[BR: CVE-2016-7466 BSC#1000345]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2016-11-10 13:34:03 -07:00
Prasad J Pandit
c08b11cce7 scsi: pvscsi: limit loop to fetch SG list
In PVSCSI paravirtual SCSI bus, pvscsi_convert_sglist can take a very
long time or go into an infinite loop due to two different bugs:

1) the request descriptor data length is defined to be 64 bit. While
building SG list from a request descriptor, it gets truncated to 32bit
in routine 'pvscsi_convert_sglist'. This could lead to an infinite loop
situation large 'dataLen' values when data_length is cast to uint32_t and
chunk_size becomes always zero.  Fix this by removing the incorrect cast.

2) pvscsi_get_next_sg_elem can be called arbitrarily many times if the
element has a zero length.  Get out of the loop early when this happens,
by introducing an upper limit on the number of SG list elements.

Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <1473108643-12983-1-git-send-email-ppandit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 49adc5d3f8)
[BR: CVE-2016-7156 BSC#997859]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2016-11-10 13:32:55 -07:00
Paolo Bonzini
3e3bf236d5 scsi: mptconfig: fix misuse of MPTSAS_CONFIG_PACK
These issues cause respectively a QEMU crash and a leak of 2 bytes of
stack.  They were discovered by VictorV of 360 Marvel Team.

Reported-by: Tom Victor <i-tangtianwen@360.cm>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 65a8e1f641)
[BR: CVE-2016-7157 BSC#997860]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2016-11-10 13:32:48 -07:00
Prasad J Pandit
eccd42e2e9 scsi: mptconfig: fix an assert expression
When LSI SAS1068 Host Bus emulator builds configuration page
headers, mptsas_config_pack() should assert that the size
fits in a byte.  However, the size is expressed in 32-bit
units, so up to 1020 bytes fit.  The assertion was only
allowing replies up to 252 bytes, so fix it.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <1472645167-30765-2-git-send-email-ppandit@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit cf2bce203a)
[BR: CVE-2016-7157 BSC#997860]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2016-11-10 13:32:39 -07:00
Prasad J Pandit
fd5aa800d1 vmsvga: correct bitmap and pixmap size checks
When processing svga command DEFINE_CURSOR in vmsvga_fifo_run,
the computed BITMAP and PIXMAP size are checked against the
'cursor.mask[]' and 'cursor.image[]' array sizes in bytes.
Correct these checks to avoid OOB memory access.

Reported-by: Qinghao Tang <luodalongde@gmail.com>
Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-id: 1473338754-15430-1-git-send-email-ppandit@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 167d97a3de)
[BR: CVE-2016-7170 BSC#998516]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2016-11-10 13:32:36 -07:00
e2e103eaa7 linux-user: remove all traces of qemu from /proc/self/cmdline
Instead of post-processing the real contents use the remembered target
argv.  That removes all traces of qemu, including command line options,
and handles QEMU_ARGV0.

Signed-off-by: Andreas Schwab <schwab@suse.de>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2016-11-07 17:11:36 +01:00
Benjamin Herrenschmidt
803968c258 Fix tlb_vaddr_to_host with CONFIG_USER_ONLY
We use the wrong argument name for the g2h() macro !

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Tested-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2016-09-22 17:02:24 +02:00
7c9a134065 linux-user: properly test for infinite timeout in poll (#8)
After "linux-user: use target_ulong" the poll syscall was no longer
handling infinite timeout.

/home/abuild/rpmbuild/BUILD/qemu-2.7.0-rc5/linux-user/syscall.c:9773:26: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
                 if (arg3 >= 0) {
                          ^~

Signed-off-by: Andreas Schwab <schwab@suse.de>
2016-09-21 15:55:13 +02:00
markkp
0b135a5863 configure: Fix detection of seccomp on s390x
Signed-off-by: Mark Post <mpost@suse.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2016-09-21 15:55:13 +02:00
3a45e30cfe qemu-binfmt-conf: use qemu-ARCH-binfmt
Signed-off-by: Andreas Schwab <schwab@suse.de>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2016-09-21 15:55:13 +02:00
Bruce Rogers
5219d096e1 qemu-bridge-helper: reduce security profile
Change from using glib alloc and free routines to those
from libc. Also perform safety measure of dropping privs
to user if configured no-caps.

[BR: BOO#988279]
Signed-off-by: Bruce Rogers <brogers@suse.com>
[AF: Rebased for v2.7.0-rc2]
Signed-off-by: Andreas Färber <afaerber@suse.de>
2016-09-21 15:55:13 +02:00
Olaf Hering
21e9a3360b xen: SUSE xenlinux unplug for emulated PCI
Implement SUSE specific unplug protocol for emulated PCI devices
in PVonHVM guests
(bsc#953339, bsc#953362, bsc#953518, bsc#984981)

Signed-off-by: Olaf Hering <ohering@suse.de>
2016-09-21 15:55:13 +02:00
Bruce Rogers
d464395f48 xen_disk: Add suse specific flush disable handling and map to QEMU equiv
Add code to read the suse specific suse-diskcache-disable-flush flag out
of xenstore, and set the equivalent flag within QEMU.

Patch taken from Xen's patch queue, Olaf Hering being the original author.
[bsc#879425]

Signed-off-by: Bruce Rogers <brogers@suse.com>
Signed-off-by: Olaf Hering <olaf@aepfle.de>
2016-09-10 11:25:03 +02:00
Alexander Graf
aacebb4ff8 dictzip: Fix on big endian systems
The dictzip code in SLE11 received some treatment over time to support
running on big endian hosts. Somewhere in the transition to SLE12 this
support got lost. Add it back in again from the SLE11 code base.

Furthermore while at it, fix up the debug prints to not emit warnings.

[AG: BSC#937572]
Signed-off-by: Alexander Graf <agraf@suse.de>
[AF: Rebased for v2.7.0-rc2]
Signed-off-by: Andreas Färber <afaerber@suse.de>
2016-09-10 11:25:03 +02:00
Alexander Graf
4aa17b7cf5 AIO: Reduce number of threads for 32bit hosts
On hosts with limited virtual address space (32bit pointers), we can very
easily run out of virtual memory with big thread pools.

Instead, we should limit ourselves to small pools to keep memory footprint
low on those systems.

This patch fixes random VM stalls like

  (process:25114): GLib-ERROR **: gmem.c:103: failed to allocate 1048576 bytes

on 32bit ARM systems for me.

Signed-off-by: Alexander Graf <agraf@suse.de>
2016-09-10 11:25:03 +02:00
Dinar Valeev
4203277655 configure: Enable PIE for ppc and ppc64 hosts
Signed-off-by: Dinar Valeev <dvaleev@suse.com>
[AF: Rebased for v1.7]
Signed-off-by: Andreas Färber <afaerber@suse.de>
2016-09-10 11:25:03 +02:00
Bruce Rogers
0e73e519a0 virtfs-proxy-helper: Provide __u64 for broken sys/capability.h
Fixes the build on SLE 11 SP2.

[AF: Extend to ppc64]
2016-09-10 11:25:03 +02:00
Alexander Graf
fd4fc533fb linux-user: lseek: explicitly cast non-set offsets to signed
When doing lseek, SEEK_SET indicates that the offset is an unsigned variable.
Other seek types have parameters that can be negative.

When converting from 32bit to 64bit parameters, we need to take this into
account and enable SEEK_END and SEEK_CUR to be negative, while SEEK_SET stays
absolute positioned which we need to maintain as unsigned.

Signed-off-by: Alexander Graf <agraf@suse.de>
2016-09-10 11:25:03 +02:00
Alexander Graf
a6a54eb0ce Make char muxer more robust wrt small FIFOs
Virtio-Console can only process one character at a time. Using it on S390
gave me strage "lags" where I got the character I pressed before when
pressing one. So I typed in "abc" and only received "a", then pressed "d"
but the guest received "b" and so on.

While the stdio driver calls a poll function that just processes on its
queue in case virtio-console can't take multiple characters at once, the
muxer does not have such callbacks, so it can't empty its queue.

To work around that limitation, I introduced a new timer that only gets
active when the guest can not receive any more characters. In that case
it polls again after a while to check if the guest is now receiving input.

This patch fixes input when using -nographic on s390 for me.

[AF: Rebased for v2.7.0-rc2]
2016-09-10 11:25:03 +02:00
Alexander Graf
b00ff88b97 console: add question-mark escape operator
Some termcaps (found using SLES11SP1) use [? sequences. According to man
console_codes (http://linux.die.net/man/4/console_codes) the question mark
is a nop and should simply be ignored.

This patch does exactly that, rendering screen output readable when
outputting guest serial consoles to the graphical console emulator.

Signed-off-by: Alexander Graf <agraf@suse.de>
2016-09-10 11:25:03 +02:00
Alexander Graf
674ccdfa8c Legacy Patch kvm-qemu-preXX-dictzip3.patch 2016-09-10 11:25:03 +02:00
Alexander Graf
7c81e618f5 block: Add tar container format
Tar is a very widely used format to store data in. Sometimes people even put
virtual machine images in there.

So it makes sense for qemu to be able to read from tar files. I implemented a
written from scratch reader that also knows about the GNU sparse format, which
is what pigz creates.

This version checks for filenames that end on well-known extensions. The logic
could be changed to search for filenames given on the command line, but that
would require changes to more parts of qemu.

The tar reader in conjunctiuon with dzip gives us the chance to download
tar'ed up virtual machine images (even via http) and instantly make use of
them.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Bruce Rogers <brogers@novell.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
[TH: Use bdrv_open options instead of filename]
Signed-off-by: Tim Hardeck <thardeck@suse.de>
[AF: bdrv_file_open got an Error **errp argument, bdrv_delete -> brd_unref]
[AF: qemu_opts_create_nofail() -> qemu_opts_create(),
     bdrv_file_open() -> bdrv_open(), based on work by brogers]
[AF: error_is_set() dropped for v2.1.0-rc0]
[AF: BlockDriverAIOCB -> BlockAIOCB,
     BlockDriverCompletionFunc -> BlockCompletionFunc,
     qemu_aio_release() -> qemu_aio_unref(),
     drop tar_aio_cancel()]
[AF: common-obj-y -> block-obj-y, drop probe hook (bsc#945778)]
[AF: Drop bdrv_open() drv parameter for 2.5]
Signed-off-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Bruce Rogers <brogers@suse.com>
[AF: Changed bdrv_open() bs parameter and return value for v2.7.0-rc2,
     for bdrv_pread() and bdrv_aio_readv() s/s->hd/s->hd->file/]
Signed-off-by: Andreas Färber <afaerber@suse.de>
2016-09-10 11:25:03 +02:00
Alexander Graf
975ac12982 block: Add support for DictZip enabled gzip files
DictZip is an extension to the gzip format that allows random seeks in gzip
compressed files by cutting the file into pieces and storing the piece offsets
in the "extra" header of the gzip format.

Thanks to that extension, we can use gzip compressed files as block backend,
though only in read mode.

This makes a lot of sense when stacked with tar files that can then be shipped
to VM users. If a VM image is inside a tar file that is inside a DictZip
enabled gzip file, the user can run the tar.gz file as is without having to
extract the image first.

Tar patch follows.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Bruce Rogers <brogers@novell.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
[TH: Use bdrv_open options instead of filename]
Signed-off-by: Tim Hardeck <thardeck@suse.de>
[AF: Error **errp added for bdrv_file_open, bdrv_delete -> bdrv_unref]
[AF: qemu_opts_create_nofail() -> qemu_opts_create(),
     bdrv_file_open() -> bdrv_open(), based on work by brogers]
[AF: error_is_set() dropped for v2.1.0-rc0]
[AF: BlockDriverAIOCB -> BlockAIOCB,
     BlockDriverCompletionFunc -> BlockCompletionFunc,
     qemu_aio_release() -> qemu_aio_unref(),
     drop dictzip_aio_cancel()]
[AF: common-obj-y -> block-obj-y, drop probe hook (bsc#945778)]
[AF: Drop bdrv_open() drv parameter for 2.5]
Signed-off-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Bruce Rogers <brogers@suse.com>
[AF: Drop bdrv_open() bs parameter and change return value for v2.7.0-rc2,
     for bdrv_pread() and bdrv_aio_readv() do s/s->hd/s->hd->file/]
Signed-off-by: Andreas Färber <afaerber@suse.de>
2016-09-10 11:25:03 +02:00
Alexander Graf
d84e1f7cb1 linux-user: use target_ulong
Linux syscalls pass pointers or data length or other information of that sort
to the kernel. This is all stuff you don't want to have sign extended.
Otherwise a host 64bit variable parameter with a size parameter will extend
it to a negative number, breaking lseek for example.

Pass syscall arguments as ulong always.

Signed-off-by: Alexander Graf <agraf@suse.de>
2016-09-10 11:25:03 +02:00
Andreas Färber
5f1f3f0769 vnc: password-file= and incoming-connections=
TBD (from SUSE Studio team)
2016-09-10 11:25:03 +02:00
Andreas Färber
4f30787729 slirp: -nooutgoing
TBD (from SUSE Studio team)
2016-09-10 11:25:03 +02:00
Alexander Graf
dbab3749b2 linux-user: XXX disable fiemap
agraf: fiemap breaks in libarchive. Disable it for now.
2016-09-10 11:25:03 +02:00
Alexander Graf
4d8d32bbd3 linux-user: implement FS_IOC_SETFLAGS ioctl
Signed-off-by: Alexander Graf <agraf@suse.de>

---

v1 -> v2

  - use TYPE_LONG instead of TYPE_INT
2016-09-10 11:25:03 +02:00
Alexander Graf
d6a5cfe7d3 linux-user: implement FS_IOC_GETFLAGS ioctl
Signed-off-by: Alexander Graf <agraf@suse.de>

---

v1 -> v2:

  - use TYPE_LONG instead of TYPE_INT
2016-09-10 11:25:03 +02:00
Alexander Graf
a5a2c84614 linux-user: Fake /proc/cpuinfo
Fedora 17 for ARM reads /proc/cpuinfo and fails if it doesn't contain
ARM related contents. This patch implements a quick hack to expose real
/proc/cpuinfo data taken from a real world machine.

The real fix would be to generate at least the flags automatically based
on the selected CPU. Please do not submit this patch upstream until this
has happened.

Signed-off-by: Alexander Graf <agraf@suse.de>
[AF: Rebased for v1.6 and v1.7]
Signed-off-by: Andreas Färber <afaerber@suse.de>
2016-09-10 11:25:03 +02:00
Alexander Graf
0f2a2996a0 linux-user: lock tb flushing too
Signed-off-by: Alexander Graf <agraf@suse.de>
[AF: Rebased onto exec.c/translate-all.c split for 1.4]
[AF: Rebased onto tb_alloc() changes for v2.5.0-rc0]
Signed-off-by: Andreas Färber <afaerber@suse.de>
2016-09-10 11:25:03 +02:00
Alexander Graf
394f7f1470 linux-user: Run multi-threaded code on a single core
Running multi-threaded code can easily expose some of the fundamental
breakages in QEMU's design. It's just not a well supported scenario.

So if we pin the whole process to a single host CPU, we guarantee that
we will never have concurrent memory access actually happen. We can still
get scheduled away at any time, so it's no complete guarantee, but apparently
it reduces the odds well enough to get my test cases to pass.

This gets Java 1.7 working for me again on my test box.

Signed-off-by: Alexander Graf <agraf@suse.de>
2016-09-10 11:25:03 +02:00
Alexander Graf
9d58ff5695 linux-user: lock tcg
The tcg code generator is not thread safe. Lock its generation between
different threads.

Signed-off-by: Alexander Graf <agraf@suse.de>
[AF: Rebased onto exec.c/translate-all.c split for 1.4]
[AF: Rebased for v2.1.0-rc0]
[AF: Rebased onto tcg_gen_code_common() drop for v2.5.0-rc0]
[AF: Rebased for v2.7.0-rc2]
Signed-off-by: Andreas Färber <afaerber@suse.de>
2016-09-10 11:25:03 +02:00
Alexander Graf
25dd5db5e0 linux-user: binfmt: support host binaries
When we have a working host binary equivalent for the guest binary we're
trying to run, let's just use that instead as it will be a lot faster.

Signed-off-by: Alexander Graf <agraf@suse.de>
2016-09-10 11:25:03 +02:00
Alexander Graf
73678412d1 linux-user: fix segfault deadlock
When entering the guest we take a lock to ensure that nobody else messes
with our TB chaining while we're doing it. If we get a segfault inside that
code, we manage to work on, but will not unlock the lock.

This patch forces unlocking of that lock in the segv handler. I'm not sure
this is the right approach though. Maybe we should rather make sure we don't
segfault in the code? I would greatly appreciate someone more intelligible
than me to look at this :).

Example code to trigger this is at: http://csgraf.de/tmp/conftest.c

Reported-by: Fabio Erculiani <lxnay@sabayon.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
[AF: Drop spinlock_safe_unlock() and switch to tb_lock_reset() (bonzini)]
Signed-off-by: Andreas Färber <afaerber@suse.de>
2016-09-10 11:25:03 +02:00
Alexander Graf
9f443d183c PPC: KVM: Disable mmu notifier check
When using hugetlbfs (which is required for HV mode KVM on 970), we
check for MMU notifiers that on 970 can not be implemented properly.

So disable the check for mmu notifiers on PowerPC guests, making
KVM guests work there, even if possibly racy in some odd circumstances.
2016-09-10 11:25:03 +02:00
Alexander Graf
5a101ff0b5 linux-user: add binfmt wrapper for argv[0] handling
When using qemu's linux-user binaries through binfmt, argv[0] gets lost
along the execution because qemu only gets passed in the full file name
to the executable while argv[0] can be something completely different.

This breaks in some subtile situations, such as the grep and make test
suites.

This patch adds a wrapper binary called qemu-$TARGET-binfmt that can be
used with binfmt's P flag which passes the full path _and_ argv[0] to
the binfmt handler.

The binary would be smart enough to be versatile and only exist in the
system once, creating the qemu binary path names from its own argv[0].
However, this seemed like it didn't fit the make system too well, so
we're currently creating a new binary for each target archictecture.

CC: Reinhard Max <max@suse.de>
Signed-off-by: Alexander Graf <agraf@suse.de>
[AF: Rebased onto new Makefile infrastructure, twice]
[AF: Updated for aarch64 for v2.0.0-rc1]
[AF: Rebased onto Makefile changes for v2.1.0-rc0]
[AF: Rebased onto script rewrite for v2.7.0-rc2 - to be fixed]
Signed-off-by: Andreas Färber <afaerber@suse.de>
2016-09-10 11:25:03 +02:00
Alexander Graf
c0baf4a943 qemu-cvs-ioctl_nodirection
the direction given in the ioctl should be correct so we can assume the
communication is uni-directional. The alsa developers did not like this
concept though and declared ioctls IOC_R and IOC_W even though they were
IOC_RW.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Ulrich Hecht <uli@suse.de>
2016-09-10 11:25:03 +02:00
Alexander Graf
382d3ca372 qemu-cvs-ioctl_debug
Extends unsupported ioctl debug output.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Ulrich Hecht <uli@suse.de>
2016-09-10 11:25:03 +02:00
Ulrich Hecht
4259605f8b qemu-cvs-gettimeofday
No clue what this is for.
2016-09-10 11:25:03 +02:00
Alexander Graf
b62c901c47 qemu-cvs-alsa_mmap
Hack to prevent ALSA from using mmap() interface to simplify emulation.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Ulrich Hecht <uli@suse.de>
2016-09-10 11:25:03 +02:00
Alexander Graf
219067ccab qemu-cvs-alsa_ioctl
Implements ALSA ioctls on PPC hosts.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Ulrich Hecht <uli@suse.de>
[AF: Rebased for v2.7.0-rc2]
Signed-off-by: Andreas Färber <afaerber@suse.de>
2016-09-10 11:25:03 +02:00
Alexander Graf
3861f88d6d qemu-cvs-alsa_bitfield
Implements TYPE_INTBITFIELD partially. (required for ALSA support)

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Ulrich Hecht <uli@suse.de>
2016-09-10 11:25:03 +02:00
Andreas Färber
92a7da2889 qemu-binfmt-conf: Modify default path
Change QEMU_PATH from /usr/local/bin to /usr/bin prefix.

Signed-off-by: Andreas Färber <afaerber@suse.de>
2016-09-10 11:25:03 +02:00
Alexander Graf
69e1d0ef9e XXX dont dump core on sigabort 2016-09-10 11:25:03 +02:00
2683 changed files with 93811 additions and 215903 deletions

View File

@@ -1,8 +0,0 @@
# GDB may have ./.gdbinit loading disabled by default. In that case you can
# follow the instructions it prints. They boil down to adding the following to
# your home directory's ~/.gdbinit file:
#
# add-auto-load-safe-path /path/to/qemu/.gdbinit
# Load QEMU-specific sub-commands and settings
source scripts/qemu-gdb.py

45
.gitignore vendored
View File

@@ -6,12 +6,18 @@
/config.status /config.status
/config-temp /config-temp
/trace-events-all /trace-events-all
/trace/generated-tracers.h
/trace/generated-tracers.c
/trace/generated-tracers-dtrace.h
/trace/generated-tracers.dtrace
/trace/generated-events.h /trace/generated-events.h
/trace/generated-events.c /trace/generated-events.c
/trace/generated-helpers-wrappers.h /trace/generated-helpers-wrappers.h
/trace/generated-helpers.h /trace/generated-helpers.h
/trace/generated-helpers.c /trace/generated-helpers.c
/trace/generated-tcg-tracers.h /trace/generated-tcg-tracers.h
/trace/generated-ust-provider.h
/trace/generated-ust.c
/ui/shader/texture-blit-frag.h /ui/shader/texture-blit-frag.h
/ui/shader/texture-blit-vert.h /ui/shader/texture-blit-vert.h
*-timestamp *-timestamp
@@ -33,8 +39,9 @@
/qmp-introspect.[ch] /qmp-introspect.[ch]
/qmp-marshal.c /qmp-marshal.c
/qemu-doc.html /qemu-doc.html
/qemu-tech.html
/qemu-doc.info /qemu-doc.info
/qemu-doc.txt /qemu-tech.info
/qemu-img /qemu-img
/qemu-nbd /qemu-nbd
/qemu-options.def /qemu-options.def
@@ -46,16 +53,14 @@
/qemu-bridge-helper /qemu-bridge-helper
/qemu-monitor.texi /qemu-monitor.texi
/qemu-monitor-info.texi /qemu-monitor-info.texi
/qemu-version.h /qmp-commands.txt
/qemu-version.h.tmp
/module_block.h
/vscclient /vscclient
/vhost-user-scsi
/fsdev/virtfs-proxy-helper /fsdev/virtfs-proxy-helper
*.[1-9] *.[1-9]
*.a *.a
*.aux *.aux
*.cp *.cp
*.dvi
*.exe *.exe
*.msi *.msi
*.dll *.dll
@@ -77,6 +82,10 @@
*.d *.d
!/scripts/qemu-guest-agent/fsfreeze-hook.d !/scripts/qemu-guest-agent/fsfreeze-hook.d
*.o *.o
*.lo
*.la
*.pc
.libs
.sdk .sdk
*.gcda *.gcda
*.gcno *.gcno
@@ -100,35 +109,9 @@
/pc-bios/optionrom/kvmvapic.img /pc-bios/optionrom/kvmvapic.img
/pc-bios/s390-ccw/s390-ccw.elf /pc-bios/s390-ccw/s390-ccw.elf
/pc-bios/s390-ccw/s390-ccw.img /pc-bios/s390-ccw/s390-ccw.img
/docs/interop/qemu-ga-qapi.texi
/docs/interop/qemu-ga-ref.html
/docs/interop/qemu-ga-ref.info*
/docs/interop/qemu-ga-ref.txt
/docs/interop/qemu-qmp-qapi.texi
/docs/interop/qemu-qmp-ref.html
/docs/interop/qemu-qmp-ref.info*
/docs/interop/qemu-qmp-ref.txt
/docs/version.texi
*.tps
.stgit-* .stgit-*
cscope.* cscope.*
tags tags
TAGS TAGS
docker-src.* docker-src.*
*~ *~
trace.h
trace.c
trace-ust.h
trace-ust.h
trace-dtrace.h
trace-dtrace.dtrace
trace-root.h
trace-root.c
trace-ust-root.h
trace-ust-root.h
trace-ust-all.h
trace-ust-all.c
trace-dtrace-root.h
trace-dtrace-root.dtrace
trace-ust-all.h
trace-ust-all.c

6
.gitmodules vendored
View File

@@ -31,9 +31,3 @@
[submodule "roms/u-boot"] [submodule "roms/u-boot"]
path = roms/u-boot path = roms/u-boot
url = git://git.qemu-project.org/u-boot.git url = git://git.qemu-project.org/u-boot.git
[submodule "roms/skiboot"]
path = roms/skiboot
url = git://git.qemu.org/skiboot.git
[submodule "roms/QemuMacDrivers"]
path = roms/QemuMacDrivers
url = git://git.qemu.org/QemuMacDrivers.git

View File

@@ -1,21 +0,0 @@
language: c
env:
matrix:
- IMAGE=debian-armhf-cross
TARGET_LIST=arm-softmmu,arm-linux-user
- IMAGE=debian-arm64-cross
TARGET_LIST=aarch64-softmmu,aarch64-linux-user
- IMAGE=debian-s390x-cross
TARGET_LIST=s390x-softmmu,s390x-linux-user
build:
pre_ci:
- make docker-image-${IMAGE}
pre_ci_boot:
image_name: qemu
image_tag: ${IMAGE}
pull: false
options: "-e HOME=/root"
ci:
- unset CC
- ./configure ${QEMU_CONFIGURE_OPTS} --target-list=${TARGET_LIST}
- make -j2

View File

@@ -4,11 +4,11 @@ python:
- "2.4" - "2.4"
compiler: compiler:
- gcc - gcc
- clang
cache: ccache cache: ccache
addons: addons:
apt: apt:
packages: packages:
# Build dependencies
- libaio-dev - libaio-dev
- libattr1-dev - libattr1-dev
- libbrlapi-dev - libbrlapi-dev
@@ -67,9 +67,6 @@ script:
- make -j3 && ${TEST_CMD} - make -j3 && ${TEST_CMD}
matrix: matrix:
include: include:
# Test with CLang for compile portability
- env: CONFIG=""
compiler: clang
# gprof/gcov are GCC features # gprof/gcov are GCC features
- env: CONFIG="--enable-gprof --enable-gcov --disable-pie" - env: CONFIG="--enable-gprof --enable-gcov --disable-pie"
compiler: gcc compiler: gcc
@@ -86,11 +83,13 @@ matrix:
- env: CONFIG="--enable-trace-backends=ust" - env: CONFIG="--enable-trace-backends=ust"
TEST_CMD="" TEST_CMD=""
compiler: gcc compiler: gcc
- env: CONFIG="--with-coroutine=gthread"
TEST_CMD=""
compiler: gcc
- env: CONFIG="" - env: CONFIG=""
os: osx os: osx
compiler: clang compiler: clang
# Plain Trusty System Build - env: CONFIG=""
- env: CONFIG="--disable-linux-user"
sudo: required sudo: required
addons: addons:
dist: trusty dist: trusty
@@ -100,95 +99,3 @@ matrix:
- sudo apt-get build-dep -qq qemu - sudo apt-get build-dep -qq qemu
- wget -O - http://people.linaro.org/~alex.bennee/qemu-submodule-git-seed.tar.xz | tar -xvJ - wget -O - http://people.linaro.org/~alex.bennee/qemu-submodule-git-seed.tar.xz | tar -xvJ
- git submodule update --init --recursive - git submodule update --init --recursive
# Plain Trusty Linux User Build
- env: CONFIG="--disable-system"
sudo: required
addons:
dist: trusty
compiler: gcc
before_install:
- sudo apt-get update -qq
- sudo apt-get build-dep -qq qemu
- wget -O - http://people.linaro.org/~alex.bennee/qemu-submodule-git-seed.tar.xz | tar -xvJ
- git submodule update --init --recursive
# Trusty System build with latest stable clang
- sudo: required
addons:
dist: trusty
language: generic
compiler: none
env:
- COMPILER_NAME=clang CXX=clang++-3.9 CC=clang-3.9
- CONFIG="--disable-linux-user --cc=clang-3.9 --cxx=clang++-3.9"
before_install:
- wget -nv -O - http://llvm.org/apt/llvm-snapshot.gpg.key | sudo apt-key add -
- sudo apt-add-repository -y 'deb http://llvm.org/apt/trusty llvm-toolchain-trusty-3.9 main'
- sudo apt-get update -qq
- sudo apt-get install -qq -y clang-3.9
- sudo apt-get build-dep -qq qemu
- wget -O - http://people.linaro.org/~alex.bennee/qemu-submodule-git-seed.tar.xz | tar -xvJ
- git submodule update --init --recursive
before_script:
- ./configure ${CONFIG} || cat config.log
# Trusty Linux User build with latest stable clang
- sudo: required
addons:
dist: trusty
language: generic
compiler: none
env:
- COMPILER_NAME=clang CXX=clang++-3.9 CC=clang-3.9
- CONFIG="--disable-system --cc=clang-3.9 --cxx=clang++-3.9"
before_install:
- wget -nv -O - http://llvm.org/apt/llvm-snapshot.gpg.key | sudo apt-key add -
- sudo apt-add-repository -y 'deb http://llvm.org/apt/trusty llvm-toolchain-trusty-3.9 main'
- sudo apt-get update -qq
- sudo apt-get install -qq -y clang-3.9
- sudo apt-get build-dep -qq qemu
- wget -O - http://people.linaro.org/~alex.bennee/qemu-submodule-git-seed.tar.xz | tar -xvJ
- git submodule update --init --recursive
before_script:
- ./configure ${CONFIG} || cat config.log
# Using newer GCC with sanitizers
- addons:
apt:
sources:
# PPAs for newer toolchains
- ubuntu-toolchain-r-test
packages:
# Extra toolchains
- gcc-5
- g++-5
# Build dependencies
- libaio-dev
- libattr1-dev
- libbrlapi-dev
- libcap-ng-dev
- libgnutls-dev
- libgtk-3-dev
- libiscsi-dev
- liblttng-ust-dev
- libnfs-dev
- libncurses5-dev
- libnss3-dev
- libpixman-1-dev
- libpng12-dev
- librados-dev
- libsdl1.2-dev
- libseccomp-dev
- libspice-protocol-dev
- libspice-server-dev
- libssh2-1-dev
- liburcu-dev
- libusb-1.0-0-dev
- libvte-2.90-dev
- sparse
- uuid-dev
language: generic
compiler: none
env:
- COMPILER_NAME=gcc CXX=g++-5 CC=gcc-5
- CONFIG="--cc=gcc-5 --cxx=g++-5 --disable-pie --disable-linux-user"
- TEST_CMD=""
before_script:
- ./configure ${CONFIG} --extra-cflags="-g3 -O0 -fsanitize=thread -fuse-ld=gold" || cat config.log

View File

@@ -9,7 +9,7 @@ patches before submitting.
Of course, the most important aspect in any coding style is whitespace. Of course, the most important aspect in any coding style is whitespace.
Crusty old coders who have trouble spotting the glasses on their noses Crusty old coders who have trouble spotting the glasses on their noses
can tell the difference between a tab and eight spaces from a distance can tell the difference between a tab and eight spaces from a distance
of approximately fifteen parsecs. Many a flamewar has been fought and of approximately fifteen parsecs. Many a flamewar have been fought and
lost on this issue. lost on this issue.
QEMU indents are four spaces. Tabs are never used, except in Makefiles QEMU indents are four spaces. Tabs are never used, except in Makefiles
@@ -116,10 +116,3 @@ if (a == 1) {
Rationale: Yoda conditions (as in 'if (1 == a)') are awkward to read. Rationale: Yoda conditions (as in 'if (1 == a)') are awkward to read.
Besides, good compilers already warn users when '==' is mis-typed as '=', Besides, good compilers already warn users when '==' is mis-typed as '=',
even when the constant is on the right. even when the constant is on the right.
7. Comment style
We use traditional C-style /* */ comments and avoid // comments.
Rationale: The // form is valid in C99, so this is purely a matter of
consistency of style. The checkpatch script will warn you about this.

18
HACKING
View File

@@ -1,28 +1,10 @@
1. Preprocessor 1. Preprocessor
1.1. Variadic macros
For variadic macros, stick with this C99-like syntax: For variadic macros, stick with this C99-like syntax:
#define DPRINTF(fmt, ...) \ #define DPRINTF(fmt, ...) \
do { printf("IRQ: " fmt, ## __VA_ARGS__); } while (0) do { printf("IRQ: " fmt, ## __VA_ARGS__); } while (0)
1.2. Include directives
Order include directives as follows:
#include "qemu/osdep.h" /* Always first... */
#include <...> /* then system headers... */
#include "..." /* and finally QEMU headers. */
The "qemu/osdep.h" header contains preprocessor macros that affect the behavior
of core system headers like <stdint.h>. It must be the first include so that
core system headers included by external libraries get the preprocessor macros
that QEMU depends on.
Do not include "qemu/osdep.h" from header files since the .c file will have
already included it.
2. C types 2. C types
It should be common sense to use the right type, but we have collected It should be common sense to use the right type, but we have collected

File diff suppressed because it is too large Load Diff

427
Makefile
View File

@@ -26,7 +26,6 @@ endif
CONFIG_SOFTMMU := $(if $(filter %-softmmu,$(TARGET_DIRS)),y) CONFIG_SOFTMMU := $(if $(filter %-softmmu,$(TARGET_DIRS)),y)
CONFIG_USER_ONLY := $(if $(filter %-user,$(TARGET_DIRS)),y) CONFIG_USER_ONLY := $(if $(filter %-user,$(TARGET_DIRS)),y)
CONFIG_XEN := $(CONFIG_XEN_BACKEND)
CONFIG_ALL=y CONFIG_ALL=y
-include config-all-devices.mak -include config-all-devices.mak
-include config-all-disas.mak -include config-all-disas.mak
@@ -51,153 +50,39 @@ endif
include $(SRC_PATH)/rules.mak include $(SRC_PATH)/rules.mak
GENERATED_FILES = qemu-version.h config-host.h qemu-options.def GENERATED_HEADERS = qemu-version.h config-host.h qemu-options.def
GENERATED_FILES += qmp-commands.h qapi-types.h qapi-visit.h qapi-event.h GENERATED_HEADERS += qmp-commands.h qapi-types.h qapi-visit.h qapi-event.h
GENERATED_FILES += qmp-marshal.c qapi-types.c qapi-visit.c qapi-event.c GENERATED_SOURCES += qmp-marshal.c qapi-types.c qapi-visit.c qapi-event.c
GENERATED_FILES += qmp-introspect.h GENERATED_HEADERS += qmp-introspect.h
GENERATED_FILES += qmp-introspect.c GENERATED_SOURCES += qmp-introspect.c
GENERATED_FILES += trace/generated-tcg-tracers.h GENERATED_HEADERS += trace/generated-events.h
GENERATED_SOURCES += trace/generated-events.c
GENERATED_FILES += trace/generated-helpers-wrappers.h GENERATED_HEADERS += trace/generated-tracers.h
GENERATED_FILES += trace/generated-helpers.h ifeq ($(findstring dtrace,$(TRACE_BACKENDS)),dtrace)
GENERATED_FILES += trace/generated-helpers.c GENERATED_HEADERS += trace/generated-tracers-dtrace.h
ifdef CONFIG_TRACE_UST
GENERATED_FILES += trace-ust-all.h
GENERATED_FILES += trace-ust-all.c
endif endif
GENERATED_SOURCES += trace/generated-tracers.c
GENERATED_FILES += module_block.h GENERATED_HEADERS += trace/generated-tcg-tracers.h
TRACE_HEADERS = trace-root.h $(trace-events-subdirs:%=%/trace.h) GENERATED_HEADERS += trace/generated-helpers-wrappers.h
TRACE_SOURCES = trace-root.c $(trace-events-subdirs:%=%/trace.c) GENERATED_HEADERS += trace/generated-helpers.h
TRACE_DTRACE = GENERATED_SOURCES += trace/generated-helpers.c
ifdef CONFIG_TRACE_DTRACE
TRACE_HEADERS += trace-dtrace-root.h $(trace-events-subdirs:%=%/trace-dtrace.h) ifeq ($(findstring ust,$(TRACE_BACKENDS)),ust)
TRACE_DTRACE += trace-dtrace-root.dtrace $(trace-events-subdirs:%=%/trace-dtrace.dtrace) GENERATED_HEADERS += trace/generated-ust-provider.h
GENERATED_SOURCES += trace/generated-ust.c
endif endif
ifdef CONFIG_TRACE_UST
TRACE_HEADERS += trace-ust-root.h $(trace-events-subdirs:%=%/trace-ust.h)
endif
GENERATED_FILES += $(TRACE_HEADERS)
GENERATED_FILES += $(TRACE_SOURCES)
GENERATED_FILES += $(BUILD_DIR)/trace-events-all
trace-group-name = $(shell dirname $1 | sed -e 's/[^a-zA-Z0-9]/_/g')
tracetool-y = $(SRC_PATH)/scripts/tracetool.py
tracetool-y += $(shell find $(SRC_PATH)/scripts/tracetool -name "*.py")
%/trace.h: %/trace.h-timestamp
@cmp $< $@ >/dev/null 2>&1 || cp $< $@
%/trace.h-timestamp: $(SRC_PATH)/%/trace-events $(tracetool-y)
$(call quiet-command,$(TRACETOOL) \
--group=$(call trace-group-name,$@) \
--format=h \
--backends=$(TRACE_BACKENDS) \
$< > $@,"GEN","$(@:%-timestamp=%)")
%/trace.c: %/trace.c-timestamp
@cmp $< $@ >/dev/null 2>&1 || cp $< $@
%/trace.c-timestamp: $(SRC_PATH)/%/trace-events $(tracetool-y)
$(call quiet-command,$(TRACETOOL) \
--group=$(call trace-group-name,$@) \
--format=c \
--backends=$(TRACE_BACKENDS) \
$< > $@,"GEN","$(@:%-timestamp=%)")
%/trace-ust.h: %/trace-ust.h-timestamp
@cmp $< $@ >/dev/null 2>&1 || cp $< $@
%/trace-ust.h-timestamp: $(SRC_PATH)/%/trace-events $(tracetool-y)
$(call quiet-command,$(TRACETOOL) \
--group=$(call trace-group-name,$@) \
--format=ust-events-h \
--backends=$(TRACE_BACKENDS) \
$< > $@,"GEN","$(@:%-timestamp=%)")
%/trace-dtrace.dtrace: %/trace-dtrace.dtrace-timestamp
@cmp $< $@ >/dev/null 2>&1 || cp $< $@
%/trace-dtrace.dtrace-timestamp: $(SRC_PATH)/%/trace-events $(BUILD_DIR)/config-host.mak $(tracetool-y)
$(call quiet-command,$(TRACETOOL) \
--group=$(call trace-group-name,$@) \
--format=d \
--backends=$(TRACE_BACKENDS) \
$< > $@,"GEN","$(@:%-timestamp=%)")
%/trace-dtrace.h: %/trace-dtrace.dtrace $(tracetool-y)
$(call quiet-command,dtrace -o $@ -h -s $<, "GEN","$@")
%/trace-dtrace.o: %/trace-dtrace.dtrace $(tracetool-y)
trace-root.h: trace-root.h-timestamp
@cmp $< $@ >/dev/null 2>&1 || cp $< $@
trace-root.h-timestamp: $(SRC_PATH)/trace-events $(tracetool-y)
$(call quiet-command,$(TRACETOOL) \
--group=root \
--format=h \
--backends=$(TRACE_BACKENDS) \
$< > $@,"GEN","$(@:%-timestamp=%)")
trace-root.c: trace-root.c-timestamp
@cmp $< $@ >/dev/null 2>&1 || cp $< $@
trace-root.c-timestamp: $(SRC_PATH)/trace-events $(tracetool-y)
$(call quiet-command,$(TRACETOOL) \
--group=root \
--format=c \
--backends=$(TRACE_BACKENDS) \
$< > $@,"GEN","$(@:%-timestamp=%)")
trace-ust-root.h: trace-ust-root.h-timestamp
@cmp $< $@ >/dev/null 2>&1 || cp $< $@
trace-ust-root.h-timestamp: $(SRC_PATH)/trace-events $(tracetool-y)
$(call quiet-command,$(TRACETOOL) \
--group=root \
--format=ust-events-h \
--backends=$(TRACE_BACKENDS) \
$< > $@,"GEN","$(@:%-timestamp=%)")
trace-ust-all.h: trace-ust-all.h-timestamp
@cmp $< $@ >/dev/null 2>&1 || cp $< $@
trace-ust-all.h-timestamp: $(trace-events-files) $(tracetool-y)
$(call quiet-command,$(TRACETOOL) \
--group=all \
--format=ust-events-h \
--backends=$(TRACE_BACKENDS) \
$(trace-events-files) > $@,"GEN","$(@:%-timestamp=%)")
trace-ust-all.c: trace-ust-all.c-timestamp
@cmp $< $@ >/dev/null 2>&1 || cp $< $@
trace-ust-all.c-timestamp: $(trace-events-files) $(tracetool-y)
$(call quiet-command,$(TRACETOOL) \
--group=all \
--format=ust-events-c \
--backends=$(TRACE_BACKENDS) \
$(trace-events-files) > $@,"GEN","$(@:%-timestamp=%)")
trace-dtrace-root.dtrace: trace-dtrace-root.dtrace-timestamp
@cmp $< $@ >/dev/null 2>&1 || cp $< $@
trace-dtrace-root.dtrace-timestamp: $(SRC_PATH)/trace-events $(BUILD_DIR)/config-host.mak $(tracetool-y)
$(call quiet-command,$(TRACETOOL) \
--group=root \
--format=d \
--backends=$(TRACE_BACKENDS) \
$< > $@,"GEN","$(@:%-timestamp=%)")
trace-dtrace-root.h: trace-dtrace-root.dtrace
$(call quiet-command,dtrace -o $@ -h -s $<, "GEN","$@")
trace-dtrace-root.o: trace-dtrace-root.dtrace
# Don't try to regenerate Makefile or configure # Don't try to regenerate Makefile or configure
# We don't generate any of them # We don't generate any of them
Makefile: ; Makefile: ;
configure: ; configure: ;
.PHONY: all clean cscope distclean html info install install-doc \ .PHONY: all clean cscope distclean dvi html info install install-doc \
pdf txt recurse-all speed test dist msi FORCE pdf recurse-all speed test dist msi FORCE
$(call set-vpath, $(SRC_PATH)) $(call set-vpath, $(SRC_PATH))
@@ -206,9 +91,8 @@ LIBS+=-lz $(LIBS_TOOLS)
HELPERS-$(CONFIG_LINUX) = qemu-bridge-helper$(EXESUF) HELPERS-$(CONFIG_LINUX) = qemu-bridge-helper$(EXESUF)
ifdef BUILD_DOCS ifdef BUILD_DOCS
DOCS=qemu-doc.html qemu-doc.txt qemu.1 qemu-img.1 qemu-nbd.8 qemu-ga.8 DOCS=qemu-doc.html qemu-tech.html qemu.1 qemu-img.1 qemu-nbd.8 qemu-ga.8
DOCS+=docs/interop/qemu-qmp-ref.html docs/interop/qemu-qmp-ref.txt docs/interop/qemu-qmp-ref.7 DOCS+=qmp-commands.txt
DOCS+=docs/interop/qemu-ga-ref.html docs/interop/qemu-ga-ref.txt docs/interop/qemu-ga-ref.7
ifdef CONFIG_VIRTFS ifdef CONFIG_VIRTFS
DOCS+=fsdev/virtfs-proxy-helper.1 DOCS+=fsdev/virtfs-proxy-helper.1
endif endif
@@ -222,20 +106,20 @@ SUBDIR_DEVICES_MAK_DEP=$(patsubst %, %-config-devices.mak.d, $(TARGET_DIRS))
ifeq ($(SUBDIR_DEVICES_MAK),) ifeq ($(SUBDIR_DEVICES_MAK),)
config-all-devices.mak: config-all-devices.mak:
$(call quiet-command,echo '# no devices' > $@,"GEN","$@") $(call quiet-command,echo '# no devices' > $@," GEN $@")
else else
config-all-devices.mak: $(SUBDIR_DEVICES_MAK) config-all-devices.mak: $(SUBDIR_DEVICES_MAK)
$(call quiet-command, sed -n \ $(call quiet-command, sed -n \
's|^\([^=]*\)=\(.*\)$$|\1:=$$(findstring y,$$(\1)\2)|p' \ 's|^\([^=]*\)=\(.*\)$$|\1:=$$(findstring y,$$(\1)\2)|p' \
$(SUBDIR_DEVICES_MAK) | sort -u > $@, \ $(SUBDIR_DEVICES_MAK) | sort -u > $@, \
"GEN","$@") " GEN $@")
endif endif
-include $(SUBDIR_DEVICES_MAK_DEP) -include $(SUBDIR_DEVICES_MAK_DEP)
%/config-devices.mak: default-configs/%.mak $(SRC_PATH)/scripts/make_device_config.sh %/config-devices.mak: default-configs/%.mak $(SRC_PATH)/scripts/make_device_config.sh
$(call quiet-command, \ $(call quiet-command, \
$(SHELL) $(SRC_PATH)/scripts/make_device_config.sh $< $*-config-devices.mak.d $@ > $@.tmp,"GEN","$@.tmp") $(SHELL) $(SRC_PATH)/scripts/make_device_config.sh $< $*-config-devices.mak.d $@ > $@.tmp, " GEN $@.tmp")
$(call quiet-command, if test -f $@; then \ $(call quiet-command, if test -f $@; then \
if cmp -s $@.old $@; then \ if cmp -s $@.old $@; then \
mv $@.tmp $@; \ mv $@.tmp $@; \
@@ -252,7 +136,7 @@ endif
else \ else \
mv $@.tmp $@; \ mv $@.tmp $@; \
cp -p $@ $@.old; \ cp -p $@ $@.old; \
fi,"GEN","$@"); fi, " GEN $@");
defconfig: defconfig:
rm -f config-all-devices.mak $(SUBDIR_DEVICES_MAK) rm -f config-all-devices.mak $(SUBDIR_DEVICES_MAK)
@@ -263,13 +147,10 @@ endif
dummy := $(call unnest-vars,, \ dummy := $(call unnest-vars,, \
stub-obj-y \ stub-obj-y \
chardev-obj-y \
util-obj-y \ util-obj-y \
qga-obj-y \ qga-obj-y \
ivshmem-client-obj-y \ ivshmem-client-obj-y \
ivshmem-server-obj-y \ ivshmem-server-obj-y \
libvhost-user-obj-y \
vhost-user-scsi-obj-y \
qga-vss-dll-obj-y \ qga-vss-dll-obj-y \
block-obj-y \ block-obj-y \
block-obj-m \ block-obj-m \
@@ -278,8 +159,7 @@ dummy := $(call unnest-vars,, \
qom-obj-y \ qom-obj-y \
io-obj-y \ io-obj-y \
common-obj-y \ common-obj-y \
common-obj-m \ common-obj-m)
trace-obj-y)
ifneq ($(wildcard config-host.mak),) ifneq ($(wildcard config-host.mak),)
include $(SRC_PATH)/tests/Makefile.include include $(SRC_PATH)/tests/Makefile.include
@@ -305,16 +185,12 @@ qemu-version.h: FORCE
printf '""\n'; \ printf '""\n'; \
fi; \ fi; \
fi) > $@.tmp) fi) > $@.tmp)
$(call quiet-command, if ! cmp -s $@ $@.tmp; then \ $(call quiet-command, cmp -s $@ $@.tmp || mv $@.tmp $@)
mv $@.tmp $@; \
else \
rm $@.tmp; \
fi)
config-host.h: config-host.h-timestamp config-host.h: config-host.h-timestamp
config-host.h-timestamp: config-host.mak config-host.h-timestamp: config-host.mak
qemu-options.def: $(SRC_PATH)/qemu-options.hx $(SRC_PATH)/scripts/hxtool qemu-options.def: $(SRC_PATH)/qemu-options.hx $(SRC_PATH)/scripts/hxtool
$(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -h < $< > $@,"GEN","$@") $(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -h < $< > $@," GEN $@")
SUBDIR_RULES=$(patsubst %,subdir-%, $(TARGET_DIRS)) SUBDIR_RULES=$(patsubst %,subdir-%, $(TARGET_DIRS))
SOFTMMU_SUBDIR_RULES=$(filter %-softmmu,$(SUBDIR_RULES)) SOFTMMU_SUBDIR_RULES=$(filter %-softmmu,$(SUBDIR_RULES))
@@ -346,8 +222,7 @@ subdir-dtc:dtc/libfdt dtc/tests
dtc/%: dtc/%:
mkdir -p $@ mkdir -p $@
$(SUBDIR_RULES): libqemuutil.a libqemustub.a $(common-obj-y) $(chardev-obj-y) \ $(SUBDIR_RULES): libqemuutil.a libqemustub.a $(common-obj-y) $(qom-obj-y) $(crypto-aes-obj-$(CONFIG_USER_ONLY))
$(qom-obj-y) $(crypto-aes-obj-$(CONFIG_USER_ONLY))
ROMSUBDIR_RULES=$(patsubst %,romsubdir-%, $(ROMS)) ROMSUBDIR_RULES=$(patsubst %,romsubdir-%, $(ROMS))
# Only keep -O and -g cflags # Only keep -O and -g cflags
@@ -358,34 +233,37 @@ ALL_SUBDIRS=$(TARGET_DIRS) $(patsubst %,pc-bios/%, $(ROMS))
recurse-all: $(SUBDIR_RULES) $(ROMSUBDIR_RULES) recurse-all: $(SUBDIR_RULES) $(ROMSUBDIR_RULES)
$(BUILD_DIR)/version.o: $(SRC_PATH)/version.rc config-host.h $(BUILD_DIR)/version.o: $(SRC_PATH)/version.rc config-host.h | $(BUILD_DIR)/version.lo
$(call quiet-command,$(WINDRES) -I$(BUILD_DIR) -o $@ $<,"RC","version.o") $(call quiet-command,$(WINDRES) -I$(BUILD_DIR) -o $@ $<," RC version.o")
$(BUILD_DIR)/version.lo: $(SRC_PATH)/version.rc config-host.h
$(call quiet-command,$(WINDRES) -I$(BUILD_DIR) -o $@ $<," RC version.lo")
Makefile: $(version-obj-y) Makefile: $(version-obj-y) $(version-lobj-y)
###################################################################### ######################################################################
# Build libraries # Build libraries
libqemustub.a: $(stub-obj-y) libqemustub.a: $(stub-obj-y)
libqemuutil.a: $(util-obj-y) $(trace-obj-y) libqemuutil.a: $(util-obj-y)
block-modules = $(foreach o,$(block-obj-m),"$(basename $(subst /,-,$o))",) NULL
util/module.o-cflags = -D'CONFIG_BLOCK_MODULES=$(block-modules)'
###################################################################### ######################################################################
COMMON_LDADDS = libqemuutil.a libqemustub.a
qemu-img.o: qemu-img-cmds.h qemu-img.o: qemu-img-cmds.h
qemu-img$(EXESUF): qemu-img.o $(block-obj-y) $(crypto-obj-y) $(io-obj-y) $(qom-obj-y) $(COMMON_LDADDS) qemu-img$(EXESUF): qemu-img.o $(block-obj-y) $(crypto-obj-y) $(io-obj-y) $(qom-obj-y) libqemuutil.a libqemustub.a
qemu-nbd$(EXESUF): qemu-nbd.o $(block-obj-y) $(crypto-obj-y) $(io-obj-y) $(qom-obj-y) $(COMMON_LDADDS) qemu-nbd$(EXESUF): qemu-nbd.o $(block-obj-y) $(crypto-obj-y) $(io-obj-y) $(qom-obj-y) libqemuutil.a libqemustub.a
qemu-io$(EXESUF): qemu-io.o $(block-obj-y) $(crypto-obj-y) $(io-obj-y) $(qom-obj-y) $(COMMON_LDADDS) qemu-io$(EXESUF): qemu-io.o $(block-obj-y) $(crypto-obj-y) $(io-obj-y) $(qom-obj-y) libqemuutil.a libqemustub.a
qemu-bridge-helper$(EXESUF): qemu-bridge-helper.o $(COMMON_LDADDS) qemu-bridge-helper$(EXESUF): qemu-bridge-helper.o libqemuutil.a libqemustub.a
fsdev/virtfs-proxy-helper$(EXESUF): fsdev/virtfs-proxy-helper.o fsdev/9p-marshal.o fsdev/9p-iov-marshal.o $(COMMON_LDADDS) fsdev/virtfs-proxy-helper$(EXESUF): fsdev/virtfs-proxy-helper.o fsdev/9p-marshal.o fsdev/9p-iov-marshal.o libqemuutil.a libqemustub.a
fsdev/virtfs-proxy-helper$(EXESUF): LIBS += -lcap fsdev/virtfs-proxy-helper$(EXESUF): LIBS += -lcap
qemu-img-cmds.h: $(SRC_PATH)/qemu-img-cmds.hx $(SRC_PATH)/scripts/hxtool qemu-img-cmds.h: $(SRC_PATH)/qemu-img-cmds.hx $(SRC_PATH)/scripts/hxtool
$(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -h < $< > $@,"GEN","$@") $(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -h < $< > $@," GEN $@")
qemu-ga$(EXESUF): LIBS = $(LIBS_QGA) qemu-ga$(EXESUF): LIBS = $(LIBS_QGA)
qemu-ga$(EXESUF): QEMU_CFLAGS += -I qga/qapi-generated qemu-ga$(EXESUF): QEMU_CFLAGS += -I qga/qapi-generated
@@ -398,17 +276,17 @@ qga/qapi-generated/qga-qapi-types.c qga/qapi-generated/qga-qapi-types.h :\
$(SRC_PATH)/qga/qapi-schema.json $(SRC_PATH)/scripts/qapi-types.py $(qapi-py) $(SRC_PATH)/qga/qapi-schema.json $(SRC_PATH)/scripts/qapi-types.py $(qapi-py)
$(call quiet-command,$(PYTHON) $(SRC_PATH)/scripts/qapi-types.py \ $(call quiet-command,$(PYTHON) $(SRC_PATH)/scripts/qapi-types.py \
$(gen-out-type) -o qga/qapi-generated -p "qga-" $<, \ $(gen-out-type) -o qga/qapi-generated -p "qga-" $<, \
"GEN","$@") " GEN $@")
qga/qapi-generated/qga-qapi-visit.c qga/qapi-generated/qga-qapi-visit.h :\ qga/qapi-generated/qga-qapi-visit.c qga/qapi-generated/qga-qapi-visit.h :\
$(SRC_PATH)/qga/qapi-schema.json $(SRC_PATH)/scripts/qapi-visit.py $(qapi-py) $(SRC_PATH)/qga/qapi-schema.json $(SRC_PATH)/scripts/qapi-visit.py $(qapi-py)
$(call quiet-command,$(PYTHON) $(SRC_PATH)/scripts/qapi-visit.py \ $(call quiet-command,$(PYTHON) $(SRC_PATH)/scripts/qapi-visit.py \
$(gen-out-type) -o qga/qapi-generated -p "qga-" $<, \ $(gen-out-type) -o qga/qapi-generated -p "qga-" $<, \
"GEN","$@") " GEN $@")
qga/qapi-generated/qga-qmp-commands.h qga/qapi-generated/qga-qmp-marshal.c :\ qga/qapi-generated/qga-qmp-commands.h qga/qapi-generated/qga-qmp-marshal.c :\
$(SRC_PATH)/qga/qapi-schema.json $(SRC_PATH)/scripts/qapi-commands.py $(qapi-py) $(SRC_PATH)/qga/qapi-schema.json $(SRC_PATH)/scripts/qapi-commands.py $(qapi-py)
$(call quiet-command,$(PYTHON) $(SRC_PATH)/scripts/qapi-commands.py \ $(call quiet-command,$(PYTHON) $(SRC_PATH)/scripts/qapi-commands.py \
$(gen-out-type) -o qga/qapi-generated -p "qga-" $<, \ $(gen-out-type) -o qga/qapi-generated -p "qga-" $<, \
"GEN","$@") " GEN $@")
qapi-modules = $(SRC_PATH)/qapi-schema.json $(SRC_PATH)/qapi/common.json \ qapi-modules = $(SRC_PATH)/qapi-schema.json $(SRC_PATH)/qapi/common.json \
$(SRC_PATH)/qapi/block.json $(SRC_PATH)/qapi/block-core.json \ $(SRC_PATH)/qapi/block.json $(SRC_PATH)/qapi/block-core.json \
@@ -420,32 +298,32 @@ qapi-types.c qapi-types.h :\
$(qapi-modules) $(SRC_PATH)/scripts/qapi-types.py $(qapi-py) $(qapi-modules) $(SRC_PATH)/scripts/qapi-types.py $(qapi-py)
$(call quiet-command,$(PYTHON) $(SRC_PATH)/scripts/qapi-types.py \ $(call quiet-command,$(PYTHON) $(SRC_PATH)/scripts/qapi-types.py \
$(gen-out-type) -o "." -b $<, \ $(gen-out-type) -o "." -b $<, \
"GEN","$@") " GEN $@")
qapi-visit.c qapi-visit.h :\ qapi-visit.c qapi-visit.h :\
$(qapi-modules) $(SRC_PATH)/scripts/qapi-visit.py $(qapi-py) $(qapi-modules) $(SRC_PATH)/scripts/qapi-visit.py $(qapi-py)
$(call quiet-command,$(PYTHON) $(SRC_PATH)/scripts/qapi-visit.py \ $(call quiet-command,$(PYTHON) $(SRC_PATH)/scripts/qapi-visit.py \
$(gen-out-type) -o "." -b $<, \ $(gen-out-type) -o "." -b $<, \
"GEN","$@") " GEN $@")
qapi-event.c qapi-event.h :\ qapi-event.c qapi-event.h :\
$(qapi-modules) $(SRC_PATH)/scripts/qapi-event.py $(qapi-py) $(qapi-modules) $(SRC_PATH)/scripts/qapi-event.py $(qapi-py)
$(call quiet-command,$(PYTHON) $(SRC_PATH)/scripts/qapi-event.py \ $(call quiet-command,$(PYTHON) $(SRC_PATH)/scripts/qapi-event.py \
$(gen-out-type) -o "." $<, \ $(gen-out-type) -o "." $<, \
"GEN","$@") " GEN $@")
qmp-commands.h qmp-marshal.c :\ qmp-commands.h qmp-marshal.c :\
$(qapi-modules) $(SRC_PATH)/scripts/qapi-commands.py $(qapi-py) $(qapi-modules) $(SRC_PATH)/scripts/qapi-commands.py $(qapi-py)
$(call quiet-command,$(PYTHON) $(SRC_PATH)/scripts/qapi-commands.py \ $(call quiet-command,$(PYTHON) $(SRC_PATH)/scripts/qapi-commands.py \
$(gen-out-type) -o "." $<, \ $(gen-out-type) -o "." -m $<, \
"GEN","$@") " GEN $@")
qmp-introspect.h qmp-introspect.c :\ qmp-introspect.h qmp-introspect.c :\
$(qapi-modules) $(SRC_PATH)/scripts/qapi-introspect.py $(qapi-py) $(qapi-modules) $(SRC_PATH)/scripts/qapi-introspect.py $(qapi-py)
$(call quiet-command,$(PYTHON) $(SRC_PATH)/scripts/qapi-introspect.py \ $(call quiet-command,$(PYTHON) $(SRC_PATH)/scripts/qapi-introspect.py \
$(gen-out-type) -o "." $<, \ $(gen-out-type) -o "." $<, \
"GEN","$@") " GEN $@")
QGALIB_GEN=$(addprefix qga/qapi-generated/, qga-qapi-types.h qga-qapi-visit.h qga-qmp-commands.h) QGALIB_GEN=$(addprefix qga/qapi-generated/, qga-qapi-types.h qga-qapi-visit.h qga-qmp-commands.h)
$(qga-obj-y) qemu-ga.o: $(QGALIB_GEN) $(qga-obj-y) qemu-ga.o: $(QGALIB_GEN)
qemu-ga$(EXESUF): $(qga-obj-y) $(COMMON_LDADDS) qemu-ga$(EXESUF): $(qga-obj-y) libqemuutil.a libqemustub.a
$(call LINK, $^) $(call LINK, $^)
ifdef QEMU_GA_MSI_ENABLED ifdef QEMU_GA_MSI_ENABLED
@@ -459,7 +337,7 @@ $(QEMU_GA_MSI): config-host.mak
$(QEMU_GA_MSI): $(SRC_PATH)/qga/installer/qemu-ga.wxs $(QEMU_GA_MSI): $(SRC_PATH)/qga/installer/qemu-ga.wxs
$(call quiet-command,QEMU_GA_VERSION="$(QEMU_GA_VERSION)" QEMU_GA_MANUFACTURER="$(QEMU_GA_MANUFACTURER)" QEMU_GA_DISTRO="$(QEMU_GA_DISTRO)" BUILD_DIR="$(BUILD_DIR)" \ $(call quiet-command,QEMU_GA_VERSION="$(QEMU_GA_VERSION)" QEMU_GA_MANUFACTURER="$(QEMU_GA_MANUFACTURER)" QEMU_GA_DISTRO="$(QEMU_GA_DISTRO)" BUILD_DIR="$(BUILD_DIR)" \
wixl -o $@ $(QEMU_GA_MSI_ARCH) $(QEMU_GA_MSI_WITH_VSS) $(QEMU_GA_MSI_MINGW_DLL_PATH) $<,"WIXL","$@") wixl -o $@ $(QEMU_GA_MSI_ARCH) $(QEMU_GA_MSI_WITH_VSS) $(QEMU_GA_MSI_MINGW_DLL_PATH) $<, " WIXL $@")
else else
msi: msi:
@echo "MSI build not configured or dependency resolution failed (reconfigure with --enable-guest-agent-msi option)" @echo "MSI build not configured or dependency resolution failed (reconfigure with --enable-guest-agent-msi option)"
@@ -470,32 +348,27 @@ ifneq ($(EXESUF),)
qemu-ga: qemu-ga$(EXESUF) $(QGA_VSS_PROVIDER) $(QEMU_GA_MSI) qemu-ga: qemu-ga$(EXESUF) $(QGA_VSS_PROVIDER) $(QEMU_GA_MSI)
endif endif
ivshmem-client$(EXESUF): $(ivshmem-client-obj-y) $(COMMON_LDADDS) ivshmem-client$(EXESUF): $(ivshmem-client-obj-y) libqemuutil.a libqemustub.a
$(call LINK, $^) $(call LINK, $^)
ivshmem-server$(EXESUF): $(ivshmem-server-obj-y) $(COMMON_LDADDS) ivshmem-server$(EXESUF): $(ivshmem-server-obj-y) libqemuutil.a libqemustub.a
$(call LINK, $^) $(call LINK, $^)
vhost-user-scsi$(EXESUF): $(vhost-user-scsi-obj-y)
$(call LINK, $^)
module_block.h: $(SRC_PATH)/scripts/modules/module_block.py config-host.mak
$(call quiet-command,$(PYTHON) $< $@ \
$(addprefix $(SRC_PATH)/,$(patsubst %.mo,%.c,$(block-obj-m))), \
"GEN","$@")
clean: clean:
# avoid old build problems by removing potentially incorrect old files # avoid old build problems by removing potentially incorrect old files
rm -f config.mak op-i386.h opc-i386.h gen-op-i386.h op-arm.h opc-arm.h gen-op-arm.h rm -f config.mak op-i386.h opc-i386.h gen-op-i386.h op-arm.h opc-arm.h gen-op-arm.h
rm -f qemu-options.def rm -f qemu-options.def
rm -f *.msi rm -f *.msi
find . \( -name '*.so' -o -name '*.dll' -o -name '*.mo' -o -name '*.[oda]' \) -type f -exec rm {} + find . \( -name '*.l[oa]' -o -name '*.so' -o -name '*.dll' -o -name '*.mo' -o -name '*.[oda]' \) -type f -exec rm {} +
rm -f $(filter-out %.tlb,$(TOOLS)) $(HELPERS-y) qemu-ga TAGS cscope.* *.pod *~ */*~ rm -f $(filter-out %.tlb,$(TOOLS)) $(HELPERS-y) qemu-ga TAGS cscope.* *.pod *~ */*~
rm -f fsdev/*.pod rm -f fsdev/*.pod
rm -rf .libs */.libs
rm -f qemu-img-cmds.h rm -f qemu-img-cmds.h
rm -f ui/shader/*-vert.h ui/shader/*-frag.h rm -f ui/shader/*-vert.h ui/shader/*-frag.h
@# May not be present in GENERATED_FILES @# May not be present in GENERATED_HEADERS
rm -f trace/generated-tracers-dtrace.dtrace* rm -f trace/generated-tracers-dtrace.dtrace*
rm -f trace/generated-tracers-dtrace.h* rm -f trace/generated-tracers-dtrace.h*
rm -f $(foreach f,$(GENERATED_FILES),$(f) $(f)-timestamp) rm -f $(foreach f,$(GENERATED_HEADERS),$(f) $(f)-timestamp)
rm -f $(foreach f,$(GENERATED_SOURCES),$(f) $(f)-timestamp)
rm -rf qapi-generated rm -rf qapi-generated
rm -rf qga/qapi-generated rm -rf qga/qapi-generated
for d in $(ALL_SUBDIRS); do \ for d in $(ALL_SUBDIRS); do \
@@ -516,18 +389,13 @@ distclean: clean
rm -f config-all-devices.mak config-all-disas.mak config.status rm -f config-all-devices.mak config-all-disas.mak config.status
rm -f po/*.mo tests/qemu-iotests/common.env rm -f po/*.mo tests/qemu-iotests/common.env
rm -f roms/seabios/config.mak roms/vgabios/config.mak rm -f roms/seabios/config.mak roms/vgabios/config.mak
rm -f qemu-doc.info qemu-doc.aux qemu-doc.cp qemu-doc.cps rm -f qemu-doc.info qemu-doc.aux qemu-doc.cp qemu-doc.cps qemu-doc.dvi
rm -f qemu-doc.fn qemu-doc.fns qemu-doc.info qemu-doc.ky qemu-doc.kys rm -f qemu-doc.fn qemu-doc.fns qemu-doc.info qemu-doc.ky qemu-doc.kys
rm -f qemu-doc.log qemu-doc.pdf qemu-doc.pg qemu-doc.toc qemu-doc.tp rm -f qemu-doc.log qemu-doc.pdf qemu-doc.pg qemu-doc.toc qemu-doc.tp
rm -f qemu-doc.vr qemu-doc.txt rm -f qemu-doc.vr
rm -f config.log rm -f config.log
rm -f linux-headers/asm rm -f linux-headers/asm
rm -f docs/version.texi rm -f qemu-tech.info qemu-tech.aux qemu-tech.cp qemu-tech.dvi qemu-tech.fn qemu-tech.info qemu-tech.ky qemu-tech.log qemu-tech.pdf qemu-tech.pg qemu-tech.toc qemu-tech.tp qemu-tech.vr
rm -f docs/interop/qemu-ga-qapi.texi docs/interop/qemu-qmp-qapi.texi
rm -f docs/interop/qemu-qmp-ref.7 docs/interop/qemu-ga-ref.7
rm -f docs/interop/qemu-qmp-ref.txt docs/interop/qemu-ga-ref.txt
rm -f docs/interop/qemu-qmp-ref.pdf docs/interop/qemu-ga-ref.pdf
rm -f docs/interop/qemu-qmp-ref.html docs/interop/qemu-ga-ref.html
for d in $(TARGET_DIRS); do \ for d in $(TARGET_DIRS); do \
rm -rf $$d || exit 1 ; \ rm -rf $$d || exit 1 ; \
done done
@@ -554,25 +422,20 @@ qemu-icon.bmp qemu_logo_no_text.svg \
bamboo.dtb petalogix-s3adsp1800.dtb petalogix-ml605.dtb \ bamboo.dtb petalogix-s3adsp1800.dtb petalogix-ml605.dtb \
multiboot.bin linuxboot.bin linuxboot_dma.bin kvmvapic.bin \ multiboot.bin linuxboot.bin linuxboot_dma.bin kvmvapic.bin \
s390-ccw.img \ s390-ccw.img \
spapr-rtas.bin slof.bin skiboot.lid \ spapr-rtas.bin slof.bin \
palcode-clipper \ palcode-clipper \
u-boot.e500 \ u-boot.e500
qemu_vga.ndrv
else else
BLOBS= BLOBS=
endif endif
install-doc: $(DOCS) install-doc: $(DOCS)
$(INSTALL_DIR) "$(DESTDIR)$(qemu_docdir)" $(INSTALL_DIR) "$(DESTDIR)$(qemu_docdir)"
$(INSTALL_DATA) qemu-doc.html "$(DESTDIR)$(qemu_docdir)" $(INSTALL_DATA) qemu-doc.html qemu-tech.html "$(DESTDIR)$(qemu_docdir)"
$(INSTALL_DATA) qemu-doc.txt "$(DESTDIR)$(qemu_docdir)" $(INSTALL_DATA) qmp-commands.txt "$(DESTDIR)$(qemu_docdir)"
$(INSTALL_DATA) docs/interop/qemu-qmp-ref.html "$(DESTDIR)$(qemu_docdir)"
$(INSTALL_DATA) docs/interop/qemu-qmp-ref.txt "$(DESTDIR)$(qemu_docdir)"
ifdef CONFIG_POSIX ifdef CONFIG_POSIX
$(INSTALL_DIR) "$(DESTDIR)$(mandir)/man1" $(INSTALL_DIR) "$(DESTDIR)$(mandir)/man1"
$(INSTALL_DATA) qemu.1 "$(DESTDIR)$(mandir)/man1" $(INSTALL_DATA) qemu.1 "$(DESTDIR)$(mandir)/man1"
$(INSTALL_DIR) "$(DESTDIR)$(mandir)/man7"
$(INSTALL_DATA) docs/interop/qemu-qmp-ref.7 "$(DESTDIR)$(mandir)/man7"
ifneq ($(TOOLS),) ifneq ($(TOOLS),)
$(INSTALL_DATA) qemu-img.1 "$(DESTDIR)$(mandir)/man1" $(INSTALL_DATA) qemu-img.1 "$(DESTDIR)$(mandir)/man1"
$(INSTALL_DIR) "$(DESTDIR)$(mandir)/man8" $(INSTALL_DIR) "$(DESTDIR)$(mandir)/man8"
@@ -580,9 +443,6 @@ ifneq ($(TOOLS),)
endif endif
ifneq (,$(findstring qemu-ga,$(TOOLS))) ifneq (,$(findstring qemu-ga,$(TOOLS)))
$(INSTALL_DATA) qemu-ga.8 "$(DESTDIR)$(mandir)/man8" $(INSTALL_DATA) qemu-ga.8 "$(DESTDIR)$(mandir)/man8"
$(INSTALL_DATA) docs/interop/qemu-ga-ref.html "$(DESTDIR)$(qemu_docdir)"
$(INSTALL_DATA) docs/interop/qemu-ga-ref.txt "$(DESTDIR)$(qemu_docdir)"
$(INSTALL_DATA) docs/interop/qemu-ga-ref.7 "$(DESTDIR)$(mandir)/man7"
endif endif
endif endif
ifdef CONFIG_VIRTFS ifdef CONFIG_VIRTFS
@@ -601,7 +461,8 @@ endif
endif endif
install: all $(if $(BUILD_DOCS),install-doc) install-datadir install-localstatedir install: all $(if $(BUILD_DOCS),install-doc) \
install-datadir install-localstatedir
ifneq ($(TOOLS),) ifneq ($(TOOLS),)
$(call install-prog,$(subst qemu-ga,qemu-ga$(EXESUF),$(TOOLS)),$(DESTDIR)$(bindir)) $(call install-prog,$(subst qemu-ga,qemu-ga$(EXESUF),$(TOOLS)),$(DESTDIR)$(bindir))
endif endif
@@ -657,89 +518,90 @@ ui/shader/%-vert.h: $(SRC_PATH)/ui/shader/%.vert $(SRC_PATH)/scripts/shaderinclu
@mkdir -p $(dir $@) @mkdir -p $(dir $@)
$(call quiet-command,\ $(call quiet-command,\
perl $(SRC_PATH)/scripts/shaderinclude.pl $< > $@,\ perl $(SRC_PATH)/scripts/shaderinclude.pl $< > $@,\
"VERT","$@") " VERT $@")
ui/shader/%-frag.h: $(SRC_PATH)/ui/shader/%.frag $(SRC_PATH)/scripts/shaderinclude.pl ui/shader/%-frag.h: $(SRC_PATH)/ui/shader/%.frag $(SRC_PATH)/scripts/shaderinclude.pl
@mkdir -p $(dir $@) @mkdir -p $(dir $@)
$(call quiet-command,\ $(call quiet-command,\
perl $(SRC_PATH)/scripts/shaderinclude.pl $< > $@,\ perl $(SRC_PATH)/scripts/shaderinclude.pl $< > $@,\
"FRAG","$@") " FRAG $@")
ui/console-gl.o: $(SRC_PATH)/ui/console-gl.c \ ui/console-gl.o: $(SRC_PATH)/ui/console-gl.c \
ui/shader/texture-blit-vert.h ui/shader/texture-blit-frag.h ui/shader/texture-blit-vert.h ui/shader/texture-blit-frag.h
# documentation # documentation
MAKEINFO=makeinfo MAKEINFO=makeinfo
MAKEINFOINCLUDES= -I docs -I $(<D) -I $(@D) MAKEINFOFLAGS=--no-headers --no-split --number-sections
MAKEINFOFLAGS=--no-split --number-sections $(MAKEINFOINCLUDES) TEXIFLAG=$(if $(V),,--quiet)
TEXI2PODFLAGS=$(MAKEINFOINCLUDES) "-DVERSION=$(VERSION)" %.dvi: %.texi
TEXI2PDFFLAGS=$(if $(V),,--quiet) -I $(SRC_PATH) $(MAKEINFOINCLUDES) $(call quiet-command,texi2dvi $(TEXIFLAG) -I . $<," GEN $@")
docs/version.texi: $(SRC_PATH)/VERSION %.html: %.texi
$(call quiet-command,echo "@set VERSION $(VERSION)" > $@,"GEN","$@") $(call quiet-command,LC_ALL=C $(MAKEINFO) $(MAKEINFOFLAGS) --html $< -o $@, \
" GEN $@")
%.html: %.texi docs/version.texi %.info: %.texi
$(call quiet-command,LC_ALL=C $(MAKEINFO) $(MAKEINFOFLAGS) --no-headers \ $(call quiet-command,$(MAKEINFO) $< -o $@," GEN $@")
--html $< -o $@,"GEN","$@")
%.info: %.texi docs/version.texi %.pdf: %.texi
$(call quiet-command,$(MAKEINFO) $(MAKEINFOFLAGS) $< -o $@,"GEN","$@") $(call quiet-command,texi2pdf $(TEXIFLAG) -I . $<," GEN $@")
%.txt: %.texi docs/version.texi
$(call quiet-command,LC_ALL=C $(MAKEINFO) $(MAKEINFOFLAGS) --no-headers \
--plaintext $< -o $@,"GEN","$@")
%.pdf: %.texi docs/version.texi
$(call quiet-command,texi2pdf $(TEXI2PDFFLAGS) $< -o $@,"GEN","$@")
qemu-options.texi: $(SRC_PATH)/qemu-options.hx $(SRC_PATH)/scripts/hxtool qemu-options.texi: $(SRC_PATH)/qemu-options.hx $(SRC_PATH)/scripts/hxtool
$(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -t < $< > $@,"GEN","$@") $(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -t < $< > $@," GEN $@")
qemu-monitor.texi: $(SRC_PATH)/hmp-commands.hx $(SRC_PATH)/scripts/hxtool qemu-monitor.texi: $(SRC_PATH)/hmp-commands.hx $(SRC_PATH)/scripts/hxtool
$(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -t < $< > $@,"GEN","$@") $(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -t < $< > $@," GEN $@")
qemu-monitor-info.texi: $(SRC_PATH)/hmp-commands-info.hx $(SRC_PATH)/scripts/hxtool qemu-monitor-info.texi: $(SRC_PATH)/hmp-commands-info.hx $(SRC_PATH)/scripts/hxtool
$(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -t < $< > $@,"GEN","$@") $(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -t < $< > $@," GEN $@")
qmp-commands.txt: $(SRC_PATH)/qmp-commands.hx $(SRC_PATH)/scripts/hxtool
$(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -q < $< > $@," GEN $@")
qemu-img-cmds.texi: $(SRC_PATH)/qemu-img-cmds.hx $(SRC_PATH)/scripts/hxtool qemu-img-cmds.texi: $(SRC_PATH)/qemu-img-cmds.hx $(SRC_PATH)/scripts/hxtool
$(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -t < $< > $@,"GEN","$@") $(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -t < $< > $@," GEN $@")
docs/interop/qemu-qmp-qapi.texi docs/interop/qemu-ga-qapi.texi: $(SRC_PATH)/scripts/qapi2texi.py $(qapi-py)
docs/interop/qemu-qmp-qapi.texi: $(qapi-modules)
$(call quiet-command,$(PYTHON) $(SRC_PATH)/scripts/qapi2texi.py $< > $@,"GEN","$@")
docs/interop/qemu-ga-qapi.texi: $(SRC_PATH)/qga/qapi-schema.json
$(call quiet-command,$(PYTHON) $(SRC_PATH)/scripts/qapi2texi.py $< > $@,"GEN","$@")
qemu.1: qemu-doc.texi qemu-options.texi qemu-monitor.texi qemu-monitor-info.texi qemu.1: qemu-doc.texi qemu-options.texi qemu-monitor.texi qemu-monitor-info.texi
$(call quiet-command, \
perl -Ww -- $(SRC_PATH)/scripts/texi2pod.pl $< qemu.pod && \
$(POD2MAN) --section=1 --center=" " --release=" " qemu.pod > $@, \
" GEN $@")
qemu.1: qemu-option-trace.texi qemu.1: qemu-option-trace.texi
qemu-img.1: qemu-img.texi qemu-option-trace.texi qemu-img-cmds.texi qemu-img.1: qemu-img.texi qemu-option-trace.texi qemu-img-cmds.texi
$(call quiet-command, \
perl -Ww -- $(SRC_PATH)/scripts/texi2pod.pl $< qemu-img.pod && \
$(POD2MAN) --section=1 --center=" " --release=" " qemu-img.pod > $@, \
" GEN $@")
fsdev/virtfs-proxy-helper.1: fsdev/virtfs-proxy-helper.texi fsdev/virtfs-proxy-helper.1: fsdev/virtfs-proxy-helper.texi
$(call quiet-command, \
perl -Ww -- $(SRC_PATH)/scripts/texi2pod.pl $< fsdev/virtfs-proxy-helper.pod && \
$(POD2MAN) --section=1 --center=" " --release=" " fsdev/virtfs-proxy-helper.pod > $@, \
" GEN $@")
qemu-nbd.8: qemu-nbd.texi qemu-option-trace.texi qemu-nbd.8: qemu-nbd.texi qemu-option-trace.texi
$(call quiet-command, \
perl -Ww -- $(SRC_PATH)/scripts/texi2pod.pl $< qemu-nbd.pod && \
$(POD2MAN) --section=8 --center=" " --release=" " qemu-nbd.pod > $@, \
" GEN $@")
qemu-ga.8: qemu-ga.texi qemu-ga.8: qemu-ga.texi
$(call quiet-command, \
perl -Ww -- $(SRC_PATH)/scripts/texi2pod.pl $< qemu-ga.pod && \
$(POD2MAN) --section=8 --center=" " --release=" " qemu-ga.pod > $@, \
" GEN $@")
html: qemu-doc.html docs/interop/qemu-qmp-ref.html docs/interop/qemu-ga-ref.html dvi: qemu-doc.dvi qemu-tech.dvi
info: qemu-doc.info docs/interop/qemu-qmp-ref.info docs/interop/qemu-ga-ref.info html: qemu-doc.html qemu-tech.html
pdf: qemu-doc.pdf docs/interop/qemu-qmp-ref.pdf docs/interop/qemu-ga-ref.pdf info: qemu-doc.info qemu-tech.info
txt: qemu-doc.txt docs/interop/qemu-qmp-ref.txt docs/interop/qemu-ga-ref.txt pdf: qemu-doc.pdf qemu-tech.pdf
qemu-doc.html qemu-doc.info qemu-doc.pdf qemu-doc.txt: \ qemu-doc.dvi qemu-doc.html qemu-doc.info qemu-doc.pdf: \
qemu-img.texi qemu-nbd.texi qemu-options.texi qemu-option-trace.texi \ qemu-img.texi qemu-nbd.texi qemu-options.texi qemu-option-trace.texi \
qemu-monitor.texi qemu-img-cmds.texi qemu-ga.texi \ qemu-monitor.texi qemu-img-cmds.texi qemu-ga.texi \
qemu-monitor-info.texi qemu-monitor-info.texi
docs/interop/qemu-ga-ref.dvi docs/interop/qemu-ga-ref.html \
docs/interop/qemu-ga-ref.info docs/interop/qemu-ga-ref.pdf \
docs/interop/qemu-ga-ref.txt docs/interop/qemu-ga-ref.7: \
docs/interop/qemu-ga-ref.texi docs/interop/qemu-ga-qapi.texi
docs/interop/qemu-qmp-ref.dvi docs/interop/qemu-qmp-ref.html \
docs/interop/qemu-qmp-ref.info docs/interop/qemu-qmp-ref.pdf \
docs/interop/qemu-qmp-ref.txt docs/interop/qemu-qmp-ref.7: \
docs/interop/qemu-qmp-ref.texi docs/interop/qemu-qmp-qapi.texi
ifdef CONFIG_WIN32 ifdef CONFIG_WIN32
INSTALLER = qemu-setup-$(VERSION)$(EXESUF) INSTALLER = qemu-setup-$(VERSION)$(EXESUF)
@@ -798,55 +660,12 @@ endif # CONFIG_WIN
# Add a dependency on the generated files, so that they are always # Add a dependency on the generated files, so that they are always
# rebuilt before other object files # rebuilt before other object files
ifneq ($(wildcard config-host.mak),)
ifneq ($(filter-out $(UNCHECKED_GOALS),$(MAKECMDGOALS)),$(if $(MAKECMDGOALS),,fail)) ifneq ($(filter-out $(UNCHECKED_GOALS),$(MAKECMDGOALS)),$(if $(MAKECMDGOALS),,fail))
Makefile: $(GENERATED_FILES) Makefile: $(GENERATED_HEADERS)
endif endif
endif
.SECONDARY: $(TRACE_HEADERS) $(TRACE_HEADERS:%=%-timestamp) \
$(TRACE_SOURCES) $(TRACE_SOURCES:%=%-timestamp) \
$(TRACE_DTRACE) $(TRACE_DTRACE:%=%-timestamp)
# Include automatically generated dependency files # Include automatically generated dependency files
# Dependencies in Makefile.objs files come from our recursive subdir rules # Dependencies in Makefile.objs files come from our recursive subdir rules
-include $(wildcard *.d tests/*.d) -include $(wildcard *.d tests/*.d)
include $(SRC_PATH)/tests/docker/Makefile.include include $(SRC_PATH)/tests/docker/Makefile.include
.PHONY: help
help:
@echo 'Generic targets:'
@echo ' all - Build all'
@echo ' dir/file.o - Build specified target only'
@echo ' install - Install QEMU, documentation and tools'
@echo ' ctags/TAGS - Generate tags file for editors'
@echo ' cscope - Generate cscope index'
@echo ''
@$(if $(TARGET_DIRS), \
echo 'Architecture specific targets:'; \
$(foreach t, $(TARGET_DIRS), \
printf " %-30s - Build for %s\\n" $(patsubst %,subdir-%,$(t)) $(t);) \
echo '')
@echo 'Cleaning targets:'
@echo ' clean - Remove most generated files but keep the config'
@echo ' distclean - Remove all generated files'
@echo ' dist - Build a distributable tarball'
@echo ''
@echo 'Test targets:'
@echo ' check - Run all tests (check-help for details)'
@echo ' docker - Help about targets running tests inside Docker containers'
@echo ''
@echo 'Documentation targets:'
@echo ' html info pdf txt'
@echo ' - Build documentation in specified format'
@echo ''
ifdef CONFIG_WIN32
@echo 'Windows targets:'
@echo ' installer - Build NSIS-based installer for QEMU'
ifdef QEMU_GA_MSI_ENABLED
@echo ' msi - Build MSI-based installer for qemu-ga'
endif
@echo ''
endif
@echo ' make V=0|1 [targets] 0 => quiet build (default), 1 => verbose build'

View File

@@ -4,16 +4,17 @@ stub-obj-y = stubs/ crypto/
util-obj-y = util/ qobject/ qapi/ util-obj-y = util/ qobject/ qapi/
util-obj-y += qmp-introspect.o qapi-types.o qapi-visit.o qapi-event.o util-obj-y += qmp-introspect.o qapi-types.o qapi-visit.o qapi-event.o
chardev-obj-y = chardev/
####################################################################### #######################################################################
# block-obj-y is code used by both qemu system emulation and qemu-img # block-obj-y is code used by both qemu system emulation and qemu-img
block-obj-y = async.o thread-pool.o
block-obj-y += nbd/ block-obj-y += nbd/
block-obj-y += block.o blockjob.o block-obj-y += block.o blockjob.o
block-obj-y += main-loop.o iohandler.o qemu-timer.o
block-obj-$(CONFIG_POSIX) += aio-posix.o
block-obj-$(CONFIG_WIN32) += aio-win32.o
block-obj-y += block/ block-obj-y += block/
block-obj-y += qemu-io-cmds.o block-obj-y += qemu-io-cmds.o
block-obj-$(CONFIG_REPLICATION) += replication.o
block-obj-m = block/ block-obj-m = block/
@@ -49,9 +50,14 @@ common-obj-$(CONFIG_POSIX) += os-posix.o
common-obj-$(CONFIG_LINUX) += fsdev/ common-obj-$(CONFIG_LINUX) += fsdev/
common-obj-y += migration/ common-obj-y += migration/
common-obj-y += qemu-char.o #aio.o
common-obj-y += page_cache.o
common-obj-$(CONFIG_SPICE) += spice-qemu-char.o
common-obj-y += audio/ common-obj-y += audio/
common-obj-y += hw/ common-obj-y += hw/
common-obj-y += accel.o
common-obj-y += replay/ common-obj-y += replay/
@@ -67,7 +73,6 @@ common-obj-y += tpm.o
common-obj-$(CONFIG_SLIRP) += slirp/ common-obj-$(CONFIG_SLIRP) += slirp/
common-obj-y += backends/ common-obj-y += backends/
common-obj-y += chardev/
common-obj-$(CONFIG_SECCOMP) += qemu-seccomp.o common-obj-$(CONFIG_SECCOMP) += qemu-seccomp.o
@@ -83,7 +88,7 @@ endif
####################################################################### #######################################################################
# Target-independent parts used in system and user emulation # Target-independent parts used in system and user emulation
common-obj-y += cpus-common.o common-obj-y += tcg-runtime.o
common-obj-y += hw/ common-obj-y += hw/
common-obj-y += qom/ common-obj-y += qom/
common-obj-y += disas/ common-obj-y += disas/
@@ -91,6 +96,7 @@ common-obj-y += disas/
###################################################################### ######################################################################
# Resource file for Windows executables # Resource file for Windows executables
version-obj-$(CONFIG_WIN32) += $(BUILD_DIR)/version.o version-obj-$(CONFIG_WIN32) += $(BUILD_DIR)/version.o
version-lobj-$(CONFIG_WIN32) += $(BUILD_DIR)/version.lo
###################################################################### ######################################################################
# tracing # tracing
@@ -109,70 +115,47 @@ qga-vss-dll-obj-y = qga/
# contrib # contrib
ivshmem-client-obj-y = contrib/ivshmem-client/ ivshmem-client-obj-y = contrib/ivshmem-client/
ivshmem-server-obj-y = contrib/ivshmem-server/ ivshmem-server-obj-y = contrib/ivshmem-server/
libvhost-user-obj-y = contrib/libvhost-user/
vhost-user-scsi.o-cflags := $(LIBISCSI_CFLAGS)
vhost-user-scsi.o-libs := $(LIBISCSI_LIBS)
vhost-user-scsi-obj-y = contrib/vhost-user-scsi/
vhost-user-scsi-obj-y += contrib/libvhost-user/libvhost-user.o
###################################################################### ######################################################################
trace-events-subdirs = trace-events-y = trace-events
trace-events-subdirs += util trace-events-y += util/trace-events
trace-events-subdirs += crypto trace-events-y += crypto/trace-events
trace-events-subdirs += io trace-events-y += io/trace-events
trace-events-subdirs += migration trace-events-y += migration/trace-events
trace-events-subdirs += block trace-events-y += block/trace-events
trace-events-subdirs += backends trace-events-y += hw/block/trace-events
trace-events-subdirs += chardev trace-events-y += hw/char/trace-events
trace-events-subdirs += hw/block trace-events-y += hw/intc/trace-events
trace-events-subdirs += hw/block/dataplane trace-events-y += hw/net/trace-events
trace-events-subdirs += hw/char trace-events-y += hw/virtio/trace-events
trace-events-subdirs += hw/intc trace-events-y += hw/audio/trace-events
trace-events-subdirs += hw/net trace-events-y += hw/misc/trace-events
trace-events-subdirs += hw/virtio trace-events-y += hw/usb/trace-events
trace-events-subdirs += hw/audio trace-events-y += hw/scsi/trace-events
trace-events-subdirs += hw/misc trace-events-y += hw/nvram/trace-events
trace-events-subdirs += hw/usb trace-events-y += hw/display/trace-events
trace-events-subdirs += hw/scsi trace-events-y += hw/input/trace-events
trace-events-subdirs += hw/nvram trace-events-y += hw/timer/trace-events
trace-events-subdirs += hw/display trace-events-y += hw/dma/trace-events
trace-events-subdirs += hw/input trace-events-y += hw/sparc/trace-events
trace-events-subdirs += hw/timer trace-events-y += hw/sd/trace-events
trace-events-subdirs += hw/dma trace-events-y += hw/isa/trace-events
trace-events-subdirs += hw/sparc trace-events-y += hw/i386/trace-events
trace-events-subdirs += hw/sd trace-events-y += hw/9pfs/trace-events
trace-events-subdirs += hw/isa trace-events-y += hw/ppc/trace-events
trace-events-subdirs += hw/mem trace-events-y += hw/pci/trace-events
trace-events-subdirs += hw/i386 trace-events-y += hw/s390x/trace-events
trace-events-subdirs += hw/i386/xen trace-events-y += hw/vfio/trace-events
trace-events-subdirs += hw/9pfs trace-events-y += hw/acpi/trace-events
trace-events-subdirs += hw/ppc trace-events-y += hw/arm/trace-events
trace-events-subdirs += hw/pci trace-events-y += hw/alpha/trace-events
trace-events-subdirs += hw/s390x trace-events-y += ui/trace-events
trace-events-subdirs += hw/vfio trace-events-y += audio/trace-events
trace-events-subdirs += hw/acpi trace-events-y += net/trace-events
trace-events-subdirs += hw/arm trace-events-y += target-i386/trace-events
trace-events-subdirs += hw/alpha trace-events-y += target-sparc/trace-events
trace-events-subdirs += hw/xen trace-events-y += target-s390x/trace-events
trace-events-subdirs += ui trace-events-y += target-ppc/trace-events
trace-events-subdirs += audio trace-events-y += qom/trace-events
trace-events-subdirs += net trace-events-y += linux-user/trace-events
trace-events-subdirs += target/arm
trace-events-subdirs += target/i386
trace-events-subdirs += target/mips
trace-events-subdirs += target/sparc
trace-events-subdirs += target/s390x
trace-events-subdirs += target/ppc
trace-events-subdirs += qom
trace-events-subdirs += linux-user
trace-events-subdirs += qapi
trace-events-subdirs += accel/tcg
trace-events-subdirs += accel/kvm
trace-events-files = $(SRC_PATH)/trace-events $(trace-events-subdirs:%=$(SRC_PATH)/%/trace-events)
trace-obj-y = trace-root.o
trace-obj-y += $(trace-events-subdirs:%=%/trace.o)
trace-obj-$(CONFIG_TRACE_UST) += trace-ust-all.o
trace-obj-$(CONFIG_TRACE_DTRACE) += trace-dtrace-root.o
trace-obj-$(CONFIG_TRACE_DTRACE) += $(trace-events-subdirs:%=%/trace-dtrace.o)

View File

@@ -11,7 +11,7 @@ $(call set-vpath, $(SRC_PATH):$(BUILD_DIR))
ifdef CONFIG_LINUX ifdef CONFIG_LINUX
QEMU_CFLAGS += -I../linux-headers QEMU_CFLAGS += -I../linux-headers
endif endif
QEMU_CFLAGS += -I.. -I$(SRC_PATH)/target/$(TARGET_BASE_ARCH) -DNEED_CPU_H QEMU_CFLAGS += -I.. -I$(SRC_PATH)/target-$(TARGET_BASE_ARCH) -DNEED_CPU_H
QEMU_CFLAGS+=-I$(SRC_PATH)/include QEMU_CFLAGS+=-I$(SRC_PATH)/include
@@ -26,7 +26,7 @@ ifneq (,$(findstring -mwindows,$(libs_softmmu)))
# Terminate program name with a 'w' because the linker builds a windows executable. # Terminate program name with a 'w' because the linker builds a windows executable.
QEMU_PROGW=qemu-system-$(TARGET_NAME)w$(EXESUF) QEMU_PROGW=qemu-system-$(TARGET_NAME)w$(EXESUF)
$(QEMU_PROG): $(QEMU_PROGW) $(QEMU_PROG): $(QEMU_PROGW)
$(call quiet-command,$(OBJCOPY) --subsystem console $(QEMU_PROGW) $(QEMU_PROG),"GEN","$(TARGET_DIR)$(QEMU_PROG)") $(call quiet-command,$(OBJCOPY) --subsystem console $(QEMU_PROGW) $(QEMU_PROG)," GEN $(TARGET_DIR)$(QEMU_PROG)")
QEMU_PROG_BUILD = $(QEMU_PROGW) QEMU_PROG_BUILD = $(QEMU_PROGW)
else else
QEMU_PROG_BUILD = $(QEMU_PROG) QEMU_PROG_BUILD = $(QEMU_PROG)
@@ -36,6 +36,10 @@ endif
PROGS=$(QEMU_PROG) $(QEMU_PROGW) PROGS=$(QEMU_PROG) $(QEMU_PROGW)
STPFILES= STPFILES=
ifdef CONFIG_LINUX_USER
PROGS+=$(QEMU_PROG)-binfmt
endif
config-target.h: config-target.h-timestamp config-target.h: config-target.h-timestamp
config-target.h-timestamp: config-target.mak config-target.h-timestamp: config-target.mak
@@ -50,36 +54,32 @@ endif
$(QEMU_PROG).stp-installed: $(BUILD_DIR)/trace-events-all $(QEMU_PROG).stp-installed: $(BUILD_DIR)/trace-events-all
$(call quiet-command,$(TRACETOOL) \ $(call quiet-command,$(TRACETOOL) \
--group=all \
--format=stap \ --format=stap \
--backends=$(TRACE_BACKENDS) \ --backends=$(TRACE_BACKENDS) \
--binary=$(bindir)/$(QEMU_PROG) \ --binary=$(bindir)/$(QEMU_PROG) \
--target-name=$(TARGET_NAME) \ --target-name=$(TARGET_NAME) \
--target-type=$(TARGET_TYPE) \ --target-type=$(TARGET_TYPE) \
$< > $@,"GEN","$(TARGET_DIR)$(QEMU_PROG).stp-installed") < $< > $@," GEN $(TARGET_DIR)$(QEMU_PROG).stp-installed")
$(QEMU_PROG).stp: $(BUILD_DIR)/trace-events-all $(QEMU_PROG).stp: $(BUILD_DIR)/trace-events-all
$(call quiet-command,$(TRACETOOL) \ $(call quiet-command,$(TRACETOOL) \
--group=all \
--format=stap \ --format=stap \
--backends=$(TRACE_BACKENDS) \ --backends=$(TRACE_BACKENDS) \
--binary=$(realpath .)/$(QEMU_PROG) \ --binary=$(realpath .)/$(QEMU_PROG) \
--target-name=$(TARGET_NAME) \ --target-name=$(TARGET_NAME) \
--target-type=$(TARGET_TYPE) \ --target-type=$(TARGET_TYPE) \
$< > $@,"GEN","$(TARGET_DIR)$(QEMU_PROG).stp") < $< > $@," GEN $(TARGET_DIR)$(QEMU_PROG).stp")
$(QEMU_PROG)-simpletrace.stp: $(BUILD_DIR)/trace-events-all $(QEMU_PROG)-simpletrace.stp: $(BUILD_DIR)/trace-events-all
$(call quiet-command,$(TRACETOOL) \ $(call quiet-command,$(TRACETOOL) \
--group=all \
--format=simpletrace-stap \ --format=simpletrace-stap \
--backends=$(TRACE_BACKENDS) \ --backends=$(TRACE_BACKENDS) \
--probe-prefix=qemu.$(TARGET_TYPE).$(TARGET_NAME) \ --probe-prefix=qemu.$(TARGET_TYPE).$(TARGET_NAME) \
$< > $@,"GEN","$(TARGET_DIR)$(QEMU_PROG)-simpletrace.stp") < $< > $@," GEN $(TARGET_DIR)$(QEMU_PROG)-simpletrace.stp")
else else
stap: stap:
endif endif
.PHONY: stap
all: $(PROGS) stap all: $(PROGS) stap
@@ -88,17 +88,18 @@ all: $(PROGS) stap
######################################################### #########################################################
# cpu emulator library # cpu emulator library
obj-y += exec.o obj-y = exec.o translate-all.o cpu-exec.o
obj-y += accel/ obj-y += translate-common.o
obj-y += cpu-exec-common.o
obj-y += tcg/tcg.o tcg/tcg-op.o tcg/optimize.o obj-y += tcg/tcg.o tcg/tcg-op.o tcg/optimize.o
obj-y += tcg/tcg-common.o tcg/tcg-runtime.o obj-$(CONFIG_TCG_INTERPRETER) += tci.o
obj-$(CONFIG_TCG_INTERPRETER) += tcg/tci.o obj-y += tcg/tcg-common.o
obj-$(CONFIG_TCG_INTERPRETER) += disas/tci.o obj-$(CONFIG_TCG_INTERPRETER) += disas/tci.o
obj-y += fpu/softfloat.o obj-y += fpu/softfloat.o
obj-y += target/$(TARGET_BASE_ARCH)/ obj-y += target-$(TARGET_BASE_ARCH)/
obj-y += disas.o obj-y += disas.o
obj-$(call notempty,$(TARGET_XML_FILES)) += gdbstub-xml.o obj-$(call notempty,$(TARGET_XML_FILES)) += gdbstub-xml.o
obj-$(call lnot,$(CONFIG_HAX)) += hax-stub.o obj-$(call lnot,$(CONFIG_KVM)) += kvm-stub.o
obj-$(CONFIG_LIBDECNUMBER) += libdecnumber/decContext.o obj-$(CONFIG_LIBDECNUMBER) += libdecnumber/decContext.o
obj-$(CONFIG_LIBDECNUMBER) += libdecnumber/decNumber.o obj-$(CONFIG_LIBDECNUMBER) += libdecnumber/decNumber.o
@@ -116,7 +117,9 @@ QEMU_CFLAGS+=-I$(SRC_PATH)/linux-user/$(TARGET_ABI_DIR) \
-I$(SRC_PATH)/linux-user -I$(SRC_PATH)/linux-user
obj-y += linux-user/ obj-y += linux-user/
obj-y += gdbstub.o thunk.o user-exec.o user-exec-stub.o obj-y += gdbstub.o thunk.o user-exec.o
obj-binfmt-y += linux-user/
endif #CONFIG_LINUX_USER endif #CONFIG_LINUX_USER
@@ -129,7 +132,7 @@ QEMU_CFLAGS+=-I$(SRC_PATH)/bsd-user -I$(SRC_PATH)/bsd-user/$(TARGET_ABI_DIR) \
-I$(SRC_PATH)/bsd-user/$(HOST_VARIANT_DIR) -I$(SRC_PATH)/bsd-user/$(HOST_VARIANT_DIR)
obj-y += bsd-user/ obj-y += bsd-user/
obj-y += gdbstub.o user-exec.o user-exec-stub.o obj-y += gdbstub.o user-exec.o
endif #CONFIG_BSD_USER endif #CONFIG_BSD_USER
@@ -139,12 +142,19 @@ ifdef CONFIG_SOFTMMU
obj-y += arch_init.o cpus.o monitor.o gdbstub.o balloon.o ioport.o numa.o obj-y += arch_init.o cpus.o monitor.o gdbstub.o balloon.o ioport.o numa.o
obj-y += qtest.o bootdevice.o obj-y += qtest.o bootdevice.o
obj-y += hw/ obj-y += hw/
obj-y += memory.o obj-$(CONFIG_KVM) += kvm-all.o
obj-y += memory.o cputlb.o
obj-y += memory_mapping.o obj-y += memory_mapping.o
obj-y += dump.o obj-y += dump.o
obj-y += migration/ram.o obj-y += migration/ram.o migration/savevm.o
LIBS := $(libs_softmmu) $(LIBS) LIBS := $(libs_softmmu) $(LIBS)
# xen support
obj-$(CONFIG_XEN) += xen-common.o
obj-$(CONFIG_XEN_I386) += xen-hvm.o xen-mapcache.o
obj-$(call lnot,$(CONFIG_XEN)) += xen-common-stub.o
obj-$(call lnot,$(CONFIG_XEN_I386)) += xen-hvm-stub.o
# Hardware support # Hardware support
ifeq ($(TARGET_NAME), sparc64) ifeq ($(TARGET_NAME), sparc64)
obj-y += hw/sparc64/ obj-y += hw/sparc64/
@@ -152,27 +162,29 @@ else
obj-y += hw/$(TARGET_BASE_ARCH)/ obj-y += hw/$(TARGET_BASE_ARCH)/
endif endif
GENERATED_FILES += hmp-commands.h hmp-commands-info.h GENERATED_HEADERS += hmp-commands.h hmp-commands-info.h qmp-commands-old.h
endif # CONFIG_SOFTMMU endif # CONFIG_SOFTMMU
# Workaround for http://gcc.gnu.org/PR55489, see configure. # Workaround for http://gcc.gnu.org/PR55489, see configure.
%/translate.o: QEMU_CFLAGS += $(TRANSLATE_OPT_CFLAGS) %/translate.o: QEMU_CFLAGS += $(TRANSLATE_OPT_CFLAGS)
ifdef CONFIG_LINUX_USER
dummy := $(call unnest-vars,,obj-y obj-binfmt-y)
else
dummy := $(call unnest-vars,,obj-y) dummy := $(call unnest-vars,,obj-y)
endif
all-obj-y := $(obj-y) all-obj-y := $(obj-y)
target-obj-y := target-obj-y :=
block-obj-y := block-obj-y :=
common-obj-y := common-obj-y :=
chardev-obj-y :=
include $(SRC_PATH)/Makefile.objs include $(SRC_PATH)/Makefile.objs
dummy := $(call unnest-vars,,target-obj-y) dummy := $(call unnest-vars,,target-obj-y)
target-obj-y-save := $(target-obj-y) target-obj-y-save := $(target-obj-y)
dummy := $(call unnest-vars,.., \ dummy := $(call unnest-vars,.., \
block-obj-y \ block-obj-y \
block-obj-m \ block-obj-m \
chardev-obj-y \
crypto-obj-y \ crypto-obj-y \
crypto-aes-obj-y \ crypto-aes-obj-y \
qom-obj-y \ qom-obj-y \
@@ -183,36 +195,40 @@ target-obj-y := $(target-obj-y-save)
all-obj-y += $(common-obj-y) all-obj-y += $(common-obj-y)
all-obj-y += $(target-obj-y) all-obj-y += $(target-obj-y)
all-obj-y += $(qom-obj-y) all-obj-y += $(qom-obj-y)
all-obj-$(CONFIG_SOFTMMU) += $(block-obj-y) $(chardev-obj-y) all-obj-$(CONFIG_SOFTMMU) += $(block-obj-y)
all-obj-$(CONFIG_USER_ONLY) += $(crypto-aes-obj-y) all-obj-$(CONFIG_USER_ONLY) += $(crypto-aes-obj-y)
all-obj-$(CONFIG_SOFTMMU) += $(crypto-obj-y) all-obj-$(CONFIG_SOFTMMU) += $(crypto-obj-y)
all-obj-$(CONFIG_SOFTMMU) += $(io-obj-y) all-obj-$(CONFIG_SOFTMMU) += $(io-obj-y)
$(QEMU_PROG_BUILD): config-devices.mak $(QEMU_PROG_BUILD): config-devices.mak
COMMON_LDADDS = ../libqemuutil.a ../libqemustub.a
# build either PROG or PROGW # build either PROG or PROGW
$(QEMU_PROG_BUILD): $(all-obj-y) $(COMMON_LDADDS) $(QEMU_PROG_BUILD): $(all-obj-y) ../libqemuutil.a ../libqemustub.a
$(call LINK, $(filter-out %.mak, $^)) $(call LINK, $(filter-out %.mak, $^))
ifdef CONFIG_DARWIN ifdef CONFIG_DARWIN
$(call quiet-command,Rez -append $(SRC_PATH)/pc-bios/qemu.rsrc -o $@,"REZ","$(TARGET_DIR)$@") $(call quiet-command,Rez -append $(SRC_PATH)/pc-bios/qemu.rsrc -o $@," REZ $(TARGET_DIR)$@")
$(call quiet-command,SetFile -a C $@,"SETFILE","$(TARGET_DIR)$@") $(call quiet-command,SetFile -a C $@," SETFILE $(TARGET_DIR)$@")
endif endif
$(QEMU_PROG)-binfmt: $(obj-binfmt-y)
$(call LINK,$^)
gdbstub-xml.c: $(TARGET_XML_FILES) $(SRC_PATH)/scripts/feature_to_c.sh gdbstub-xml.c: $(TARGET_XML_FILES) $(SRC_PATH)/scripts/feature_to_c.sh
$(call quiet-command,rm -f $@ && $(SHELL) $(SRC_PATH)/scripts/feature_to_c.sh $@ $(TARGET_XML_FILES),"GEN","$(TARGET_DIR)$@") $(call quiet-command,rm -f $@ && $(SHELL) $(SRC_PATH)/scripts/feature_to_c.sh $@ $(TARGET_XML_FILES)," GEN $(TARGET_DIR)$@")
hmp-commands.h: $(SRC_PATH)/hmp-commands.hx $(SRC_PATH)/scripts/hxtool hmp-commands.h: $(SRC_PATH)/hmp-commands.hx $(SRC_PATH)/scripts/hxtool
$(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -h < $< > $@,"GEN","$(TARGET_DIR)$@") $(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -h < $< > $@," GEN $(TARGET_DIR)$@")
hmp-commands-info.h: $(SRC_PATH)/hmp-commands-info.hx $(SRC_PATH)/scripts/hxtool hmp-commands-info.h: $(SRC_PATH)/hmp-commands-info.hx $(SRC_PATH)/scripts/hxtool
$(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -h < $< > $@,"GEN","$(TARGET_DIR)$@") $(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -h < $< > $@," GEN $(TARGET_DIR)$@")
clean: clean-target qmp-commands-old.h: $(SRC_PATH)/qmp-commands.hx $(SRC_PATH)/scripts/hxtool
$(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -h < $< > $@," GEN $(TARGET_DIR)$@")
clean:
rm -f *.a *~ $(PROGS) rm -f *.a *~ $(PROGS)
rm -f $(shell find . -name '*.[od]') rm -f $(shell find . -name '*.[od]')
rm -f hmp-commands.h gdbstub-xml.c rm -f hmp-commands.h qmp-commands-old.h gdbstub-xml.c
ifdef CONFIG_TRACE_SYSTEMTAP ifdef CONFIG_TRACE_SYSTEMTAP
rm -f *.stp rm -f *.stp
endif endif
@@ -227,5 +243,5 @@ ifdef CONFIG_TRACE_SYSTEMTAP
$(INSTALL_DATA) $(QEMU_PROG)-simpletrace.stp "$(DESTDIR)$(qemu_datadir)/../systemtap/tapset/$(QEMU_PROG)-simpletrace.stp" $(INSTALL_DATA) $(QEMU_PROG)-simpletrace.stp "$(DESTDIR)$(qemu_datadir)/../systemtap/tapset/$(QEMU_PROG)-simpletrace.stp"
endif endif
GENERATED_FILES += config-target.h GENERATED_HEADERS += config-target.h
Makefile: $(GENERATED_FILES) Makefile: $(GENERATED_HEADERS)

3
README
View File

@@ -42,10 +42,11 @@ of other UNIX targets. The simple steps to build QEMU are:
../configure ../configure
make make
Complete details of the process for building and configuring QEMU for
all supported host platforms can be found in the qemu-tech.html file.
Additional information can also be found online via the QEMU website: Additional information can also be found online via the QEMU website:
http://qemu-project.org/Hosts/Linux http://qemu-project.org/Hosts/Linux
http://qemu-project.org/Hosts/Mac
http://qemu-project.org/Hosts/W32 http://qemu-project.org/Hosts/W32

View File

@@ -1 +1 @@
2.9.50 2.7.0

View File

@@ -33,6 +33,16 @@
#include "sysemu/qtest.h" #include "sysemu/qtest.h"
#include "hw/xen/xen.h" #include "hw/xen/xen.h"
#include "qom/object.h" #include "qom/object.h"
#include "hw/boards.h"
int tcg_tb_size;
static bool tcg_allowed = true;
static int tcg_init(MachineState *ms)
{
tcg_exec_init(tcg_tb_size * 1024 * 1024);
return 0;
}
static const TypeInfo accel_type = { static const TypeInfo accel_type = {
.name = TYPE_ACCEL, .name = TYPE_ACCEL,
@@ -120,9 +130,27 @@ void configure_accelerator(MachineState *ms)
} }
} }
static void tcg_accel_class_init(ObjectClass *oc, void *data)
{
AccelClass *ac = ACCEL_CLASS(oc);
ac->name = "tcg";
ac->init_machine = tcg_init;
ac->allowed = &tcg_allowed;
}
#define TYPE_TCG_ACCEL ACCEL_CLASS_NAME("tcg")
static const TypeInfo tcg_accel_type = {
.name = TYPE_TCG_ACCEL,
.parent = TYPE_ACCEL,
.class_init = tcg_accel_class_init,
};
static void register_accel_types(void) static void register_accel_types(void)
{ {
type_register_static(&accel_type); type_register_static(&accel_type);
type_register_static(&tcg_accel_type);
} }
type_init(register_accel_types); type_init(register_accel_types);

View File

@@ -1,4 +0,0 @@
obj-$(CONFIG_SOFTMMU) += accel.o
obj-y += kvm/
obj-y += tcg/
obj-y += stubs/

View File

@@ -1 +0,0 @@
obj-$(CONFIG_KVM) += kvm-all.o

View File

@@ -1,15 +0,0 @@
# Trace events for debugging and performance instrumentation
# kvm-all.c
kvm_ioctl(int type, void *arg) "type 0x%x, arg %p"
kvm_vm_ioctl(int type, void *arg) "type 0x%x, arg %p"
kvm_vcpu_ioctl(int cpu_index, int type, void *arg) "cpu_index %d, type 0x%x, arg %p"
kvm_run_exit(int cpu_index, uint32_t reason) "cpu_index %d, reason %d"
kvm_device_ioctl(int fd, int type, void *arg) "dev fd %d, type 0x%x, arg %p"
kvm_failed_reg_get(uint64_t id, const char *msg) "Warning: Unable to retrieve ONEREG %" PRIu64 " from KVM: %s"
kvm_failed_reg_set(uint64_t id, const char *msg) "Warning: Unable to set ONEREG %" PRIu64 " to KVM: %s"
kvm_irqchip_commit_routes(void) ""
kvm_irqchip_add_msi_route(char *name, int vector, int virq) "dev %s vector %d virq %d"
kvm_irqchip_update_msi_route(int virq) "Updating MSI route virq=%d"
kvm_irqchip_release_virq(int virq) "virq %d"

View File

@@ -1 +0,0 @@
obj-$(call lnot,$(CONFIG_KVM)) += kvm-stub.o

View File

@@ -1,3 +0,0 @@
obj-$(CONFIG_SOFTMMU) += tcg-all.o
obj-$(CONFIG_SOFTMMU) += cputlb.o
obj-y += cpu-exec.o cpu-exec-common.o translate-all.o translate-common.o

File diff suppressed because it is too large Load Diff

View File

@@ -1,61 +0,0 @@
/*
* QEMU System Emulator, accelerator interfaces
*
* Copyright (c) 2003-2008 Fabrice Bellard
* Copyright (c) 2014 Red Hat Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*/
#include "qemu/osdep.h"
#include "sysemu/accel.h"
#include "sysemu/sysemu.h"
#include "qom/object.h"
int tcg_tb_size;
static bool tcg_allowed = true;
static int tcg_init(MachineState *ms)
{
tcg_exec_init(tcg_tb_size * 1024 * 1024);
return 0;
}
static void tcg_accel_class_init(ObjectClass *oc, void *data)
{
AccelClass *ac = ACCEL_CLASS(oc);
ac->name = "tcg";
ac->init_machine = tcg_init;
ac->allowed = &tcg_allowed;
}
#define TYPE_TCG_ACCEL ACCEL_CLASS_NAME("tcg")
static const TypeInfo tcg_accel_type = {
.name = TYPE_TCG_ACCEL,
.parent = TYPE_ACCEL,
.class_init = tcg_accel_class_init,
};
static void register_accel_types(void)
{
type_register_static(&tcg_accel_type);
}
type_init(register_accel_types);

View File

@@ -1,10 +0,0 @@
# Trace events for debugging and performance instrumentation
# TCG related tracing (mostly disabled by default)
# cpu-exec.c
disable exec_tb(void *tb, uintptr_t pc) "tb:%p pc=0x%"PRIxPTR
disable exec_tb_nocache(void *tb, uintptr_t pc) "tb:%p pc=0x%"PRIxPTR
disable exec_tb_exit(void *last_tb, unsigned int flags) "tb:%p flags=%x"
# translate-all.c
translate_block(void *tb, uintptr_t pc, uint8_t *tb_code) "tb:%p, pc:0x%"PRIxPTR", tb_code:%p"

View File

@@ -16,10 +16,8 @@
#include "qemu/osdep.h" #include "qemu/osdep.h"
#include "qemu-common.h" #include "qemu-common.h"
#include "block/block.h" #include "block/block.h"
#include "qemu/rcu_queue.h" #include "qemu/queue.h"
#include "qemu/sockets.h" #include "qemu/sockets.h"
#include "qemu/cutils.h"
#include "trace.h"
#ifdef CONFIG_EPOLL_CREATE1 #ifdef CONFIG_EPOLL_CREATE1
#include <sys/epoll.h> #include <sys/epoll.h>
#endif #endif
@@ -29,9 +27,6 @@ struct AioHandler
GPollFD pfd; GPollFD pfd;
IOHandler *io_read; IOHandler *io_read;
IOHandler *io_write; IOHandler *io_write;
AioPollFn *io_poll;
IOHandler *io_poll_begin;
IOHandler *io_poll_end;
int deleted; int deleted;
void *opaque; void *opaque;
bool is_external; bool is_external;
@@ -66,7 +61,7 @@ static bool aio_epoll_try_enable(AioContext *ctx)
AioHandler *node; AioHandler *node;
struct epoll_event event; struct epoll_event event;
QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) { QLIST_FOREACH(node, &ctx->aio_handlers, node) {
int r; int r;
if (node->deleted || !node->pfd.events) { if (node->deleted || !node->pfd.events) {
continue; continue;
@@ -86,23 +81,30 @@ static void aio_epoll_update(AioContext *ctx, AioHandler *node, bool is_new)
{ {
struct epoll_event event; struct epoll_event event;
int r; int r;
int ctl;
if (!ctx->epoll_enabled) { if (!ctx->epoll_enabled) {
return; return;
} }
if (!node->pfd.events) { if (!node->pfd.events) {
ctl = EPOLL_CTL_DEL; r = epoll_ctl(ctx->epollfd, EPOLL_CTL_DEL, node->pfd.fd, &event);
if (r) {
aio_epoll_disable(ctx);
}
} else { } else {
event.data.ptr = node; event.data.ptr = node;
event.events = epoll_events_from_pfd(node->pfd.events); event.events = epoll_events_from_pfd(node->pfd.events);
ctl = is_new ? EPOLL_CTL_ADD : EPOLL_CTL_MOD; if (is_new) {
} r = epoll_ctl(ctx->epollfd, EPOLL_CTL_ADD, node->pfd.fd, &event);
r = epoll_ctl(ctx->epollfd, ctl, node->pfd.fd, &event);
if (r) { if (r) {
aio_epoll_disable(ctx); aio_epoll_disable(ctx);
} }
} else {
r = epoll_ctl(ctx->epollfd, EPOLL_CTL_MOD, node->pfd.fd, &event);
if (r) {
aio_epoll_disable(ctx);
}
}
}
} }
static int aio_epoll(AioContext *ctx, GPollFD *pfds, static int aio_epoll(AioContext *ctx, GPollFD *pfds,
@@ -205,61 +207,45 @@ void aio_set_fd_handler(AioContext *ctx,
bool is_external, bool is_external,
IOHandler *io_read, IOHandler *io_read,
IOHandler *io_write, IOHandler *io_write,
AioPollFn *io_poll,
void *opaque) void *opaque)
{ {
AioHandler *node; AioHandler *node;
bool is_new = false; bool is_new = false;
bool deleted = false; bool deleted = false;
qemu_lockcnt_lock(&ctx->list_lock);
node = find_aio_handler(ctx, fd); node = find_aio_handler(ctx, fd);
/* Are we deleting the fd handler? */ /* Are we deleting the fd handler? */
if (!io_read && !io_write && !io_poll) { if (!io_read && !io_write) {
if (node == NULL) { if (node) {
qemu_lockcnt_unlock(&ctx->list_lock);
return;
}
g_source_remove_poll(&ctx->source, &node->pfd); g_source_remove_poll(&ctx->source, &node->pfd);
/* If the lock is held, just mark the node as deleted */ /* If the lock is held, just mark the node as deleted */
if (qemu_lockcnt_count(&ctx->list_lock)) { if (ctx->walking_handlers) {
node->deleted = 1; node->deleted = 1;
node->pfd.revents = 0; node->pfd.revents = 0;
} else { } else {
/* Otherwise, delete it for real. We can't just mark it as /* Otherwise, delete it for real. We can't just mark it as
* deleted because deleted nodes are only cleaned up while * deleted because deleted nodes are only cleaned up after
* no one is walking the handlers list. * releasing the walking_handlers lock.
*/ */
QLIST_REMOVE(node, node); QLIST_REMOVE(node, node);
deleted = true; deleted = true;
} }
if (!node->io_poll) {
ctx->poll_disable_cnt--;
} }
} else { } else {
if (node == NULL) { if (node == NULL) {
/* Alloc and insert if it's not already there */ /* Alloc and insert if it's not already there */
node = g_new0(AioHandler, 1); node = g_new0(AioHandler, 1);
node->pfd.fd = fd; node->pfd.fd = fd;
QLIST_INSERT_HEAD_RCU(&ctx->aio_handlers, node, node); QLIST_INSERT_HEAD(&ctx->aio_handlers, node, node);
g_source_add_poll(&ctx->source, &node->pfd); g_source_add_poll(&ctx->source, &node->pfd);
is_new = true; is_new = true;
ctx->poll_disable_cnt += !io_poll;
} else {
ctx->poll_disable_cnt += !io_poll - !node->io_poll;
} }
/* Update handler with latest information */ /* Update handler with latest information */
node->io_read = io_read; node->io_read = io_read;
node->io_write = io_write; node->io_write = io_write;
node->io_poll = io_poll;
node->opaque = opaque; node->opaque = opaque;
node->is_external = is_external; node->is_external = is_external;
@@ -268,127 +254,72 @@ void aio_set_fd_handler(AioContext *ctx,
} }
aio_epoll_update(ctx, node, is_new); aio_epoll_update(ctx, node, is_new);
qemu_lockcnt_unlock(&ctx->list_lock);
aio_notify(ctx); aio_notify(ctx);
if (deleted) { if (deleted) {
g_free(node); g_free(node);
} }
} }
void aio_set_fd_poll(AioContext *ctx, int fd,
IOHandler *io_poll_begin,
IOHandler *io_poll_end)
{
AioHandler *node = find_aio_handler(ctx, fd);
if (!node) {
return;
}
node->io_poll_begin = io_poll_begin;
node->io_poll_end = io_poll_end;
}
void aio_set_event_notifier(AioContext *ctx, void aio_set_event_notifier(AioContext *ctx,
EventNotifier *notifier, EventNotifier *notifier,
bool is_external, bool is_external,
EventNotifierHandler *io_read, EventNotifierHandler *io_read)
AioPollFn *io_poll)
{ {
aio_set_fd_handler(ctx, event_notifier_get_fd(notifier), is_external, aio_set_fd_handler(ctx, event_notifier_get_fd(notifier),
(IOHandler *)io_read, NULL, io_poll, notifier); is_external, (IOHandler *)io_read, NULL, notifier);
} }
void aio_set_event_notifier_poll(AioContext *ctx,
EventNotifier *notifier,
EventNotifierHandler *io_poll_begin,
EventNotifierHandler *io_poll_end)
{
aio_set_fd_poll(ctx, event_notifier_get_fd(notifier),
(IOHandler *)io_poll_begin,
(IOHandler *)io_poll_end);
}
static void poll_set_started(AioContext *ctx, bool started)
{
AioHandler *node;
if (started == ctx->poll_started) {
return;
}
ctx->poll_started = started;
qemu_lockcnt_inc(&ctx->list_lock);
QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) {
IOHandler *fn;
if (node->deleted) {
continue;
}
if (started) {
fn = node->io_poll_begin;
} else {
fn = node->io_poll_end;
}
if (fn) {
fn(node->opaque);
}
}
qemu_lockcnt_dec(&ctx->list_lock);
}
bool aio_prepare(AioContext *ctx) bool aio_prepare(AioContext *ctx)
{ {
/* Poll mode cannot be used with glib's event loop, disable it. */
poll_set_started(ctx, false);
return false; return false;
} }
bool aio_pending(AioContext *ctx) bool aio_pending(AioContext *ctx)
{ {
AioHandler *node; AioHandler *node;
bool result = false;
/* QLIST_FOREACH(node, &ctx->aio_handlers, node) {
* We have to walk very carefully in case aio_set_fd_handler is
* called while we're walking.
*/
qemu_lockcnt_inc(&ctx->list_lock);
QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) {
int revents; int revents;
revents = node->pfd.revents & node->pfd.events; revents = node->pfd.revents & node->pfd.events;
if (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR) && node->io_read && if (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR) && node->io_read &&
aio_node_check(ctx, node->is_external)) { aio_node_check(ctx, node->is_external)) {
result = true; return true;
break;
} }
if (revents & (G_IO_OUT | G_IO_ERR) && node->io_write && if (revents & (G_IO_OUT | G_IO_ERR) && node->io_write &&
aio_node_check(ctx, node->is_external)) { aio_node_check(ctx, node->is_external)) {
result = true; return true;
break;
} }
} }
qemu_lockcnt_dec(&ctx->list_lock);
return result;
}
static bool aio_dispatch_handlers(AioContext *ctx) return false;
}
bool aio_dispatch(AioContext *ctx)
{ {
AioHandler *node, *tmp; AioHandler *node;
bool progress = false; bool progress = false;
QLIST_FOREACH_SAFE_RCU(node, &ctx->aio_handlers, node, tmp) { /*
* If there are callbacks left that have been queued, we need to call them.
* Do not call select in this case, because it is possible that the caller
* does not need a complete flush (as is the case for aio_poll loops).
*/
if (aio_bh_poll(ctx)) {
progress = true;
}
/*
* We have to walk very carefully in case aio_set_fd_handler is
* called while we're walking.
*/
node = QLIST_FIRST(&ctx->aio_handlers);
while (node) {
AioHandler *tmp;
int revents; int revents;
ctx->walking_handlers++;
revents = node->pfd.revents & node->pfd.events; revents = node->pfd.revents & node->pfd.events;
node->pfd.revents = 0; node->pfd.revents = 0;
@@ -411,28 +342,23 @@ static bool aio_dispatch_handlers(AioContext *ctx)
progress = true; progress = true;
} }
if (node->deleted) { tmp = node;
if (qemu_lockcnt_dec_if_lock(&ctx->list_lock)) { node = QLIST_NEXT(node, node);
QLIST_REMOVE(node, node);
g_free(node); ctx->walking_handlers--;
qemu_lockcnt_inc_and_unlock(&ctx->list_lock);
} if (!ctx->walking_handlers && tmp->deleted) {
QLIST_REMOVE(tmp, node);
g_free(tmp);
} }
} }
/* Run our timers */
progress |= timerlistgroup_run_timers(&ctx->tlg);
return progress; return progress;
} }
void aio_dispatch(AioContext *ctx)
{
qemu_lockcnt_inc(&ctx->list_lock);
aio_bh_poll(ctx);
aio_dispatch_handlers(ctx);
qemu_lockcnt_dec(&ctx->list_lock);
timerlistgroup_run_timers(&ctx->tlg);
}
/* These thread-local variables are used only in a small part of aio_poll /* These thread-local variables are used only in a small part of aio_poll
* around the call to the poll() system call. In particular they are not * around the call to the poll() system call. In particular they are not
* used while aio_poll is performing callbacks, which makes it much easier * used while aio_poll is performing callbacks, which makes it much easier
@@ -479,101 +405,15 @@ static void add_pollfd(AioHandler *node)
npfd++; npfd++;
} }
static bool run_poll_handlers_once(AioContext *ctx)
{
bool progress = false;
AioHandler *node;
QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) {
if (!node->deleted && node->io_poll &&
aio_node_check(ctx, node->is_external) &&
node->io_poll(node->opaque)) {
progress = true;
}
/* Caller handles freeing deleted nodes. Don't do it here. */
}
return progress;
}
/* run_poll_handlers:
* @ctx: the AioContext
* @max_ns: maximum time to poll for, in nanoseconds
*
* Polls for a given time.
*
* Note that ctx->notify_me must be non-zero so this function can detect
* aio_notify().
*
* Note that the caller must have incremented ctx->list_lock.
*
* Returns: true if progress was made, false otherwise
*/
static bool run_poll_handlers(AioContext *ctx, int64_t max_ns)
{
bool progress;
int64_t end_time;
assert(ctx->notify_me);
assert(qemu_lockcnt_count(&ctx->list_lock) > 0);
assert(ctx->poll_disable_cnt == 0);
trace_run_poll_handlers_begin(ctx, max_ns);
end_time = qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + max_ns;
do {
progress = run_poll_handlers_once(ctx);
} while (!progress && qemu_clock_get_ns(QEMU_CLOCK_REALTIME) < end_time);
trace_run_poll_handlers_end(ctx, progress);
return progress;
}
/* try_poll_mode:
* @ctx: the AioContext
* @blocking: busy polling is only attempted when blocking is true
*
* ctx->notify_me must be non-zero so this function can detect aio_notify().
*
* Note that the caller must have incremented ctx->list_lock.
*
* Returns: true if progress was made, false otherwise
*/
static bool try_poll_mode(AioContext *ctx, bool blocking)
{
if (blocking && ctx->poll_max_ns && ctx->poll_disable_cnt == 0) {
/* See qemu_soonest_timeout() uint64_t hack */
int64_t max_ns = MIN((uint64_t)aio_compute_timeout(ctx),
(uint64_t)ctx->poll_ns);
if (max_ns) {
poll_set_started(ctx, true);
if (run_poll_handlers(ctx, max_ns)) {
return true;
}
}
}
poll_set_started(ctx, false);
/* Even if we don't run busy polling, try polling once in case it can make
* progress and the caller will be able to avoid ppoll(2)/epoll_wait(2).
*/
return run_poll_handlers_once(ctx);
}
bool aio_poll(AioContext *ctx, bool blocking) bool aio_poll(AioContext *ctx, bool blocking)
{ {
AioHandler *node; AioHandler *node;
int i; int i, ret;
int ret = 0;
bool progress; bool progress;
int64_t timeout; int64_t timeout;
int64_t start = 0;
aio_context_acquire(ctx);
progress = false;
/* aio_notify can avoid the expensive event_notifier_set if /* aio_notify can avoid the expensive event_notifier_set if
* everything (file descriptors, bottom halves, timers) will * everything (file descriptors, bottom halves, timers) will
@@ -586,30 +426,25 @@ bool aio_poll(AioContext *ctx, bool blocking)
atomic_add(&ctx->notify_me, 2); atomic_add(&ctx->notify_me, 2);
} }
qemu_lockcnt_inc(&ctx->list_lock); ctx->walking_handlers++;
if (ctx->poll_max_ns) {
start = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
}
progress = try_poll_mode(ctx, blocking);
if (!progress) {
assert(npfd == 0); assert(npfd == 0);
/* fill pollfds */ /* fill pollfds */
QLIST_FOREACH(node, &ctx->aio_handlers, node) {
if (!aio_epoll_enabled(ctx)) {
QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) {
if (!node->deleted && node->pfd.events if (!node->deleted && node->pfd.events
&& !aio_epoll_enabled(ctx)
&& aio_node_check(ctx, node->is_external)) { && aio_node_check(ctx, node->is_external)) {
add_pollfd(node); add_pollfd(node);
} }
} }
}
timeout = blocking ? aio_compute_timeout(ctx) : 0; timeout = blocking ? aio_compute_timeout(ctx) : 0;
/* wait until next event */ /* wait until next event */
if (timeout) {
aio_context_release(ctx);
}
if (aio_epoll_check_poll(ctx, pollfds, npfd, timeout)) { if (aio_epoll_check_poll(ctx, pollfds, npfd, timeout)) {
AioHandler epoll_handler; AioHandler epoll_handler;
@@ -621,51 +456,11 @@ bool aio_poll(AioContext *ctx, bool blocking)
} else { } else {
ret = qemu_poll_ns(pollfds, npfd, timeout); ret = qemu_poll_ns(pollfds, npfd, timeout);
} }
}
if (blocking) { if (blocking) {
atomic_sub(&ctx->notify_me, 2); atomic_sub(&ctx->notify_me, 2);
} }
if (timeout) {
/* Adjust polling time */ aio_context_acquire(ctx);
if (ctx->poll_max_ns) {
int64_t block_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - start;
if (block_ns <= ctx->poll_ns) {
/* This is the sweet spot, no adjustment needed */
} else if (block_ns > ctx->poll_max_ns) {
/* We'd have to poll for too long, poll less */
int64_t old = ctx->poll_ns;
if (ctx->poll_shrink) {
ctx->poll_ns /= ctx->poll_shrink;
} else {
ctx->poll_ns = 0;
}
trace_poll_shrink(ctx, old, ctx->poll_ns);
} else if (ctx->poll_ns < ctx->poll_max_ns &&
block_ns < ctx->poll_max_ns) {
/* There is room to grow, poll longer */
int64_t old = ctx->poll_ns;
int64_t grow = ctx->poll_grow;
if (grow == 0) {
grow = 2;
}
if (ctx->poll_ns) {
ctx->poll_ns *= grow;
} else {
ctx->poll_ns = 4000; /* start polling at 4 microseconds */
}
if (ctx->poll_ns > ctx->poll_max_ns) {
ctx->poll_ns = ctx->poll_max_ns;
}
trace_poll_grow(ctx, old, ctx->poll_ns);
}
} }
aio_notify_accept(ctx); aio_notify_accept(ctx);
@@ -678,29 +473,20 @@ bool aio_poll(AioContext *ctx, bool blocking)
} }
npfd = 0; npfd = 0;
ctx->walking_handlers--;
progress |= aio_bh_poll(ctx); /* Run dispatch even if there were no readable fds to run timers */
if (aio_dispatch(ctx)) {
if (ret > 0) { progress = true;
progress |= aio_dispatch_handlers(ctx);
} }
qemu_lockcnt_dec(&ctx->list_lock); aio_context_release(ctx);
progress |= timerlistgroup_run_timers(&ctx->tlg);
return progress; return progress;
} }
void aio_context_setup(AioContext *ctx) void aio_context_setup(AioContext *ctx)
{ {
/* TODO remove this in final patch submission */
if (getenv("QEMU_AIO_POLL_MAX_NS")) {
fprintf(stderr, "The QEMU_AIO_POLL_MAX_NS environment variable has "
"been replaced with -object iothread,poll-max-ns=NUM\n");
exit(1);
}
#ifdef CONFIG_EPOLL_CREATE1 #ifdef CONFIG_EPOLL_CREATE1
assert(!ctx->epollfd); assert(!ctx->epollfd);
ctx->epollfd = epoll_create1(EPOLL_CLOEXEC); ctx->epollfd = epoll_create1(EPOLL_CLOEXEC);
@@ -712,17 +498,3 @@ void aio_context_setup(AioContext *ctx)
} }
#endif #endif
} }
void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
int64_t grow, int64_t shrink, Error **errp)
{
/* No thread synchronization here, it doesn't matter if an incorrect value
* is used once.
*/
ctx->poll_max_ns = max_ns;
ctx->poll_ns = 0;
ctx->poll_grow = grow;
ctx->poll_shrink = shrink;
aio_notify(ctx);
}

View File

@@ -20,8 +20,6 @@
#include "block/block.h" #include "block/block.h"
#include "qemu/queue.h" #include "qemu/queue.h"
#include "qemu/sockets.h" #include "qemu/sockets.h"
#include "qapi/error.h"
#include "qemu/rcu_queue.h"
struct AioHandler { struct AioHandler {
EventNotifier *e; EventNotifier *e;
@@ -40,13 +38,11 @@ void aio_set_fd_handler(AioContext *ctx,
bool is_external, bool is_external,
IOHandler *io_read, IOHandler *io_read,
IOHandler *io_write, IOHandler *io_write,
AioPollFn *io_poll,
void *opaque) void *opaque)
{ {
/* fd is a SOCKET in our case */ /* fd is a SOCKET in our case */
AioHandler *node; AioHandler *node;
qemu_lockcnt_lock(&ctx->list_lock);
QLIST_FOREACH(node, &ctx->aio_handlers, node) { QLIST_FOREACH(node, &ctx->aio_handlers, node) {
if (node->pfd.fd == fd && !node->deleted) { if (node->pfd.fd == fd && !node->deleted) {
break; break;
@@ -56,14 +52,14 @@ void aio_set_fd_handler(AioContext *ctx,
/* Are we deleting the fd handler? */ /* Are we deleting the fd handler? */
if (!io_read && !io_write) { if (!io_read && !io_write) {
if (node) { if (node) {
/* If aio_poll is in progress, just mark the node as deleted */ /* If the lock is held, just mark the node as deleted */
if (qemu_lockcnt_count(&ctx->list_lock)) { if (ctx->walking_handlers) {
node->deleted = 1; node->deleted = 1;
node->pfd.revents = 0; node->pfd.revents = 0;
} else { } else {
/* Otherwise, delete it for real. We can't just mark it as /* Otherwise, delete it for real. We can't just mark it as
* deleted because deleted nodes are only cleaned up after * deleted because deleted nodes are only cleaned up after
* releasing the list_lock. * releasing the walking_handlers lock.
*/ */
QLIST_REMOVE(node, node); QLIST_REMOVE(node, node);
g_free(node); g_free(node);
@@ -76,7 +72,7 @@ void aio_set_fd_handler(AioContext *ctx,
/* Alloc and insert if it's not already there */ /* Alloc and insert if it's not already there */
node = g_new0(AioHandler, 1); node = g_new0(AioHandler, 1);
node->pfd.fd = fd; node->pfd.fd = fd;
QLIST_INSERT_HEAD_RCU(&ctx->aio_handlers, node, node); QLIST_INSERT_HEAD(&ctx->aio_handlers, node, node);
} }
node->pfd.events = 0; node->pfd.events = 0;
@@ -101,26 +97,16 @@ void aio_set_fd_handler(AioContext *ctx,
FD_CONNECT | FD_WRITE | FD_OOB); FD_CONNECT | FD_WRITE | FD_OOB);
} }
qemu_lockcnt_unlock(&ctx->list_lock);
aio_notify(ctx); aio_notify(ctx);
} }
void aio_set_fd_poll(AioContext *ctx, int fd,
IOHandler *io_poll_begin,
IOHandler *io_poll_end)
{
/* Not implemented */
}
void aio_set_event_notifier(AioContext *ctx, void aio_set_event_notifier(AioContext *ctx,
EventNotifier *e, EventNotifier *e,
bool is_external, bool is_external,
EventNotifierHandler *io_notify, EventNotifierHandler *io_notify)
AioPollFn *io_poll)
{ {
AioHandler *node; AioHandler *node;
qemu_lockcnt_lock(&ctx->list_lock);
QLIST_FOREACH(node, &ctx->aio_handlers, node) { QLIST_FOREACH(node, &ctx->aio_handlers, node) {
if (node->e == e && !node->deleted) { if (node->e == e && !node->deleted) {
break; break;
@@ -132,14 +118,14 @@ void aio_set_event_notifier(AioContext *ctx,
if (node) { if (node) {
g_source_remove_poll(&ctx->source, &node->pfd); g_source_remove_poll(&ctx->source, &node->pfd);
/* aio_poll is in progress, just mark the node as deleted */ /* If the lock is held, just mark the node as deleted */
if (qemu_lockcnt_count(&ctx->list_lock)) { if (ctx->walking_handlers) {
node->deleted = 1; node->deleted = 1;
node->pfd.revents = 0; node->pfd.revents = 0;
} else { } else {
/* Otherwise, delete it for real. We can't just mark it as /* Otherwise, delete it for real. We can't just mark it as
* deleted because deleted nodes are only cleaned up after * deleted because deleted nodes are only cleaned up after
* releasing the list_lock. * releasing the walking_handlers lock.
*/ */
QLIST_REMOVE(node, node); QLIST_REMOVE(node, node);
g_free(node); g_free(node);
@@ -153,7 +139,7 @@ void aio_set_event_notifier(AioContext *ctx,
node->pfd.fd = (uintptr_t)event_notifier_get_handle(e); node->pfd.fd = (uintptr_t)event_notifier_get_handle(e);
node->pfd.events = G_IO_IN; node->pfd.events = G_IO_IN;
node->is_external = is_external; node->is_external = is_external;
QLIST_INSERT_HEAD_RCU(&ctx->aio_handlers, node, node); QLIST_INSERT_HEAD(&ctx->aio_handlers, node, node);
g_source_add_poll(&ctx->source, &node->pfd); g_source_add_poll(&ctx->source, &node->pfd);
} }
@@ -161,18 +147,9 @@ void aio_set_event_notifier(AioContext *ctx,
node->io_notify = io_notify; node->io_notify = io_notify;
} }
qemu_lockcnt_unlock(&ctx->list_lock);
aio_notify(ctx); aio_notify(ctx);
} }
void aio_set_event_notifier_poll(AioContext *ctx,
EventNotifier *notifier,
EventNotifierHandler *io_poll_begin,
EventNotifierHandler *io_poll_end)
{
/* Not implemented */
}
bool aio_prepare(AioContext *ctx) bool aio_prepare(AioContext *ctx)
{ {
static struct timeval tv0; static struct timeval tv0;
@@ -180,16 +157,10 @@ bool aio_prepare(AioContext *ctx)
bool have_select_revents = false; bool have_select_revents = false;
fd_set rfds, wfds; fd_set rfds, wfds;
/*
* We have to walk very carefully in case aio_set_fd_handler is
* called while we're walking.
*/
qemu_lockcnt_inc(&ctx->list_lock);
/* fill fd sets */ /* fill fd sets */
FD_ZERO(&rfds); FD_ZERO(&rfds);
FD_ZERO(&wfds); FD_ZERO(&wfds);
QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) { QLIST_FOREACH(node, &ctx->aio_handlers, node) {
if (node->io_read) { if (node->io_read) {
FD_SET ((SOCKET)node->pfd.fd, &rfds); FD_SET ((SOCKET)node->pfd.fd, &rfds);
} }
@@ -199,7 +170,7 @@ bool aio_prepare(AioContext *ctx)
} }
if (select(0, &rfds, &wfds, NULL, &tv0) > 0) { if (select(0, &rfds, &wfds, NULL, &tv0) > 0) {
QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) { QLIST_FOREACH(node, &ctx->aio_handlers, node) {
node->pfd.revents = 0; node->pfd.revents = 0;
if (FD_ISSET(node->pfd.fd, &rfds)) { if (FD_ISSET(node->pfd.fd, &rfds)) {
node->pfd.revents |= G_IO_IN; node->pfd.revents |= G_IO_IN;
@@ -213,53 +184,45 @@ bool aio_prepare(AioContext *ctx)
} }
} }
qemu_lockcnt_dec(&ctx->list_lock);
return have_select_revents; return have_select_revents;
} }
bool aio_pending(AioContext *ctx) bool aio_pending(AioContext *ctx)
{ {
AioHandler *node; AioHandler *node;
bool result = false;
/* QLIST_FOREACH(node, &ctx->aio_handlers, node) {
* We have to walk very carefully in case aio_set_fd_handler is
* called while we're walking.
*/
qemu_lockcnt_inc(&ctx->list_lock);
QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) {
if (node->pfd.revents && node->io_notify) { if (node->pfd.revents && node->io_notify) {
result = true; return true;
break;
} }
if ((node->pfd.revents & G_IO_IN) && node->io_read) { if ((node->pfd.revents & G_IO_IN) && node->io_read) {
result = true; return true;
break;
} }
if ((node->pfd.revents & G_IO_OUT) && node->io_write) { if ((node->pfd.revents & G_IO_OUT) && node->io_write) {
result = true; return true;
break;
} }
} }
qemu_lockcnt_dec(&ctx->list_lock); return false;
return result;
} }
static bool aio_dispatch_handlers(AioContext *ctx, HANDLE event) static bool aio_dispatch_handlers(AioContext *ctx, HANDLE event)
{ {
AioHandler *node; AioHandler *node;
bool progress = false; bool progress = false;
AioHandler *tmp;
/* /*
* We have to walk very carefully in case aio_set_fd_handler is * We have to walk very carefully in case aio_set_fd_handler is
* called while we're walking. * called while we're walking.
*/ */
QLIST_FOREACH_SAFE_RCU(node, &ctx->aio_handlers, node, tmp) { node = QLIST_FIRST(&ctx->aio_handlers);
while (node) {
AioHandler *tmp;
int revents = node->pfd.revents; int revents = node->pfd.revents;
ctx->walking_handlers++;
if (!node->deleted && if (!node->deleted &&
(revents || event_notifier_get_handle(node->e) == event) && (revents || event_notifier_get_handle(node->e) == event) &&
node->io_notify) { node->io_notify) {
@@ -294,25 +257,28 @@ static bool aio_dispatch_handlers(AioContext *ctx, HANDLE event)
} }
} }
if (node->deleted) { tmp = node;
if (qemu_lockcnt_dec_if_lock(&ctx->list_lock)) { node = QLIST_NEXT(node, node);
QLIST_REMOVE(node, node);
g_free(node); ctx->walking_handlers--;
qemu_lockcnt_inc_and_unlock(&ctx->list_lock);
} if (!ctx->walking_handlers && tmp->deleted) {
QLIST_REMOVE(tmp, node);
g_free(tmp);
} }
} }
return progress; return progress;
} }
void aio_dispatch(AioContext *ctx) bool aio_dispatch(AioContext *ctx)
{ {
qemu_lockcnt_inc(&ctx->list_lock); bool progress;
aio_bh_poll(ctx);
aio_dispatch_handlers(ctx, INVALID_HANDLE_VALUE); progress = aio_bh_poll(ctx);
qemu_lockcnt_dec(&ctx->list_lock); progress |= aio_dispatch_handlers(ctx, INVALID_HANDLE_VALUE);
timerlistgroup_run_timers(&ctx->tlg); progress |= timerlistgroup_run_timers(&ctx->tlg);
return progress;
} }
bool aio_poll(AioContext *ctx, bool blocking) bool aio_poll(AioContext *ctx, bool blocking)
@@ -323,6 +289,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
int count; int count;
int timeout; int timeout;
aio_context_acquire(ctx);
progress = false; progress = false;
/* aio_notify can avoid the expensive event_notifier_set if /* aio_notify can avoid the expensive event_notifier_set if
@@ -336,18 +303,20 @@ bool aio_poll(AioContext *ctx, bool blocking)
atomic_add(&ctx->notify_me, 2); atomic_add(&ctx->notify_me, 2);
} }
qemu_lockcnt_inc(&ctx->list_lock);
have_select_revents = aio_prepare(ctx); have_select_revents = aio_prepare(ctx);
ctx->walking_handlers++;
/* fill fd sets */ /* fill fd sets */
count = 0; count = 0;
QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) { QLIST_FOREACH(node, &ctx->aio_handlers, node) {
if (!node->deleted && node->io_notify if (!node->deleted && node->io_notify
&& aio_node_check(ctx, node->is_external)) { && aio_node_check(ctx, node->is_external)) {
events[count++] = event_notifier_get_handle(node->e); events[count++] = event_notifier_get_handle(node->e);
} }
} }
ctx->walking_handlers--;
first = true; first = true;
/* ctx->notifier is always registered. */ /* ctx->notifier is always registered. */
@@ -363,11 +332,17 @@ bool aio_poll(AioContext *ctx, bool blocking)
timeout = blocking && !have_select_revents timeout = blocking && !have_select_revents
? qemu_timeout_ns_to_ms(aio_compute_timeout(ctx)) : 0; ? qemu_timeout_ns_to_ms(aio_compute_timeout(ctx)) : 0;
if (timeout) {
aio_context_release(ctx);
}
ret = WaitForMultipleObjects(count, events, FALSE, timeout); ret = WaitForMultipleObjects(count, events, FALSE, timeout);
if (blocking) { if (blocking) {
assert(first); assert(first);
atomic_sub(&ctx->notify_me, 2); atomic_sub(&ctx->notify_me, 2);
} }
if (timeout) {
aio_context_acquire(ctx);
}
if (first) { if (first) {
aio_notify_accept(ctx); aio_notify_accept(ctx);
@@ -390,18 +365,12 @@ bool aio_poll(AioContext *ctx, bool blocking)
progress |= aio_dispatch_handlers(ctx, event); progress |= aio_dispatch_handlers(ctx, event);
} while (count > 0); } while (count > 0);
qemu_lockcnt_dec(&ctx->list_lock);
progress |= timerlistgroup_run_timers(&ctx->tlg); progress |= timerlistgroup_run_timers(&ctx->tlg);
aio_context_release(ctx);
return progress; return progress;
} }
void aio_context_setup(AioContext *ctx) void aio_context_setup(AioContext *ctx)
{ {
} }
void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
int64_t grow, int64_t shrink, Error **errp)
{
error_setg(errp, "AioContext polling is not implemented on Windows");
}

View File

@@ -27,7 +27,8 @@
#include "sysemu/sysemu.h" #include "sysemu/sysemu.h"
#include "sysemu/arch_init.h" #include "sysemu/arch_init.h"
#include "hw/pci/pci.h" #include "hw/pci/pci.h"
#include "hw/audio/soundhw.h" #include "hw/audio/audio.h"
#include "hw/smbios/smbios.h"
#include "qemu/config-file.h" #include "qemu/config-file.h"
#include "qemu/error-report.h" #include "qemu/error-report.h"
#include "qmp-commands.h" #include "qmp-commands.h"
@@ -63,8 +64,6 @@ int graphic_depth = 32;
#define QEMU_ARCH QEMU_ARCH_MIPS #define QEMU_ARCH QEMU_ARCH_MIPS
#elif defined(TARGET_MOXIE) #elif defined(TARGET_MOXIE)
#define QEMU_ARCH QEMU_ARCH_MOXIE #define QEMU_ARCH QEMU_ARCH_MOXIE
#elif defined(TARGET_NIOS2)
#define QEMU_ARCH QEMU_ARCH_NIOS2
#elif defined(TARGET_OPENRISC) #elif defined(TARGET_OPENRISC)
#define QEMU_ARCH QEMU_ARCH_OPENRISC #define QEMU_ARCH QEMU_ARCH_OPENRISC
#elif defined(TARGET_PPC) #elif defined(TARGET_PPC)
@@ -85,6 +84,196 @@ int graphic_depth = 32;
const uint32_t arch_type = QEMU_ARCH; const uint32_t arch_type = QEMU_ARCH;
static struct defconfig_file {
const char *filename;
/* Indicates it is an user config file (disabled by -no-user-config) */
bool userconfig;
} default_config_files[] = {
{ CONFIG_QEMU_CONFDIR "/qemu.conf", true },
{ NULL }, /* end of list */
};
int qemu_read_default_config_files(bool userconfig)
{
int ret;
struct defconfig_file *f;
for (f = default_config_files; f->filename; f++) {
if (!userconfig && f->userconfig) {
continue;
}
ret = qemu_read_config_file(f->filename);
if (ret < 0 && ret != -ENOENT) {
return ret;
}
}
return 0;
}
struct soundhw {
const char *name;
const char *descr;
int enabled;
int isa;
union {
int (*init_isa) (ISABus *bus);
int (*init_pci) (PCIBus *bus);
} init;
};
static struct soundhw soundhw[9];
static int soundhw_count;
void isa_register_soundhw(const char *name, const char *descr,
int (*init_isa)(ISABus *bus))
{
assert(soundhw_count < ARRAY_SIZE(soundhw) - 1);
soundhw[soundhw_count].name = name;
soundhw[soundhw_count].descr = descr;
soundhw[soundhw_count].isa = 1;
soundhw[soundhw_count].init.init_isa = init_isa;
soundhw_count++;
}
void pci_register_soundhw(const char *name, const char *descr,
int (*init_pci)(PCIBus *bus))
{
assert(soundhw_count < ARRAY_SIZE(soundhw) - 1);
soundhw[soundhw_count].name = name;
soundhw[soundhw_count].descr = descr;
soundhw[soundhw_count].isa = 0;
soundhw[soundhw_count].init.init_pci = init_pci;
soundhw_count++;
}
void select_soundhw(const char *optarg)
{
struct soundhw *c;
if (is_help_option(optarg)) {
show_valid_cards:
if (soundhw_count) {
printf("Valid sound card names (comma separated):\n");
for (c = soundhw; c->name; ++c) {
printf ("%-11s %s\n", c->name, c->descr);
}
printf("\n-soundhw all will enable all of the above\n");
} else {
printf("Machine has no user-selectable audio hardware "
"(it may or may not have always-present audio hardware).\n");
}
exit(!is_help_option(optarg));
}
else {
size_t l;
const char *p;
char *e;
int bad_card = 0;
if (!strcmp(optarg, "all")) {
for (c = soundhw; c->name; ++c) {
c->enabled = 1;
}
return;
}
p = optarg;
while (*p) {
e = strchr(p, ',');
l = !e ? strlen(p) : (size_t) (e - p);
for (c = soundhw; c->name; ++c) {
if (!strncmp(c->name, p, l) && !c->name[l]) {
c->enabled = 1;
break;
}
}
if (!c->name) {
if (l > 80) {
error_report("Unknown sound card name (too big to show)");
}
else {
error_report("Unknown sound card name `%.*s'",
(int) l, p);
}
bad_card = 1;
}
p += l + (e != NULL);
}
if (bad_card) {
goto show_valid_cards;
}
}
}
void audio_init(void)
{
struct soundhw *c;
ISABus *isa_bus = (ISABus *) object_resolve_path_type("", TYPE_ISA_BUS, NULL);
PCIBus *pci_bus = (PCIBus *) object_resolve_path_type("", TYPE_PCI_BUS, NULL);
for (c = soundhw; c->name; ++c) {
if (c->enabled) {
if (c->isa) {
if (!isa_bus) {
error_report("ISA bus not available for %s", c->name);
exit(1);
}
c->init.init_isa(isa_bus);
} else {
if (!pci_bus) {
error_report("PCI bus not available for %s", c->name);
exit(1);
}
c->init.init_pci(pci_bus);
}
}
}
}
int qemu_uuid_parse(const char *str, uint8_t *uuid)
{
int ret;
if (strlen(str) != 36) {
return -1;
}
ret = sscanf(str, UUID_FMT, &uuid[0], &uuid[1], &uuid[2], &uuid[3],
&uuid[4], &uuid[5], &uuid[6], &uuid[7], &uuid[8], &uuid[9],
&uuid[10], &uuid[11], &uuid[12], &uuid[13], &uuid[14],
&uuid[15]);
if (ret != 16) {
return -1;
}
return 0;
}
void do_acpitable_option(const QemuOpts *opts)
{
#ifdef TARGET_I386
Error *err = NULL;
acpi_table_add(opts, &err);
if (err) {
error_reportf_err(err, "Wrong acpi table provided: ");
exit(1);
}
#endif
}
void do_smbios_option(QemuOpts *opts)
{
#ifdef TARGET_I386
smbios_entry_add(opts);
#endif
}
int kvm_available(void) int kvm_available(void)
{ {
#ifdef CONFIG_KVM #ifdef CONFIG_KVM

View File

@@ -1,8 +1,7 @@
/* /*
* Data plane event loop * QEMU System Emulator
* *
* Copyright (c) 2003-2008 Fabrice Bellard * Copyright (c) 2003-2008 Fabrice Bellard
* Copyright (c) 2009-2017 QEMU contributors
* *
* Permission is hereby granted, free of charge, to any person obtaining a copy * Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal * of this software and associated documentation files (the "Software"), to deal
@@ -31,8 +30,6 @@
#include "qemu/main-loop.h" #include "qemu/main-loop.h"
#include "qemu/atomic.h" #include "qemu/atomic.h"
#include "block/raw-aio.h" #include "block/raw-aio.h"
#include "qemu/coroutine_int.h"
#include "trace.h"
/***********************************************************/ /***********************************************************/
/* bottom halves (can be seen as timers which expire ASAP) */ /* bottom halves (can be seen as timers which expire ASAP) */
@@ -47,26 +44,6 @@ struct QEMUBH {
bool deleted; bool deleted;
}; };
void aio_bh_schedule_oneshot(AioContext *ctx, QEMUBHFunc *cb, void *opaque)
{
QEMUBH *bh;
bh = g_new(QEMUBH, 1);
*bh = (QEMUBH){
.ctx = ctx,
.cb = cb,
.opaque = opaque,
};
qemu_lockcnt_lock(&ctx->list_lock);
bh->next = ctx->first_bh;
bh->scheduled = 1;
bh->deleted = 1;
/* Make sure that the members are ready before putting bh into list */
smp_wmb();
ctx->first_bh = bh;
qemu_lockcnt_unlock(&ctx->list_lock);
aio_notify(ctx);
}
QEMUBH *aio_bh_new(AioContext *ctx, QEMUBHFunc *cb, void *opaque) QEMUBH *aio_bh_new(AioContext *ctx, QEMUBHFunc *cb, void *opaque)
{ {
QEMUBH *bh; QEMUBH *bh;
@@ -76,12 +53,12 @@ QEMUBH *aio_bh_new(AioContext *ctx, QEMUBHFunc *cb, void *opaque)
.cb = cb, .cb = cb,
.opaque = opaque, .opaque = opaque,
}; };
qemu_lockcnt_lock(&ctx->list_lock); qemu_mutex_lock(&ctx->bh_lock);
bh->next = ctx->first_bh; bh->next = ctx->first_bh;
/* Make sure that the members are ready before putting bh into list */ /* Make sure that the members are ready before putting bh into list */
smp_wmb(); smp_wmb();
ctx->first_bh = bh; ctx->first_bh = bh;
qemu_lockcnt_unlock(&ctx->list_lock); qemu_mutex_unlock(&ctx->bh_lock);
return bh; return bh;
} }
@@ -90,56 +67,53 @@ void aio_bh_call(QEMUBH *bh)
bh->cb(bh->opaque); bh->cb(bh->opaque);
} }
/* Multiple occurrences of aio_bh_poll cannot be called concurrently. /* Multiple occurrences of aio_bh_poll cannot be called concurrently */
* The count in ctx->list_lock is incremented before the call, and is
* not affected by the call.
*/
int aio_bh_poll(AioContext *ctx) int aio_bh_poll(AioContext *ctx)
{ {
QEMUBH *bh, **bhp, *next; QEMUBH *bh, **bhp, *next;
int ret; int ret;
bool deleted = false;
ctx->walking_bh++;
ret = 0; ret = 0;
for (bh = atomic_rcu_read(&ctx->first_bh); bh; bh = next) { for (bh = ctx->first_bh; bh; bh = next) {
next = atomic_rcu_read(&bh->next); /* Make sure that fetching bh happens before accessing its members */
smp_read_barrier_depends();
next = bh->next;
/* The atomic_xchg is paired with the one in qemu_bh_schedule. The /* The atomic_xchg is paired with the one in qemu_bh_schedule. The
* implicit memory barrier ensures that the callback sees all writes * implicit memory barrier ensures that the callback sees all writes
* done by the scheduling thread. It also ensures that the scheduling * done by the scheduling thread. It also ensures that the scheduling
* thread sees the zero before bh->cb has run, and thus will call * thread sees the zero before bh->cb has run, and thus will call
* aio_notify again if necessary. * aio_notify again if necessary.
*/ */
if (atomic_xchg(&bh->scheduled, 0)) { if (!bh->deleted && atomic_xchg(&bh->scheduled, 0)) {
/* Idle BHs don't count as progress */ /* Idle BHs and the notify BH don't count as progress */
if (!bh->idle) { if (!bh->idle && bh != ctx->notify_dummy_bh) {
ret = 1; ret = 1;
} }
bh->idle = 0; bh->idle = 0;
aio_bh_call(bh); aio_bh_call(bh);
} }
if (bh->deleted) {
deleted = true;
}
} }
ctx->walking_bh--;
/* remove deleted bhs */ /* remove deleted bhs */
if (!deleted) { if (!ctx->walking_bh) {
return ret; qemu_mutex_lock(&ctx->bh_lock);
}
if (qemu_lockcnt_dec_if_lock(&ctx->list_lock)) {
bhp = &ctx->first_bh; bhp = &ctx->first_bh;
while (*bhp) { while (*bhp) {
bh = *bhp; bh = *bhp;
if (bh->deleted && !bh->scheduled) { if (bh->deleted) {
*bhp = bh->next; *bhp = bh->next;
g_free(bh); g_free(bh);
} else { } else {
bhp = &bh->next; bhp = &bh->next;
} }
} }
qemu_lockcnt_inc_and_unlock(&ctx->list_lock); qemu_mutex_unlock(&ctx->bh_lock);
} }
return ret; return ret;
} }
@@ -193,9 +167,8 @@ aio_compute_timeout(AioContext *ctx)
int timeout = -1; int timeout = -1;
QEMUBH *bh; QEMUBH *bh;
for (bh = atomic_rcu_read(&ctx->first_bh); bh; for (bh = ctx->first_bh; bh; bh = bh->next) {
bh = atomic_rcu_read(&bh->next)) { if (!bh->deleted && bh->scheduled) {
if (bh->scheduled) {
if (bh->idle) { if (bh->idle) {
/* idle bottom halves will be polled at least /* idle bottom halves will be polled at least
* every 10ms */ * every 10ms */
@@ -243,7 +216,7 @@ aio_ctx_check(GSource *source)
aio_notify_accept(ctx); aio_notify_accept(ctx);
for (bh = ctx->first_bh; bh; bh = bh->next) { for (bh = ctx->first_bh; bh; bh = bh->next) {
if (bh->scheduled) { if (!bh->deleted && bh->scheduled) {
return true; return true;
} }
} }
@@ -267,6 +240,7 @@ aio_ctx_finalize(GSource *source)
{ {
AioContext *ctx = (AioContext *) source; AioContext *ctx = (AioContext *) source;
qemu_bh_delete(ctx->notify_dummy_bh);
thread_pool_free(ctx->thread_pool); thread_pool_free(ctx->thread_pool);
#ifdef CONFIG_LINUX_AIO #ifdef CONFIG_LINUX_AIO
@@ -277,11 +251,7 @@ aio_ctx_finalize(GSource *source)
} }
#endif #endif
assert(QSLIST_EMPTY(&ctx->scheduled_coroutines)); qemu_mutex_lock(&ctx->bh_lock);
qemu_bh_delete(ctx->co_schedule_bh);
qemu_lockcnt_lock(&ctx->list_lock);
assert(!qemu_lockcnt_count(&ctx->list_lock));
while (ctx->first_bh) { while (ctx->first_bh) {
QEMUBH *next = ctx->first_bh->next; QEMUBH *next = ctx->first_bh->next;
@@ -291,12 +261,12 @@ aio_ctx_finalize(GSource *source)
g_free(ctx->first_bh); g_free(ctx->first_bh);
ctx->first_bh = next; ctx->first_bh = next;
} }
qemu_lockcnt_unlock(&ctx->list_lock); qemu_mutex_unlock(&ctx->bh_lock);
aio_set_event_notifier(ctx, &ctx->notifier, false, NULL, NULL); aio_set_event_notifier(ctx, &ctx->notifier, false, NULL);
event_notifier_cleanup(&ctx->notifier); event_notifier_cleanup(&ctx->notifier);
qemu_rec_mutex_destroy(&ctx->lock); rfifolock_destroy(&ctx->lock);
qemu_lockcnt_destroy(&ctx->list_lock); qemu_mutex_destroy(&ctx->bh_lock);
timerlistgroup_deinit(&ctx->tlg); timerlistgroup_deinit(&ctx->tlg);
} }
@@ -351,46 +321,26 @@ void aio_notify_accept(AioContext *ctx)
} }
} }
static void aio_timerlist_notify(void *opaque, QEMUClockType type) static void aio_timerlist_notify(void *opaque)
{ {
aio_notify(opaque); aio_notify(opaque);
} }
static void event_notifier_dummy_cb(EventNotifier *e) static void aio_rfifolock_cb(void *opaque)
{
}
/* Returns true if aio_notify() was called (e.g. a BH was scheduled) */
static bool event_notifier_poll(void *opaque)
{
EventNotifier *e = opaque;
AioContext *ctx = container_of(e, AioContext, notifier);
return atomic_read(&ctx->notified);
}
static void co_schedule_bh_cb(void *opaque)
{ {
AioContext *ctx = opaque; AioContext *ctx = opaque;
QSLIST_HEAD(, Coroutine) straight, reversed;
QSLIST_MOVE_ATOMIC(&reversed, &ctx->scheduled_coroutines); /* Kick owner thread in case they are blocked in aio_poll() */
QSLIST_INIT(&straight); qemu_bh_schedule(ctx->notify_dummy_bh);
while (!QSLIST_EMPTY(&reversed)) {
Coroutine *co = QSLIST_FIRST(&reversed);
QSLIST_REMOVE_HEAD(&reversed, co_scheduled_next);
QSLIST_INSERT_HEAD(&straight, co, co_scheduled_next);
} }
while (!QSLIST_EMPTY(&straight)) { static void notify_dummy_bh(void *opaque)
Coroutine *co = QSLIST_FIRST(&straight); {
QSLIST_REMOVE_HEAD(&straight, co_scheduled_next); /* Do nothing, we were invoked just to force the event loop to iterate */
trace_aio_co_schedule_bh_cb(ctx, co);
aio_context_acquire(ctx);
qemu_coroutine_enter(co);
aio_context_release(ctx);
} }
static void event_notifier_dummy_cb(EventNotifier *e)
{
} }
AioContext *aio_context_new(Error **errp) AioContext *aio_context_new(Error **errp)
@@ -407,27 +357,19 @@ AioContext *aio_context_new(Error **errp)
goto fail; goto fail;
} }
g_source_set_can_recurse(&ctx->source, true); g_source_set_can_recurse(&ctx->source, true);
qemu_lockcnt_init(&ctx->list_lock);
ctx->co_schedule_bh = aio_bh_new(ctx, co_schedule_bh_cb, ctx);
QSLIST_INIT(&ctx->scheduled_coroutines);
aio_set_event_notifier(ctx, &ctx->notifier, aio_set_event_notifier(ctx, &ctx->notifier,
false, false,
(EventNotifierHandler *) (EventNotifierHandler *)
event_notifier_dummy_cb, event_notifier_dummy_cb);
event_notifier_poll);
#ifdef CONFIG_LINUX_AIO #ifdef CONFIG_LINUX_AIO
ctx->linux_aio = NULL; ctx->linux_aio = NULL;
#endif #endif
ctx->thread_pool = NULL; ctx->thread_pool = NULL;
qemu_rec_mutex_init(&ctx->lock); qemu_mutex_init(&ctx->bh_lock);
rfifolock_init(&ctx->lock, aio_rfifolock_cb, ctx);
timerlistgroup_init(&ctx->tlg, aio_timerlist_notify, ctx); timerlistgroup_init(&ctx->tlg, aio_timerlist_notify, ctx);
ctx->poll_ns = 0; ctx->notify_dummy_bh = aio_bh_new(ctx, notify_dummy_bh, NULL);
ctx->poll_max_ns = 0;
ctx->poll_grow = 0;
ctx->poll_shrink = 0;
return ctx; return ctx;
fail: fail:
@@ -435,45 +377,6 @@ fail:
return NULL; return NULL;
} }
void aio_co_schedule(AioContext *ctx, Coroutine *co)
{
trace_aio_co_schedule(ctx, co);
QSLIST_INSERT_HEAD_ATOMIC(&ctx->scheduled_coroutines,
co, co_scheduled_next);
qemu_bh_schedule(ctx->co_schedule_bh);
}
void aio_co_wake(struct Coroutine *co)
{
AioContext *ctx;
/* Read coroutine before co->ctx. Matches smp_wmb in
* qemu_coroutine_enter.
*/
smp_read_barrier_depends();
ctx = atomic_read(&co->ctx);
aio_co_enter(ctx, co);
}
void aio_co_enter(AioContext *ctx, struct Coroutine *co)
{
if (ctx != qemu_get_current_aio_context()) {
aio_co_schedule(ctx, co);
return;
}
if (qemu_in_coroutine()) {
Coroutine *self = qemu_coroutine_self();
assert(self != co);
QSIMPLEQ_INSERT_TAIL(&self->co_queue_wakeup, co, co_queue_next);
} else {
aio_context_acquire(ctx);
qemu_aio_coroutine_enter(ctx, co);
aio_context_release(ctx);
}
}
void aio_context_ref(AioContext *ctx) void aio_context_ref(AioContext *ctx)
{ {
g_source_ref(&ctx->source); g_source_ref(&ctx->source);
@@ -486,10 +389,10 @@ void aio_context_unref(AioContext *ctx)
void aio_context_acquire(AioContext *ctx) void aio_context_acquire(AioContext *ctx)
{ {
qemu_rec_mutex_lock(&ctx->lock); rfifolock_lock(&ctx->lock);
} }
void aio_context_release(AioContext *ctx) void aio_context_release(AioContext *ctx)
{ {
qemu_rec_mutex_unlock(&ctx->lock); rfifolock_unlock(&ctx->lock);
} }

View File

@@ -1,215 +0,0 @@
/*
* Atomic helper templates
* Included from tcg-runtime.c and cputlb.c.
*
* Copyright (c) 2016 Red Hat, Inc
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
* License as published by the Free Software Foundation; either
* version 2 of the License, or (at your option) any later version.
*
* This library is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* Lesser General Public License for more details.
*
* You should have received a copy of the GNU Lesser General Public
* License along with this library; if not, see <http://www.gnu.org/licenses/>.
*/
#if DATA_SIZE == 16
# define SUFFIX o
# define DATA_TYPE Int128
# define BSWAP bswap128
#elif DATA_SIZE == 8
# define SUFFIX q
# define DATA_TYPE uint64_t
# define BSWAP bswap64
#elif DATA_SIZE == 4
# define SUFFIX l
# define DATA_TYPE uint32_t
# define BSWAP bswap32
#elif DATA_SIZE == 2
# define SUFFIX w
# define DATA_TYPE uint16_t
# define BSWAP bswap16
#elif DATA_SIZE == 1
# define SUFFIX b
# define DATA_TYPE uint8_t
# define BSWAP
#else
# error unsupported data size
#endif
#if DATA_SIZE >= 4
# define ABI_TYPE DATA_TYPE
#else
# define ABI_TYPE uint32_t
#endif
/* Define host-endian atomic operations. Note that END is used within
the ATOMIC_NAME macro, and redefined below. */
#if DATA_SIZE == 1
# define END
#elif defined(HOST_WORDS_BIGENDIAN)
# define END _be
#else
# define END _le
#endif
ABI_TYPE ATOMIC_NAME(cmpxchg)(CPUArchState *env, target_ulong addr,
ABI_TYPE cmpv, ABI_TYPE newv EXTRA_ARGS)
{
DATA_TYPE *haddr = ATOMIC_MMU_LOOKUP;
return atomic_cmpxchg__nocheck(haddr, cmpv, newv);
}
#if DATA_SIZE >= 16
ABI_TYPE ATOMIC_NAME(ld)(CPUArchState *env, target_ulong addr EXTRA_ARGS)
{
DATA_TYPE val, *haddr = ATOMIC_MMU_LOOKUP;
__atomic_load(haddr, &val, __ATOMIC_RELAXED);
return val;
}
void ATOMIC_NAME(st)(CPUArchState *env, target_ulong addr,
ABI_TYPE val EXTRA_ARGS)
{
DATA_TYPE *haddr = ATOMIC_MMU_LOOKUP;
__atomic_store(haddr, &val, __ATOMIC_RELAXED);
}
#else
ABI_TYPE ATOMIC_NAME(xchg)(CPUArchState *env, target_ulong addr,
ABI_TYPE val EXTRA_ARGS)
{
DATA_TYPE *haddr = ATOMIC_MMU_LOOKUP;
return atomic_xchg__nocheck(haddr, val);
}
#define GEN_ATOMIC_HELPER(X) \
ABI_TYPE ATOMIC_NAME(X)(CPUArchState *env, target_ulong addr, \
ABI_TYPE val EXTRA_ARGS) \
{ \
DATA_TYPE *haddr = ATOMIC_MMU_LOOKUP; \
return atomic_##X(haddr, val); \
} \
GEN_ATOMIC_HELPER(fetch_add)
GEN_ATOMIC_HELPER(fetch_and)
GEN_ATOMIC_HELPER(fetch_or)
GEN_ATOMIC_HELPER(fetch_xor)
GEN_ATOMIC_HELPER(add_fetch)
GEN_ATOMIC_HELPER(and_fetch)
GEN_ATOMIC_HELPER(or_fetch)
GEN_ATOMIC_HELPER(xor_fetch)
#undef GEN_ATOMIC_HELPER
#endif /* DATA SIZE >= 16 */
#undef END
#if DATA_SIZE > 1
/* Define reverse-host-endian atomic operations. Note that END is used
within the ATOMIC_NAME macro. */
#ifdef HOST_WORDS_BIGENDIAN
# define END _le
#else
# define END _be
#endif
ABI_TYPE ATOMIC_NAME(cmpxchg)(CPUArchState *env, target_ulong addr,
ABI_TYPE cmpv, ABI_TYPE newv EXTRA_ARGS)
{
DATA_TYPE *haddr = ATOMIC_MMU_LOOKUP;
return BSWAP(atomic_cmpxchg__nocheck(haddr, BSWAP(cmpv), BSWAP(newv)));
}
#if DATA_SIZE >= 16
ABI_TYPE ATOMIC_NAME(ld)(CPUArchState *env, target_ulong addr EXTRA_ARGS)
{
DATA_TYPE val, *haddr = ATOMIC_MMU_LOOKUP;
__atomic_load(haddr, &val, __ATOMIC_RELAXED);
return BSWAP(val);
}
void ATOMIC_NAME(st)(CPUArchState *env, target_ulong addr,
ABI_TYPE val EXTRA_ARGS)
{
DATA_TYPE *haddr = ATOMIC_MMU_LOOKUP;
val = BSWAP(val);
__atomic_store(haddr, &val, __ATOMIC_RELAXED);
}
#else
ABI_TYPE ATOMIC_NAME(xchg)(CPUArchState *env, target_ulong addr,
ABI_TYPE val EXTRA_ARGS)
{
DATA_TYPE *haddr = ATOMIC_MMU_LOOKUP;
return BSWAP(atomic_xchg__nocheck(haddr, BSWAP(val)));
}
#define GEN_ATOMIC_HELPER(X) \
ABI_TYPE ATOMIC_NAME(X)(CPUArchState *env, target_ulong addr, \
ABI_TYPE val EXTRA_ARGS) \
{ \
DATA_TYPE *haddr = ATOMIC_MMU_LOOKUP; \
return BSWAP(atomic_##X(haddr, BSWAP(val))); \
}
GEN_ATOMIC_HELPER(fetch_and)
GEN_ATOMIC_HELPER(fetch_or)
GEN_ATOMIC_HELPER(fetch_xor)
GEN_ATOMIC_HELPER(and_fetch)
GEN_ATOMIC_HELPER(or_fetch)
GEN_ATOMIC_HELPER(xor_fetch)
#undef GEN_ATOMIC_HELPER
/* Note that for addition, we need to use a separate cmpxchg loop instead
of bswaps for the reverse-host-endian helpers. */
ABI_TYPE ATOMIC_NAME(fetch_add)(CPUArchState *env, target_ulong addr,
ABI_TYPE val EXTRA_ARGS)
{
DATA_TYPE *haddr = ATOMIC_MMU_LOOKUP;
DATA_TYPE ldo, ldn, ret, sto;
ldo = atomic_read__nocheck(haddr);
while (1) {
ret = BSWAP(ldo);
sto = BSWAP(ret + val);
ldn = atomic_cmpxchg__nocheck(haddr, ldo, sto);
if (ldn == ldo) {
return ret;
}
ldo = ldn;
}
}
ABI_TYPE ATOMIC_NAME(add_fetch)(CPUArchState *env, target_ulong addr,
ABI_TYPE val EXTRA_ARGS)
{
DATA_TYPE *haddr = ATOMIC_MMU_LOOKUP;
DATA_TYPE ldo, ldn, ret, sto;
ldo = atomic_read__nocheck(haddr);
while (1) {
ret = BSWAP(ldo) + val;
sto = BSWAP(ret);
ldn = atomic_cmpxchg__nocheck(haddr, ldo, sto);
if (ldn == ldo) {
return ret;
}
ldo = ldn;
}
}
#endif /* DATA_SIZE >= 16 */
#undef END
#endif /* DATA_SIZE > 1 */
#undef BSWAP
#undef ABI_TYPE
#undef DATA_TYPE
#undef SUFFIX
#undef DATA_SIZE

View File

@@ -28,7 +28,6 @@
#include "qemu/timer.h" #include "qemu/timer.h"
#include "sysemu/sysemu.h" #include "sysemu/sysemu.h"
#include "qemu/cutils.h" #include "qemu/cutils.h"
#include "sysemu/replay.h"
#define AUDIO_CAP "audio" #define AUDIO_CAP "audio"
#include "audio_int.h" #include "audio_int.h"
@@ -1113,7 +1112,7 @@ static int audio_is_timer_needed (void)
static void audio_reset_timer (AudioState *s) static void audio_reset_timer (AudioState *s)
{ {
if (audio_is_timer_needed ()) { if (audio_is_timer_needed ()) {
timer_mod_anticipate_ns(s->ts, timer_mod (s->ts,
qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + conf.period.ticks); qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + conf.period.ticks);
} }
else { else {
@@ -1388,7 +1387,6 @@ static void audio_run_out (AudioState *s)
prev_rpos = hw->rpos; prev_rpos = hw->rpos;
played = hw->pcm_ops->run_out (hw, live); played = hw->pcm_ops->run_out (hw, live);
replay_audio_out(&played);
if (audio_bug (AUDIO_FUNC, hw->rpos >= hw->samples)) { if (audio_bug (AUDIO_FUNC, hw->rpos >= hw->samples)) {
dolog ("hw->rpos=%d hw->samples=%d played=%d\n", dolog ("hw->rpos=%d hw->samples=%d played=%d\n",
hw->rpos, hw->samples, played); hw->rpos, hw->samples, played);
@@ -1452,12 +1450,9 @@ static void audio_run_in (AudioState *s)
while ((hw = audio_pcm_hw_find_any_enabled_in (hw))) { while ((hw = audio_pcm_hw_find_any_enabled_in (hw))) {
SWVoiceIn *sw; SWVoiceIn *sw;
int captured = 0, min; int captured, min;
if (replay_mode != REPLAY_MODE_PLAY) {
captured = hw->pcm_ops->run_in (hw); captured = hw->pcm_ops->run_in (hw);
}
replay_audio_in(&captured, hw->conv_buf, &hw->wpos, hw->samples);
min = audio_pcm_hw_find_min_in (hw); min = audio_pcm_hw_find_min_in (hw);
hw->total_samples_captured += captured - min; hw->total_samples_captured += captured - min;
@@ -2028,8 +2023,6 @@ void AUD_del_capture (CaptureVoiceOut *cap, void *cb_opaque)
sw = sw1; sw = sw1;
} }
QLIST_REMOVE (cap, entries); QLIST_REMOVE (cap, entries);
g_free (cap->hw.mix_buf);
g_free (cap->buf);
g_free (cap); g_free (cap);
} }
return; return;

View File

@@ -166,9 +166,4 @@ int wav_start_capture (CaptureState *s, const char *path, int freq,
bool audio_is_cleaning_up(void); bool audio_is_cleaning_up(void);
void audio_cleanup(void); void audio_cleanup(void);
void audio_sample_to_uint64(void *samples, int pos,
uint64_t *left, uint64_t *right);
void audio_sample_from_uint64(void *samples, int pos,
uint64_t left, uint64_t right);
#endif /* QEMU_AUDIO_H */ #endif /* QEMU_AUDIO_H */

View File

@@ -25,7 +25,6 @@
#include "qemu/osdep.h" #include "qemu/osdep.h"
#include "qemu-common.h" #include "qemu-common.h"
#include "qemu/bswap.h" #include "qemu/bswap.h"
#include "qemu/error-report.h"
#include "audio.h" #include "audio.h"
#define AUDIO_CAP "mixeng" #define AUDIO_CAP "mixeng"
@@ -268,37 +267,6 @@ f_sample *mixeng_clip[2][2][2][3] = {
} }
}; };
void audio_sample_to_uint64(void *samples, int pos,
uint64_t *left, uint64_t *right)
{
struct st_sample *sample = samples;
sample += pos;
#ifdef FLOAT_MIXENG
error_report(
"Coreaudio and floating point samples are not supported by replay yet");
abort();
#else
*left = sample->l;
*right = sample->r;
#endif
}
void audio_sample_from_uint64(void *samples, int pos,
uint64_t left, uint64_t right)
{
struct st_sample *sample = samples;
sample += pos;
#ifdef FLOAT_MIXENG
error_report(
"Coreaudio and floating point samples are not supported by replay yet");
abort();
#else
sample->l = left;
sample->r = right;
#endif
}
/* /*
* August 21, 1998 * August 21, 1998
* Copyright 1998 Fabrice Bellard. * Copyright 1998 Fabrice Bellard.

View File

@@ -38,14 +38,10 @@
#define AUDIO_CAP "sdl" #define AUDIO_CAP "sdl"
#include "audio_int.h" #include "audio_int.h"
#define USE_SEMAPHORE (SDL_MAJOR_VERSION < 2)
typedef struct SDLVoiceOut { typedef struct SDLVoiceOut {
HWVoiceOut hw; HWVoiceOut hw;
int live; int live;
#if USE_SEMAPHORE
int rpos; int rpos;
#endif
int decr; int decr;
} SDLVoiceOut; } SDLVoiceOut;
@@ -57,10 +53,8 @@ static struct {
static struct SDLAudioState { static struct SDLAudioState {
int exit; int exit;
#if USE_SEMAPHORE
SDL_mutex *mutex; SDL_mutex *mutex;
SDL_sem *sem; SDL_sem *sem;
#endif
int initialized; int initialized;
bool driver_created; bool driver_created;
} glob_sdl; } glob_sdl;
@@ -79,45 +73,31 @@ static void GCC_FMT_ATTR (1, 2) sdl_logerr (const char *fmt, ...)
static int sdl_lock (SDLAudioState *s, const char *forfn) static int sdl_lock (SDLAudioState *s, const char *forfn)
{ {
#if USE_SEMAPHORE
if (SDL_LockMutex (s->mutex)) { if (SDL_LockMutex (s->mutex)) {
sdl_logerr ("SDL_LockMutex for %s failed\n", forfn); sdl_logerr ("SDL_LockMutex for %s failed\n", forfn);
return -1; return -1;
} }
#else
SDL_LockAudio();
#endif
return 0; return 0;
} }
static int sdl_unlock (SDLAudioState *s, const char *forfn) static int sdl_unlock (SDLAudioState *s, const char *forfn)
{ {
#if USE_SEMAPHORE
if (SDL_UnlockMutex (s->mutex)) { if (SDL_UnlockMutex (s->mutex)) {
sdl_logerr ("SDL_UnlockMutex for %s failed\n", forfn); sdl_logerr ("SDL_UnlockMutex for %s failed\n", forfn);
return -1; return -1;
} }
#else
SDL_UnlockAudio();
#endif
return 0; return 0;
} }
static int sdl_post (SDLAudioState *s, const char *forfn) static int sdl_post (SDLAudioState *s, const char *forfn)
{ {
#if USE_SEMAPHORE
if (SDL_SemPost (s->sem)) { if (SDL_SemPost (s->sem)) {
sdl_logerr ("SDL_SemPost for %s failed\n", forfn); sdl_logerr ("SDL_SemPost for %s failed\n", forfn);
return -1; return -1;
} }
#endif
return 0; return 0;
} }
#if USE_SEMAPHORE
static int sdl_wait (SDLAudioState *s, const char *forfn) static int sdl_wait (SDLAudioState *s, const char *forfn)
{ {
if (SDL_SemWait (s->sem)) { if (SDL_SemWait (s->sem)) {
@@ -126,7 +106,6 @@ static int sdl_wait (SDLAudioState *s, const char *forfn)
} }
return 0; return 0;
} }
#endif
static int sdl_unlock_and_post (SDLAudioState *s, const char *forfn) static int sdl_unlock_and_post (SDLAudioState *s, const char *forfn)
{ {
@@ -267,7 +246,6 @@ static void sdl_callback (void *opaque, Uint8 *buf, int len)
int to_mix, decr; int to_mix, decr;
/* dolog ("in callback samples=%d\n", samples); */ /* dolog ("in callback samples=%d\n", samples); */
#if USE_SEMAPHORE
sdl_wait (s, "sdl_callback"); sdl_wait (s, "sdl_callback");
if (s->exit) { if (s->exit) {
return; return;
@@ -286,11 +264,6 @@ static void sdl_callback (void *opaque, Uint8 *buf, int len)
if (!sdl->live) { if (!sdl->live) {
goto again; goto again;
} }
#else
if (s->exit || !sdl->live) {
break;
}
#endif
/* dolog ("in callback live=%d\n", live); */ /* dolog ("in callback live=%d\n", live); */
to_mix = audio_MIN (samples, sdl->live); to_mix = audio_MIN (samples, sdl->live);
@@ -301,11 +274,7 @@ static void sdl_callback (void *opaque, Uint8 *buf, int len)
/* dolog ("in callback to_mix %d, chunk %d\n", to_mix, chunk); */ /* dolog ("in callback to_mix %d, chunk %d\n", to_mix, chunk); */
hw->clip (buf, src, chunk); hw->clip (buf, src, chunk);
#if USE_SEMAPHORE
sdl->rpos = (sdl->rpos + chunk) % hw->samples; sdl->rpos = (sdl->rpos + chunk) % hw->samples;
#else
hw->rpos = (hw->rpos + chunk) % hw->samples;
#endif
to_mix -= chunk; to_mix -= chunk;
buf += chunk << hw->info.shift; buf += chunk << hw->info.shift;
} }
@@ -313,21 +282,12 @@ static void sdl_callback (void *opaque, Uint8 *buf, int len)
sdl->live -= decr; sdl->live -= decr;
sdl->decr += decr; sdl->decr += decr;
#if USE_SEMAPHORE
again: again:
if (sdl_unlock (s, "sdl_callback")) { if (sdl_unlock (s, "sdl_callback")) {
return; return;
} }
#endif
} }
/* dolog ("done len=%d\n", len); */ /* dolog ("done len=%d\n", len); */
#if (SDL_MAJOR_VERSION >= 2)
/* SDL2 does not clear the remaining buffer for us, so do it on our own */
if (samples) {
memset(buf, 0, samples << hw->info.shift);
}
#endif
} }
static int sdl_write_out (SWVoiceOut *sw, void *buf, int len) static int sdl_write_out (SWVoiceOut *sw, void *buf, int len)
@@ -355,12 +315,8 @@ static int sdl_run_out (HWVoiceOut *hw, int live)
decr = audio_MIN (sdl->decr, live); decr = audio_MIN (sdl->decr, live);
sdl->decr -= decr; sdl->decr -= decr;
#if USE_SEMAPHORE
sdl->live = live - decr; sdl->live = live - decr;
hw->rpos = sdl->rpos; hw->rpos = sdl->rpos;
#else
sdl->live = live;
#endif
if (sdl->live > 0) { if (sdl->live > 0) {
sdl_unlock_and_post (s, "sdl_run_out"); sdl_unlock_and_post (s, "sdl_run_out");
@@ -449,7 +405,6 @@ static void *sdl_audio_init (void)
return NULL; return NULL;
} }
#if USE_SEMAPHORE
s->mutex = SDL_CreateMutex (); s->mutex = SDL_CreateMutex ();
if (!s->mutex) { if (!s->mutex) {
sdl_logerr ("Failed to create SDL mutex\n"); sdl_logerr ("Failed to create SDL mutex\n");
@@ -464,7 +419,6 @@ static void *sdl_audio_init (void)
SDL_QuitSubSystem (SDL_INIT_AUDIO); SDL_QuitSubSystem (SDL_INIT_AUDIO);
return NULL; return NULL;
} }
#endif
s->driver_created = true; s->driver_created = true;
return s; return s;
@@ -474,10 +428,8 @@ static void sdl_audio_fini (void *opaque)
{ {
SDLAudioState *s = opaque; SDLAudioState *s = opaque;
sdl_close (s); sdl_close (s);
#if USE_SEMAPHORE
SDL_DestroySemaphore (s->sem); SDL_DestroySemaphore (s->sem);
SDL_DestroyMutex (s->mutex); SDL_DestroyMutex (s->mutex);
#endif
SDL_QuitSubSystem (SDL_INIT_AUDIO); SDL_QuitSubSystem (SDL_INIT_AUDIO);
s->driver_created = false; s->driver_created = false;
} }

View File

@@ -88,7 +88,6 @@ static void wav_capture_destroy (void *opaque)
WAVState *wav = opaque; WAVState *wav = opaque;
AUD_del_capture (wav->cap, wav); AUD_del_capture (wav->cap, wav);
g_free (wav);
} }
static void wav_capture_info (void *opaque) static void wav_capture_info (void *opaque)

View File

@@ -1,10 +1,11 @@
common-obj-y += rng.o rng-egd.o common-obj-y += rng.o rng-egd.o
common-obj-$(CONFIG_POSIX) += rng-random.o common-obj-$(CONFIG_POSIX) += rng-random.o
common-obj-y += msmouse.o testdev.o
common-obj-$(CONFIG_BRLAPI) += baum.o
baum.o-cflags := $(SDL_CFLAGS)
common-obj-$(CONFIG_TPM) += tpm.o common-obj-$(CONFIG_TPM) += tpm.o
common-obj-y += hostmem.o hostmem-ram.o common-obj-y += hostmem.o hostmem-ram.o
common-obj-$(CONFIG_LINUX) += hostmem-file.o common-obj-$(CONFIG_LINUX) += hostmem-file.o
common-obj-y += cryptodev.o
common-obj-y += cryptodev-builtin.o

View File

@@ -1,7 +1,7 @@
/* /*
* QEMU Baum Braille Device * QEMU Baum Braille Device
* *
* Copyright (c) 2008, 2010-2011, 2016 Samuel Thibault * Copyright (c) 2008 Samuel Thibault
* *
* Permission is hereby granted, free of charge, to any person obtaining a copy * Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal * of this software and associated documentation files (the "Software"), to deal
@@ -24,13 +24,15 @@
#include "qemu/osdep.h" #include "qemu/osdep.h"
#include "qapi/error.h" #include "qapi/error.h"
#include "qemu-common.h" #include "qemu-common.h"
#include "chardev/char.h" #include "sysemu/char.h"
#include "qemu/timer.h" #include "qemu/timer.h"
#include "hw/usb.h" #include "hw/usb.h"
#include "ui/console.h"
#include <brlapi.h> #include <brlapi.h>
#include <brlapi_constants.h> #include <brlapi_constants.h>
#include <brlapi_keycodes.h> #include <brlapi_keycodes.h>
#ifdef CONFIG_SDL
#include <SDL_syswm.h>
#endif
#if 0 #if 0
#define DPRINTF(fmt, ...) \ #define DPRINTF(fmt, ...) \
@@ -85,12 +87,11 @@
#define BUF_SIZE 256 #define BUF_SIZE 256
typedef struct { typedef struct {
Chardev parent; CharDriverState *chr;
brlapi_handle_t *brlapi; brlapi_handle_t *brlapi;
int brlapi_fd; int brlapi_fd;
unsigned int x, y; unsigned int x, y;
bool deferred_init;
uint8_t in_buf[BUF_SIZE]; uint8_t in_buf[BUF_SIZE];
uint8_t in_buf_used; uint8_t in_buf_used;
@@ -98,17 +99,11 @@ typedef struct {
uint8_t out_buf_used, out_buf_ptr; uint8_t out_buf_used, out_buf_ptr;
QEMUTimer *cellCount_timer; QEMUTimer *cellCount_timer;
} BaumChardev; } BaumDriverState;
#define TYPE_CHARDEV_BRAILLE "chardev-braille"
#define BAUM_CHARDEV(obj) OBJECT_CHECK(BaumChardev, (obj), TYPE_CHARDEV_BRAILLE)
/* Let's assume NABCC by default */ /* Let's assume NABCC by default */
enum way { static const uint8_t nabcc_translation[256] = {
DOTS2ASCII, [0] = ' ',
ASCII2DOTS
};
static const uint8_t nabcc_translation[2][256] = {
#ifndef BRLAPI_DOTS #ifndef BRLAPI_DOTS
#define BRLAPI_DOTS(d1,d2,d3,d4,d5,d6,d7,d8) \ #define BRLAPI_DOTS(d1,d2,d3,d4,d5,d6,d7,d8) \
((d1?BRLAPI_DOT1:0)|\ ((d1?BRLAPI_DOT1:0)|\
@@ -120,145 +115,111 @@ static const uint8_t nabcc_translation[2][256] = {
(d7?BRLAPI_DOT7:0)|\ (d7?BRLAPI_DOT7:0)|\
(d8?BRLAPI_DOT8:0)) (d8?BRLAPI_DOT8:0))
#endif #endif
#define DO(dots, ascii) \ [BRLAPI_DOTS(1,0,0,0,0,0,0,0)] = 'a',
[DOTS2ASCII][dots] = ascii, \ [BRLAPI_DOTS(1,1,0,0,0,0,0,0)] = 'b',
[ASCII2DOTS][ascii] = dots [BRLAPI_DOTS(1,0,0,1,0,0,0,0)] = 'c',
DO(0, ' '), [BRLAPI_DOTS(1,0,0,1,1,0,0,0)] = 'd',
DO(BRLAPI_DOTS(1, 0, 0, 0, 0, 0, 0, 0), 'a'), [BRLAPI_DOTS(1,0,0,0,1,0,0,0)] = 'e',
DO(BRLAPI_DOTS(1, 1, 0, 0, 0, 0, 0, 0), 'b'), [BRLAPI_DOTS(1,1,0,1,0,0,0,0)] = 'f',
DO(BRLAPI_DOTS(1, 0, 0, 1, 0, 0, 0, 0), 'c'), [BRLAPI_DOTS(1,1,0,1,1,0,0,0)] = 'g',
DO(BRLAPI_DOTS(1, 0, 0, 1, 1, 0, 0, 0), 'd'), [BRLAPI_DOTS(1,1,0,0,1,0,0,0)] = 'h',
DO(BRLAPI_DOTS(1, 0, 0, 0, 1, 0, 0, 0), 'e'), [BRLAPI_DOTS(0,1,0,1,0,0,0,0)] = 'i',
DO(BRLAPI_DOTS(1, 1, 0, 1, 0, 0, 0, 0), 'f'), [BRLAPI_DOTS(0,1,0,1,1,0,0,0)] = 'j',
DO(BRLAPI_DOTS(1, 1, 0, 1, 1, 0, 0, 0), 'g'), [BRLAPI_DOTS(1,0,1,0,0,0,0,0)] = 'k',
DO(BRLAPI_DOTS(1, 1, 0, 0, 1, 0, 0, 0), 'h'), [BRLAPI_DOTS(1,1,1,0,0,0,0,0)] = 'l',
DO(BRLAPI_DOTS(0, 1, 0, 1, 0, 0, 0, 0), 'i'), [BRLAPI_DOTS(1,0,1,1,0,0,0,0)] = 'm',
DO(BRLAPI_DOTS(0, 1, 0, 1, 1, 0, 0, 0), 'j'), [BRLAPI_DOTS(1,0,1,1,1,0,0,0)] = 'n',
DO(BRLAPI_DOTS(1, 0, 1, 0, 0, 0, 0, 0), 'k'), [BRLAPI_DOTS(1,0,1,0,1,0,0,0)] = 'o',
DO(BRLAPI_DOTS(1, 1, 1, 0, 0, 0, 0, 0), 'l'), [BRLAPI_DOTS(1,1,1,1,0,0,0,0)] = 'p',
DO(BRLAPI_DOTS(1, 0, 1, 1, 0, 0, 0, 0), 'm'), [BRLAPI_DOTS(1,1,1,1,1,0,0,0)] = 'q',
DO(BRLAPI_DOTS(1, 0, 1, 1, 1, 0, 0, 0), 'n'), [BRLAPI_DOTS(1,1,1,0,1,0,0,0)] = 'r',
DO(BRLAPI_DOTS(1, 0, 1, 0, 1, 0, 0, 0), 'o'), [BRLAPI_DOTS(0,1,1,1,0,0,0,0)] = 's',
DO(BRLAPI_DOTS(1, 1, 1, 1, 0, 0, 0, 0), 'p'), [BRLAPI_DOTS(0,1,1,1,1,0,0,0)] = 't',
DO(BRLAPI_DOTS(1, 1, 1, 1, 1, 0, 0, 0), 'q'), [BRLAPI_DOTS(1,0,1,0,0,1,0,0)] = 'u',
DO(BRLAPI_DOTS(1, 1, 1, 0, 1, 0, 0, 0), 'r'), [BRLAPI_DOTS(1,1,1,0,0,1,0,0)] = 'v',
DO(BRLAPI_DOTS(0, 1, 1, 1, 0, 0, 0, 0), 's'), [BRLAPI_DOTS(0,1,0,1,1,1,0,0)] = 'w',
DO(BRLAPI_DOTS(0, 1, 1, 1, 1, 0, 0, 0), 't'), [BRLAPI_DOTS(1,0,1,1,0,1,0,0)] = 'x',
DO(BRLAPI_DOTS(1, 0, 1, 0, 0, 1, 0, 0), 'u'), [BRLAPI_DOTS(1,0,1,1,1,1,0,0)] = 'y',
DO(BRLAPI_DOTS(1, 1, 1, 0, 0, 1, 0, 0), 'v'), [BRLAPI_DOTS(1,0,1,0,1,1,0,0)] = 'z',
DO(BRLAPI_DOTS(0, 1, 0, 1, 1, 1, 0, 0), 'w'),
DO(BRLAPI_DOTS(1, 0, 1, 1, 0, 1, 0, 0), 'x'),
DO(BRLAPI_DOTS(1, 0, 1, 1, 1, 1, 0, 0), 'y'),
DO(BRLAPI_DOTS(1, 0, 1, 0, 1, 1, 0, 0), 'z'),
DO(BRLAPI_DOTS(1, 0, 0, 0, 0, 0, 1, 0), 'A'), [BRLAPI_DOTS(1,0,0,0,0,0,1,0)] = 'A',
DO(BRLAPI_DOTS(1, 1, 0, 0, 0, 0, 1, 0), 'B'), [BRLAPI_DOTS(1,1,0,0,0,0,1,0)] = 'B',
DO(BRLAPI_DOTS(1, 0, 0, 1, 0, 0, 1, 0), 'C'), [BRLAPI_DOTS(1,0,0,1,0,0,1,0)] = 'C',
DO(BRLAPI_DOTS(1, 0, 0, 1, 1, 0, 1, 0), 'D'), [BRLAPI_DOTS(1,0,0,1,1,0,1,0)] = 'D',
DO(BRLAPI_DOTS(1, 0, 0, 0, 1, 0, 1, 0), 'E'), [BRLAPI_DOTS(1,0,0,0,1,0,1,0)] = 'E',
DO(BRLAPI_DOTS(1, 1, 0, 1, 0, 0, 1, 0), 'F'), [BRLAPI_DOTS(1,1,0,1,0,0,1,0)] = 'F',
DO(BRLAPI_DOTS(1, 1, 0, 1, 1, 0, 1, 0), 'G'), [BRLAPI_DOTS(1,1,0,1,1,0,1,0)] = 'G',
DO(BRLAPI_DOTS(1, 1, 0, 0, 1, 0, 1, 0), 'H'), [BRLAPI_DOTS(1,1,0,0,1,0,1,0)] = 'H',
DO(BRLAPI_DOTS(0, 1, 0, 1, 0, 0, 1, 0), 'I'), [BRLAPI_DOTS(0,1,0,1,0,0,1,0)] = 'I',
DO(BRLAPI_DOTS(0, 1, 0, 1, 1, 0, 1, 0), 'J'), [BRLAPI_DOTS(0,1,0,1,1,0,1,0)] = 'J',
DO(BRLAPI_DOTS(1, 0, 1, 0, 0, 0, 1, 0), 'K'), [BRLAPI_DOTS(1,0,1,0,0,0,1,0)] = 'K',
DO(BRLAPI_DOTS(1, 1, 1, 0, 0, 0, 1, 0), 'L'), [BRLAPI_DOTS(1,1,1,0,0,0,1,0)] = 'L',
DO(BRLAPI_DOTS(1, 0, 1, 1, 0, 0, 1, 0), 'M'), [BRLAPI_DOTS(1,0,1,1,0,0,1,0)] = 'M',
DO(BRLAPI_DOTS(1, 0, 1, 1, 1, 0, 1, 0), 'N'), [BRLAPI_DOTS(1,0,1,1,1,0,1,0)] = 'N',
DO(BRLAPI_DOTS(1, 0, 1, 0, 1, 0, 1, 0), 'O'), [BRLAPI_DOTS(1,0,1,0,1,0,1,0)] = 'O',
DO(BRLAPI_DOTS(1, 1, 1, 1, 0, 0, 1, 0), 'P'), [BRLAPI_DOTS(1,1,1,1,0,0,1,0)] = 'P',
DO(BRLAPI_DOTS(1, 1, 1, 1, 1, 0, 1, 0), 'Q'), [BRLAPI_DOTS(1,1,1,1,1,0,1,0)] = 'Q',
DO(BRLAPI_DOTS(1, 1, 1, 0, 1, 0, 1, 0), 'R'), [BRLAPI_DOTS(1,1,1,0,1,0,1,0)] = 'R',
DO(BRLAPI_DOTS(0, 1, 1, 1, 0, 0, 1, 0), 'S'), [BRLAPI_DOTS(0,1,1,1,0,0,1,0)] = 'S',
DO(BRLAPI_DOTS(0, 1, 1, 1, 1, 0, 1, 0), 'T'), [BRLAPI_DOTS(0,1,1,1,1,0,1,0)] = 'T',
DO(BRLAPI_DOTS(1, 0, 1, 0, 0, 1, 1, 0), 'U'), [BRLAPI_DOTS(1,0,1,0,0,1,1,0)] = 'U',
DO(BRLAPI_DOTS(1, 1, 1, 0, 0, 1, 1, 0), 'V'), [BRLAPI_DOTS(1,1,1,0,0,1,1,0)] = 'V',
DO(BRLAPI_DOTS(0, 1, 0, 1, 1, 1, 1, 0), 'W'), [BRLAPI_DOTS(0,1,0,1,1,1,1,0)] = 'W',
DO(BRLAPI_DOTS(1, 0, 1, 1, 0, 1, 1, 0), 'X'), [BRLAPI_DOTS(1,0,1,1,0,1,1,0)] = 'X',
DO(BRLAPI_DOTS(1, 0, 1, 1, 1, 1, 1, 0), 'Y'), [BRLAPI_DOTS(1,0,1,1,1,1,1,0)] = 'Y',
DO(BRLAPI_DOTS(1, 0, 1, 0, 1, 1, 1, 0), 'Z'), [BRLAPI_DOTS(1,0,1,0,1,1,1,0)] = 'Z',
DO(BRLAPI_DOTS(0, 0, 1, 0, 1, 1, 0, 0), '0'), [BRLAPI_DOTS(0,0,1,0,1,1,0,0)] = '0',
DO(BRLAPI_DOTS(0, 1, 0, 0, 0, 0, 0, 0), '1'), [BRLAPI_DOTS(0,1,0,0,0,0,0,0)] = '1',
DO(BRLAPI_DOTS(0, 1, 1, 0, 0, 0, 0, 0), '2'), [BRLAPI_DOTS(0,1,1,0,0,0,0,0)] = '2',
DO(BRLAPI_DOTS(0, 1, 0, 0, 1, 0, 0, 0), '3'), [BRLAPI_DOTS(0,1,0,0,1,0,0,0)] = '3',
DO(BRLAPI_DOTS(0, 1, 0, 0, 1, 1, 0, 0), '4'), [BRLAPI_DOTS(0,1,0,0,1,1,0,0)] = '4',
DO(BRLAPI_DOTS(0, 1, 0, 0, 0, 1, 0, 0), '5'), [BRLAPI_DOTS(0,1,0,0,0,1,0,0)] = '5',
DO(BRLAPI_DOTS(0, 1, 1, 0, 1, 0, 0, 0), '6'), [BRLAPI_DOTS(0,1,1,0,1,0,0,0)] = '6',
DO(BRLAPI_DOTS(0, 1, 1, 0, 1, 1, 0, 0), '7'), [BRLAPI_DOTS(0,1,1,0,1,1,0,0)] = '7',
DO(BRLAPI_DOTS(0, 1, 1, 0, 0, 1, 0, 0), '8'), [BRLAPI_DOTS(0,1,1,0,0,1,0,0)] = '8',
DO(BRLAPI_DOTS(0, 0, 1, 0, 1, 0, 0, 0), '9'), [BRLAPI_DOTS(0,0,1,0,1,0,0,0)] = '9',
DO(BRLAPI_DOTS(0, 0, 0, 1, 0, 1, 0, 0), '.'), [BRLAPI_DOTS(0,0,0,1,0,1,0,0)] = '.',
DO(BRLAPI_DOTS(0, 0, 1, 1, 0, 1, 0, 0), '+'), [BRLAPI_DOTS(0,0,1,1,0,1,0,0)] = '+',
DO(BRLAPI_DOTS(0, 0, 1, 0, 0, 1, 0, 0), '-'), [BRLAPI_DOTS(0,0,1,0,0,1,0,0)] = '-',
DO(BRLAPI_DOTS(1, 0, 0, 0, 0, 1, 0, 0), '*'), [BRLAPI_DOTS(1,0,0,0,0,1,0,0)] = '*',
DO(BRLAPI_DOTS(0, 0, 1, 1, 0, 0, 0, 0), '/'), [BRLAPI_DOTS(0,0,1,1,0,0,0,0)] = '/',
DO(BRLAPI_DOTS(1, 1, 1, 0, 1, 1, 0, 0), '('), [BRLAPI_DOTS(1,1,1,0,1,1,0,0)] = '(',
DO(BRLAPI_DOTS(0, 1, 1, 1, 1, 1, 0, 0), ')'), [BRLAPI_DOTS(0,1,1,1,1,1,0,0)] = ')',
DO(BRLAPI_DOTS(1, 1, 1, 1, 0, 1, 0, 0), '&'), [BRLAPI_DOTS(1,1,1,1,0,1,0,0)] = '&',
DO(BRLAPI_DOTS(0, 0, 1, 1, 1, 1, 0, 0), '#'), [BRLAPI_DOTS(0,0,1,1,1,1,0,0)] = '#',
DO(BRLAPI_DOTS(0, 0, 0, 0, 0, 1, 0, 0), ','), [BRLAPI_DOTS(0,0,0,0,0,1,0,0)] = ',',
DO(BRLAPI_DOTS(0, 0, 0, 0, 1, 1, 0, 0), ';'), [BRLAPI_DOTS(0,0,0,0,1,1,0,0)] = ';',
DO(BRLAPI_DOTS(1, 0, 0, 0, 1, 1, 0, 0), ':'), [BRLAPI_DOTS(1,0,0,0,1,1,0,0)] = ':',
DO(BRLAPI_DOTS(0, 1, 1, 1, 0, 1, 0, 0), '!'), [BRLAPI_DOTS(0,1,1,1,0,1,0,0)] = '!',
DO(BRLAPI_DOTS(1, 0, 0, 1, 1, 1, 0, 0), '?'), [BRLAPI_DOTS(1,0,0,1,1,1,0,0)] = '?',
DO(BRLAPI_DOTS(0, 0, 0, 0, 1, 0, 0, 0), '"'), [BRLAPI_DOTS(0,0,0,0,1,0,0,0)] = '"',
DO(BRLAPI_DOTS(0, 0, 1, 0, 0, 0, 0, 0), '\''), [BRLAPI_DOTS(0,0,1,0,0,0,0,0)] ='\'',
DO(BRLAPI_DOTS(0, 0, 0, 1, 0, 0, 0, 0), '`'), [BRLAPI_DOTS(0,0,0,1,0,0,0,0)] = '`',
DO(BRLAPI_DOTS(0, 0, 0, 1, 1, 0, 1, 0), '^'), [BRLAPI_DOTS(0,0,0,1,1,0,1,0)] = '^',
DO(BRLAPI_DOTS(0, 0, 0, 1, 1, 0, 0, 0), '~'), [BRLAPI_DOTS(0,0,0,1,1,0,0,0)] = '~',
DO(BRLAPI_DOTS(0, 1, 0, 1, 0, 1, 1, 0), '['), [BRLAPI_DOTS(0,1,0,1,0,1,1,0)] = '[',
DO(BRLAPI_DOTS(1, 1, 0, 1, 1, 1, 1, 0), ']'), [BRLAPI_DOTS(1,1,0,1,1,1,1,0)] = ']',
DO(BRLAPI_DOTS(0, 1, 0, 1, 0, 1, 0, 0), '{'), [BRLAPI_DOTS(0,1,0,1,0,1,0,0)] = '{',
DO(BRLAPI_DOTS(1, 1, 0, 1, 1, 1, 0, 0), '}'), [BRLAPI_DOTS(1,1,0,1,1,1,0,0)] = '}',
DO(BRLAPI_DOTS(1, 1, 1, 1, 1, 1, 0, 0), '='), [BRLAPI_DOTS(1,1,1,1,1,1,0,0)] = '=',
DO(BRLAPI_DOTS(1, 1, 0, 0, 0, 1, 0, 0), '<'), [BRLAPI_DOTS(1,1,0,0,0,1,0,0)] = '<',
DO(BRLAPI_DOTS(0, 0, 1, 1, 1, 0, 0, 0), '>'), [BRLAPI_DOTS(0,0,1,1,1,0,0,0)] = '>',
DO(BRLAPI_DOTS(1, 1, 0, 1, 0, 1, 0, 0), '$'), [BRLAPI_DOTS(1,1,0,1,0,1,0,0)] = '$',
DO(BRLAPI_DOTS(1, 0, 0, 1, 0, 1, 0, 0), '%'), [BRLAPI_DOTS(1,0,0,1,0,1,0,0)] = '%',
DO(BRLAPI_DOTS(0, 0, 0, 1, 0, 0, 1, 0), '@'), [BRLAPI_DOTS(0,0,0,1,0,0,1,0)] = '@',
DO(BRLAPI_DOTS(1, 1, 0, 0, 1, 1, 0, 0), '|'), [BRLAPI_DOTS(1,1,0,0,1,1,0,0)] = '|',
DO(BRLAPI_DOTS(1, 1, 0, 0, 1, 1, 1, 0), '\\'), [BRLAPI_DOTS(1,1,0,0,1,1,1,0)] ='\\',
DO(BRLAPI_DOTS(0, 0, 0, 1, 1, 1, 0, 0), '_'), [BRLAPI_DOTS(0,0,0,1,1,1,0,0)] = '_',
}; };
/* The guest OS has started discussing with us, finish initializing BrlAPI */
static int baum_deferred_init(BaumChardev *baum)
{
int tty = BRLAPI_TTY_DEFAULT;
QemuConsole *con;
if (baum->deferred_init) {
return 1;
}
if (brlapi__getDisplaySize(baum->brlapi, &baum->x, &baum->y) == -1) {
brlapi_perror("baum: brlapi__getDisplaySize");
return 0;
}
con = qemu_console_lookup_by_index(0);
if (con && qemu_console_is_graphic(con)) {
tty = qemu_console_get_window_id(con);
if (tty == -1)
tty = BRLAPI_TTY_DEFAULT;
}
if (brlapi__enterTtyMode(baum->brlapi, tty, NULL) == -1) {
brlapi_perror("baum: brlapi__enterTtyMode");
return 0;
}
baum->deferred_init = 1;
return 1;
}
/* The serial port can receive more of our data */ /* The serial port can receive more of our data */
static void baum_chr_accept_input(struct Chardev *chr) static void baum_accept_input(struct CharDriverState *chr)
{ {
BaumChardev *baum = BAUM_CHARDEV(chr); BaumDriverState *baum = chr->opaque;
int room, first; int room, first;
if (!baum->out_buf_used) if (!baum->out_buf_used)
@@ -282,25 +243,24 @@ static void baum_chr_accept_input(struct Chardev *chr)
} }
/* We want to send a packet */ /* We want to send a packet */
static void baum_write_packet(BaumChardev *baum, const uint8_t *buf, int len) static void baum_write_packet(BaumDriverState *baum, const uint8_t *buf, int len)
{ {
Chardev *chr = CHARDEV(baum);
uint8_t io_buf[1 + 2 * len], *cur = io_buf; uint8_t io_buf[1 + 2 * len], *cur = io_buf;
int room; int room;
*cur++ = ESC; *cur++ = ESC;
while (len--) while (len--)
if ((*cur++ = *buf++) == ESC) if ((*cur++ = *buf++) == ESC)
*cur++ = ESC; *cur++ = ESC;
room = qemu_chr_be_can_write(chr); room = qemu_chr_be_can_write(baum->chr);
len = cur - io_buf; len = cur - io_buf;
if (len <= room) { if (len <= room) {
/* Fits */ /* Fits */
qemu_chr_be_write(chr, io_buf, len); qemu_chr_be_write(baum->chr, io_buf, len);
} else { } else {
int first; int first;
uint8_t out; uint8_t out;
/* Can't fit all, send what can be, and store the rest. */ /* Can't fit all, send what can be, and store the rest. */
qemu_chr_be_write(chr, io_buf, room); qemu_chr_be_write(baum->chr, io_buf, room);
len -= room; len -= room;
cur = io_buf + room; cur = io_buf + room;
if (len > BUF_SIZE - baum->out_buf_used) { if (len > BUF_SIZE - baum->out_buf_used) {
@@ -325,14 +285,14 @@ static void baum_write_packet(BaumChardev *baum, const uint8_t *buf, int len)
/* Called when the other end seems to have a wrong idea of our display size */ /* Called when the other end seems to have a wrong idea of our display size */
static void baum_cellCount_timer_cb(void *opaque) static void baum_cellCount_timer_cb(void *opaque)
{ {
BaumChardev *baum = BAUM_CHARDEV(opaque); BaumDriverState *baum = opaque;
uint8_t cell_count[] = { BAUM_RSP_CellCount, baum->x * baum->y }; uint8_t cell_count[] = { BAUM_RSP_CellCount, baum->x * baum->y };
DPRINTF("Timeout waiting for DisplayData, sending cell count\n"); DPRINTF("Timeout waiting for DisplayData, sending cell count\n");
baum_write_packet(baum, cell_count, sizeof(cell_count)); baum_write_packet(baum, cell_count, sizeof(cell_count));
} }
/* Try to interpret a whole incoming packet */ /* Try to interpret a whole incoming packet */
static int baum_eat_packet(BaumChardev *baum, const uint8_t *buf, int len) static int baum_eat_packet(BaumDriverState *baum, const uint8_t *buf, int len)
{ {
const uint8_t *cur = buf; const uint8_t *cur = buf;
uint8_t req = 0; uint8_t req = 0;
@@ -386,10 +346,8 @@ static int baum_eat_packet(BaumChardev *baum, const uint8_t *buf, int len)
cursor = i + 1; cursor = i + 1;
c &= ~(BRLAPI_DOT7|BRLAPI_DOT8); c &= ~(BRLAPI_DOT7|BRLAPI_DOT8);
} }
c = nabcc_translation[DOTS2ASCII][c]; if (!(c = nabcc_translation[c]))
if (!c) {
c = '?'; c = '?';
}
text[i] = c; text[i] = c;
} }
timer_del(baum->cellCount_timer); timer_del(baum->cellCount_timer);
@@ -473,17 +431,15 @@ static int baum_eat_packet(BaumChardev *baum, const uint8_t *buf, int len)
} }
/* The other end is writing some data. Store it and try to interpret */ /* The other end is writing some data. Store it and try to interpret */
static int baum_chr_write(Chardev *chr, const uint8_t *buf, int len) static int baum_write(CharDriverState *chr, const uint8_t *buf, int len)
{ {
BaumChardev *baum = BAUM_CHARDEV(chr); BaumDriverState *baum = chr->opaque;
int tocopy, cur, eaten, orig_len = len; int tocopy, cur, eaten, orig_len = len;
if (!len) if (!len)
return 0; return 0;
if (!baum->brlapi) if (!baum->brlapi)
return len; return len;
if (!baum_deferred_init(baum))
return len;
while (len) { while (len) {
/* Complete our buffer as much as possible */ /* Complete our buffer as much as possible */
@@ -514,31 +470,20 @@ static int baum_chr_write(Chardev *chr, const uint8_t *buf, int len)
} }
/* Send the key code to the other end */ /* Send the key code to the other end */
static void baum_send_key(BaumChardev *baum, uint8_t type, uint8_t value) static void baum_send_key(BaumDriverState *baum, uint8_t type, uint8_t value) {
{
uint8_t packet[] = { type, value }; uint8_t packet[] = { type, value };
DPRINTF("writing key %x %x\n", type, value); DPRINTF("writing key %x %x\n", type, value);
baum_write_packet(baum, packet, sizeof(packet)); baum_write_packet(baum, packet, sizeof(packet));
} }
static void baum_send_key2(BaumChardev *baum, uint8_t type, uint8_t value,
uint8_t value2)
{
uint8_t packet[] = { type, value, value2 };
DPRINTF("writing key %x %x\n", type, value);
baum_write_packet(baum, packet, sizeof(packet));
}
/* We got some data on the BrlAPI socket */ /* We got some data on the BrlAPI socket */
static void baum_chr_read(void *opaque) static void baum_chr_read(void *opaque)
{ {
BaumChardev *baum = BAUM_CHARDEV(opaque); BaumDriverState *baum = opaque;
brlapi_keyCode_t code; brlapi_keyCode_t code;
int ret; int ret;
if (!baum->brlapi) if (!baum->brlapi)
return; return;
if (!baum_deferred_init(baum))
return;
while ((ret = brlapi__readKey(baum->brlapi, 0, &code)) == 1) { while ((ret = brlapi__readKey(baum->brlapi, 0, &code)) == 1) {
DPRINTF("got key %"BRLAPI_PRIxKEYCODE"\n", code); DPRINTF("got key %"BRLAPI_PRIxKEYCODE"\n", code);
/* Emulate */ /* Emulate */
@@ -595,19 +540,9 @@ static void baum_chr_read(void *opaque)
} }
break; break;
case BRLAPI_KEY_TYPE_SYM: case BRLAPI_KEY_TYPE_SYM:
{
brlapi_keyCode_t keysym = code & BRLAPI_KEY_CODE_MASK;
if (keysym < 0x100) {
uint8_t dots = nabcc_translation[ASCII2DOTS][keysym];
if (dots) {
baum_send_key2(baum, BAUM_RSP_EntryKeys, 0, dots);
baum_send_key2(baum, BAUM_RSP_EntryKeys, 0, 0);
}
}
break; break;
} }
} }
}
if (ret == -1 && (brlapi_errno != BRLAPI_ERROR_LIBCERR || errno != EINTR)) { if (ret == -1 && (brlapi_errno != BRLAPI_ERROR_LIBCERR || errno != EINTR)) {
brlapi_perror("baum: brlapi_readKey"); brlapi_perror("baum: brlapi_readKey");
brlapi__closeConnection(baum->brlapi); brlapi__closeConnection(baum->brlapi);
@@ -616,24 +551,45 @@ static void baum_chr_read(void *opaque)
} }
} }
static void char_braille_finalize(Object *obj) static void baum_close(struct CharDriverState *chr)
{ {
BaumChardev *baum = BAUM_CHARDEV(obj); BaumDriverState *baum = chr->opaque;
timer_free(baum->cellCount_timer); timer_free(baum->cellCount_timer);
if (baum->brlapi) { if (baum->brlapi) {
brlapi__closeConnection(baum->brlapi); brlapi__closeConnection(baum->brlapi);
g_free(baum->brlapi); g_free(baum->brlapi);
} }
g_free(baum);
} }
static void baum_chr_open(Chardev *chr, static CharDriverState *chr_baum_init(const char *id,
ChardevBackend *backend, ChardevBackend *backend,
bool *be_opened, ChardevReturn *ret,
Error **errp) Error **errp)
{ {
BaumChardev *baum = BAUM_CHARDEV(chr); ChardevCommon *common = backend->u.braille.data;
BaumDriverState *baum;
CharDriverState *chr;
brlapi_handle_t *handle; brlapi_handle_t *handle;
#if defined(CONFIG_SDL)
#if SDL_COMPILEDVERSION < SDL_VERSIONNUM(2, 0, 0)
SDL_SysWMinfo info;
#endif
#endif
int tty;
chr = qemu_chr_alloc(common, errp);
if (!chr) {
return NULL;
}
baum = g_malloc0(sizeof(BaumDriverState));
baum->chr = chr;
chr->opaque = baum;
chr->chr_write = baum_write;
chr->chr_accept_input = baum_accept_input;
chr->chr_close = baum_close;
handle = g_malloc0(brlapi_getHandleSize()); handle = g_malloc0(brlapi_getHandleSize());
baum->brlapi = handle; baum->brlapi = handle;
@@ -642,36 +598,52 @@ static void baum_chr_open(Chardev *chr,
if (baum->brlapi_fd == -1) { if (baum->brlapi_fd == -1) {
error_setg(errp, "brlapi__openConnection: %s", error_setg(errp, "brlapi__openConnection: %s",
brlapi_strerror(brlapi_error_location())); brlapi_strerror(brlapi_error_location()));
g_free(handle); goto fail_handle;
return;
} }
baum->deferred_init = 0;
baum->cellCount_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, baum_cellCount_timer_cb, baum); baum->cellCount_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, baum_cellCount_timer_cb, baum);
if (brlapi__getDisplaySize(handle, &baum->x, &baum->y) == -1) {
error_setg(errp, "brlapi__getDisplaySize: %s",
brlapi_strerror(brlapi_error_location()));
goto fail;
}
#if defined(CONFIG_SDL)
#if SDL_COMPILEDVERSION < SDL_VERSIONNUM(2, 0, 0)
memset(&info, 0, sizeof(info));
SDL_VERSION(&info.version);
if (SDL_GetWMInfo(&info))
tty = info.info.x11.wmwindow;
else
#endif
#endif
tty = BRLAPI_TTY_DEFAULT;
if (brlapi__enterTtyMode(handle, tty, NULL) == -1) {
error_setg(errp, "brlapi__enterTtyMode: %s",
brlapi_strerror(brlapi_error_location()));
goto fail;
}
qemu_set_fd_handler(baum->brlapi_fd, baum_chr_read, NULL, baum); qemu_set_fd_handler(baum->brlapi_fd, baum_chr_read, NULL, baum);
return chr;
fail:
timer_free(baum->cellCount_timer);
brlapi__closeConnection(handle);
fail_handle:
g_free(handle);
g_free(chr);
g_free(baum);
return NULL;
} }
static void char_braille_class_init(ObjectClass *oc, void *data)
{
ChardevClass *cc = CHARDEV_CLASS(oc);
cc->open = baum_chr_open;
cc->chr_write = baum_chr_write;
cc->chr_accept_input = baum_chr_accept_input;
}
static const TypeInfo char_braille_type_info = {
.name = TYPE_CHARDEV_BRAILLE,
.parent = TYPE_CHARDEV,
.instance_size = sizeof(BaumChardev),
.instance_finalize = char_braille_finalize,
.class_init = char_braille_class_init,
};
static void register_types(void) static void register_types(void)
{ {
type_register_static(&char_braille_type_info); register_char_driver("braille", CHARDEV_BACKEND_KIND_BRAILLE, NULL,
chr_baum_init);
} }
type_init(register_types); type_init(register_types);

View File

@@ -1,400 +0,0 @@
/*
* QEMU Cryptodev backend for QEMU cipher APIs
*
* Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
*
* Authors:
* Gonglei <arei.gonglei@huawei.com>
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
* License as published by the Free Software Foundation; either
* version 2 of the License, or (at your option) any later version.
*
* This library is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* Lesser General Public License for more details.
*
* You should have received a copy of the GNU Lesser General Public
* License along with this library; if not, see <http://www.gnu.org/licenses/>.
*
*/
#include "qemu/osdep.h"
#include "sysemu/cryptodev.h"
#include "hw/boards.h"
#include "qapi/error.h"
#include "standard-headers/linux/virtio_crypto.h"
#include "crypto/cipher.h"
/**
* @TYPE_CRYPTODEV_BACKEND_BUILTIN:
* name of backend that uses QEMU cipher API
*/
#define TYPE_CRYPTODEV_BACKEND_BUILTIN "cryptodev-backend-builtin"
#define CRYPTODEV_BACKEND_BUILTIN(obj) \
OBJECT_CHECK(CryptoDevBackendBuiltin, \
(obj), TYPE_CRYPTODEV_BACKEND_BUILTIN)
typedef struct CryptoDevBackendBuiltin
CryptoDevBackendBuiltin;
typedef struct CryptoDevBackendBuiltinSession {
QCryptoCipher *cipher;
uint8_t direction; /* encryption or decryption */
uint8_t type; /* cipher? hash? aead? */
QTAILQ_ENTRY(CryptoDevBackendBuiltinSession) next;
} CryptoDevBackendBuiltinSession;
/* Max number of symmetric sessions */
#define MAX_NUM_SESSIONS 256
#define CRYPTODEV_BUITLIN_MAX_AUTH_KEY_LEN 512
#define CRYPTODEV_BUITLIN_MAX_CIPHER_KEY_LEN 64
struct CryptoDevBackendBuiltin {
CryptoDevBackend parent_obj;
CryptoDevBackendBuiltinSession *sessions[MAX_NUM_SESSIONS];
};
static void cryptodev_builtin_init(
CryptoDevBackend *backend, Error **errp)
{
/* Only support one queue */
int queues = backend->conf.peers.queues;
CryptoDevBackendClient *cc;
if (queues != 1) {
error_setg(errp,
"Only support one queue in cryptdov-builtin backend");
return;
}
cc = cryptodev_backend_new_client(
"cryptodev-builtin", NULL);
cc->info_str = g_strdup_printf("cryptodev-builtin0");
cc->queue_index = 0;
backend->conf.peers.ccs[0] = cc;
backend->conf.crypto_services =
1u << VIRTIO_CRYPTO_SERVICE_CIPHER |
1u << VIRTIO_CRYPTO_SERVICE_HASH |
1u << VIRTIO_CRYPTO_SERVICE_MAC;
backend->conf.cipher_algo_l = 1u << VIRTIO_CRYPTO_CIPHER_AES_CBC;
backend->conf.hash_algo = 1u << VIRTIO_CRYPTO_HASH_SHA1;
/*
* Set the Maximum length of crypto request.
* Why this value? Just avoid to overflow when
* memory allocation for each crypto request.
*/
backend->conf.max_size = LONG_MAX - sizeof(CryptoDevBackendSymOpInfo);
backend->conf.max_cipher_key_len = CRYPTODEV_BUITLIN_MAX_CIPHER_KEY_LEN;
backend->conf.max_auth_key_len = CRYPTODEV_BUITLIN_MAX_AUTH_KEY_LEN;
cryptodev_backend_set_ready(backend, true);
}
static int
cryptodev_builtin_get_unused_session_index(
CryptoDevBackendBuiltin *builtin)
{
size_t i;
for (i = 0; i < MAX_NUM_SESSIONS; i++) {
if (builtin->sessions[i] == NULL) {
return i;
}
}
return -1;
}
#define AES_KEYSIZE_128 16
#define AES_KEYSIZE_192 24
#define AES_KEYSIZE_256 32
#define AES_KEYSIZE_128_XTS AES_KEYSIZE_256
#define AES_KEYSIZE_256_XTS 64
static int
cryptodev_builtin_get_aes_algo(uint32_t key_len, int mode, Error **errp)
{
int algo;
if (key_len == AES_KEYSIZE_128) {
algo = QCRYPTO_CIPHER_ALG_AES_128;
} else if (key_len == AES_KEYSIZE_192) {
algo = QCRYPTO_CIPHER_ALG_AES_192;
} else if (key_len == AES_KEYSIZE_256) { /* equals AES_KEYSIZE_128_XTS */
if (mode == QCRYPTO_CIPHER_MODE_XTS) {
algo = QCRYPTO_CIPHER_ALG_AES_128;
} else {
algo = QCRYPTO_CIPHER_ALG_AES_256;
}
} else if (key_len == AES_KEYSIZE_256_XTS) {
if (mode == QCRYPTO_CIPHER_MODE_XTS) {
algo = QCRYPTO_CIPHER_ALG_AES_256;
} else {
goto err;
}
} else {
goto err;
}
return algo;
err:
error_setg(errp, "Unsupported key length :%u", key_len);
return -1;
}
static int cryptodev_builtin_create_cipher_session(
CryptoDevBackendBuiltin *builtin,
CryptoDevBackendSymSessionInfo *sess_info,
Error **errp)
{
int algo;
int mode;
QCryptoCipher *cipher;
int index;
CryptoDevBackendBuiltinSession *sess;
if (sess_info->op_type != VIRTIO_CRYPTO_SYM_OP_CIPHER) {
error_setg(errp, "Unsupported optype :%u", sess_info->op_type);
return -1;
}
index = cryptodev_builtin_get_unused_session_index(builtin);
if (index < 0) {
error_setg(errp, "Total number of sessions created exceeds %u",
MAX_NUM_SESSIONS);
return -1;
}
switch (sess_info->cipher_alg) {
case VIRTIO_CRYPTO_CIPHER_AES_ECB:
mode = QCRYPTO_CIPHER_MODE_ECB;
algo = cryptodev_builtin_get_aes_algo(sess_info->key_len,
mode, errp);
if (algo < 0) {
return -1;
}
break;
case VIRTIO_CRYPTO_CIPHER_AES_CBC:
mode = QCRYPTO_CIPHER_MODE_CBC;
algo = cryptodev_builtin_get_aes_algo(sess_info->key_len,
mode, errp);
if (algo < 0) {
return -1;
}
break;
case VIRTIO_CRYPTO_CIPHER_AES_CTR:
mode = QCRYPTO_CIPHER_MODE_CTR;
algo = cryptodev_builtin_get_aes_algo(sess_info->key_len,
mode, errp);
if (algo < 0) {
return -1;
}
break;
case VIRTIO_CRYPTO_CIPHER_AES_XTS:
mode = QCRYPTO_CIPHER_MODE_XTS;
algo = cryptodev_builtin_get_aes_algo(sess_info->key_len,
mode, errp);
if (algo < 0) {
return -1;
}
break;
case VIRTIO_CRYPTO_CIPHER_3DES_ECB:
mode = QCRYPTO_CIPHER_MODE_ECB;
algo = QCRYPTO_CIPHER_ALG_3DES;
break;
case VIRTIO_CRYPTO_CIPHER_3DES_CBC:
mode = QCRYPTO_CIPHER_MODE_CBC;
algo = QCRYPTO_CIPHER_ALG_3DES;
break;
case VIRTIO_CRYPTO_CIPHER_3DES_CTR:
mode = QCRYPTO_CIPHER_MODE_CTR;
algo = QCRYPTO_CIPHER_ALG_3DES;
break;
default:
error_setg(errp, "Unsupported cipher alg :%u",
sess_info->cipher_alg);
return -1;
}
cipher = qcrypto_cipher_new(algo, mode,
sess_info->cipher_key,
sess_info->key_len,
errp);
if (!cipher) {
return -1;
}
sess = g_new0(CryptoDevBackendBuiltinSession, 1);
sess->cipher = cipher;
sess->direction = sess_info->direction;
sess->type = sess_info->op_type;
builtin->sessions[index] = sess;
return index;
}
static int64_t cryptodev_builtin_sym_create_session(
CryptoDevBackend *backend,
CryptoDevBackendSymSessionInfo *sess_info,
uint32_t queue_index, Error **errp)
{
CryptoDevBackendBuiltin *builtin =
CRYPTODEV_BACKEND_BUILTIN(backend);
int64_t session_id = -1;
int ret;
switch (sess_info->op_code) {
case VIRTIO_CRYPTO_CIPHER_CREATE_SESSION:
ret = cryptodev_builtin_create_cipher_session(
builtin, sess_info, errp);
if (ret < 0) {
return ret;
} else {
session_id = ret;
}
break;
case VIRTIO_CRYPTO_HASH_CREATE_SESSION:
case VIRTIO_CRYPTO_MAC_CREATE_SESSION:
default:
error_setg(errp, "Unsupported opcode :%" PRIu32 "",
sess_info->op_code);
return -1;
}
return session_id;
}
static int cryptodev_builtin_sym_close_session(
CryptoDevBackend *backend,
uint64_t session_id,
uint32_t queue_index, Error **errp)
{
CryptoDevBackendBuiltin *builtin =
CRYPTODEV_BACKEND_BUILTIN(backend);
if (session_id >= MAX_NUM_SESSIONS ||
builtin->sessions[session_id] == NULL) {
error_setg(errp, "Cannot find a valid session id: %" PRIu64 "",
session_id);
return -1;
}
qcrypto_cipher_free(builtin->sessions[session_id]->cipher);
g_free(builtin->sessions[session_id]);
builtin->sessions[session_id] = NULL;
return 0;
}
static int cryptodev_builtin_sym_operation(
CryptoDevBackend *backend,
CryptoDevBackendSymOpInfo *op_info,
uint32_t queue_index, Error **errp)
{
CryptoDevBackendBuiltin *builtin =
CRYPTODEV_BACKEND_BUILTIN(backend);
CryptoDevBackendBuiltinSession *sess;
int ret;
if (op_info->session_id >= MAX_NUM_SESSIONS ||
builtin->sessions[op_info->session_id] == NULL) {
error_setg(errp, "Cannot find a valid session id: %" PRIu64 "",
op_info->session_id);
return -VIRTIO_CRYPTO_INVSESS;
}
if (op_info->op_type == VIRTIO_CRYPTO_SYM_OP_ALGORITHM_CHAINING) {
error_setg(errp,
"Algorithm chain is unsupported for cryptdoev-builtin");
return -VIRTIO_CRYPTO_NOTSUPP;
}
sess = builtin->sessions[op_info->session_id];
if (op_info->iv_len > 0) {
ret = qcrypto_cipher_setiv(sess->cipher, op_info->iv,
op_info->iv_len, errp);
if (ret < 0) {
return -VIRTIO_CRYPTO_ERR;
}
}
if (sess->direction == VIRTIO_CRYPTO_OP_ENCRYPT) {
ret = qcrypto_cipher_encrypt(sess->cipher, op_info->src,
op_info->dst, op_info->src_len, errp);
if (ret < 0) {
return -VIRTIO_CRYPTO_ERR;
}
} else {
ret = qcrypto_cipher_decrypt(sess->cipher, op_info->src,
op_info->dst, op_info->src_len, errp);
if (ret < 0) {
return -VIRTIO_CRYPTO_ERR;
}
}
return VIRTIO_CRYPTO_OK;
}
static void cryptodev_builtin_cleanup(
CryptoDevBackend *backend,
Error **errp)
{
CryptoDevBackendBuiltin *builtin =
CRYPTODEV_BACKEND_BUILTIN(backend);
size_t i;
int queues = backend->conf.peers.queues;
CryptoDevBackendClient *cc;
for (i = 0; i < MAX_NUM_SESSIONS; i++) {
if (builtin->sessions[i] != NULL) {
cryptodev_builtin_sym_close_session(
backend, i, 0, errp);
}
}
for (i = 0; i < queues; i++) {
cc = backend->conf.peers.ccs[i];
if (cc) {
cryptodev_backend_free_client(cc);
backend->conf.peers.ccs[i] = NULL;
}
}
cryptodev_backend_set_ready(backend, false);
}
static void
cryptodev_builtin_class_init(ObjectClass *oc, void *data)
{
CryptoDevBackendClass *bc = CRYPTODEV_BACKEND_CLASS(oc);
bc->init = cryptodev_builtin_init;
bc->cleanup = cryptodev_builtin_cleanup;
bc->create_session = cryptodev_builtin_sym_create_session;
bc->close_session = cryptodev_builtin_sym_close_session;
bc->do_sym_op = cryptodev_builtin_sym_operation;
}
static const TypeInfo cryptodev_builtin_info = {
.name = TYPE_CRYPTODEV_BACKEND_BUILTIN,
.parent = TYPE_CRYPTODEV_BACKEND,
.class_init = cryptodev_builtin_class_init,
.instance_size = sizeof(CryptoDevBackendBuiltin),
};
static void
cryptodev_builtin_register_types(void)
{
type_register_static(&cryptodev_builtin_info);
}
type_init(cryptodev_builtin_register_types);

View File

@@ -1,271 +0,0 @@
/*
* QEMU Crypto Device Implementation
*
* Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
*
* Authors:
* Gonglei <arei.gonglei@huawei.com>
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
* License as published by the Free Software Foundation; either
* version 2 of the License, or (at your option) any later version.
*
* This library is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* Lesser General Public License for more details.
*
* You should have received a copy of the GNU Lesser General Public
* License along with this library; if not, see <http://www.gnu.org/licenses/>.
*
*/
#include "qemu/osdep.h"
#include "sysemu/cryptodev.h"
#include "hw/boards.h"
#include "qapi/error.h"
#include "qapi/visitor.h"
#include "qapi-types.h"
#include "qapi-visit.h"
#include "qemu/config-file.h"
#include "qom/object_interfaces.h"
#include "hw/virtio/virtio-crypto.h"
static QTAILQ_HEAD(, CryptoDevBackendClient) crypto_clients;
CryptoDevBackendClient *
cryptodev_backend_new_client(const char *model,
const char *name)
{
CryptoDevBackendClient *cc;
cc = g_malloc0(sizeof(CryptoDevBackendClient));
cc->model = g_strdup(model);
if (name) {
cc->name = g_strdup(name);
}
QTAILQ_INSERT_TAIL(&crypto_clients, cc, next);
return cc;
}
void cryptodev_backend_free_client(
CryptoDevBackendClient *cc)
{
QTAILQ_REMOVE(&crypto_clients, cc, next);
g_free(cc->name);
g_free(cc->model);
g_free(cc->info_str);
g_free(cc);
}
void cryptodev_backend_cleanup(
CryptoDevBackend *backend,
Error **errp)
{
CryptoDevBackendClass *bc =
CRYPTODEV_BACKEND_GET_CLASS(backend);
if (bc->cleanup) {
bc->cleanup(backend, errp);
}
}
int64_t cryptodev_backend_sym_create_session(
CryptoDevBackend *backend,
CryptoDevBackendSymSessionInfo *sess_info,
uint32_t queue_index, Error **errp)
{
CryptoDevBackendClass *bc =
CRYPTODEV_BACKEND_GET_CLASS(backend);
if (bc->create_session) {
return bc->create_session(backend, sess_info, queue_index, errp);
}
return -1;
}
int cryptodev_backend_sym_close_session(
CryptoDevBackend *backend,
uint64_t session_id,
uint32_t queue_index, Error **errp)
{
CryptoDevBackendClass *bc =
CRYPTODEV_BACKEND_GET_CLASS(backend);
if (bc->close_session) {
return bc->close_session(backend, session_id, queue_index, errp);
}
return -1;
}
static int cryptodev_backend_sym_operation(
CryptoDevBackend *backend,
CryptoDevBackendSymOpInfo *op_info,
uint32_t queue_index, Error **errp)
{
CryptoDevBackendClass *bc =
CRYPTODEV_BACKEND_GET_CLASS(backend);
if (bc->do_sym_op) {
return bc->do_sym_op(backend, op_info, queue_index, errp);
}
return -VIRTIO_CRYPTO_ERR;
}
int cryptodev_backend_crypto_operation(
CryptoDevBackend *backend,
void *opaque,
uint32_t queue_index, Error **errp)
{
VirtIOCryptoReq *req = opaque;
if (req->flags == CRYPTODEV_BACKEND_ALG_SYM) {
CryptoDevBackendSymOpInfo *op_info;
op_info = req->u.sym_op_info;
return cryptodev_backend_sym_operation(backend,
op_info, queue_index, errp);
} else {
error_setg(errp, "Unsupported cryptodev alg type: %" PRIu32 "",
req->flags);
return -VIRTIO_CRYPTO_NOTSUPP;
}
return -VIRTIO_CRYPTO_ERR;
}
static void
cryptodev_backend_get_queues(Object *obj, Visitor *v, const char *name,
void *opaque, Error **errp)
{
CryptoDevBackend *backend = CRYPTODEV_BACKEND(obj);
uint32_t value = backend->conf.peers.queues;
visit_type_uint32(v, name, &value, errp);
}
static void
cryptodev_backend_set_queues(Object *obj, Visitor *v, const char *name,
void *opaque, Error **errp)
{
CryptoDevBackend *backend = CRYPTODEV_BACKEND(obj);
Error *local_err = NULL;
uint32_t value;
visit_type_uint32(v, name, &value, &local_err);
if (local_err) {
goto out;
}
if (!value) {
error_setg(&local_err, "Property '%s.%s' doesn't take value '%"
PRIu32 "'", object_get_typename(obj), name, value);
goto out;
}
backend->conf.peers.queues = value;
out:
error_propagate(errp, local_err);
}
static void
cryptodev_backend_complete(UserCreatable *uc, Error **errp)
{
CryptoDevBackend *backend = CRYPTODEV_BACKEND(uc);
CryptoDevBackendClass *bc = CRYPTODEV_BACKEND_GET_CLASS(uc);
Error *local_err = NULL;
if (bc->init) {
bc->init(backend, &local_err);
if (local_err) {
goto out;
}
}
return;
out:
error_propagate(errp, local_err);
}
void cryptodev_backend_set_used(CryptoDevBackend *backend, bool used)
{
backend->is_used = used;
}
bool cryptodev_backend_is_used(CryptoDevBackend *backend)
{
return backend->is_used;
}
void cryptodev_backend_set_ready(CryptoDevBackend *backend, bool ready)
{
backend->ready = ready;
}
bool cryptodev_backend_is_ready(CryptoDevBackend *backend)
{
return backend->ready;
}
static bool
cryptodev_backend_can_be_deleted(UserCreatable *uc, Error **errp)
{
return !cryptodev_backend_is_used(CRYPTODEV_BACKEND(uc));
}
static void cryptodev_backend_instance_init(Object *obj)
{
object_property_add(obj, "queues", "int",
cryptodev_backend_get_queues,
cryptodev_backend_set_queues,
NULL, NULL, NULL);
/* Initialize devices' queues property to 1 */
object_property_set_int(obj, 1, "queues", NULL);
}
static void cryptodev_backend_finalize(Object *obj)
{
CryptoDevBackend *backend = CRYPTODEV_BACKEND(obj);
cryptodev_backend_cleanup(backend, NULL);
}
static void
cryptodev_backend_class_init(ObjectClass *oc, void *data)
{
UserCreatableClass *ucc = USER_CREATABLE_CLASS(oc);
ucc->complete = cryptodev_backend_complete;
ucc->can_be_deleted = cryptodev_backend_can_be_deleted;
QTAILQ_INIT(&crypto_clients);
}
static const TypeInfo cryptodev_backend_info = {
.name = TYPE_CRYPTODEV_BACKEND,
.parent = TYPE_OBJECT,
.instance_size = sizeof(CryptoDevBackend),
.instance_init = cryptodev_backend_instance_init,
.instance_finalize = cryptodev_backend_finalize,
.class_size = sizeof(CryptoDevBackendClass),
.class_init = cryptodev_backend_class_init,
.interfaces = (InterfaceInfo[]) {
{ TYPE_USER_CREATABLE },
{ }
}
};
static void
cryptodev_backend_register_types(void)
{
type_register_static(&cryptodev_backend_info);
}
type_init(cryptodev_backend_register_types);

View File

@@ -51,7 +51,7 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
#ifndef CONFIG_LINUX #ifndef CONFIG_LINUX
error_setg(errp, "-mem-path not supported on this host"); error_setg(errp, "-mem-path not supported on this host");
#else #else
if (!host_memory_backend_mr_inited(backend)) { if (!memory_region_size(&backend->mr)) {
gchar *path; gchar *path;
backend->force_prealloc = mem_prealloc; backend->force_prealloc = mem_prealloc;
path = object_get_canonical_path(OBJECT(backend)); path = object_get_canonical_path(OBJECT(backend));
@@ -64,6 +64,14 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
#endif #endif
} }
static void
file_backend_class_init(ObjectClass *oc, void *data)
{
HostMemoryBackendClass *bc = MEMORY_BACKEND_CLASS(oc);
bc->alloc = file_backend_memory_alloc;
}
static char *get_mem_path(Object *o, Error **errp) static char *get_mem_path(Object *o, Error **errp)
{ {
HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o); HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
@@ -76,7 +84,7 @@ static void set_mem_path(Object *o, const char *str, Error **errp)
HostMemoryBackend *backend = MEMORY_BACKEND(o); HostMemoryBackend *backend = MEMORY_BACKEND(o);
HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o); HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
if (host_memory_backend_mr_inited(backend)) { if (memory_region_size(&backend->mr)) {
error_setg(errp, "cannot change property value"); error_setg(errp, "cannot change property value");
return; return;
} }
@@ -96,7 +104,7 @@ static void file_memory_backend_set_share(Object *o, bool value, Error **errp)
HostMemoryBackend *backend = MEMORY_BACKEND(o); HostMemoryBackend *backend = MEMORY_BACKEND(o);
HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o); HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(o);
if (host_memory_backend_mr_inited(backend)) { if (memory_region_size(&backend->mr)) {
error_setg(errp, "cannot change property value"); error_setg(errp, "cannot change property value");
return; return;
} }
@@ -104,18 +112,13 @@ static void file_memory_backend_set_share(Object *o, bool value, Error **errp)
} }
static void static void
file_backend_class_init(ObjectClass *oc, void *data) file_backend_instance_init(Object *o)
{ {
HostMemoryBackendClass *bc = MEMORY_BACKEND_CLASS(oc); object_property_add_bool(o, "share",
file_memory_backend_get_share,
bc->alloc = file_backend_memory_alloc; file_memory_backend_set_share, NULL);
object_property_add_str(o, "mem-path", get_mem_path,
object_class_property_add_bool(oc, "share", set_mem_path, NULL);
file_memory_backend_get_share, file_memory_backend_set_share,
&error_abort);
object_class_property_add_str(oc, "mem-path",
get_mem_path, set_mem_path,
&error_abort);
} }
static void file_backend_instance_finalize(Object *o) static void file_backend_instance_finalize(Object *o)
@@ -129,6 +132,7 @@ static const TypeInfo file_backend_info = {
.name = TYPE_MEMORY_BACKEND_FILE, .name = TYPE_MEMORY_BACKEND_FILE,
.parent = TYPE_MEMORY_BACKEND, .parent = TYPE_MEMORY_BACKEND,
.class_init = file_backend_class_init, .class_init = file_backend_class_init,
.instance_init = file_backend_instance_init,
.instance_finalize = file_backend_instance_finalize, .instance_finalize = file_backend_instance_finalize,
.instance_size = sizeof(HostMemoryBackendFile), .instance_size = sizeof(HostMemoryBackendFile),
}; };

View File

@@ -45,7 +45,7 @@ host_memory_backend_set_size(Object *obj, Visitor *v, const char *name,
Error *local_err = NULL; Error *local_err = NULL;
uint64_t value; uint64_t value;
if (host_memory_backend_mr_inited(backend)) { if (memory_region_size(&backend->mr)) {
error_setg(&local_err, "cannot change property value"); error_setg(&local_err, "cannot change property value");
goto out; goto out;
} }
@@ -64,6 +64,14 @@ out:
error_propagate(errp, local_err); error_propagate(errp, local_err);
} }
static uint16List **host_memory_append_node(uint16List **node,
unsigned long value)
{
*node = g_malloc0(sizeof(**node));
(*node)->value = value;
return &(*node)->next;
}
static void static void
host_memory_backend_get_host_nodes(Object *obj, Visitor *v, const char *name, host_memory_backend_get_host_nodes(Object *obj, Visitor *v, const char *name,
void *opaque, Error **errp) void *opaque, Error **errp)
@@ -74,13 +82,12 @@ host_memory_backend_get_host_nodes(Object *obj, Visitor *v, const char *name,
unsigned long value; unsigned long value;
value = find_first_bit(backend->host_nodes, MAX_NODES); value = find_first_bit(backend->host_nodes, MAX_NODES);
if (value == MAX_NODES) {
return;
}
*node = g_malloc0(sizeof(**node)); node = host_memory_append_node(node, value);
(*node)->value = value;
node = &(*node)->next; if (value == MAX_NODES) {
goto out;
}
do { do {
value = find_next_bit(backend->host_nodes, MAX_NODES, value + 1); value = find_next_bit(backend->host_nodes, MAX_NODES, value + 1);
@@ -88,11 +95,10 @@ host_memory_backend_get_host_nodes(Object *obj, Visitor *v, const char *name,
break; break;
} }
*node = g_malloc0(sizeof(**node)); node = host_memory_append_node(node, value);
(*node)->value = value;
node = &(*node)->next;
} while (true); } while (true);
out:
visit_type_uint16List(v, name, &host_nodes, errp); visit_type_uint16List(v, name, &host_nodes, errp);
} }
@@ -146,7 +152,7 @@ static void host_memory_backend_set_merge(Object *obj, bool value, Error **errp)
{ {
HostMemoryBackend *backend = MEMORY_BACKEND(obj); HostMemoryBackend *backend = MEMORY_BACKEND(obj);
if (!host_memory_backend_mr_inited(backend)) { if (!memory_region_size(&backend->mr)) {
backend->merge = value; backend->merge = value;
return; return;
} }
@@ -172,7 +178,7 @@ static void host_memory_backend_set_dump(Object *obj, bool value, Error **errp)
{ {
HostMemoryBackend *backend = MEMORY_BACKEND(obj); HostMemoryBackend *backend = MEMORY_BACKEND(obj);
if (!host_memory_backend_mr_inited(backend)) { if (!memory_region_size(&backend->mr)) {
backend->dump = value; backend->dump = value;
return; return;
} }
@@ -208,7 +214,7 @@ static void host_memory_backend_set_prealloc(Object *obj, bool value,
} }
} }
if (!host_memory_backend_mr_inited(backend)) { if (!memory_region_size(&backend->mr)) {
backend->prealloc = value; backend->prealloc = value;
return; return;
} }
@@ -218,7 +224,7 @@ static void host_memory_backend_set_prealloc(Object *obj, bool value,
void *ptr = memory_region_get_ram_ptr(&backend->mr); void *ptr = memory_region_get_ram_ptr(&backend->mr);
uint64_t sz = memory_region_size(&backend->mr); uint64_t sz = memory_region_size(&backend->mr);
os_mem_prealloc(fd, ptr, sz, smp_cpus, &local_err); os_mem_prealloc(fd, ptr, sz, &local_err);
if (local_err) { if (local_err) {
error_propagate(errp, local_err); error_propagate(errp, local_err);
return; return;
@@ -235,21 +241,32 @@ static void host_memory_backend_init(Object *obj)
backend->merge = machine_mem_merge(machine); backend->merge = machine_mem_merge(machine);
backend->dump = machine_dump_guest_core(machine); backend->dump = machine_dump_guest_core(machine);
backend->prealloc = mem_prealloc; backend->prealloc = mem_prealloc;
}
bool host_memory_backend_mr_inited(HostMemoryBackend *backend) object_property_add_bool(obj, "merge",
{ host_memory_backend_get_merge,
/* host_memory_backend_set_merge, NULL);
* NOTE: We forbid zero-length memory backend, so here zero means object_property_add_bool(obj, "dump",
* "we haven't inited the backend memory region yet". host_memory_backend_get_dump,
*/ host_memory_backend_set_dump, NULL);
return memory_region_size(&backend->mr) != 0; object_property_add_bool(obj, "prealloc",
host_memory_backend_get_prealloc,
host_memory_backend_set_prealloc, NULL);
object_property_add(obj, "size", "int",
host_memory_backend_get_size,
host_memory_backend_set_size, NULL, NULL, NULL);
object_property_add(obj, "host-nodes", "int",
host_memory_backend_get_host_nodes,
host_memory_backend_set_host_nodes, NULL, NULL, NULL);
object_property_add_enum(obj, "policy", "HostMemPolicy",
HostMemPolicy_lookup,
host_memory_backend_get_policy,
host_memory_backend_set_policy, NULL);
} }
MemoryRegion * MemoryRegion *
host_memory_backend_get_memory(HostMemoryBackend *backend, Error **errp) host_memory_backend_get_memory(HostMemoryBackend *backend, Error **errp)
{ {
return host_memory_backend_mr_inited(backend) ? &backend->mr : NULL; return memory_region_size(&backend->mr) ? &backend->mr : NULL;
} }
void host_memory_backend_set_mapped(HostMemoryBackend *backend, bool mapped) void host_memory_backend_set_mapped(HostMemoryBackend *backend, bool mapped)
@@ -331,7 +348,7 @@ host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
*/ */
if (backend->prealloc) { if (backend->prealloc) {
os_mem_prealloc(memory_region_get_fd(&backend->mr), ptr, sz, os_mem_prealloc(memory_region_get_fd(&backend->mr), ptr, sz,
smp_cpus, &local_err); &local_err);
if (local_err) { if (local_err) {
goto out; goto out;
} }
@@ -351,24 +368,6 @@ host_memory_backend_can_be_deleted(UserCreatable *uc, Error **errp)
} }
} }
static char *get_id(Object *o, Error **errp)
{
HostMemoryBackend *backend = MEMORY_BACKEND(o);
return g_strdup(backend->id);
}
static void set_id(Object *o, const char *str, Error **errp)
{
HostMemoryBackend *backend = MEMORY_BACKEND(o);
if (backend->id) {
error_setg(errp, "cannot change property value");
return;
}
backend->id = g_strdup(str);
}
static void static void
host_memory_backend_class_init(ObjectClass *oc, void *data) host_memory_backend_class_init(ObjectClass *oc, void *data)
{ {
@@ -376,35 +375,6 @@ host_memory_backend_class_init(ObjectClass *oc, void *data)
ucc->complete = host_memory_backend_memory_complete; ucc->complete = host_memory_backend_memory_complete;
ucc->can_be_deleted = host_memory_backend_can_be_deleted; ucc->can_be_deleted = host_memory_backend_can_be_deleted;
object_class_property_add_bool(oc, "merge",
host_memory_backend_get_merge,
host_memory_backend_set_merge, &error_abort);
object_class_property_add_bool(oc, "dump",
host_memory_backend_get_dump,
host_memory_backend_set_dump, &error_abort);
object_class_property_add_bool(oc, "prealloc",
host_memory_backend_get_prealloc,
host_memory_backend_set_prealloc, &error_abort);
object_class_property_add(oc, "size", "int",
host_memory_backend_get_size,
host_memory_backend_set_size,
NULL, NULL, &error_abort);
object_class_property_add(oc, "host-nodes", "int",
host_memory_backend_get_host_nodes,
host_memory_backend_set_host_nodes,
NULL, NULL, &error_abort);
object_class_property_add_enum(oc, "policy", "HostMemPolicy",
HostMemPolicy_lookup,
host_memory_backend_get_policy,
host_memory_backend_set_policy, &error_abort);
object_class_property_add_str(oc, "id", get_id, set_id, &error_abort);
}
static void host_memory_backend_finalize(Object *o)
{
HostMemoryBackend *backend = MEMORY_BACKEND(o);
g_free(backend->id);
} }
static const TypeInfo host_memory_backend_info = { static const TypeInfo host_memory_backend_info = {
@@ -415,7 +385,6 @@ static const TypeInfo host_memory_backend_info = {
.class_init = host_memory_backend_class_init, .class_init = host_memory_backend_class_init,
.instance_size = sizeof(HostMemoryBackend), .instance_size = sizeof(HostMemoryBackend),
.instance_init = host_memory_backend_init, .instance_init = host_memory_backend_init,
.instance_finalize = host_memory_backend_finalize,
.interfaces = (InterfaceInfo[]) { .interfaces = (InterfaceInfo[]) {
{ TYPE_USER_CREATABLE }, { TYPE_USER_CREATABLE },
{ } { }

View File

@@ -23,7 +23,7 @@
*/ */
#include "qemu/osdep.h" #include "qemu/osdep.h"
#include "qemu-common.h" #include "qemu-common.h"
#include "chardev/char.h" #include "sysemu/char.h"
#include "ui/console.h" #include "ui/console.h"
#include "ui/input.h" #include "ui/input.h"
@@ -31,23 +31,18 @@
#define MSMOUSE_HI2(n) (((n) & 0xc0) >> 6) #define MSMOUSE_HI2(n) (((n) & 0xc0) >> 6)
typedef struct { typedef struct {
Chardev parent; CharDriverState *chr;
QemuInputHandlerState *hs; QemuInputHandlerState *hs;
int axis[INPUT_AXIS__MAX]; int axis[INPUT_AXIS__MAX];
bool btns[INPUT_BUTTON__MAX]; bool btns[INPUT_BUTTON__MAX];
bool btnc[INPUT_BUTTON__MAX]; bool btnc[INPUT_BUTTON__MAX];
uint8_t outbuf[32]; uint8_t outbuf[32];
int outlen; int outlen;
} MouseChardev; } MouseState;
#define TYPE_CHARDEV_MSMOUSE "chardev-msmouse" static void msmouse_chr_accept_input(CharDriverState *chr)
#define MOUSE_CHARDEV(obj) \
OBJECT_CHECK(MouseChardev, (obj), TYPE_CHARDEV_MSMOUSE)
static void msmouse_chr_accept_input(Chardev *chr)
{ {
MouseChardev *mouse = MOUSE_CHARDEV(chr); MouseState *mouse = chr->opaque;
int len; int len;
len = qemu_chr_be_can_write(chr); len = qemu_chr_be_can_write(chr);
@@ -65,7 +60,7 @@ static void msmouse_chr_accept_input(Chardev *chr)
} }
} }
static void msmouse_queue_event(MouseChardev *mouse) static void msmouse_queue_event(MouseState *mouse)
{ {
unsigned char bytes[4] = { 0x40, 0x00, 0x00, 0x00 }; unsigned char bytes[4] = { 0x40, 0x00, 0x00, 0x00 };
int dx, dy, count = 3; int dx, dy, count = 3;
@@ -102,7 +97,7 @@ static void msmouse_queue_event(MouseChardev *mouse)
static void msmouse_input_event(DeviceState *dev, QemuConsole *src, static void msmouse_input_event(DeviceState *dev, QemuConsole *src,
InputEvent *evt) InputEvent *evt)
{ {
MouseChardev *mouse = MOUSE_CHARDEV(dev); MouseState *mouse = (MouseState *)dev;
InputMoveEvent *move; InputMoveEvent *move;
InputBtnEvent *btn; InputBtnEvent *btn;
@@ -126,24 +121,25 @@ static void msmouse_input_event(DeviceState *dev, QemuConsole *src,
static void msmouse_input_sync(DeviceState *dev) static void msmouse_input_sync(DeviceState *dev)
{ {
MouseChardev *mouse = MOUSE_CHARDEV(dev); MouseState *mouse = (MouseState *)dev;
Chardev *chr = CHARDEV(dev);
msmouse_queue_event(mouse); msmouse_queue_event(mouse);
msmouse_chr_accept_input(chr); msmouse_chr_accept_input(mouse->chr);
} }
static int msmouse_chr_write(struct Chardev *s, const uint8_t *buf, int len) static int msmouse_chr_write (struct CharDriverState *s, const uint8_t *buf, int len)
{ {
/* Ignore writes to mouse port */ /* Ignore writes to mouse port */
return len; return len;
} }
static void char_msmouse_finalize(Object *obj) static void msmouse_chr_close (struct CharDriverState *chr)
{ {
MouseChardev *mouse = MOUSE_CHARDEV(obj); MouseState *mouse = chr->opaque;
qemu_input_handler_unregister(mouse->hs); qemu_input_handler_unregister(mouse->hs);
g_free(mouse);
g_free(chr);
} }
static QemuInputHandler msmouse_handler = { static QemuInputHandler msmouse_handler = {
@@ -153,38 +149,35 @@ static QemuInputHandler msmouse_handler = {
.sync = msmouse_input_sync, .sync = msmouse_input_sync,
}; };
static void msmouse_chr_open(Chardev *chr, static CharDriverState *qemu_chr_open_msmouse(const char *id,
ChardevBackend *backend, ChardevBackend *backend,
bool *be_opened, ChardevReturn *ret,
Error **errp) Error **errp)
{ {
MouseChardev *mouse = MOUSE_CHARDEV(chr); ChardevCommon *common = backend->u.msmouse.data;
MouseState *mouse;
CharDriverState *chr;
*be_opened = false; chr = qemu_chr_alloc(common, errp);
chr->chr_write = msmouse_chr_write;
chr->chr_close = msmouse_chr_close;
chr->chr_accept_input = msmouse_chr_accept_input;
chr->explicit_be_open = true;
mouse = g_new0(MouseState, 1);
mouse->hs = qemu_input_handler_register((DeviceState *)mouse, mouse->hs = qemu_input_handler_register((DeviceState *)mouse,
&msmouse_handler); &msmouse_handler);
mouse->chr = chr;
chr->opaque = mouse;
return chr;
} }
static void char_msmouse_class_init(ObjectClass *oc, void *data)
{
ChardevClass *cc = CHARDEV_CLASS(oc);
cc->open = msmouse_chr_open;
cc->chr_write = msmouse_chr_write;
cc->chr_accept_input = msmouse_chr_accept_input;
}
static const TypeInfo char_msmouse_type_info = {
.name = TYPE_CHARDEV_MSMOUSE,
.parent = TYPE_CHARDEV,
.instance_size = sizeof(MouseChardev),
.instance_finalize = char_msmouse_finalize,
.class_init = char_msmouse_class_init,
};
static void register_types(void) static void register_types(void)
{ {
type_register_static(&char_msmouse_type_info); register_char_driver("msmouse", CHARDEV_BACKEND_KIND_MSMOUSE, NULL,
qemu_chr_open_msmouse);
} }
type_init(register_types); type_init(register_types);

View File

@@ -12,9 +12,10 @@
#include "qemu/osdep.h" #include "qemu/osdep.h"
#include "sysemu/rng.h" #include "sysemu/rng.h"
#include "chardev/char-fe.h" #include "sysemu/char.h"
#include "qapi/error.h" #include "qapi/error.h"
#include "qapi/qmp/qerror.h" #include "qapi/qmp/qerror.h"
#include "hw/qdev.h" /* just for DEFINE_PROP_CHR */
#define TYPE_RNG_EGD "rng-egd" #define TYPE_RNG_EGD "rng-egd"
#define RNG_EGD(obj) OBJECT_CHECK(RngEgd, (obj), TYPE_RNG_EGD) #define RNG_EGD(obj) OBJECT_CHECK(RngEgd, (obj), TYPE_RNG_EGD)
@@ -23,7 +24,7 @@ typedef struct RngEgd
{ {
RngBackend parent; RngBackend parent;
CharBackend chr; CharDriverState *chr;
char *chr_name; char *chr_name;
} RngEgd; } RngEgd;
@@ -40,9 +41,7 @@ static void rng_egd_request_entropy(RngBackend *b, RngRequest *req)
header[0] = 0x02; header[0] = 0x02;
header[1] = len; header[1] = len;
/* XXX this blocks entire thread. Rewrite to use qemu_chr_fe_write(s->chr, header, sizeof(header));
* qemu_chr_fe_write and background I/O callbacks */
qemu_chr_fe_write_all(&s->chr, header, sizeof(header));
size -= len; size -= len;
} }
@@ -86,7 +85,6 @@ static void rng_egd_chr_read(void *opaque, const uint8_t *buf, int size)
static void rng_egd_opened(RngBackend *b, Error **errp) static void rng_egd_opened(RngBackend *b, Error **errp)
{ {
RngEgd *s = RNG_EGD(b); RngEgd *s = RNG_EGD(b);
Chardev *chr;
if (s->chr_name == NULL) { if (s->chr_name == NULL) {
error_setg(errp, QERR_INVALID_PARAMETER_VALUE, error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
@@ -94,19 +92,21 @@ static void rng_egd_opened(RngBackend *b, Error **errp)
return; return;
} }
chr = qemu_chr_find(s->chr_name); s->chr = qemu_chr_find(s->chr_name);
if (chr == NULL) { if (s->chr == NULL) {
error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND, error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
"Device '%s' not found", s->chr_name); "Device '%s' not found", s->chr_name);
return; return;
} }
if (!qemu_chr_fe_init(&s->chr, chr, errp)) {
if (qemu_chr_fe_claim(s->chr) != 0) {
error_setg(errp, QERR_DEVICE_IN_USE, s->chr_name);
return; return;
} }
/* FIXME we should resubmit pending requests when the CDS reconnects. */ /* FIXME we should resubmit pending requests when the CDS reconnects. */
qemu_chr_fe_set_handlers(&s->chr, rng_egd_chr_can_read, qemu_chr_add_handlers(s->chr, rng_egd_chr_can_read, rng_egd_chr_read,
rng_egd_chr_read, NULL, s, NULL, true); NULL, s);
} }
static void rng_egd_set_chardev(Object *obj, const char *value, Error **errp) static void rng_egd_set_chardev(Object *obj, const char *value, Error **errp)
@@ -125,10 +125,9 @@ static void rng_egd_set_chardev(Object *obj, const char *value, Error **errp)
static char *rng_egd_get_chardev(Object *obj, Error **errp) static char *rng_egd_get_chardev(Object *obj, Error **errp)
{ {
RngEgd *s = RNG_EGD(obj); RngEgd *s = RNG_EGD(obj);
Chardev *chr = qemu_chr_fe_get_driver(&s->chr);
if (chr && chr->label) { if (s->chr && s->chr->label) {
return g_strdup(chr->label); return g_strdup(s->chr->label);
} }
return NULL; return NULL;
@@ -145,7 +144,11 @@ static void rng_egd_finalize(Object *obj)
{ {
RngEgd *s = RNG_EGD(obj); RngEgd *s = RNG_EGD(obj);
qemu_chr_fe_deinit(&s->chr, false); if (s->chr) {
qemu_chr_add_handlers(s->chr, NULL, NULL, NULL, NULL);
qemu_chr_fe_release(s->chr);
}
g_free(s->chr_name); g_free(s->chr_name);
} }

View File

@@ -25,23 +25,18 @@
*/ */
#include "qemu/osdep.h" #include "qemu/osdep.h"
#include "qemu-common.h" #include "qemu-common.h"
#include "chardev/char.h" #include "sysemu/char.h"
#define BUF_SIZE 32 #define BUF_SIZE 32
typedef struct { typedef struct {
Chardev parent; CharDriverState *chr;
uint8_t in_buf[32]; uint8_t in_buf[32];
int in_buf_used; int in_buf_used;
} TestdevChardev; } TestdevCharState;
#define TYPE_CHARDEV_TESTDEV "chardev-testdev"
#define TESTDEV_CHARDEV(obj) \
OBJECT_CHECK(TestdevChardev, (obj), TYPE_CHARDEV_TESTDEV)
/* Try to interpret a whole incoming packet */ /* Try to interpret a whole incoming packet */
static int testdev_eat_packet(TestdevChardev *testdev) static int testdev_eat_packet(TestdevCharState *testdev)
{ {
const uint8_t *cur = testdev->in_buf; const uint8_t *cur = testdev->in_buf;
int len = testdev->in_buf_used; int len = testdev->in_buf_used;
@@ -82,9 +77,9 @@ static int testdev_eat_packet(TestdevChardev *testdev)
} }
/* The other end is writing some data. Store it and try to interpret */ /* The other end is writing some data. Store it and try to interpret */
static int testdev_chr_write(Chardev *chr, const uint8_t *buf, int len) static int testdev_write(CharDriverState *chr, const uint8_t *buf, int len)
{ {
TestdevChardev *testdev = TESTDEV_CHARDEV(chr); TestdevCharState *testdev = chr->opaque;
int tocopy, eaten, orig_len = len; int tocopy, eaten, orig_len = len;
while (len) { while (len) {
@@ -107,23 +102,35 @@ static int testdev_chr_write(Chardev *chr, const uint8_t *buf, int len)
return orig_len; return orig_len;
} }
static void char_testdev_class_init(ObjectClass *oc, void *data) static void testdev_close(struct CharDriverState *chr)
{ {
ChardevClass *cc = CHARDEV_CLASS(oc); TestdevCharState *testdev = chr->opaque;
cc->chr_write = testdev_chr_write; g_free(testdev);
} }
static const TypeInfo char_testdev_type_info = { static CharDriverState *chr_testdev_init(const char *id,
.name = TYPE_CHARDEV_TESTDEV, ChardevBackend *backend,
.parent = TYPE_CHARDEV, ChardevReturn *ret,
.instance_size = sizeof(TestdevChardev), Error **errp)
.class_init = char_testdev_class_init, {
}; TestdevCharState *testdev;
CharDriverState *chr;
testdev = g_new0(TestdevCharState, 1);
testdev->chr = chr = g_new0(CharDriverState, 1);
chr->opaque = testdev;
chr->chr_write = testdev_write;
chr->chr_close = testdev_close;
return chr;
}
static void register_types(void) static void register_types(void)
{ {
type_register_static(&char_testdev_type_info); register_char_driver("testdev", CHARDEV_BACKEND_KIND_TESTDEV, NULL,
chr_testdev_init);
} }
type_init(register_types); type_init(register_types);

View File

View File

@@ -29,7 +29,7 @@
#include "exec/cpu-common.h" #include "exec/cpu-common.h"
#include "sysemu/kvm.h" #include "sysemu/kvm.h"
#include "sysemu/balloon.h" #include "sysemu/balloon.h"
#include "trace-root.h" #include "trace.h"
#include "qmp-commands.h" #include "qmp-commands.h"
#include "qapi/qmp/qerror.h" #include "qapi/qmp/qerror.h"
#include "qapi/qmp/qjson.h" #include "qapi/qmp/qjson.h"

1676
block.c

File diff suppressed because it is too large Load Diff

View File

@@ -1,36 +1,35 @@
block-obj-y += raw-format.o qcow.o vdi.o vmdk.o cloop.o bochs.o vpc.o vvfat.o dmg.o block-obj-y += raw_bsd.o qcow.o vdi.o vmdk.o cloop.o bochs.o vpc.o vvfat.o
block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
block-obj-y += qed-check.o block-obj-y += qed-check.o
block-obj-y += vhdx.o vhdx-endian.o vhdx-log.o block-obj-$(CONFIG_VHDX) += vhdx.o vhdx-endian.o vhdx-log.o
block-obj-y += quorum.o block-obj-y += quorum.o
block-obj-y += parallels.o blkdebug.o blkverify.o blkreplay.o block-obj-y += parallels.o blkdebug.o blkverify.o blkreplay.o
block-obj-y += block-backend.o snapshot.o qapi.o block-obj-y += block-backend.o snapshot.o qapi.o
block-obj-$(CONFIG_WIN32) += file-win32.o win32-aio.o block-obj-$(CONFIG_WIN32) += raw-win32.o win32-aio.o
block-obj-$(CONFIG_POSIX) += file-posix.o block-obj-$(CONFIG_POSIX) += raw-posix.o
block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
block-obj-y += null.o mirror.o commit.o io.o block-obj-y += null.o mirror.o commit.o io.o
block-obj-y += throttle-groups.o block-obj-y += throttle-groups.o
block-obj-y += nbd.o nbd-client.o sheepdog.o block-obj-y += nbd.o nbd-client.o sheepdog.o
block-obj-$(CONFIG_LIBISCSI) += iscsi.o block-obj-$(CONFIG_LIBISCSI) += iscsi.o
block-obj-$(if $(CONFIG_LIBISCSI),y,n) += iscsi-opts.o
block-obj-$(CONFIG_LIBNFS) += nfs.o block-obj-$(CONFIG_LIBNFS) += nfs.o
block-obj-$(CONFIG_CURL) += curl.o block-obj-$(CONFIG_CURL) += curl.o
block-obj-$(CONFIG_RBD) += rbd.o block-obj-$(CONFIG_RBD) += rbd.o
block-obj-$(CONFIG_GLUSTERFS) += gluster.o block-obj-$(CONFIG_GLUSTERFS) += gluster.o
block-obj-$(CONFIG_VXHS) += vxhs.o block-obj-$(CONFIG_ARCHIPELAGO) += archipelago.o
block-obj-$(CONFIG_LIBSSH2) += ssh.o block-obj-$(CONFIG_LIBSSH2) += ssh.o
block-obj-y += accounting.o dirty-bitmap.o block-obj-y += accounting.o dirty-bitmap.o
block-obj-y += dictzip.o
block-obj-y += tar.o
block-obj-y += write-threshold.o block-obj-y += write-threshold.o
block-obj-y += backup.o
block-obj-$(CONFIG_REPLICATION) += replication.o
block-obj-y += crypto.o block-obj-y += crypto.o
common-obj-y += stream.o common-obj-y += stream.o
common-obj-y += backup.o
nfs.o-libs := $(LIBNFS_LIBS)
iscsi.o-cflags := $(LIBISCSI_CFLAGS) iscsi.o-cflags := $(LIBISCSI_CFLAGS)
iscsi.o-libs := $(LIBISCSI_LIBS) iscsi.o-libs := $(LIBISCSI_LIBS)
curl.o-cflags := $(CURL_CFLAGS) curl.o-cflags := $(CURL_CFLAGS)
@@ -39,10 +38,10 @@ rbd.o-cflags := $(RBD_CFLAGS)
rbd.o-libs := $(RBD_LIBS) rbd.o-libs := $(RBD_LIBS)
gluster.o-cflags := $(GLUSTERFS_CFLAGS) gluster.o-cflags := $(GLUSTERFS_CFLAGS)
gluster.o-libs := $(GLUSTERFS_LIBS) gluster.o-libs := $(GLUSTERFS_LIBS)
vxhs.o-libs := $(VXHS_LIBS)
ssh.o-cflags := $(LIBSSH2_CFLAGS) ssh.o-cflags := $(LIBSSH2_CFLAGS)
ssh.o-libs := $(LIBSSH2_LIBS) ssh.o-libs := $(LIBSSH2_LIBS)
block-obj-$(if $(CONFIG_BZIP2),m,n) += dmg-bz2.o archipelago.o-libs := $(ARCHIPELAGO_LIBS)
dmg-bz2.o-libs := $(BZIP2_LIBS) block-obj-m += dmg.o
dmg.o-libs := $(BZIP2_LIBS)
qcow.o-libs := -lz qcow.o-libs := -lz
linux-aio.o-libs := -laio linux-aio.o-libs := -laio

View File

@@ -32,19 +32,15 @@
static QEMUClockType clock_type = QEMU_CLOCK_REALTIME; static QEMUClockType clock_type = QEMU_CLOCK_REALTIME;
static const int qtest_latency_ns = NANOSECONDS_PER_SECOND / 1000; static const int qtest_latency_ns = NANOSECONDS_PER_SECOND / 1000;
void block_acct_init(BlockAcctStats *stats) void block_acct_init(BlockAcctStats *stats, bool account_invalid,
{
qemu_mutex_init(&stats->lock);
if (qtest_enabled()) {
clock_type = QEMU_CLOCK_VIRTUAL;
}
}
void block_acct_setup(BlockAcctStats *stats, bool account_invalid,
bool account_failed) bool account_failed)
{ {
stats->account_invalid = account_invalid; stats->account_invalid = account_invalid;
stats->account_failed = account_failed; stats->account_failed = account_failed;
if (qtest_enabled()) {
clock_type = QEMU_CLOCK_VIRTUAL;
}
} }
void block_acct_cleanup(BlockAcctStats *stats) void block_acct_cleanup(BlockAcctStats *stats)
@@ -53,7 +49,6 @@ void block_acct_cleanup(BlockAcctStats *stats)
QSLIST_FOREACH_SAFE(s, &stats->intervals, entries, next) { QSLIST_FOREACH_SAFE(s, &stats->intervals, entries, next) {
g_free(s); g_free(s);
} }
qemu_mutex_destroy(&stats->lock);
} }
void block_acct_add_interval(BlockAcctStats *stats, unsigned interval_length) void block_acct_add_interval(BlockAcctStats *stats, unsigned interval_length)
@@ -63,15 +58,12 @@ void block_acct_add_interval(BlockAcctStats *stats, unsigned interval_length)
s = g_new0(BlockAcctTimedStats, 1); s = g_new0(BlockAcctTimedStats, 1);
s->interval_length = interval_length; s->interval_length = interval_length;
s->stats = stats;
qemu_mutex_lock(&stats->lock);
QSLIST_INSERT_HEAD(&stats->intervals, s, entries); QSLIST_INSERT_HEAD(&stats->intervals, s, entries);
for (i = 0; i < BLOCK_MAX_IOTYPE; i++) { for (i = 0; i < BLOCK_MAX_IOTYPE; i++) {
timed_average_init(&s->latency[i], clock_type, timed_average_init(&s->latency[i], clock_type,
(uint64_t) interval_length * NANOSECONDS_PER_SECOND); (uint64_t) interval_length * NANOSECONDS_PER_SECOND);
} }
qemu_mutex_unlock(&stats->lock);
} }
BlockAcctTimedStats *block_acct_interval_next(BlockAcctStats *stats, BlockAcctTimedStats *block_acct_interval_next(BlockAcctStats *stats,
@@ -94,8 +86,7 @@ void block_acct_start(BlockAcctStats *stats, BlockAcctCookie *cookie,
cookie->type = type; cookie->type = type;
} }
static void block_account_one_io(BlockAcctStats *stats, BlockAcctCookie *cookie, void block_acct_done(BlockAcctStats *stats, BlockAcctCookie *cookie)
bool failed)
{ {
BlockAcctTimedStats *s; BlockAcctTimedStats *s;
int64_t time_ns = qemu_clock_get_ns(clock_type); int64_t time_ns = qemu_clock_get_ns(clock_type);
@@ -107,16 +98,8 @@ static void block_account_one_io(BlockAcctStats *stats, BlockAcctCookie *cookie,
assert(cookie->type < BLOCK_MAX_IOTYPE); assert(cookie->type < BLOCK_MAX_IOTYPE);
qemu_mutex_lock(&stats->lock);
if (failed) {
stats->failed_ops[cookie->type]++;
} else {
stats->nr_bytes[cookie->type] += cookie->bytes; stats->nr_bytes[cookie->type] += cookie->bytes;
stats->nr_ops[cookie->type]++; stats->nr_ops[cookie->type]++;
}
if (!failed || stats->account_failed) {
stats->total_time_ns[cookie->type] += latency_ns; stats->total_time_ns[cookie->type] += latency_ns;
stats->last_access_time_ns = time_ns; stats->last_access_time_ns = time_ns;
@@ -125,44 +108,51 @@ static void block_account_one_io(BlockAcctStats *stats, BlockAcctCookie *cookie,
} }
} }
qemu_mutex_unlock(&stats->lock);
}
void block_acct_done(BlockAcctStats *stats, BlockAcctCookie *cookie)
{
block_account_one_io(stats, cookie, false);
}
void block_acct_failed(BlockAcctStats *stats, BlockAcctCookie *cookie) void block_acct_failed(BlockAcctStats *stats, BlockAcctCookie *cookie)
{ {
block_account_one_io(stats, cookie, true); assert(cookie->type < BLOCK_MAX_IOTYPE);
stats->failed_ops[cookie->type]++;
if (stats->account_failed) {
BlockAcctTimedStats *s;
int64_t time_ns = qemu_clock_get_ns(clock_type);
int64_t latency_ns = time_ns - cookie->start_time_ns;
if (qtest_enabled()) {
latency_ns = qtest_latency_ns;
}
stats->total_time_ns[cookie->type] += latency_ns;
stats->last_access_time_ns = time_ns;
QSLIST_FOREACH(s, &stats->intervals, entries) {
timed_average_account(&s->latency[cookie->type], latency_ns);
}
}
} }
void block_acct_invalid(BlockAcctStats *stats, enum BlockAcctType type) void block_acct_invalid(BlockAcctStats *stats, enum BlockAcctType type)
{ {
assert(type < BLOCK_MAX_IOTYPE); assert(type < BLOCK_MAX_IOTYPE);
/* block_account_one_io() updates total_time_ns[], but this one does /* block_acct_done() and block_acct_failed() update
* not. The reason is that invalid requests are accounted during their * total_time_ns[], but this one does not. The reason is that
* submission, therefore there's no actual I/O involved. * invalid requests are accounted during their submission,
*/ * therefore there's no actual I/O involved. */
qemu_mutex_lock(&stats->lock);
stats->invalid_ops[type]++; stats->invalid_ops[type]++;
if (stats->account_invalid) { if (stats->account_invalid) {
stats->last_access_time_ns = qemu_clock_get_ns(clock_type); stats->last_access_time_ns = qemu_clock_get_ns(clock_type);
} }
qemu_mutex_unlock(&stats->lock);
} }
void block_acct_merge_done(BlockAcctStats *stats, enum BlockAcctType type, void block_acct_merge_done(BlockAcctStats *stats, enum BlockAcctType type,
int num_requests) int num_requests)
{ {
assert(type < BLOCK_MAX_IOTYPE); assert(type < BLOCK_MAX_IOTYPE);
qemu_mutex_lock(&stats->lock);
stats->merged[type] += num_requests; stats->merged[type] += num_requests;
qemu_mutex_unlock(&stats->lock);
} }
int64_t block_acct_idle_time_ns(BlockAcctStats *stats) int64_t block_acct_idle_time_ns(BlockAcctStats *stats)
@@ -177,9 +167,7 @@ double block_acct_queue_depth(BlockAcctTimedStats *stats,
assert(type < BLOCK_MAX_IOTYPE); assert(type < BLOCK_MAX_IOTYPE);
qemu_mutex_lock(&stats->stats->lock);
sum = timed_average_sum(&stats->latency[type], &elapsed); sum = timed_average_sum(&stats->latency[type], &elapsed);
qemu_mutex_unlock(&stats->stats->lock);
return (double) sum / elapsed; return (double) sum / elapsed;
} }

1082
block/archipelago.c Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -16,19 +16,24 @@
#include "trace.h" #include "trace.h"
#include "block/block.h" #include "block/block.h"
#include "block/block_int.h" #include "block/block_int.h"
#include "block/blockjob_int.h" #include "block/blockjob.h"
#include "block/block_backup.h"
#include "qapi/error.h" #include "qapi/error.h"
#include "qapi/qmp/qerror.h" #include "qapi/qmp/qerror.h"
#include "qemu/ratelimit.h" #include "qemu/ratelimit.h"
#include "qemu/cutils.h" #include "qemu/cutils.h"
#include "sysemu/block-backend.h" #include "sysemu/block-backend.h"
#include "qemu/bitmap.h" #include "qemu/bitmap.h"
#include "qemu/error-report.h"
#define BACKUP_CLUSTER_SIZE_DEFAULT (1 << 16) #define BACKUP_CLUSTER_SIZE_DEFAULT (1 << 16)
#define SLICE_TIME 100000000ULL /* ns */ #define SLICE_TIME 100000000ULL /* ns */
typedef struct CowRequest {
int64_t start;
int64_t end;
QLIST_ENTRY(CowRequest) list;
CoQueue wait_queue; /* coroutines blocked on this request */
} CowRequest;
typedef struct BackupBlockJob { typedef struct BackupBlockJob {
BlockJob common; BlockJob common;
BlockBackend *target; BlockBackend *target;
@@ -42,7 +47,6 @@ typedef struct BackupBlockJob {
uint64_t sectors_read; uint64_t sectors_read;
unsigned long *done_bitmap; unsigned long *done_bitmap;
int64_t cluster_size; int64_t cluster_size;
bool compress;
NotifierWithReturn before_write; NotifierWithReturn before_write;
QLIST_HEAD(, CowRequest) inflight_reqs; QLIST_HEAD(, CowRequest) inflight_reqs;
} BackupBlockJob; } BackupBlockJob;
@@ -65,7 +69,7 @@ static void coroutine_fn wait_for_overlapping_requests(BackupBlockJob *job,
retry = false; retry = false;
QLIST_FOREACH(req, &job->inflight_reqs, list) { QLIST_FOREACH(req, &job->inflight_reqs, list) {
if (end > req->start && start < req->end) { if (end > req->start && start < req->end) {
qemu_co_queue_wait(&req->wait_queue, NULL); qemu_co_queue_wait(&req->wait_queue);
retry = true; retry = true;
break; break;
} }
@@ -150,8 +154,7 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
bounce_qiov.size, BDRV_REQ_MAY_UNMAP); bounce_qiov.size, BDRV_REQ_MAY_UNMAP);
} else { } else {
ret = blk_co_pwritev(job->target, start * job->cluster_size, ret = blk_co_pwritev(job->target, start * job->cluster_size,
bounce_qiov.size, &bounce_qiov, bounce_qiov.size, &bounce_qiov, 0);
job->compress ? BDRV_REQ_WRITE_COMPRESSED : 0);
} }
if (ret < 0) { if (ret < 0) {
trace_backup_do_cow_write_fail(job, start, ret); trace_backup_do_cow_write_fail(job, start, ret);
@@ -243,14 +246,6 @@ static void backup_abort(BlockJob *job)
} }
} }
static void backup_clean(BlockJob *job)
{
BackupBlockJob *s = container_of(job, BackupBlockJob, common);
assert(s->target);
blk_unref(s->target);
s->target = NULL;
}
static void backup_attached_aio_context(BlockJob *job, AioContext *aio_context) static void backup_attached_aio_context(BlockJob *job, AioContext *aio_context)
{ {
BackupBlockJob *s = container_of(job, BackupBlockJob, common); BackupBlockJob *s = container_of(job, BackupBlockJob, common);
@@ -258,71 +253,14 @@ static void backup_attached_aio_context(BlockJob *job, AioContext *aio_context)
blk_set_aio_context(s->target, aio_context); blk_set_aio_context(s->target, aio_context);
} }
void backup_do_checkpoint(BlockJob *job, Error **errp) static const BlockJobDriver backup_job_driver = {
{ .instance_size = sizeof(BackupBlockJob),
BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common); .job_type = BLOCK_JOB_TYPE_BACKUP,
int64_t len; .set_speed = backup_set_speed,
.commit = backup_commit,
assert(job->driver->job_type == BLOCK_JOB_TYPE_BACKUP); .abort = backup_abort,
.attached_aio_context = backup_attached_aio_context,
if (backup_job->sync_mode != MIRROR_SYNC_MODE_NONE) { };
error_setg(errp, "The backup job only supports block checkpoint in"
" sync=none mode");
return;
}
len = DIV_ROUND_UP(backup_job->common.len, backup_job->cluster_size);
bitmap_zero(backup_job->done_bitmap, len);
}
void backup_wait_for_overlapping_requests(BlockJob *job, int64_t sector_num,
int nb_sectors)
{
BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
int64_t sectors_per_cluster = cluster_size_sectors(backup_job);
int64_t start, end;
assert(job->driver->job_type == BLOCK_JOB_TYPE_BACKUP);
start = sector_num / sectors_per_cluster;
end = DIV_ROUND_UP(sector_num + nb_sectors, sectors_per_cluster);
wait_for_overlapping_requests(backup_job, start, end);
}
void backup_cow_request_begin(CowRequest *req, BlockJob *job,
int64_t sector_num,
int nb_sectors)
{
BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
int64_t sectors_per_cluster = cluster_size_sectors(backup_job);
int64_t start, end;
assert(job->driver->job_type == BLOCK_JOB_TYPE_BACKUP);
start = sector_num / sectors_per_cluster;
end = DIV_ROUND_UP(sector_num + nb_sectors, sectors_per_cluster);
cow_request_begin(req, backup_job, start, end);
}
void backup_cow_request_end(CowRequest *req)
{
cow_request_end(req);
}
static void backup_drain(BlockJob *job)
{
BackupBlockJob *s = container_of(job, BackupBlockJob, common);
/* Need to keep a reference in case blk_drain triggers execution
* of backup_complete...
*/
if (s->target) {
BlockBackend *target = s->target;
blk_ref(target);
blk_drain(target);
blk_unref(target);
}
}
static BlockErrorAction backup_error_action(BackupBlockJob *job, static BlockErrorAction backup_error_action(BackupBlockJob *job,
bool read, int error) bool read, int error)
@@ -342,8 +280,11 @@ typedef struct {
static void backup_complete(BlockJob *job, void *opaque) static void backup_complete(BlockJob *job, void *opaque)
{ {
BackupBlockJob *s = container_of(job, BackupBlockJob, common);
BackupCompleteData *data = opaque; BackupCompleteData *data = opaque;
blk_unref(s->target);
block_job_completed(job, data->ret); block_job_completed(job, data->ret);
g_free(data); g_free(data);
} }
@@ -384,14 +325,14 @@ static int coroutine_fn backup_run_incremental(BackupBlockJob *job)
int64_t end; int64_t end;
int64_t last_cluster = -1; int64_t last_cluster = -1;
int64_t sectors_per_cluster = cluster_size_sectors(job); int64_t sectors_per_cluster = cluster_size_sectors(job);
BdrvDirtyBitmapIter *dbi; HBitmapIter hbi;
granularity = bdrv_dirty_bitmap_granularity(job->sync_bitmap); granularity = bdrv_dirty_bitmap_granularity(job->sync_bitmap);
clusters_per_iter = MAX((granularity / job->cluster_size), 1); clusters_per_iter = MAX((granularity / job->cluster_size), 1);
dbi = bdrv_dirty_iter_new(job->sync_bitmap, 0); bdrv_dirty_iter_init(job->sync_bitmap, &hbi);
/* Find the next dirty sector(s) */ /* Find the next dirty sector(s) */
while ((sector = bdrv_dirty_iter_next(dbi)) != -1) { while ((sector = hbitmap_iter_next(&hbi)) != -1) {
cluster = sector / sectors_per_cluster; cluster = sector / sectors_per_cluster;
/* Fake progress updates for any clusters we skipped */ /* Fake progress updates for any clusters we skipped */
@@ -403,7 +344,7 @@ static int coroutine_fn backup_run_incremental(BackupBlockJob *job)
for (end = cluster + clusters_per_iter; cluster < end; cluster++) { for (end = cluster + clusters_per_iter; cluster < end; cluster++) {
do { do {
if (yield_and_check(job)) { if (yield_and_check(job)) {
goto out; return ret;
} }
ret = backup_do_cow(job, cluster * sectors_per_cluster, ret = backup_do_cow(job, cluster * sectors_per_cluster,
sectors_per_cluster, &error_is_read, sectors_per_cluster, &error_is_read,
@@ -411,7 +352,7 @@ static int coroutine_fn backup_run_incremental(BackupBlockJob *job)
if ((ret < 0) && if ((ret < 0) &&
backup_error_action(job, error_is_read, -ret) == backup_error_action(job, error_is_read, -ret) ==
BLOCK_ERROR_ACTION_REPORT) { BLOCK_ERROR_ACTION_REPORT) {
goto out; return ret;
} }
} while (ret < 0); } while (ret < 0);
} }
@@ -419,7 +360,7 @@ static int coroutine_fn backup_run_incremental(BackupBlockJob *job)
/* If the bitmap granularity is smaller than the backup granularity, /* If the bitmap granularity is smaller than the backup granularity,
* we need to advance the iterator pointer to the next cluster. */ * we need to advance the iterator pointer to the next cluster. */
if (granularity < job->cluster_size) { if (granularity < job->cluster_size) {
bdrv_set_dirty_iter(dbi, cluster * sectors_per_cluster); bdrv_set_dirty_iter(&hbi, cluster * sectors_per_cluster);
} }
last_cluster = cluster - 1; last_cluster = cluster - 1;
@@ -431,8 +372,6 @@ static int coroutine_fn backup_run_incremental(BackupBlockJob *job)
job->common.offset += ((end - last_cluster - 1) * job->cluster_size); job->common.offset += ((end - last_cluster - 1) * job->cluster_size);
} }
out:
bdrv_dirty_iter_free(dbi);
return ret; return ret;
} }
@@ -441,6 +380,7 @@ static void coroutine_fn backup_run(void *opaque)
BackupBlockJob *job = opaque; BackupBlockJob *job = opaque;
BackupCompleteData *data; BackupCompleteData *data;
BlockDriverState *bs = blk_bs(job->common.blk); BlockDriverState *bs = blk_bs(job->common.blk);
BlockBackend *target = job->target;
int64_t start, end; int64_t start, end;
int64_t sectors_per_cluster = cluster_size_sectors(job); int64_t sectors_per_cluster = cluster_size_sectors(job);
int ret = 0; int ret = 0;
@@ -468,14 +408,13 @@ static void coroutine_fn backup_run(void *opaque)
/* Both FULL and TOP SYNC_MODE's require copying.. */ /* Both FULL and TOP SYNC_MODE's require copying.. */
for (; start < end; start++) { for (; start < end; start++) {
bool error_is_read; bool error_is_read;
int alloced = 0;
if (yield_and_check(job)) { if (yield_and_check(job)) {
break; break;
} }
if (job->sync_mode == MIRROR_SYNC_MODE_TOP) { if (job->sync_mode == MIRROR_SYNC_MODE_TOP) {
int i, n; int i, n;
int alloced = 0;
/* Check to see if these blocks are already in the /* Check to see if these blocks are already in the
* backing file. */ * backing file. */
@@ -493,7 +432,7 @@ static void coroutine_fn backup_run(void *opaque)
sectors_per_cluster - i, &n); sectors_per_cluster - i, &n);
i += n; i += n;
if (alloced || n == 0) { if (alloced == 1 || n == 0) {
break; break;
} }
} }
@@ -505,13 +444,8 @@ static void coroutine_fn backup_run(void *opaque)
} }
} }
/* FULL sync mode we copy the whole drive. */ /* FULL sync mode we copy the whole drive. */
if (alloced < 0) {
ret = alloced;
} else {
ret = backup_do_cow(job, start * sectors_per_cluster, ret = backup_do_cow(job, start * sectors_per_cluster,
sectors_per_cluster, &error_is_read, sectors_per_cluster, &error_is_read, false);
false);
}
if (ret < 0) { if (ret < 0) {
/* Depending on error action, fail now or retry cluster */ /* Depending on error action, fail now or retry cluster */
BlockErrorAction action = BlockErrorAction action =
@@ -533,30 +467,18 @@ static void coroutine_fn backup_run(void *opaque)
qemu_co_rwlock_unlock(&job->flush_rwlock); qemu_co_rwlock_unlock(&job->flush_rwlock);
g_free(job->done_bitmap); g_free(job->done_bitmap);
bdrv_op_unblock_all(blk_bs(target), job->common.blocker);
data = g_malloc(sizeof(*data)); data = g_malloc(sizeof(*data));
data->ret = ret; data->ret = ret;
block_job_defer_to_main_loop(&job->common, backup_complete, data); block_job_defer_to_main_loop(&job->common, backup_complete, data);
} }
static const BlockJobDriver backup_job_driver = { void backup_start(const char *job_id, BlockDriverState *bs,
.instance_size = sizeof(BackupBlockJob),
.job_type = BLOCK_JOB_TYPE_BACKUP,
.start = backup_run,
.set_speed = backup_set_speed,
.commit = backup_commit,
.abort = backup_abort,
.clean = backup_clean,
.attached_aio_context = backup_attached_aio_context,
.drain = backup_drain,
};
BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
BlockDriverState *target, int64_t speed, BlockDriverState *target, int64_t speed,
MirrorSyncMode sync_mode, BdrvDirtyBitmap *sync_bitmap, MirrorSyncMode sync_mode, BdrvDirtyBitmap *sync_bitmap,
bool compress,
BlockdevOnError on_source_error, BlockdevOnError on_source_error,
BlockdevOnError on_target_error, BlockdevOnError on_target_error,
int creation_flags,
BlockCompletionFunc *cb, void *opaque, BlockCompletionFunc *cb, void *opaque,
BlockJobTxn *txn, Error **errp) BlockJobTxn *txn, Error **errp)
{ {
@@ -570,52 +492,46 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
if (bs == target) { if (bs == target) {
error_setg(errp, "Source and target cannot be the same"); error_setg(errp, "Source and target cannot be the same");
return NULL; return;
} }
if (!bdrv_is_inserted(bs)) { if (!bdrv_is_inserted(bs)) {
error_setg(errp, "Device is not inserted: %s", error_setg(errp, "Device is not inserted: %s",
bdrv_get_device_name(bs)); bdrv_get_device_name(bs));
return NULL; return;
} }
if (!bdrv_is_inserted(target)) { if (!bdrv_is_inserted(target)) {
error_setg(errp, "Device is not inserted: %s", error_setg(errp, "Device is not inserted: %s",
bdrv_get_device_name(target)); bdrv_get_device_name(target));
return NULL; return;
}
if (compress && target->drv->bdrv_co_pwritev_compressed == NULL) {
error_setg(errp, "Compression is not supported for this drive %s",
bdrv_get_device_name(target));
return NULL;
} }
if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKUP_SOURCE, errp)) { if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKUP_SOURCE, errp)) {
return NULL; return;
} }
if (bdrv_op_is_blocked(target, BLOCK_OP_TYPE_BACKUP_TARGET, errp)) { if (bdrv_op_is_blocked(target, BLOCK_OP_TYPE_BACKUP_TARGET, errp)) {
return NULL; return;
} }
if (sync_mode == MIRROR_SYNC_MODE_INCREMENTAL) { if (sync_mode == MIRROR_SYNC_MODE_INCREMENTAL) {
if (!sync_bitmap) { if (!sync_bitmap) {
error_setg(errp, "must provide a valid bitmap name for " error_setg(errp, "must provide a valid bitmap name for "
"\"incremental\" sync mode"); "\"incremental\" sync mode");
return NULL; return;
} }
/* Create a new bitmap, and freeze/disable this one. */ /* Create a new bitmap, and freeze/disable this one. */
if (bdrv_dirty_bitmap_create_successor(bs, sync_bitmap, errp) < 0) { if (bdrv_dirty_bitmap_create_successor(bs, sync_bitmap, errp) < 0) {
return NULL; return;
} }
} else if (sync_bitmap) { } else if (sync_bitmap) {
error_setg(errp, error_setg(errp,
"a sync_bitmap was provided to backup_run, " "a sync_bitmap was provided to backup_run, "
"but received an incompatible sync_mode (%s)", "but received an incompatible sync_mode (%s)",
MirrorSyncMode_lookup[sync_mode]); MirrorSyncMode_lookup[sync_mode]);
return NULL; return;
} }
len = bdrv_getlength(bs); len = bdrv_getlength(bs);
@@ -625,46 +541,26 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
goto error; goto error;
} }
/* job->common.len is fixed, so we can't allow resize */ job = block_job_create(job_id, &backup_job_driver, bs, speed,
job = block_job_create(job_id, &backup_job_driver, bs, cb, opaque, errp);
BLK_PERM_CONSISTENT_READ,
BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE |
BLK_PERM_WRITE_UNCHANGED | BLK_PERM_GRAPH_MOD,
speed, creation_flags, cb, opaque, errp);
if (!job) { if (!job) {
goto error; goto error;
} }
/* The target must match the source in size, so no resize here either */ job->target = blk_new();
job->target = blk_new(BLK_PERM_WRITE, blk_insert_bs(job->target, target);
BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE |
BLK_PERM_WRITE_UNCHANGED | BLK_PERM_GRAPH_MOD);
ret = blk_insert_bs(job->target, target, errp);
if (ret < 0) {
goto error;
}
job->on_source_error = on_source_error; job->on_source_error = on_source_error;
job->on_target_error = on_target_error; job->on_target_error = on_target_error;
job->sync_mode = sync_mode; job->sync_mode = sync_mode;
job->sync_bitmap = sync_mode == MIRROR_SYNC_MODE_INCREMENTAL ? job->sync_bitmap = sync_mode == MIRROR_SYNC_MODE_INCREMENTAL ?
sync_bitmap : NULL; sync_bitmap : NULL;
job->compress = compress;
/* If there is no backing file on the target, we cannot rely on COW if our /* If there is no backing file on the target, we cannot rely on COW if our
* backup cluster size is smaller than the target cluster size. Even for * backup cluster size is smaller than the target cluster size. Even for
* targets with a backing file, try to avoid COW if possible. */ * targets with a backing file, try to avoid COW if possible. */
ret = bdrv_get_info(target, &bdi); ret = bdrv_get_info(target, &bdi);
if (ret == -ENOTSUP && !target->backing) { if (ret < 0 && !target->backing) {
/* Cluster size is not defined */
error_report("WARNING: The target block device doesn't provide "
"information about the block size and it doesn't have a "
"backing file. The default block size of %u bytes is "
"used. If the actual block size of the target exceeds "
"this default, the backup may be unusable",
BACKUP_CLUSTER_SIZE_DEFAULT);
job->cluster_size = BACKUP_CLUSTER_SIZE_DEFAULT;
} else if (ret < 0 && !target->backing) {
error_setg_errno(errp, -ret, error_setg_errno(errp, -ret,
"Couldn't determine the cluster size of the target image, " "Couldn't determine the cluster size of the target image, "
"which has no backing file"); "which has no backing file");
@@ -678,22 +574,19 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
job->cluster_size = MAX(BACKUP_CLUSTER_SIZE_DEFAULT, bdi.cluster_size); job->cluster_size = MAX(BACKUP_CLUSTER_SIZE_DEFAULT, bdi.cluster_size);
} }
/* Required permissions are already taken with target's blk_new() */ bdrv_op_block_all(target, job->common.blocker);
block_job_add_bdrv(&job->common, "target", target, 0, BLK_PERM_ALL,
&error_abort);
job->common.len = len; job->common.len = len;
job->common.co = qemu_coroutine_create(backup_run, job);
block_job_txn_add_job(txn, &job->common); block_job_txn_add_job(txn, &job->common);
qemu_coroutine_enter(job->common.co);
return &job->common; return;
error: error:
if (sync_bitmap) { if (sync_bitmap) {
bdrv_reclaim_dirty_bitmap(bs, sync_bitmap, NULL); bdrv_reclaim_dirty_bitmap(bs, sync_bitmap, NULL);
} }
if (job) { if (job) {
backup_clean(&job->common); blk_unref(job->target);
block_job_early_fail(&job->common); block_job_unref(&job->common);
} }
return NULL;
} }

View File

@@ -1,7 +1,6 @@
/* /*
* Block protocol for I/O error injection * Block protocol for I/O error injection
* *
* Copyright (C) 2016-2017 Red Hat, Inc.
* Copyright (c) 2010 Kevin Wolf <kwolf@redhat.com> * Copyright (c) 2010 Kevin Wolf <kwolf@redhat.com>
* *
* Permission is hereby granted, free of charge, to any person obtaining a copy * Permission is hereby granted, free of charge, to any person obtaining a copy
@@ -38,12 +37,7 @@
typedef struct BDRVBlkdebugState { typedef struct BDRVBlkdebugState {
int state; int state;
int new_state; int new_state;
uint64_t align; int align;
uint64_t max_transfer;
uint64_t opt_write_zero;
uint64_t max_write_zero;
uint64_t opt_discard;
uint64_t max_discard;
/* For blkdebug_refresh_filename() */ /* For blkdebug_refresh_filename() */
char *config_file; char *config_file;
@@ -55,6 +49,7 @@ typedef struct BDRVBlkdebugState {
typedef struct BlkdebugAIOCB { typedef struct BlkdebugAIOCB {
BlockAIOCB common; BlockAIOCB common;
QEMUBH *bh;
int ret; int ret;
} BlkdebugAIOCB; } BlkdebugAIOCB;
@@ -64,6 +59,10 @@ typedef struct BlkdebugSuspendedReq {
QLIST_ENTRY(BlkdebugSuspendedReq) next; QLIST_ENTRY(BlkdebugSuspendedReq) next;
} BlkdebugSuspendedReq; } BlkdebugSuspendedReq;
static const AIOCBInfo blkdebug_aiocb_info = {
.aiocb_size = sizeof(BlkdebugAIOCB),
};
enum { enum {
ACTION_INJECT_ERROR, ACTION_INJECT_ERROR,
ACTION_SET_STATE, ACTION_SET_STATE,
@@ -79,7 +78,7 @@ typedef struct BlkdebugRule {
int error; int error;
int immediately; int immediately;
int once; int once;
int64_t offset; int64_t sector;
} inject; } inject;
struct { struct {
int new_state; int new_state;
@@ -176,7 +175,6 @@ static int add_rule(void *opaque, QemuOpts *opts, Error **errp)
const char* event_name; const char* event_name;
BlkdebugEvent event; BlkdebugEvent event;
struct BlkdebugRule *rule; struct BlkdebugRule *rule;
int64_t sector;
/* Find the right event for the rule */ /* Find the right event for the rule */
event_name = qemu_opt_get(opts, "event"); event_name = qemu_opt_get(opts, "event");
@@ -203,9 +201,7 @@ static int add_rule(void *opaque, QemuOpts *opts, Error **errp)
rule->options.inject.once = qemu_opt_get_bool(opts, "once", 0); rule->options.inject.once = qemu_opt_get_bool(opts, "once", 0);
rule->options.inject.immediately = rule->options.inject.immediately =
qemu_opt_get_bool(opts, "immediately", 0); qemu_opt_get_bool(opts, "immediately", 0);
sector = qemu_opt_get_number(opts, "sector", -1); rule->options.inject.sector = qemu_opt_get_number(opts, "sector", -1);
rule->options.inject.offset =
sector == -1 ? -1 : sector * BDRV_SECTOR_SIZE;
break; break;
case ACTION_SET_STATE: case ACTION_SET_STATE:
@@ -307,7 +303,7 @@ static void blkdebug_parse_filename(const char *filename, QDict *options,
if (!strstart(filename, "blkdebug:", &filename)) { if (!strstart(filename, "blkdebug:", &filename)) {
/* There was no prefix; therefore, all options have to be already /* There was no prefix; therefore, all options have to be already
present in the QDict (except for the filename) */ present in the QDict (except for the filename) */
qdict_put_str(options, "x-image", filename); qdict_put(options, "x-image", qstring_from_str(filename));
return; return;
} }
@@ -326,7 +322,7 @@ static void blkdebug_parse_filename(const char *filename, QDict *options,
/* TODO Allow multi-level nesting and set file.filename here */ /* TODO Allow multi-level nesting and set file.filename here */
filename = c + 1; filename = c + 1;
qdict_put_str(options, "x-image", filename); qdict_put(options, "x-image", qstring_from_str(filename));
} }
static QemuOptsList runtime_opts = { static QemuOptsList runtime_opts = {
@@ -348,31 +344,6 @@ static QemuOptsList runtime_opts = {
.type = QEMU_OPT_SIZE, .type = QEMU_OPT_SIZE,
.help = "Required alignment in bytes", .help = "Required alignment in bytes",
}, },
{
.name = "max-transfer",
.type = QEMU_OPT_SIZE,
.help = "Maximum transfer size in bytes",
},
{
.name = "opt-write-zero",
.type = QEMU_OPT_SIZE,
.help = "Optimum write zero alignment in bytes",
},
{
.name = "max-write-zero",
.type = QEMU_OPT_SIZE,
.help = "Maximum write zero size in bytes",
},
{
.name = "opt-discard",
.type = QEMU_OPT_SIZE,
.help = "Optimum discard alignment in bytes",
},
{
.name = "max-discard",
.type = QEMU_OPT_SIZE,
.help = "Maximum discard size in bytes",
},
{ /* end of list */ } { /* end of list */ }
}, },
}; };
@@ -383,8 +354,8 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags,
BDRVBlkdebugState *s = bs->opaque; BDRVBlkdebugState *s = bs->opaque;
QemuOpts *opts; QemuOpts *opts;
Error *local_err = NULL; Error *local_err = NULL;
int ret;
uint64_t align; uint64_t align;
int ret;
opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort); opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort);
qemu_opts_absorb_qdict(opts, options, &local_err); qemu_opts_absorb_qdict(opts, options, &local_err);
@@ -413,69 +384,21 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags,
goto out; goto out;
} }
bs->supported_write_flags = BDRV_REQ_FUA & /* Set request alignment */
bs->file->bs->supported_write_flags; align = qemu_opt_get_size(opts, "align", 0);
bs->supported_zero_flags = (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) & if (align < INT_MAX && is_power_of_2(align)) {
bs->file->bs->supported_zero_flags; s->align = align;
} else if (align) {
error_setg(errp, "Invalid alignment");
ret = -EINVAL; ret = -EINVAL;
goto fail_unref;
/* Set alignment overrides */
s->align = qemu_opt_get_size(opts, "align", 0);
if (s->align && (s->align >= INT_MAX || !is_power_of_2(s->align))) {
error_setg(errp, "Cannot meet constraints with align %" PRIu64,
s->align);
goto out;
}
align = MAX(s->align, bs->file->bs->bl.request_alignment);
s->max_transfer = qemu_opt_get_size(opts, "max-transfer", 0);
if (s->max_transfer &&
(s->max_transfer >= INT_MAX ||
!QEMU_IS_ALIGNED(s->max_transfer, align))) {
error_setg(errp, "Cannot meet constraints with max-transfer %" PRIu64,
s->max_transfer);
goto out;
}
s->opt_write_zero = qemu_opt_get_size(opts, "opt-write-zero", 0);
if (s->opt_write_zero &&
(s->opt_write_zero >= INT_MAX ||
!QEMU_IS_ALIGNED(s->opt_write_zero, align))) {
error_setg(errp, "Cannot meet constraints with opt-write-zero %" PRIu64,
s->opt_write_zero);
goto out;
}
s->max_write_zero = qemu_opt_get_size(opts, "max-write-zero", 0);
if (s->max_write_zero &&
(s->max_write_zero >= INT_MAX ||
!QEMU_IS_ALIGNED(s->max_write_zero,
MAX(s->opt_write_zero, align)))) {
error_setg(errp, "Cannot meet constraints with max-write-zero %" PRIu64,
s->max_write_zero);
goto out;
}
s->opt_discard = qemu_opt_get_size(opts, "opt-discard", 0);
if (s->opt_discard &&
(s->opt_discard >= INT_MAX ||
!QEMU_IS_ALIGNED(s->opt_discard, align))) {
error_setg(errp, "Cannot meet constraints with opt-discard %" PRIu64,
s->opt_discard);
goto out;
}
s->max_discard = qemu_opt_get_size(opts, "max-discard", 0);
if (s->max_discard &&
(s->max_discard >= INT_MAX ||
!QEMU_IS_ALIGNED(s->max_discard,
MAX(s->opt_discard, align)))) {
error_setg(errp, "Cannot meet constraints with max-discard %" PRIu64,
s->max_discard);
goto out;
} }
ret = 0; ret = 0;
goto out;
fail_unref:
bdrv_unref_child(bs, bs->file);
out: out:
if (ret < 0) { if (ret < 0) {
g_free(s->config_file); g_free(s->config_file);
@@ -484,163 +407,107 @@ out:
return ret; return ret;
} }
static int rule_check(BlockDriverState *bs, uint64_t offset, uint64_t bytes) static void error_callback_bh(void *opaque)
{
struct BlkdebugAIOCB *acb = opaque;
qemu_bh_delete(acb->bh);
acb->common.cb(acb->common.opaque, acb->ret);
qemu_aio_unref(acb);
}
static BlockAIOCB *inject_error(BlockDriverState *bs,
BlockCompletionFunc *cb, void *opaque, BlkdebugRule *rule)
{ {
BDRVBlkdebugState *s = bs->opaque; BDRVBlkdebugState *s = bs->opaque;
BlkdebugRule *rule = NULL; int error = rule->options.inject.error;
int error; struct BlkdebugAIOCB *acb;
bool immediately; QEMUBH *bh;
bool immediately = rule->options.inject.immediately;
QSIMPLEQ_FOREACH(rule, &s->active_rules, active_next) {
uint64_t inject_offset = rule->options.inject.offset;
if (inject_offset == -1 ||
(bytes && inject_offset >= offset &&
inject_offset < offset + bytes))
{
break;
}
}
if (!rule || !rule->options.inject.error) {
return 0;
}
immediately = rule->options.inject.immediately;
error = rule->options.inject.error;
if (rule->options.inject.once) { if (rule->options.inject.once) {
QSIMPLEQ_REMOVE(&s->active_rules, rule, BlkdebugRule, active_next); QSIMPLEQ_REMOVE(&s->active_rules, rule, BlkdebugRule, active_next);
remove_rule(rule); remove_rule(rule);
} }
if (!immediately) { if (immediately) {
aio_co_schedule(qemu_get_current_aio_context(), qemu_coroutine_self()); return NULL;
qemu_coroutine_yield();
} }
return -error; acb = qemu_aio_get(&blkdebug_aiocb_info, bs, cb, opaque);
acb->ret = -error;
bh = aio_bh_new(bdrv_get_aio_context(bs), error_callback_bh, acb);
acb->bh = bh;
qemu_bh_schedule(bh);
return &acb->common;
} }
static int coroutine_fn static BlockAIOCB *blkdebug_aio_readv(BlockDriverState *bs,
blkdebug_co_preadv(BlockDriverState *bs, uint64_t offset, uint64_t bytes, int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
QEMUIOVector *qiov, int flags) BlockCompletionFunc *cb, void *opaque)
{ {
int err; BDRVBlkdebugState *s = bs->opaque;
BlkdebugRule *rule = NULL;
/* Sanity check block layer guarantees */ QSIMPLEQ_FOREACH(rule, &s->active_rules, active_next) {
assert(QEMU_IS_ALIGNED(offset, bs->bl.request_alignment)); if (rule->options.inject.sector == -1 ||
assert(QEMU_IS_ALIGNED(bytes, bs->bl.request_alignment)); (rule->options.inject.sector >= sector_num &&
if (bs->bl.max_transfer) { rule->options.inject.sector < sector_num + nb_sectors)) {
assert(bytes <= bs->bl.max_transfer); break;
}
} }
err = rule_check(bs, offset, bytes); if (rule && rule->options.inject.error) {
if (err) { return inject_error(bs, cb, opaque, rule);
return err;
} }
return bdrv_co_preadv(bs->file, offset, bytes, qiov, flags); return bdrv_aio_readv(bs->file, sector_num, qiov, nb_sectors,
cb, opaque);
} }
static int coroutine_fn static BlockAIOCB *blkdebug_aio_writev(BlockDriverState *bs,
blkdebug_co_pwritev(BlockDriverState *bs, uint64_t offset, uint64_t bytes, int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
QEMUIOVector *qiov, int flags) BlockCompletionFunc *cb, void *opaque)
{ {
int err; BDRVBlkdebugState *s = bs->opaque;
BlkdebugRule *rule = NULL;
/* Sanity check block layer guarantees */ QSIMPLEQ_FOREACH(rule, &s->active_rules, active_next) {
assert(QEMU_IS_ALIGNED(offset, bs->bl.request_alignment)); if (rule->options.inject.sector == -1 ||
assert(QEMU_IS_ALIGNED(bytes, bs->bl.request_alignment)); (rule->options.inject.sector >= sector_num &&
if (bs->bl.max_transfer) { rule->options.inject.sector < sector_num + nb_sectors)) {
assert(bytes <= bs->bl.max_transfer); break;
}
} }
err = rule_check(bs, offset, bytes); if (rule && rule->options.inject.error) {
if (err) { return inject_error(bs, cb, opaque, rule);
return err;
} }
return bdrv_co_pwritev(bs->file, offset, bytes, qiov, flags); return bdrv_aio_writev(bs->file, sector_num, qiov, nb_sectors,
cb, opaque);
} }
static int blkdebug_co_flush(BlockDriverState *bs) static BlockAIOCB *blkdebug_aio_flush(BlockDriverState *bs,
BlockCompletionFunc *cb, void *opaque)
{ {
int err = rule_check(bs, 0, 0); BDRVBlkdebugState *s = bs->opaque;
BlkdebugRule *rule = NULL;
if (err) { QSIMPLEQ_FOREACH(rule, &s->active_rules, active_next) {
return err; if (rule->options.inject.sector == -1) {
break;
}
} }
return bdrv_co_flush(bs->file->bs); if (rule && rule->options.inject.error) {
return inject_error(bs, cb, opaque, rule);
} }
static int coroutine_fn blkdebug_co_pwrite_zeroes(BlockDriverState *bs, return bdrv_aio_flush(bs->file->bs, cb, opaque);
int64_t offset, int count,
BdrvRequestFlags flags)
{
uint32_t align = MAX(bs->bl.request_alignment,
bs->bl.pwrite_zeroes_alignment);
int err;
/* Only pass through requests that are larger than requested
* preferred alignment (so that we test the fallback to writes on
* unaligned portions), and check that the block layer never hands
* us anything unaligned that crosses an alignment boundary. */
if (count < align) {
assert(QEMU_IS_ALIGNED(offset, align) ||
QEMU_IS_ALIGNED(offset + count, align) ||
DIV_ROUND_UP(offset, align) ==
DIV_ROUND_UP(offset + count, align));
return -ENOTSUP;
}
assert(QEMU_IS_ALIGNED(offset, align));
assert(QEMU_IS_ALIGNED(count, align));
if (bs->bl.max_pwrite_zeroes) {
assert(count <= bs->bl.max_pwrite_zeroes);
} }
err = rule_check(bs, offset, count);
if (err) {
return err;
}
return bdrv_co_pwrite_zeroes(bs->file, offset, count, flags);
}
static int coroutine_fn blkdebug_co_pdiscard(BlockDriverState *bs,
int64_t offset, int count)
{
uint32_t align = bs->bl.pdiscard_alignment;
int err;
/* Only pass through requests that are larger than requested
* minimum alignment, and ensure that unaligned requests do not
* cross optimum discard boundaries. */
if (count < bs->bl.request_alignment) {
assert(QEMU_IS_ALIGNED(offset, align) ||
QEMU_IS_ALIGNED(offset + count, align) ||
DIV_ROUND_UP(offset, align) ==
DIV_ROUND_UP(offset + count, align));
return -ENOTSUP;
}
assert(QEMU_IS_ALIGNED(offset, bs->bl.request_alignment));
assert(QEMU_IS_ALIGNED(count, bs->bl.request_alignment));
if (align && count >= align) {
assert(QEMU_IS_ALIGNED(offset, align));
assert(QEMU_IS_ALIGNED(count, align));
}
if (bs->bl.max_pdiscard) {
assert(count <= bs->bl.max_pdiscard);
}
err = rule_check(bs, offset, count);
if (err) {
return err;
}
return bdrv_co_pdiscard(bs->file->bs, offset, count);
}
static void blkdebug_close(BlockDriverState *bs) static void blkdebug_close(BlockDriverState *bs)
{ {
@@ -812,9 +679,9 @@ static int64_t blkdebug_getlength(BlockDriverState *bs)
return bdrv_getlength(bs->file->bs); return bdrv_getlength(bs->file->bs);
} }
static int blkdebug_truncate(BlockDriverState *bs, int64_t offset, Error **errp) static int blkdebug_truncate(BlockDriverState *bs, int64_t offset)
{ {
return bdrv_truncate(bs->file, offset, errp); return bdrv_truncate(bs->file->bs, offset);
} }
static void blkdebug_refresh_filename(BlockDriverState *bs, QDict *options) static void blkdebug_refresh_filename(BlockDriverState *bs, QDict *options)
@@ -846,10 +713,10 @@ static void blkdebug_refresh_filename(BlockDriverState *bs, QDict *options)
} }
opts = qdict_new(); opts = qdict_new();
qdict_put_str(opts, "driver", "blkdebug"); qdict_put_obj(opts, "driver", QOBJECT(qstring_from_str("blkdebug")));
QINCREF(bs->file->bs->full_open_options); QINCREF(bs->file->bs->full_open_options);
qdict_put(opts, "image", bs->file->bs->full_open_options); qdict_put_obj(opts, "image", QOBJECT(bs->file->bs->full_open_options));
for (e = qdict_first(options); e; e = qdict_next(options, e)) { for (e = qdict_first(options); e; e = qdict_next(options, e)) {
if (strcmp(qdict_entry_key(e), "x-image")) { if (strcmp(qdict_entry_key(e), "x-image")) {
@@ -868,21 +735,6 @@ static void blkdebug_refresh_limits(BlockDriverState *bs, Error **errp)
if (s->align) { if (s->align) {
bs->bl.request_alignment = s->align; bs->bl.request_alignment = s->align;
} }
if (s->max_transfer) {
bs->bl.max_transfer = s->max_transfer;
}
if (s->opt_write_zero) {
bs->bl.pwrite_zeroes_alignment = s->opt_write_zero;
}
if (s->max_write_zero) {
bs->bl.max_pwrite_zeroes = s->max_write_zero;
}
if (s->opt_discard) {
bs->bl.pdiscard_alignment = s->opt_discard;
}
if (s->max_discard) {
bs->bl.max_pdiscard = s->max_discard;
}
} }
static int blkdebug_reopen_prepare(BDRVReopenState *reopen_state, static int blkdebug_reopen_prepare(BDRVReopenState *reopen_state,
@@ -900,18 +752,14 @@ static BlockDriver bdrv_blkdebug = {
.bdrv_file_open = blkdebug_open, .bdrv_file_open = blkdebug_open,
.bdrv_close = blkdebug_close, .bdrv_close = blkdebug_close,
.bdrv_reopen_prepare = blkdebug_reopen_prepare, .bdrv_reopen_prepare = blkdebug_reopen_prepare,
.bdrv_child_perm = bdrv_filter_default_perms,
.bdrv_getlength = blkdebug_getlength, .bdrv_getlength = blkdebug_getlength,
.bdrv_truncate = blkdebug_truncate, .bdrv_truncate = blkdebug_truncate,
.bdrv_refresh_filename = blkdebug_refresh_filename, .bdrv_refresh_filename = blkdebug_refresh_filename,
.bdrv_refresh_limits = blkdebug_refresh_limits, .bdrv_refresh_limits = blkdebug_refresh_limits,
.bdrv_co_preadv = blkdebug_co_preadv, .bdrv_aio_readv = blkdebug_aio_readv,
.bdrv_co_pwritev = blkdebug_co_pwritev, .bdrv_aio_writev = blkdebug_aio_writev,
.bdrv_co_flush_to_disk = blkdebug_co_flush, .bdrv_aio_flush = blkdebug_aio_flush,
.bdrv_co_pwrite_zeroes = blkdebug_co_pwrite_zeroes,
.bdrv_co_pdiscard = blkdebug_co_pdiscard,
.bdrv_debug_event = blkdebug_debug_event, .bdrv_debug_event = blkdebug_debug_event,
.bdrv_debug_breakpoint = blkdebug_debug_breakpoint, .bdrv_debug_breakpoint = blkdebug_debug_breakpoint,

View File

@@ -20,6 +20,11 @@ typedef struct Request {
QEMUBH *bh; QEMUBH *bh;
} Request; } Request;
/* Next request id.
This counter is global, because requests from different
block devices should not get overlapping ids. */
static uint64_t request_id;
static int blkreplay_open(BlockDriverState *bs, QDict *options, int flags, static int blkreplay_open(BlockDriverState *bs, QDict *options, int flags,
Error **errp) Error **errp)
{ {
@@ -37,6 +42,9 @@ static int blkreplay_open(BlockDriverState *bs, QDict *options, int flags,
ret = 0; ret = 0;
fail: fail:
if (ret < 0) {
bdrv_unref_child(bs, bs->file);
}
return ret; return ret;
} }
@@ -57,7 +65,7 @@ static int64_t blkreplay_getlength(BlockDriverState *bs)
static void blkreplay_bh_cb(void *opaque) static void blkreplay_bh_cb(void *opaque)
{ {
Request *req = opaque; Request *req = opaque;
aio_co_wake(req->co); qemu_coroutine_enter(req->co);
qemu_bh_delete(req->bh); qemu_bh_delete(req->bh);
g_free(req); g_free(req);
} }
@@ -76,7 +84,7 @@ static void block_request_create(uint64_t reqid, BlockDriverState *bs,
static int coroutine_fn blkreplay_co_preadv(BlockDriverState *bs, static int coroutine_fn blkreplay_co_preadv(BlockDriverState *bs,
uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags) uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags)
{ {
uint64_t reqid = blkreplay_next_id(); uint64_t reqid = request_id++;
int ret = bdrv_co_preadv(bs->file, offset, bytes, qiov, flags); int ret = bdrv_co_preadv(bs->file, offset, bytes, qiov, flags);
block_request_create(reqid, bs, qemu_coroutine_self()); block_request_create(reqid, bs, qemu_coroutine_self());
qemu_coroutine_yield(); qemu_coroutine_yield();
@@ -87,7 +95,7 @@ static int coroutine_fn blkreplay_co_preadv(BlockDriverState *bs,
static int coroutine_fn blkreplay_co_pwritev(BlockDriverState *bs, static int coroutine_fn blkreplay_co_pwritev(BlockDriverState *bs,
uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags) uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags)
{ {
uint64_t reqid = blkreplay_next_id(); uint64_t reqid = request_id++;
int ret = bdrv_co_pwritev(bs->file, offset, bytes, qiov, flags); int ret = bdrv_co_pwritev(bs->file, offset, bytes, qiov, flags);
block_request_create(reqid, bs, qemu_coroutine_self()); block_request_create(reqid, bs, qemu_coroutine_self());
qemu_coroutine_yield(); qemu_coroutine_yield();
@@ -98,7 +106,7 @@ static int coroutine_fn blkreplay_co_pwritev(BlockDriverState *bs,
static int coroutine_fn blkreplay_co_pwrite_zeroes(BlockDriverState *bs, static int coroutine_fn blkreplay_co_pwrite_zeroes(BlockDriverState *bs,
int64_t offset, int count, BdrvRequestFlags flags) int64_t offset, int count, BdrvRequestFlags flags)
{ {
uint64_t reqid = blkreplay_next_id(); uint64_t reqid = request_id++;
int ret = bdrv_co_pwrite_zeroes(bs->file, offset, count, flags); int ret = bdrv_co_pwrite_zeroes(bs->file, offset, count, flags);
block_request_create(reqid, bs, qemu_coroutine_self()); block_request_create(reqid, bs, qemu_coroutine_self());
qemu_coroutine_yield(); qemu_coroutine_yield();
@@ -109,7 +117,7 @@ static int coroutine_fn blkreplay_co_pwrite_zeroes(BlockDriverState *bs,
static int coroutine_fn blkreplay_co_pdiscard(BlockDriverState *bs, static int coroutine_fn blkreplay_co_pdiscard(BlockDriverState *bs,
int64_t offset, int count) int64_t offset, int count)
{ {
uint64_t reqid = blkreplay_next_id(); uint64_t reqid = request_id++;
int ret = bdrv_co_pdiscard(bs->file->bs, offset, count); int ret = bdrv_co_pdiscard(bs->file->bs, offset, count);
block_request_create(reqid, bs, qemu_coroutine_self()); block_request_create(reqid, bs, qemu_coroutine_self());
qemu_coroutine_yield(); qemu_coroutine_yield();
@@ -119,7 +127,7 @@ static int coroutine_fn blkreplay_co_pdiscard(BlockDriverState *bs,
static int coroutine_fn blkreplay_co_flush(BlockDriverState *bs) static int coroutine_fn blkreplay_co_flush(BlockDriverState *bs)
{ {
uint64_t reqid = blkreplay_next_id(); uint64_t reqid = request_id++;
int ret = bdrv_co_flush(bs->file->bs); int ret = bdrv_co_flush(bs->file->bs);
block_request_create(reqid, bs, qemu_coroutine_self()); block_request_create(reqid, bs, qemu_coroutine_self());
qemu_coroutine_yield(); qemu_coroutine_yield();
@@ -134,7 +142,6 @@ static BlockDriver bdrv_blkreplay = {
.bdrv_file_open = blkreplay_open, .bdrv_file_open = blkreplay_open,
.bdrv_close = blkreplay_close, .bdrv_close = blkreplay_close,
.bdrv_child_perm = bdrv_filter_default_perms,
.bdrv_getlength = blkreplay_getlength, .bdrv_getlength = blkreplay_getlength,
.bdrv_co_preadv = blkreplay_co_preadv, .bdrv_co_preadv = blkreplay_co_preadv,

View File

@@ -19,36 +19,39 @@ typedef struct {
BdrvChild *test_file; BdrvChild *test_file;
} BDRVBlkverifyState; } BDRVBlkverifyState;
typedef struct BlkverifyRequest { typedef struct BlkverifyAIOCB BlkverifyAIOCB;
Coroutine *co; struct BlkverifyAIOCB {
BlockDriverState *bs; BlockAIOCB common;
QEMUBH *bh;
/* Request metadata */ /* Request metadata */
bool is_write; bool is_write;
uint64_t offset; int64_t sector_num;
uint64_t bytes; int nb_sectors;
int flags;
int (*request_fn)(BdrvChild *, int64_t, unsigned int, QEMUIOVector *,
BdrvRequestFlags);
int ret; /* test image result */
int raw_ret; /* raw image result */
int ret; /* first completed request's result */
unsigned int done; /* completion counter */ unsigned int done; /* completion counter */
QEMUIOVector *qiov; /* user I/O vector */ QEMUIOVector *qiov; /* user I/O vector */
QEMUIOVector *raw_qiov; /* cloned I/O vector for raw file */ QEMUIOVector raw_qiov; /* cloned I/O vector for raw file */
} BlkverifyRequest; void *buf; /* buffer for raw file I/O */
static void GCC_FMT_ATTR(2, 3) blkverify_err(BlkverifyRequest *r, void (*verify)(BlkverifyAIOCB *acb);
};
static const AIOCBInfo blkverify_aiocb_info = {
.aiocb_size = sizeof(BlkverifyAIOCB),
};
static void GCC_FMT_ATTR(2, 3) blkverify_err(BlkverifyAIOCB *acb,
const char *fmt, ...) const char *fmt, ...)
{ {
va_list ap; va_list ap;
va_start(ap, fmt); va_start(ap, fmt);
fprintf(stderr, "blkverify: %s offset=%" PRId64 " bytes=%" PRId64 " ", fprintf(stderr, "blkverify: %s sector_num=%" PRId64 " nb_sectors=%d ",
r->is_write ? "write" : "read", r->offset, r->bytes); acb->is_write ? "write" : "read", acb->sector_num,
acb->nb_sectors);
vfprintf(stderr, fmt, ap); vfprintf(stderr, fmt, ap);
fprintf(stderr, "\n"); fprintf(stderr, "\n");
va_end(ap); va_end(ap);
@@ -67,7 +70,7 @@ static void blkverify_parse_filename(const char *filename, QDict *options,
if (!strstart(filename, "blkverify:", &filename)) { if (!strstart(filename, "blkverify:", &filename)) {
/* There was no prefix; therefore, all options have to be already /* There was no prefix; therefore, all options have to be already
present in the QDict (except for the filename) */ present in the QDict (except for the filename) */
qdict_put_str(options, "x-image", filename); qdict_put(options, "x-image", qstring_from_str(filename));
return; return;
} }
@@ -84,7 +87,7 @@ static void blkverify_parse_filename(const char *filename, QDict *options,
/* TODO Allow multi-level nesting and set file.filename here */ /* TODO Allow multi-level nesting and set file.filename here */
filename = c + 1; filename = c + 1;
qdict_put_str(options, "x-image", filename); qdict_put(options, "x-image", qstring_from_str(filename));
} }
static QemuOptsList runtime_opts = { static QemuOptsList runtime_opts = {
@@ -142,6 +145,9 @@ static int blkverify_open(BlockDriverState *bs, QDict *options, int flags,
ret = 0; ret = 0;
fail: fail:
if (ret < 0) {
bdrv_unref_child(bs, bs->file);
}
qemu_opts_del(opts); qemu_opts_del(opts);
return ret; return ret;
} }
@@ -161,106 +167,116 @@ static int64_t blkverify_getlength(BlockDriverState *bs)
return bdrv_getlength(s->test_file->bs); return bdrv_getlength(s->test_file->bs);
} }
static void coroutine_fn blkverify_do_test_req(void *opaque) static BlkverifyAIOCB *blkverify_aio_get(BlockDriverState *bs, bool is_write,
int64_t sector_num, QEMUIOVector *qiov,
int nb_sectors,
BlockCompletionFunc *cb,
void *opaque)
{ {
BlkverifyRequest *r = opaque; BlkverifyAIOCB *acb = qemu_aio_get(&blkverify_aiocb_info, bs, cb, opaque);
BDRVBlkverifyState *s = r->bs->opaque;
r->ret = r->request_fn(s->test_file, r->offset, r->bytes, r->qiov, acb->bh = NULL;
r->flags); acb->is_write = is_write;
r->done++; acb->sector_num = sector_num;
qemu_coroutine_enter_if_inactive(r->co); acb->nb_sectors = nb_sectors;
acb->ret = -EINPROGRESS;
acb->done = 0;
acb->qiov = qiov;
acb->buf = NULL;
acb->verify = NULL;
return acb;
} }
static void coroutine_fn blkverify_do_raw_req(void *opaque) static void blkverify_aio_bh(void *opaque)
{ {
BlkverifyRequest *r = opaque; BlkverifyAIOCB *acb = opaque;
r->raw_ret = r->request_fn(r->bs->file, r->offset, r->bytes, r->raw_qiov, qemu_bh_delete(acb->bh);
r->flags); if (acb->buf) {
r->done++; qemu_iovec_destroy(&acb->raw_qiov);
qemu_coroutine_enter_if_inactive(r->co); qemu_vfree(acb->buf);
}
acb->common.cb(acb->common.opaque, acb->ret);
qemu_aio_unref(acb);
} }
static int coroutine_fn static void blkverify_aio_cb(void *opaque, int ret)
blkverify_co_prwv(BlockDriverState *bs, BlkverifyRequest *r, uint64_t offset,
uint64_t bytes, QEMUIOVector *qiov, QEMUIOVector *raw_qiov,
int flags, bool is_write)
{ {
Coroutine *co_a, *co_b; BlkverifyAIOCB *acb = opaque;
*r = (BlkverifyRequest) { switch (++acb->done) {
.co = qemu_coroutine_self(), case 1:
.bs = bs, acb->ret = ret;
.offset = offset, break;
.bytes = bytes,
.qiov = qiov,
.raw_qiov = raw_qiov,
.flags = flags,
.is_write = is_write,
.request_fn = is_write ? bdrv_co_pwritev : bdrv_co_preadv,
};
co_a = qemu_coroutine_create(blkverify_do_test_req, r); case 2:
co_b = qemu_coroutine_create(blkverify_do_raw_req, r); if (acb->ret != ret) {
blkverify_err(acb, "return value mismatch %d != %d", acb->ret, ret);
qemu_coroutine_enter(co_a);
qemu_coroutine_enter(co_b);
while (r->done < 2) {
qemu_coroutine_yield();
} }
if (r->ret != r->raw_ret) { if (acb->verify) {
blkverify_err(r, "return value mismatch %d != %d", r->ret, r->raw_ret); acb->verify(acb);
} }
return r->ret; acb->bh = aio_bh_new(bdrv_get_aio_context(acb->common.bs),
blkverify_aio_bh, acb);
qemu_bh_schedule(acb->bh);
break;
}
} }
static int coroutine_fn static void blkverify_verify_readv(BlkverifyAIOCB *acb)
blkverify_co_preadv(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
QEMUIOVector *qiov, int flags)
{ {
BlkverifyRequest r; ssize_t offset = qemu_iovec_compare(acb->qiov, &acb->raw_qiov);
QEMUIOVector raw_qiov; if (offset != -1) {
void *buf; blkverify_err(acb, "contents mismatch in sector %" PRId64,
ssize_t cmp_offset; acb->sector_num + (int64_t)(offset / BDRV_SECTOR_SIZE));
int ret; }
buf = qemu_blockalign(bs->file->bs, qiov->size);
qemu_iovec_init(&raw_qiov, qiov->niov);
qemu_iovec_clone(&raw_qiov, qiov, buf);
ret = blkverify_co_prwv(bs, &r, offset, bytes, qiov, &raw_qiov, flags,
false);
cmp_offset = qemu_iovec_compare(qiov, &raw_qiov);
if (cmp_offset != -1) {
blkverify_err(&r, "contents mismatch at offset %" PRId64,
offset + cmp_offset);
} }
qemu_iovec_destroy(&raw_qiov); static BlockAIOCB *blkverify_aio_readv(BlockDriverState *bs,
qemu_vfree(buf); int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
BlockCompletionFunc *cb, void *opaque)
return ret;
}
static int coroutine_fn
blkverify_co_pwritev(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
QEMUIOVector *qiov, int flags)
{ {
BlkverifyRequest r; BDRVBlkverifyState *s = bs->opaque;
return blkverify_co_prwv(bs, &r, offset, bytes, qiov, qiov, flags, true); BlkverifyAIOCB *acb = blkverify_aio_get(bs, false, sector_num, qiov,
nb_sectors, cb, opaque);
acb->verify = blkverify_verify_readv;
acb->buf = qemu_blockalign(bs->file->bs, qiov->size);
qemu_iovec_init(&acb->raw_qiov, acb->qiov->niov);
qemu_iovec_clone(&acb->raw_qiov, qiov, acb->buf);
bdrv_aio_readv(s->test_file, sector_num, qiov, nb_sectors,
blkverify_aio_cb, acb);
bdrv_aio_readv(bs->file, sector_num, &acb->raw_qiov, nb_sectors,
blkverify_aio_cb, acb);
return &acb->common;
} }
static int blkverify_co_flush(BlockDriverState *bs) static BlockAIOCB *blkverify_aio_writev(BlockDriverState *bs,
int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
BlockCompletionFunc *cb, void *opaque)
{
BDRVBlkverifyState *s = bs->opaque;
BlkverifyAIOCB *acb = blkverify_aio_get(bs, true, sector_num, qiov,
nb_sectors, cb, opaque);
bdrv_aio_writev(s->test_file, sector_num, qiov, nb_sectors,
blkverify_aio_cb, acb);
bdrv_aio_writev(bs->file, sector_num, qiov, nb_sectors,
blkverify_aio_cb, acb);
return &acb->common;
}
static BlockAIOCB *blkverify_aio_flush(BlockDriverState *bs,
BlockCompletionFunc *cb,
void *opaque)
{ {
BDRVBlkverifyState *s = bs->opaque; BDRVBlkverifyState *s = bs->opaque;
/* Only flush test file, the raw file is not important */ /* Only flush test file, the raw file is not important */
return bdrv_co_flush(s->test_file->bs); return bdrv_aio_flush(s->test_file->bs, cb, opaque);
} }
static bool blkverify_recurse_is_first_non_filter(BlockDriverState *bs, static bool blkverify_recurse_is_first_non_filter(BlockDriverState *bs,
@@ -288,12 +304,13 @@ static void blkverify_refresh_filename(BlockDriverState *bs, QDict *options)
&& s->test_file->bs->full_open_options) && s->test_file->bs->full_open_options)
{ {
QDict *opts = qdict_new(); QDict *opts = qdict_new();
qdict_put_str(opts, "driver", "blkverify"); qdict_put_obj(opts, "driver", QOBJECT(qstring_from_str("blkverify")));
QINCREF(bs->file->bs->full_open_options); QINCREF(bs->file->bs->full_open_options);
qdict_put(opts, "raw", bs->file->bs->full_open_options); qdict_put_obj(opts, "raw", QOBJECT(bs->file->bs->full_open_options));
QINCREF(s->test_file->bs->full_open_options); QINCREF(s->test_file->bs->full_open_options);
qdict_put(opts, "test", s->test_file->bs->full_open_options); qdict_put_obj(opts, "test",
QOBJECT(s->test_file->bs->full_open_options));
bs->full_open_options = opts; bs->full_open_options = opts;
} }
@@ -316,13 +333,12 @@ static BlockDriver bdrv_blkverify = {
.bdrv_parse_filename = blkverify_parse_filename, .bdrv_parse_filename = blkverify_parse_filename,
.bdrv_file_open = blkverify_open, .bdrv_file_open = blkverify_open,
.bdrv_close = blkverify_close, .bdrv_close = blkverify_close,
.bdrv_child_perm = bdrv_filter_default_perms,
.bdrv_getlength = blkverify_getlength, .bdrv_getlength = blkverify_getlength,
.bdrv_refresh_filename = blkverify_refresh_filename, .bdrv_refresh_filename = blkverify_refresh_filename,
.bdrv_co_preadv = blkverify_co_preadv, .bdrv_aio_readv = blkverify_aio_readv,
.bdrv_co_pwritev = blkverify_co_pwritev, .bdrv_aio_writev = blkverify_aio_writev,
.bdrv_co_flush = blkverify_co_flush, .bdrv_aio_flush = blkverify_aio_flush,
.is_filter = true, .is_filter = true,
.bdrv_recurse_is_first_non_filter = blkverify_recurse_is_first_non_filter, .bdrv_recurse_is_first_non_filter = blkverify_recurse_is_first_non_filter,

View File

@@ -38,7 +38,6 @@ struct BlockBackend {
BlockBackendPublic public; BlockBackendPublic public;
void *dev; /* attached device model, if any */ void *dev; /* attached device model, if any */
bool legacy_dev; /* true if dev is not a DeviceState */
/* TODO change to DeviceState when all users are qdevified */ /* TODO change to DeviceState when all users are qdevified */
const BlockDevOps *dev_ops; const BlockDevOps *dev_ops;
void *dev_opaque; void *dev_opaque;
@@ -59,19 +58,14 @@ struct BlockBackend {
bool iostatus_enabled; bool iostatus_enabled;
BlockDeviceIoStatus iostatus; BlockDeviceIoStatus iostatus;
uint64_t perm;
uint64_t shared_perm;
bool disable_perm;
bool allow_write_beyond_eof; bool allow_write_beyond_eof;
NotifierList remove_bs_notifiers, insert_bs_notifiers; NotifierList remove_bs_notifiers, insert_bs_notifiers;
int quiesce_counter;
}; };
typedef struct BlockBackendAIOCB { typedef struct BlockBackendAIOCB {
BlockAIOCB common; BlockAIOCB common;
QEMUBH *bh;
BlockBackend *blk; BlockBackend *blk;
int ret; int ret;
} BlockBackendAIOCB; } BlockBackendAIOCB;
@@ -83,7 +77,6 @@ static const AIOCBInfo block_backend_aiocb_info = {
static void drive_info_del(DriveInfo *dinfo); static void drive_info_del(DriveInfo *dinfo);
static BlockBackend *bdrv_first_blk(BlockDriverState *bs); static BlockBackend *bdrv_first_blk(BlockDriverState *bs);
static char *blk_get_attached_dev_id(BlockBackend *blk);
/* All BlockBackends */ /* All BlockBackends */
static QTAILQ_HEAD(, BlockBackend) block_backends = static QTAILQ_HEAD(, BlockBackend) block_backends =
@@ -106,120 +99,37 @@ static void blk_root_drained_end(BdrvChild *child);
static void blk_root_change_media(BdrvChild *child, bool load); static void blk_root_change_media(BdrvChild *child, bool load);
static void blk_root_resize(BdrvChild *child); static void blk_root_resize(BdrvChild *child);
static char *blk_root_get_parent_desc(BdrvChild *child)
{
BlockBackend *blk = child->opaque;
char *dev_id;
if (blk->name) {
return g_strdup(blk->name);
}
dev_id = blk_get_attached_dev_id(blk);
if (*dev_id) {
return dev_id;
} else {
/* TODO Callback into the BB owner for something more detailed */
g_free(dev_id);
return g_strdup("a block device");
}
}
static const char *blk_root_get_name(BdrvChild *child) static const char *blk_root_get_name(BdrvChild *child)
{ {
return blk_name(child->opaque); return blk_name(child->opaque);
} }
/*
* Notifies the user of the BlockBackend that migration has completed. qdev
* devices can tighten their permissions in response (specifically revoke
* shared write permissions that we needed for storage migration).
*
* If an error is returned, the VM cannot be allowed to be resumed.
*/
static void blk_root_activate(BdrvChild *child, Error **errp)
{
BlockBackend *blk = child->opaque;
Error *local_err = NULL;
if (!blk->disable_perm) {
return;
}
blk->disable_perm = false;
blk_set_perm(blk, blk->perm, blk->shared_perm, &local_err);
if (local_err) {
error_propagate(errp, local_err);
blk->disable_perm = true;
return;
}
}
static int blk_root_inactivate(BdrvChild *child)
{
BlockBackend *blk = child->opaque;
if (blk->disable_perm) {
return 0;
}
/* Only inactivate BlockBackends for guest devices (which are inactive at
* this point because the VM is stopped) and unattached monitor-owned
* BlockBackends. If there is still any other user like a block job, then
* we simply can't inactivate the image. */
if (!blk->dev && !blk_name(blk)[0]) {
return -EPERM;
}
blk->disable_perm = true;
if (blk->root) {
bdrv_child_try_set_perm(blk->root, 0, BLK_PERM_ALL, &error_abort);
}
return 0;
}
static const BdrvChildRole child_root = { static const BdrvChildRole child_root = {
.inherit_options = blk_root_inherit_options, .inherit_options = blk_root_inherit_options,
.change_media = blk_root_change_media, .change_media = blk_root_change_media,
.resize = blk_root_resize, .resize = blk_root_resize,
.get_name = blk_root_get_name, .get_name = blk_root_get_name,
.get_parent_desc = blk_root_get_parent_desc,
.drained_begin = blk_root_drained_begin, .drained_begin = blk_root_drained_begin,
.drained_end = blk_root_drained_end, .drained_end = blk_root_drained_end,
.activate = blk_root_activate,
.inactivate = blk_root_inactivate,
}; };
/* /*
* Create a new BlockBackend with a reference count of one. * Create a new BlockBackend with a reference count of one.
* * Store an error through @errp on failure, unless it's null.
* @perm is a bitmasks of BLK_PERM_* constants which describes the permissions
* to request for a block driver node that is attached to this BlockBackend.
* @shared_perm is a bitmask which describes which permissions may be granted
* to other users of the attached node.
* Both sets of permissions can be changed later using blk_set_perm().
*
* Return the new BlockBackend on success, null on failure. * Return the new BlockBackend on success, null on failure.
*/ */
BlockBackend *blk_new(uint64_t perm, uint64_t shared_perm) BlockBackend *blk_new(void)
{ {
BlockBackend *blk; BlockBackend *blk;
blk = g_new0(BlockBackend, 1); blk = g_new0(BlockBackend, 1);
blk->refcnt = 1; blk->refcnt = 1;
blk->perm = perm;
blk->shared_perm = shared_perm;
blk_set_enable_write_cache(blk, true); blk_set_enable_write_cache(blk, true);
qemu_co_mutex_init(&blk->public.throttled_reqs_lock);
qemu_co_queue_init(&blk->public.throttled_reqs[0]); qemu_co_queue_init(&blk->public.throttled_reqs[0]);
qemu_co_queue_init(&blk->public.throttled_reqs[1]); qemu_co_queue_init(&blk->public.throttled_reqs[1]);
block_acct_init(&blk->stats);
notifier_list_init(&blk->remove_bs_notifiers); notifier_list_init(&blk->remove_bs_notifiers);
notifier_list_init(&blk->insert_bs_notifiers); notifier_list_init(&blk->insert_bs_notifiers);
@@ -245,38 +155,15 @@ BlockBackend *blk_new_open(const char *filename, const char *reference,
{ {
BlockBackend *blk; BlockBackend *blk;
BlockDriverState *bs; BlockDriverState *bs;
uint64_t perm;
/* blk_new_open() is mainly used in .bdrv_create implementations and the blk = blk_new();
* tools where sharing isn't a concern because the BDS stays private, so we
* just request permission according to the flags.
*
* The exceptions are xen_disk and blockdev_init(); in these cases, the
* caller of blk_new_open() doesn't make use of the permissions, but they
* shouldn't hurt either. We can still share everything here because the
* guest devices will add their own blockers if they can't share. */
perm = BLK_PERM_CONSISTENT_READ;
if (flags & BDRV_O_RDWR) {
perm |= BLK_PERM_WRITE;
}
if (flags & BDRV_O_RESIZE) {
perm |= BLK_PERM_RESIZE;
}
blk = blk_new(perm, BLK_PERM_ALL);
bs = bdrv_open(filename, reference, options, flags, errp); bs = bdrv_open(filename, reference, options, flags, errp);
if (!bs) { if (!bs) {
blk_unref(blk); blk_unref(blk);
return NULL; return NULL;
} }
blk->root = bdrv_root_attach_child(bs, "root", &child_root, blk->root = bdrv_root_attach_child(bs, "root", &child_root, blk);
perm, BLK_PERM_ALL, blk, errp);
if (!blk->root) {
bdrv_unref(bs);
blk_unref(blk);
return NULL;
}
return blk; return blk;
} }
@@ -286,9 +173,6 @@ static void blk_delete(BlockBackend *blk)
assert(!blk->refcnt); assert(!blk->refcnt);
assert(!blk->name); assert(!blk->name);
assert(!blk->dev); assert(!blk->dev);
if (blk->public.throttle_state) {
blk_io_limits_disable(blk);
}
if (blk->root) { if (blk->root) {
blk_remove_bs(blk); blk_remove_bs(blk);
} }
@@ -475,7 +359,7 @@ void monitor_remove_blk(BlockBackend *blk)
* Return @blk's name, a non-null string. * Return @blk's name, a non-null string.
* Returns an empty string iff @blk is not referenced by the monitor. * Returns an empty string iff @blk is not referenced by the monitor.
*/ */
const char *blk_name(const BlockBackend *blk) const char *blk_name(BlockBackend *blk)
{ {
return blk->name ?: ""; return blk->name ?: "";
} }
@@ -525,22 +409,6 @@ bool bdrv_has_blk(BlockDriverState *bs)
return bdrv_first_blk(bs) != NULL; return bdrv_first_blk(bs) != NULL;
} }
/*
* Returns true if @bs has only BlockBackends as parents.
*/
bool bdrv_is_root_node(BlockDriverState *bs)
{
BdrvChild *c;
QLIST_FOREACH(c, &bs->parents, next_parent) {
if (c->role != &child_root) {
return false;
}
}
return true;
}
/* /*
* Return @blk's DriveInfo if any, else null. * Return @blk's DriveInfo if any, else null.
*/ */
@@ -611,79 +479,32 @@ void blk_remove_bs(BlockBackend *blk)
/* /*
* Associates a new BlockDriverState with @blk. * Associates a new BlockDriverState with @blk.
*/ */
int blk_insert_bs(BlockBackend *blk, BlockDriverState *bs, Error **errp) void blk_insert_bs(BlockBackend *blk, BlockDriverState *bs)
{ {
blk->root = bdrv_root_attach_child(bs, "root", &child_root,
blk->perm, blk->shared_perm, blk, errp);
if (blk->root == NULL) {
return -EPERM;
}
bdrv_ref(bs); bdrv_ref(bs);
blk->root = bdrv_root_attach_child(bs, "root", &child_root, blk);
notifier_list_notify(&blk->insert_bs_notifiers, blk); notifier_list_notify(&blk->insert_bs_notifiers, blk);
if (blk->public.throttle_state) { if (blk->public.throttle_state) {
throttle_timers_attach_aio_context( throttle_timers_attach_aio_context(
&blk->public.throttle_timers, bdrv_get_aio_context(bs)); &blk->public.throttle_timers, bdrv_get_aio_context(bs));
} }
return 0;
}
/*
* Sets the permission bitmasks that the user of the BlockBackend needs.
*/
int blk_set_perm(BlockBackend *blk, uint64_t perm, uint64_t shared_perm,
Error **errp)
{
int ret;
if (blk->root && !blk->disable_perm) {
ret = bdrv_child_try_set_perm(blk->root, perm, shared_perm, errp);
if (ret < 0) {
return ret;
}
}
blk->perm = perm;
blk->shared_perm = shared_perm;
return 0;
}
void blk_get_perm(BlockBackend *blk, uint64_t *perm, uint64_t *shared_perm)
{
*perm = blk->perm;
*shared_perm = blk->shared_perm;
}
static int blk_do_attach_dev(BlockBackend *blk, void *dev)
{
if (blk->dev) {
return -EBUSY;
}
/* While migration is still incoming, we don't need to apply the
* permissions of guest device BlockBackends. We might still have a block
* job or NBD server writing to the image for storage migration. */
if (runstate_check(RUN_STATE_INMIGRATE)) {
blk->disable_perm = true;
}
blk_ref(blk);
blk->dev = dev;
blk->legacy_dev = false;
blk_iostatus_reset(blk);
return 0;
} }
/* /*
* Attach device model @dev to @blk. * Attach device model @dev to @blk.
* Return 0 on success, -EBUSY when a device model is attached already. * Return 0 on success, -EBUSY when a device model is attached already.
*/ */
int blk_attach_dev(BlockBackend *blk, DeviceState *dev) int blk_attach_dev(BlockBackend *blk, void *dev)
/* TODO change to DeviceState *dev when all users are qdevified */
{ {
return blk_do_attach_dev(blk, dev); if (blk->dev) {
return -EBUSY;
}
blk_ref(blk);
blk->dev = dev;
blk_iostatus_reset(blk);
return 0;
} }
/* /*
@@ -691,12 +512,11 @@ int blk_attach_dev(BlockBackend *blk, DeviceState *dev)
* @blk must not have a device model attached already. * @blk must not have a device model attached already.
* TODO qdevified devices don't use this, remove when devices are qdevified * TODO qdevified devices don't use this, remove when devices are qdevified
*/ */
void blk_attach_dev_legacy(BlockBackend *blk, void *dev) void blk_attach_dev_nofail(BlockBackend *blk, void *dev)
{ {
if (blk_do_attach_dev(blk, dev) < 0) { if (blk_attach_dev(blk, dev) < 0) {
abort(); abort();
} }
blk->legacy_dev = true;
} }
/* /*
@@ -711,7 +531,6 @@ void blk_detach_dev(BlockBackend *blk, void *dev)
blk->dev_ops = NULL; blk->dev_ops = NULL;
blk->dev_opaque = NULL; blk->dev_opaque = NULL;
blk->guest_block_size = 512; blk->guest_block_size = 512;
blk_set_perm(blk, 0, BLK_PERM_ALL, &error_abort);
blk_unref(blk); blk_unref(blk);
} }
@@ -724,42 +543,6 @@ void *blk_get_attached_dev(BlockBackend *blk)
return blk->dev; return blk->dev;
} }
/* Return the qdev ID, or if no ID is assigned the QOM path, of the block
* device attached to the BlockBackend. */
static char *blk_get_attached_dev_id(BlockBackend *blk)
{
DeviceState *dev;
assert(!blk->legacy_dev);
dev = blk->dev;
if (!dev) {
return g_strdup("");
} else if (dev->id) {
return g_strdup(dev->id);
}
return object_get_canonical_path(OBJECT(dev));
}
/*
* Return the BlockBackend which has the device model @dev attached if it
* exists, else null.
*
* @dev must not be null.
*/
BlockBackend *blk_by_dev(void *dev)
{
BlockBackend *blk = NULL;
assert(dev != NULL);
while ((blk = blk_all_next(blk)) != NULL) {
if (blk->dev == dev) {
return blk;
}
}
return NULL;
}
/* /*
* Set @blk's device model callbacks to @ops. * Set @blk's device model callbacks to @ops.
* @opaque is the opaque argument to pass to the callbacks. * @opaque is the opaque argument to pass to the callbacks.
@@ -768,59 +551,35 @@ BlockBackend *blk_by_dev(void *dev)
void blk_set_dev_ops(BlockBackend *blk, const BlockDevOps *ops, void blk_set_dev_ops(BlockBackend *blk, const BlockDevOps *ops,
void *opaque) void *opaque)
{ {
/* All drivers that use blk_set_dev_ops() are qdevified and we want to keep
* it that way, so we can assume blk->dev, if present, is a DeviceState if
* blk->dev_ops is set. Non-device users may use dev_ops without device. */
assert(!blk->legacy_dev);
blk->dev_ops = ops; blk->dev_ops = ops;
blk->dev_opaque = opaque; blk->dev_opaque = opaque;
/* Are we currently quiesced? Should we enforce this right now? */
if (blk->quiesce_counter && ops->drained_begin) {
ops->drained_begin(opaque);
}
} }
/* /*
* Notify @blk's attached device model of media change. * Notify @blk's attached device model of media change.
* * If @load is true, notify of media load.
* If @load is true, notify of media load. This action can fail, meaning that * Else, notify of media eject.
* the medium cannot be loaded. @errp is set then.
*
* If @load is false, notify of media eject. This can never fail.
*
* Also send DEVICE_TRAY_MOVED events as appropriate. * Also send DEVICE_TRAY_MOVED events as appropriate.
*/ */
void blk_dev_change_media_cb(BlockBackend *blk, bool load, Error **errp) void blk_dev_change_media_cb(BlockBackend *blk, bool load)
{ {
if (blk->dev_ops && blk->dev_ops->change_media_cb) { if (blk->dev_ops && blk->dev_ops->change_media_cb) {
bool tray_was_open, tray_is_open; bool tray_was_open, tray_is_open;
Error *local_err = NULL;
assert(!blk->legacy_dev);
tray_was_open = blk_dev_is_tray_open(blk); tray_was_open = blk_dev_is_tray_open(blk);
blk->dev_ops->change_media_cb(blk->dev_opaque, load, &local_err); blk->dev_ops->change_media_cb(blk->dev_opaque, load);
if (local_err) {
assert(load == true);
error_propagate(errp, local_err);
return;
}
tray_is_open = blk_dev_is_tray_open(blk); tray_is_open = blk_dev_is_tray_open(blk);
if (tray_was_open != tray_is_open) { if (tray_was_open != tray_is_open) {
char *id = blk_get_attached_dev_id(blk); qapi_event_send_device_tray_moved(blk_name(blk), tray_is_open,
qapi_event_send_device_tray_moved(blk_name(blk), id, tray_is_open,
&error_abort); &error_abort);
g_free(id);
} }
} }
} }
static void blk_root_change_media(BdrvChild *child, bool load) static void blk_root_change_media(BdrvChild *child, bool load)
{ {
blk_dev_change_media_cb(child->opaque, load, NULL); blk_dev_change_media_cb(child->opaque, load);
} }
/* /*
@@ -968,30 +727,40 @@ static int blk_check_byte_request(BlockBackend *blk, int64_t offset,
return 0; return 0;
} }
static int blk_check_request(BlockBackend *blk, int64_t sector_num,
int nb_sectors)
{
if (sector_num < 0 || sector_num > INT64_MAX / BDRV_SECTOR_SIZE) {
return -EIO;
}
if (nb_sectors < 0 || nb_sectors > INT_MAX / BDRV_SECTOR_SIZE) {
return -EIO;
}
return blk_check_byte_request(blk, sector_num * BDRV_SECTOR_SIZE,
nb_sectors * BDRV_SECTOR_SIZE);
}
int coroutine_fn blk_co_preadv(BlockBackend *blk, int64_t offset, int coroutine_fn blk_co_preadv(BlockBackend *blk, int64_t offset,
unsigned int bytes, QEMUIOVector *qiov, unsigned int bytes, QEMUIOVector *qiov,
BdrvRequestFlags flags) BdrvRequestFlags flags)
{ {
int ret; int ret;
BlockDriverState *bs = blk_bs(blk);
trace_blk_co_preadv(blk, bs, offset, bytes, flags); trace_blk_co_preadv(blk, blk_bs(blk), offset, bytes, flags);
ret = blk_check_byte_request(blk, offset, bytes); ret = blk_check_byte_request(blk, offset, bytes);
if (ret < 0) { if (ret < 0) {
return ret; return ret;
} }
bdrv_inc_in_flight(bs);
/* throttling disk I/O */ /* throttling disk I/O */
if (blk->public.throttle_state) { if (blk->public.throttle_state) {
throttle_group_co_io_limits_intercept(blk, bytes, false); throttle_group_co_io_limits_intercept(blk, bytes, false);
} }
ret = bdrv_co_preadv(blk->root, offset, bytes, qiov, flags); return bdrv_co_preadv(blk->root, offset, bytes, qiov, flags);
bdrv_dec_in_flight(bs);
return ret;
} }
int coroutine_fn blk_co_pwritev(BlockBackend *blk, int64_t offset, int coroutine_fn blk_co_pwritev(BlockBackend *blk, int64_t offset,
@@ -999,17 +768,14 @@ int coroutine_fn blk_co_pwritev(BlockBackend *blk, int64_t offset,
BdrvRequestFlags flags) BdrvRequestFlags flags)
{ {
int ret; int ret;
BlockDriverState *bs = blk_bs(blk);
trace_blk_co_pwritev(blk, bs, offset, bytes, flags); trace_blk_co_pwritev(blk, blk_bs(blk), offset, bytes, flags);
ret = blk_check_byte_request(blk, offset, bytes); ret = blk_check_byte_request(blk, offset, bytes);
if (ret < 0) { if (ret < 0) {
return ret; return ret;
} }
bdrv_inc_in_flight(bs);
/* throttling disk I/O */ /* throttling disk I/O */
if (blk->public.throttle_state) { if (blk->public.throttle_state) {
throttle_group_co_io_limits_intercept(blk, bytes, true); throttle_group_co_io_limits_intercept(blk, bytes, true);
@@ -1019,9 +785,7 @@ int coroutine_fn blk_co_pwritev(BlockBackend *blk, int64_t offset,
flags |= BDRV_REQ_FUA; flags |= BDRV_REQ_FUA;
} }
ret = bdrv_co_pwritev(blk->root, offset, bytes, qiov, flags); return bdrv_co_pwritev(blk->root, offset, bytes, qiov, flags);
bdrv_dec_in_flight(bs);
return ret;
} }
typedef struct BlkRwCo { typedef struct BlkRwCo {
@@ -1052,8 +816,10 @@ static int blk_prw(BlockBackend *blk, int64_t offset, uint8_t *buf,
int64_t bytes, CoroutineEntry co_entry, int64_t bytes, CoroutineEntry co_entry,
BdrvRequestFlags flags) BdrvRequestFlags flags)
{ {
AioContext *aio_context;
QEMUIOVector qiov; QEMUIOVector qiov;
struct iovec iov; struct iovec iov;
Coroutine *co;
BlkRwCo rwco; BlkRwCo rwco;
iov = (struct iovec) { iov = (struct iovec) {
@@ -1070,13 +836,12 @@ static int blk_prw(BlockBackend *blk, int64_t offset, uint8_t *buf,
.ret = NOT_DONE, .ret = NOT_DONE,
}; };
if (qemu_in_coroutine()) { co = qemu_coroutine_create(co_entry, &rwco);
/* Fast-path if already in coroutine context */ qemu_coroutine_enter(co);
co_entry(&rwco);
} else { aio_context = blk_get_aio_context(blk);
Coroutine *co = qemu_coroutine_create(co_entry, &rwco); while (rwco.ret == NOT_DONE) {
bdrv_coroutine_enter(blk_bs(blk), co); aio_poll(aio_context, true);
BDRV_POLL_WHILE(blk_bs(blk), rwco.ret == NOT_DONE);
} }
return rwco.ret; return rwco.ret;
@@ -1113,8 +878,7 @@ int blk_make_zero(BlockBackend *blk, BdrvRequestFlags flags)
static void error_callback_bh(void *opaque) static void error_callback_bh(void *opaque)
{ {
struct BlockBackendAIOCB *acb = opaque; struct BlockBackendAIOCB *acb = opaque;
qemu_bh_delete(acb->bh);
bdrv_dec_in_flight(acb->common.bs);
acb->common.cb(acb->common.opaque, acb->ret); acb->common.cb(acb->common.opaque, acb->ret);
qemu_aio_unref(acb); qemu_aio_unref(acb);
} }
@@ -1124,13 +888,16 @@ BlockAIOCB *blk_abort_aio_request(BlockBackend *blk,
void *opaque, int ret) void *opaque, int ret)
{ {
struct BlockBackendAIOCB *acb; struct BlockBackendAIOCB *acb;
QEMUBH *bh;
bdrv_inc_in_flight(blk_bs(blk));
acb = blk_aio_get(&block_backend_aiocb_info, blk, cb, opaque); acb = blk_aio_get(&block_backend_aiocb_info, blk, cb, opaque);
acb->blk = blk; acb->blk = blk;
acb->ret = ret; acb->ret = ret;
aio_bh_schedule_oneshot(blk_get_aio_context(blk), error_callback_bh, acb); bh = aio_bh_new(blk_get_aio_context(blk), error_callback_bh, acb);
acb->bh = bh;
qemu_bh_schedule(bh);
return &acb->common; return &acb->common;
} }
@@ -1139,6 +906,7 @@ typedef struct BlkAioEmAIOCB {
BlkRwCo rwco; BlkRwCo rwco;
int bytes; int bytes;
bool has_returned; bool has_returned;
QEMUBH* bh;
} BlkAioEmAIOCB; } BlkAioEmAIOCB;
static const AIOCBInfo blk_aio_em_aiocb_info = { static const AIOCBInfo blk_aio_em_aiocb_info = {
@@ -1147,8 +915,11 @@ static const AIOCBInfo blk_aio_em_aiocb_info = {
static void blk_aio_complete(BlkAioEmAIOCB *acb) static void blk_aio_complete(BlkAioEmAIOCB *acb)
{ {
if (acb->bh) {
assert(acb->has_returned);
qemu_bh_delete(acb->bh);
}
if (acb->has_returned) { if (acb->has_returned) {
bdrv_dec_in_flight(acb->common.bs);
acb->common.cb(acb->common.opaque, acb->rwco.ret); acb->common.cb(acb->common.opaque, acb->rwco.ret);
qemu_aio_unref(acb); qemu_aio_unref(acb);
} }
@@ -1156,9 +927,7 @@ static void blk_aio_complete(BlkAioEmAIOCB *acb)
static void blk_aio_complete_bh(void *opaque) static void blk_aio_complete_bh(void *opaque)
{ {
BlkAioEmAIOCB *acb = opaque; blk_aio_complete(opaque);
assert(acb->has_returned);
blk_aio_complete(acb);
} }
static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, int64_t offset, int bytes, static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, int64_t offset, int bytes,
@@ -1169,7 +938,6 @@ static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, int64_t offset, int bytes,
BlkAioEmAIOCB *acb; BlkAioEmAIOCB *acb;
Coroutine *co; Coroutine *co;
bdrv_inc_in_flight(blk_bs(blk));
acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque); acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
acb->rwco = (BlkRwCo) { acb->rwco = (BlkRwCo) {
.blk = blk, .blk = blk,
@@ -1179,15 +947,16 @@ static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, int64_t offset, int bytes,
.ret = NOT_DONE, .ret = NOT_DONE,
}; };
acb->bytes = bytes; acb->bytes = bytes;
acb->bh = NULL;
acb->has_returned = false; acb->has_returned = false;
co = qemu_coroutine_create(co_entry, acb); co = qemu_coroutine_create(co_entry, acb);
bdrv_coroutine_enter(blk_bs(blk), co); qemu_coroutine_enter(co);
acb->has_returned = true; acb->has_returned = true;
if (acb->rwco.ret != NOT_DONE) { if (acb->rwco.ret != NOT_DONE) {
aio_bh_schedule_oneshot(blk_get_aio_context(blk), acb->bh = aio_bh_new(blk_get_aio_context(blk), blk_aio_complete_bh, acb);
blk_aio_complete_bh, acb); qemu_bh_schedule(acb->bh);
} }
return &acb->common; return &acb->common;
@@ -1286,36 +1055,26 @@ BlockAIOCB *blk_aio_pwritev(BlockBackend *blk, int64_t offset,
blk_aio_write_entry, flags, cb, opaque); blk_aio_write_entry, flags, cb, opaque);
} }
static void blk_aio_flush_entry(void *opaque)
{
BlkAioEmAIOCB *acb = opaque;
BlkRwCo *rwco = &acb->rwco;
rwco->ret = blk_co_flush(rwco->blk);
blk_aio_complete(acb);
}
BlockAIOCB *blk_aio_flush(BlockBackend *blk, BlockAIOCB *blk_aio_flush(BlockBackend *blk,
BlockCompletionFunc *cb, void *opaque) BlockCompletionFunc *cb, void *opaque)
{ {
return blk_aio_prwv(blk, 0, 0, NULL, blk_aio_flush_entry, 0, cb, opaque); if (!blk_is_available(blk)) {
return blk_abort_aio_request(blk, cb, opaque, -ENOMEDIUM);
} }
static void blk_aio_pdiscard_entry(void *opaque) return bdrv_aio_flush(blk_bs(blk), cb, opaque);
{
BlkAioEmAIOCB *acb = opaque;
BlkRwCo *rwco = &acb->rwco;
rwco->ret = blk_co_pdiscard(rwco->blk, rwco->offset, acb->bytes);
blk_aio_complete(acb);
} }
BlockAIOCB *blk_aio_pdiscard(BlockBackend *blk, BlockAIOCB *blk_aio_pdiscard(BlockBackend *blk,
int64_t offset, int count, int64_t offset, int count,
BlockCompletionFunc *cb, void *opaque) BlockCompletionFunc *cb, void *opaque)
{ {
return blk_aio_prwv(blk, offset, count, NULL, blk_aio_pdiscard_entry, 0, int ret = blk_check_byte_request(blk, offset, count);
cb, opaque); if (ret < 0) {
return blk_abort_aio_request(blk, cb, opaque, ret);
}
return bdrv_aio_pdiscard(blk_bs(blk), offset, count, cb, opaque);
} }
void blk_aio_cancel(BlockAIOCB *acb) void blk_aio_cancel(BlockAIOCB *acb)
@@ -1328,50 +1087,23 @@ void blk_aio_cancel_async(BlockAIOCB *acb)
bdrv_aio_cancel_async(acb); bdrv_aio_cancel_async(acb);
} }
int blk_co_ioctl(BlockBackend *blk, unsigned long int req, void *buf) int blk_ioctl(BlockBackend *blk, unsigned long int req, void *buf)
{ {
if (!blk_is_available(blk)) { if (!blk_is_available(blk)) {
return -ENOMEDIUM; return -ENOMEDIUM;
} }
return bdrv_co_ioctl(blk_bs(blk), req, buf); return bdrv_ioctl(blk_bs(blk), req, buf);
}
static void blk_ioctl_entry(void *opaque)
{
BlkRwCo *rwco = opaque;
rwco->ret = blk_co_ioctl(rwco->blk, rwco->offset,
rwco->qiov->iov[0].iov_base);
}
int blk_ioctl(BlockBackend *blk, unsigned long int req, void *buf)
{
return blk_prw(blk, req, buf, 0, blk_ioctl_entry, 0);
}
static void blk_aio_ioctl_entry(void *opaque)
{
BlkAioEmAIOCB *acb = opaque;
BlkRwCo *rwco = &acb->rwco;
rwco->ret = blk_co_ioctl(rwco->blk, rwco->offset,
rwco->qiov->iov[0].iov_base);
blk_aio_complete(acb);
} }
BlockAIOCB *blk_aio_ioctl(BlockBackend *blk, unsigned long int req, void *buf, BlockAIOCB *blk_aio_ioctl(BlockBackend *blk, unsigned long int req, void *buf,
BlockCompletionFunc *cb, void *opaque) BlockCompletionFunc *cb, void *opaque)
{ {
QEMUIOVector qiov; if (!blk_is_available(blk)) {
struct iovec iov; return blk_abort_aio_request(blk, cb, opaque, -ENOMEDIUM);
}
iov = (struct iovec) { return bdrv_aio_ioctl(blk_bs(blk), req, buf, cb, opaque);
.iov_base = buf,
.iov_len = 0,
};
qemu_iovec_init_external(&qiov, &iov, 1);
return blk_aio_prwv(blk, req, 0, &qiov, blk_aio_ioctl_entry, 0, cb, opaque);
} }
int blk_co_pdiscard(BlockBackend *blk, int64_t offset, int count) int blk_co_pdiscard(BlockBackend *blk, int64_t offset, int count)
@@ -1393,15 +1125,13 @@ int blk_co_flush(BlockBackend *blk)
return bdrv_co_flush(blk_bs(blk)); return bdrv_co_flush(blk_bs(blk));
} }
static void blk_flush_entry(void *opaque)
{
BlkRwCo *rwco = opaque;
rwco->ret = blk_co_flush(rwco->blk);
}
int blk_flush(BlockBackend *blk) int blk_flush(BlockBackend *blk)
{ {
return blk_prw(blk, 0, NULL, 0, blk_flush_entry, 0); if (!blk_is_available(blk)) {
return -ENOMEDIUM;
}
return bdrv_flush(blk_bs(blk));
} }
void blk_drain(BlockBackend *blk) void blk_drain(BlockBackend *blk)
@@ -1456,9 +1186,8 @@ static void send_qmp_error_event(BlockBackend *blk,
IoOperationType optype; IoOperationType optype;
optype = is_read ? IO_OPERATION_TYPE_READ : IO_OPERATION_TYPE_WRITE; optype = is_read ? IO_OPERATION_TYPE_READ : IO_OPERATION_TYPE_WRITE;
qapi_event_send_block_io_error(blk_name(blk), qapi_event_send_block_io_error(blk_name(blk), optype, action,
bdrv_get_node_name(blk_bs(blk)), optype, blk_iostatus_is_enabled(blk),
action, blk_iostatus_is_enabled(blk),
error == ENOSPC, strerror(error), error == ENOSPC, strerror(error),
&error_abort); &error_abort);
} }
@@ -1563,21 +1292,10 @@ void blk_lock_medium(BlockBackend *blk, bool locked)
void blk_eject(BlockBackend *blk, bool eject_flag) void blk_eject(BlockBackend *blk, bool eject_flag)
{ {
BlockDriverState *bs = blk_bs(blk); BlockDriverState *bs = blk_bs(blk);
char *id;
/* blk_eject is only called by qdevified devices */
assert(!blk->legacy_dev);
if (bs) { if (bs) {
bdrv_eject(bs, eject_flag); bdrv_eject(bs, eject_flag);
} }
/* Whether or not we ejected on the backend,
* the frontend experienced a tray event. */
id = blk_get_attached_dev_id(blk);
qapi_event_send_device_tray_moved(blk_name(blk), id,
eject_flag, &error_abort);
g_free(id);
} }
int blk_get_flags(BlockBackend *blk) int blk_get_flags(BlockBackend *blk)
@@ -1766,32 +1484,34 @@ int coroutine_fn blk_co_pwrite_zeroes(BlockBackend *blk, int64_t offset,
flags | BDRV_REQ_ZERO_WRITE); flags | BDRV_REQ_ZERO_WRITE);
} }
int blk_pwrite_compressed(BlockBackend *blk, int64_t offset, const void *buf, int blk_write_compressed(BlockBackend *blk, int64_t sector_num,
int count) const uint8_t *buf, int nb_sectors)
{ {
return blk_prw(blk, offset, (void *) buf, count, blk_write_entry, int ret = blk_check_request(blk, sector_num, nb_sectors);
BDRV_REQ_WRITE_COMPRESSED); if (ret < 0) {
return ret;
} }
int blk_truncate(BlockBackend *blk, int64_t offset, Error **errp) return bdrv_write_compressed(blk_bs(blk), sector_num, buf, nb_sectors);
}
int blk_truncate(BlockBackend *blk, int64_t offset)
{ {
if (!blk_is_available(blk)) { if (!blk_is_available(blk)) {
error_setg(errp, "No medium inserted");
return -ENOMEDIUM; return -ENOMEDIUM;
} }
return bdrv_truncate(blk->root, offset, errp); return bdrv_truncate(blk_bs(blk), offset);
}
static void blk_pdiscard_entry(void *opaque)
{
BlkRwCo *rwco = opaque;
rwco->ret = blk_co_pdiscard(rwco->blk, rwco->offset, rwco->qiov->size);
} }
int blk_pdiscard(BlockBackend *blk, int64_t offset, int count) int blk_pdiscard(BlockBackend *blk, int64_t offset, int count)
{ {
return blk_prw(blk, offset, NULL, count, blk_pdiscard_entry, 0); int ret = blk_check_byte_request(blk, offset, count);
if (ret < 0) {
return ret;
}
return bdrv_pdiscard(blk_bs(blk), offset, count);
} }
int blk_save_vmstate(BlockBackend *blk, const uint8_t *buf, int blk_save_vmstate(BlockBackend *blk, const uint8_t *buf,
@@ -1856,12 +1576,13 @@ void blk_update_root_state(BlockBackend *blk)
} }
/* /*
* Returns the detect-zeroes setting to be used for bdrv_open() of a * Applies the information in the root state to the given BlockDriverState. This
* BlockDriverState which is supposed to inherit the root state. * does not include the flags which have to be specified for bdrv_open(), use
* blk_get_open_flags_from_root_state() to inquire them.
*/ */
bool blk_get_detect_zeroes_from_root_state(BlockBackend *blk) void blk_apply_root_state(BlockBackend *blk, BlockDriverState *bs)
{ {
return blk->root_state.detect_zeroes; bs->detect_zeroes = blk->root_state.detect_zeroes;
} }
/* /*
@@ -1903,6 +1624,28 @@ int blk_commit_all(void)
return 0; return 0;
} }
int blk_flush_all(void)
{
BlockBackend *blk = NULL;
int result = 0;
while ((blk = blk_all_next(blk)) != NULL) {
AioContext *aio_context = blk_get_aio_context(blk);
int ret;
aio_context_acquire(aio_context);
if (blk_is_inserted(blk)) {
ret = blk_flush(blk);
if (ret < 0 && !result) {
result = ret;
}
}
aio_context_release(aio_context);
}
return result;
}
/* throttling disk I/O limits */ /* throttling disk I/O limits */
void blk_set_io_limits(BlockBackend *blk, ThrottleConfig *cfg) void blk_set_io_limits(BlockBackend *blk, ThrottleConfig *cfg)
@@ -1946,16 +1689,10 @@ static void blk_root_drained_begin(BdrvChild *child)
{ {
BlockBackend *blk = child->opaque; BlockBackend *blk = child->opaque;
if (++blk->quiesce_counter == 1) {
if (blk->dev_ops && blk->dev_ops->drained_begin) {
blk->dev_ops->drained_begin(blk->dev_opaque);
}
}
/* Note that blk->root may not be accessible here yet if we are just /* Note that blk->root may not be accessible here yet if we are just
* attaching to a BlockDriverState that is drained. Use child instead. */ * attaching to a BlockDriverState that is drained. Use child instead. */
if (atomic_fetch_inc(&blk->public.io_limits_disabled) == 0) { if (blk->public.io_limits_disabled++ == 0) {
throttle_group_restart_blk(blk); throttle_group_restart_blk(blk);
} }
} }
@@ -1963,14 +1700,7 @@ static void blk_root_drained_begin(BdrvChild *child)
static void blk_root_drained_end(BdrvChild *child) static void blk_root_drained_end(BdrvChild *child)
{ {
BlockBackend *blk = child->opaque; BlockBackend *blk = child->opaque;
assert(blk->quiesce_counter);
assert(blk->public.io_limits_disabled); assert(blk->public.io_limits_disabled);
atomic_dec(&blk->public.io_limits_disabled); --blk->public.io_limits_disabled;
if (--blk->quiesce_counter == 0) {
if (blk->dev_ops && blk->dev_ops->drained_end) {
blk->dev_ops->drained_end(blk->dev_opaque);
}
}
} }

View File

@@ -104,16 +104,7 @@ static int bochs_open(BlockDriverState *bs, QDict *options, int flags,
struct bochs_header bochs; struct bochs_header bochs;
int ret; int ret;
bs->file = bdrv_open_child(NULL, options, "file", bs, &child_file, bs->read_only = true; /* no write support yet */
false, errp);
if (!bs->file) {
return -EINVAL;
}
ret = bdrv_set_read_only(bs, true, errp); /* no write support yet */
if (ret < 0) {
return ret;
}
ret = bdrv_pread(bs->file, 0, &bochs, sizeof(bochs)); ret = bdrv_pread(bs->file, 0, &bochs, sizeof(bochs));
if (ret < 0) { if (ret < 0) {
@@ -296,7 +287,6 @@ static BlockDriver bdrv_bochs = {
.instance_size = sizeof(BDRVBochsState), .instance_size = sizeof(BDRVBochsState),
.bdrv_probe = bochs_probe, .bdrv_probe = bochs_probe,
.bdrv_open = bochs_open, .bdrv_open = bochs_open,
.bdrv_child_perm = bdrv_format_default_perms,
.bdrv_refresh_limits = bochs_refresh_limits, .bdrv_refresh_limits = bochs_refresh_limits,
.bdrv_co_preadv = bochs_co_preadv, .bdrv_co_preadv = bochs_co_preadv,
.bdrv_close = bochs_close, .bdrv_close = bochs_close,

View File

@@ -66,16 +66,7 @@ static int cloop_open(BlockDriverState *bs, QDict *options, int flags,
uint32_t offsets_size, max_compressed_block_size = 1, i; uint32_t offsets_size, max_compressed_block_size = 1, i;
int ret; int ret;
bs->file = bdrv_open_child(NULL, options, "file", bs, &child_file, bs->read_only = true;
false, errp);
if (!bs->file) {
return -EINVAL;
}
ret = bdrv_set_read_only(bs, true, errp);
if (ret < 0) {
return ret;
}
/* read header */ /* read header */
ret = bdrv_pread(bs->file, 128, &s->block_size, 4); ret = bdrv_pread(bs->file, 128, &s->block_size, 4);
@@ -293,7 +284,6 @@ static BlockDriver bdrv_cloop = {
.instance_size = sizeof(BDRVCloopState), .instance_size = sizeof(BDRVCloopState),
.bdrv_probe = cloop_probe, .bdrv_probe = cloop_probe,
.bdrv_open = cloop_open, .bdrv_open = cloop_open,
.bdrv_child_perm = bdrv_format_default_perms,
.bdrv_refresh_limits = cloop_refresh_limits, .bdrv_refresh_limits = cloop_refresh_limits,
.bdrv_co_preadv = cloop_co_preadv, .bdrv_co_preadv = cloop_co_preadv,
.bdrv_close = cloop_close, .bdrv_close = cloop_close,

View File

@@ -13,10 +13,9 @@
*/ */
#include "qemu/osdep.h" #include "qemu/osdep.h"
#include "qemu/cutils.h"
#include "trace.h" #include "trace.h"
#include "block/block_int.h" #include "block/block_int.h"
#include "block/blockjob_int.h" #include "block/blockjob.h"
#include "qapi/error.h" #include "qapi/error.h"
#include "qapi/qmp/qerror.h" #include "qapi/qmp/qerror.h"
#include "qemu/ratelimit.h" #include "qemu/ratelimit.h"
@@ -37,7 +36,6 @@ typedef struct CommitBlockJob {
BlockJob common; BlockJob common;
RateLimit limit; RateLimit limit;
BlockDriverState *active; BlockDriverState *active;
BlockDriverState *commit_top_bs;
BlockBackend *top; BlockBackend *top;
BlockBackend *base; BlockBackend *base;
BlockdevOnError on_error; BlockdevOnError on_error;
@@ -85,27 +83,12 @@ static void commit_complete(BlockJob *job, void *opaque)
BlockDriverState *active = s->active; BlockDriverState *active = s->active;
BlockDriverState *top = blk_bs(s->top); BlockDriverState *top = blk_bs(s->top);
BlockDriverState *base = blk_bs(s->base); BlockDriverState *base = blk_bs(s->base);
BlockDriverState *overlay_bs = bdrv_find_overlay(active, s->commit_top_bs); BlockDriverState *overlay_bs;
int ret = data->ret; int ret = data->ret;
bool remove_commit_top_bs = false;
/* Make sure overlay_bs and top stay around until bdrv_set_backing_hd() */
bdrv_ref(top);
bdrv_ref(overlay_bs);
/* Remove base node parent that still uses BLK_PERM_WRITE/RESIZE before
* the normal backing chain can be restored. */
blk_unref(s->base);
if (!block_job_is_cancelled(&s->common) && ret == 0) { if (!block_job_is_cancelled(&s->common) && ret == 0) {
/* success */ /* success */
ret = bdrv_drop_intermediate(active, s->commit_top_bs, base, ret = bdrv_drop_intermediate(active, top, base, s->backing_file_str);
s->backing_file_str);
} else if (overlay_bs) {
/* XXX Can (or should) we somehow keep 'consistent read' blocked even
* after the failed/cancelled commit job is gone? If we already wrote
* something to base, the intermediate images aren't valid any more. */
remove_commit_top_bs = true;
} }
/* restore base open flags here if appropriate (e.g., change the base back /* restore base open flags here if appropriate (e.g., change the base back
@@ -114,23 +97,15 @@ static void commit_complete(BlockJob *job, void *opaque)
if (s->base_flags != bdrv_get_flags(base)) { if (s->base_flags != bdrv_get_flags(base)) {
bdrv_reopen(base, s->base_flags, NULL); bdrv_reopen(base, s->base_flags, NULL);
} }
overlay_bs = bdrv_find_overlay(active, top);
if (overlay_bs && s->orig_overlay_flags != bdrv_get_flags(overlay_bs)) { if (overlay_bs && s->orig_overlay_flags != bdrv_get_flags(overlay_bs)) {
bdrv_reopen(overlay_bs, s->orig_overlay_flags, NULL); bdrv_reopen(overlay_bs, s->orig_overlay_flags, NULL);
} }
g_free(s->backing_file_str); g_free(s->backing_file_str);
blk_unref(s->top); blk_unref(s->top);
blk_unref(s->base);
block_job_completed(&s->common, ret); block_job_completed(&s->common, ret);
g_free(data); g_free(data);
/* If bdrv_drop_intermediate() didn't already do that, remove the commit
* filter driver from the backing chain. Do this as the final step so that
* the 'consistent read' permission can be granted. */
if (remove_commit_top_bs) {
bdrv_set_backing_hd(overlay_bs, top, &error_abort);
}
bdrv_unref(overlay_bs);
bdrv_unref(top);
} }
static void coroutine_fn commit_run(void *opaque) static void coroutine_fn commit_run(void *opaque)
@@ -158,7 +133,7 @@ static void coroutine_fn commit_run(void *opaque)
} }
if (base_len < s->common.len) { if (base_len < s->common.len) {
ret = blk_truncate(s->base, s->common.len, NULL); ret = blk_truncate(s->base, s->common.len);
if (ret) { if (ret) {
goto out; goto out;
} }
@@ -231,70 +206,19 @@ static const BlockJobDriver commit_job_driver = {
.instance_size = sizeof(CommitBlockJob), .instance_size = sizeof(CommitBlockJob),
.job_type = BLOCK_JOB_TYPE_COMMIT, .job_type = BLOCK_JOB_TYPE_COMMIT,
.set_speed = commit_set_speed, .set_speed = commit_set_speed,
.start = commit_run,
};
static int coroutine_fn bdrv_commit_top_preadv(BlockDriverState *bs,
uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags)
{
return bdrv_co_preadv(bs->backing, offset, bytes, qiov, flags);
}
static int64_t coroutine_fn bdrv_commit_top_get_block_status(
BlockDriverState *bs, int64_t sector_num, int nb_sectors, int *pnum,
BlockDriverState **file)
{
*pnum = nb_sectors;
*file = bs->backing->bs;
return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID | BDRV_BLOCK_DATA |
(sector_num << BDRV_SECTOR_BITS);
}
static void bdrv_commit_top_refresh_filename(BlockDriverState *bs, QDict *opts)
{
bdrv_refresh_filename(bs->backing->bs);
pstrcpy(bs->exact_filename, sizeof(bs->exact_filename),
bs->backing->bs->filename);
}
static void bdrv_commit_top_close(BlockDriverState *bs)
{
}
static void bdrv_commit_top_child_perm(BlockDriverState *bs, BdrvChild *c,
const BdrvChildRole *role,
uint64_t perm, uint64_t shared,
uint64_t *nperm, uint64_t *nshared)
{
*nperm = 0;
*nshared = BLK_PERM_ALL;
}
/* Dummy node that provides consistent read to its users without requiring it
* from its backing file and that allows writes on the backing file chain. */
static BlockDriver bdrv_commit_top = {
.format_name = "commit_top",
.bdrv_co_preadv = bdrv_commit_top_preadv,
.bdrv_co_get_block_status = bdrv_commit_top_get_block_status,
.bdrv_refresh_filename = bdrv_commit_top_refresh_filename,
.bdrv_close = bdrv_commit_top_close,
.bdrv_child_perm = bdrv_commit_top_child_perm,
}; };
void commit_start(const char *job_id, BlockDriverState *bs, void commit_start(const char *job_id, BlockDriverState *bs,
BlockDriverState *base, BlockDriverState *top, int64_t speed, BlockDriverState *base, BlockDriverState *top, int64_t speed,
BlockdevOnError on_error, const char *backing_file_str, BlockdevOnError on_error, BlockCompletionFunc *cb,
const char *filter_node_name, Error **errp) void *opaque, const char *backing_file_str, Error **errp)
{ {
CommitBlockJob *s; CommitBlockJob *s;
BlockReopenQueue *reopen_queue = NULL; BlockReopenQueue *reopen_queue = NULL;
int orig_overlay_flags; int orig_overlay_flags;
int orig_base_flags; int orig_base_flags;
BlockDriverState *iter;
BlockDriverState *overlay_bs; BlockDriverState *overlay_bs;
BlockDriverState *commit_top_bs = NULL;
Error *local_err = NULL; Error *local_err = NULL;
int ret;
assert(top != bs); assert(top != bs);
if (top == base) { if (top == base) {
@@ -309,8 +233,8 @@ void commit_start(const char *job_id, BlockDriverState *bs,
return; return;
} }
s = block_job_create(job_id, &commit_job_driver, bs, 0, BLK_PERM_ALL, s = block_job_create(job_id, &commit_job_driver, bs, speed,
speed, BLOCK_JOB_DEFAULT, NULL, NULL, errp); cb, opaque, errp);
if (!s) { if (!s) {
return; return;
} }
@@ -319,96 +243,29 @@ void commit_start(const char *job_id, BlockDriverState *bs,
orig_overlay_flags = bdrv_get_flags(overlay_bs); orig_overlay_flags = bdrv_get_flags(overlay_bs);
/* convert base & overlay_bs to r/w, if necessary */ /* convert base & overlay_bs to r/w, if necessary */
if (!(orig_base_flags & BDRV_O_RDWR)) {
reopen_queue = bdrv_reopen_queue(reopen_queue, base, NULL,
orig_base_flags | BDRV_O_RDWR);
}
if (!(orig_overlay_flags & BDRV_O_RDWR)) { if (!(orig_overlay_flags & BDRV_O_RDWR)) {
reopen_queue = bdrv_reopen_queue(reopen_queue, overlay_bs, NULL, reopen_queue = bdrv_reopen_queue(reopen_queue, overlay_bs, NULL,
orig_overlay_flags | BDRV_O_RDWR); orig_overlay_flags | BDRV_O_RDWR);
} }
if (!(orig_base_flags & BDRV_O_RDWR)) {
reopen_queue = bdrv_reopen_queue(reopen_queue, base, NULL,
orig_base_flags | BDRV_O_RDWR);
}
if (reopen_queue) { if (reopen_queue) {
bdrv_reopen_multiple(bdrv_get_aio_context(bs), reopen_queue, &local_err); bdrv_reopen_multiple(reopen_queue, &local_err);
if (local_err != NULL) { if (local_err != NULL) {
error_propagate(errp, local_err); error_propagate(errp, local_err);
goto fail; block_job_unref(&s->common);
return;
} }
} }
/* Insert commit_top block node above top, so we can block consistent read
* on the backing chain below it */
commit_top_bs = bdrv_new_open_driver(&bdrv_commit_top, filter_node_name, 0,
errp);
if (commit_top_bs == NULL) {
goto fail;
}
commit_top_bs->total_sectors = top->total_sectors;
bdrv_set_aio_context(commit_top_bs, bdrv_get_aio_context(top));
bdrv_set_backing_hd(commit_top_bs, top, &local_err); s->base = blk_new();
if (local_err) { blk_insert_bs(s->base, base);
bdrv_unref(commit_top_bs);
commit_top_bs = NULL;
error_propagate(errp, local_err);
goto fail;
}
bdrv_set_backing_hd(overlay_bs, commit_top_bs, &local_err);
if (local_err) {
bdrv_unref(commit_top_bs);
commit_top_bs = NULL;
error_propagate(errp, local_err);
goto fail;
}
s->commit_top_bs = commit_top_bs; s->top = blk_new();
bdrv_unref(commit_top_bs); blk_insert_bs(s->top, top);
/* Block all nodes between top and base, because they will
* disappear from the chain after this operation. */
assert(bdrv_chain_contains(top, base));
for (iter = top; iter != base; iter = backing_bs(iter)) {
/* XXX BLK_PERM_WRITE needs to be allowed so we don't block ourselves
* at s->base (if writes are blocked for a node, they are also blocked
* for its backing file). The other options would be a second filter
* driver above s->base. */
ret = block_job_add_bdrv(&s->common, "intermediate node", iter, 0,
BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE,
errp);
if (ret < 0) {
goto fail;
}
}
ret = block_job_add_bdrv(&s->common, "base", base, 0, BLK_PERM_ALL, errp);
if (ret < 0) {
goto fail;
}
/* overlay_bs must be blocked because it needs to be modified to
* update the backing image string. */
ret = block_job_add_bdrv(&s->common, "overlay of top", overlay_bs,
BLK_PERM_GRAPH_MOD, BLK_PERM_ALL, errp);
if (ret < 0) {
goto fail;
}
s->base = blk_new(BLK_PERM_CONSISTENT_READ
| BLK_PERM_WRITE
| BLK_PERM_RESIZE,
BLK_PERM_CONSISTENT_READ
| BLK_PERM_GRAPH_MOD
| BLK_PERM_WRITE_UNCHANGED);
ret = blk_insert_bs(s->base, base, errp);
if (ret < 0) {
goto fail;
}
/* Required permissions are already taken with block_job_add_bdrv() */
s->top = blk_new(0, BLK_PERM_ALL);
ret = blk_insert_bs(s->top, top, errp);
if (ret < 0) {
goto fail;
}
s->active = bs; s->active = bs;
@@ -418,22 +275,10 @@ void commit_start(const char *job_id, BlockDriverState *bs,
s->backing_file_str = g_strdup(backing_file_str); s->backing_file_str = g_strdup(backing_file_str);
s->on_error = on_error; s->on_error = on_error;
s->common.co = qemu_coroutine_create(commit_run, s);
trace_commit_start(bs, base, top, s); trace_commit_start(bs, base, top, s, s->common.co, opaque);
block_job_start(&s->common); qemu_coroutine_enter(s->common.co);
return;
fail:
if (s->base) {
blk_unref(s->base);
}
if (s->top) {
blk_unref(s->top);
}
if (commit_top_bs) {
bdrv_set_backing_hd(overlay_bs, top, &error_abort);
}
block_job_early_fail(&s->common);
} }
@@ -443,14 +288,11 @@ fail:
int bdrv_commit(BlockDriverState *bs) int bdrv_commit(BlockDriverState *bs)
{ {
BlockBackend *src, *backing; BlockBackend *src, *backing;
BlockDriverState *backing_file_bs = NULL;
BlockDriverState *commit_top_bs = NULL;
BlockDriver *drv = bs->drv; BlockDriver *drv = bs->drv;
int64_t sector, total_sectors, length, backing_length; int64_t sector, total_sectors, length, backing_length;
int n, ro, open_flags; int n, ro, open_flags;
int ret = 0; int ret = 0;
uint8_t *buf = NULL; uint8_t *buf = NULL;
Error *local_err = NULL;
if (!drv) if (!drv)
return -ENOMEDIUM; return -ENOMEDIUM;
@@ -473,34 +315,11 @@ int bdrv_commit(BlockDriverState *bs)
} }
} }
src = blk_new(BLK_PERM_CONSISTENT_READ, BLK_PERM_ALL); src = blk_new();
backing = blk_new(BLK_PERM_WRITE | BLK_PERM_RESIZE, BLK_PERM_ALL); blk_insert_bs(src, bs);
ret = blk_insert_bs(src, bs, &local_err); backing = blk_new();
if (ret < 0) { blk_insert_bs(backing, bs->backing->bs);
error_report_err(local_err);
goto ro_cleanup;
}
/* Insert commit_top block node above backing, so we can write to it */
backing_file_bs = backing_bs(bs);
commit_top_bs = bdrv_new_open_driver(&bdrv_commit_top, NULL, BDRV_O_RDWR,
&local_err);
if (commit_top_bs == NULL) {
error_report_err(local_err);
goto ro_cleanup;
}
bdrv_set_aio_context(commit_top_bs, bdrv_get_aio_context(backing_file_bs));
bdrv_set_backing_hd(commit_top_bs, backing_file_bs, &error_abort);
bdrv_set_backing_hd(bs, commit_top_bs, &error_abort);
ret = blk_insert_bs(backing, backing_file_bs, &local_err);
if (ret < 0) {
error_report_err(local_err);
goto ro_cleanup;
}
length = blk_getlength(src); length = blk_getlength(src);
if (length < 0) { if (length < 0) {
@@ -518,9 +337,8 @@ int bdrv_commit(BlockDriverState *bs)
* grow the backing file image if possible. If not possible, * grow the backing file image if possible. If not possible,
* we must return an error */ * we must return an error */
if (length > backing_length) { if (length > backing_length) {
ret = blk_truncate(backing, length, &local_err); ret = blk_truncate(backing, length);
if (ret < 0) { if (ret < 0) {
error_report_err(local_err);
goto ro_cleanup; goto ro_cleanup;
} }
} }
@@ -573,12 +391,8 @@ int bdrv_commit(BlockDriverState *bs)
ro_cleanup: ro_cleanup:
qemu_vfree(buf); qemu_vfree(buf);
blk_unref(backing);
if (backing_file_bs) {
bdrv_set_backing_hd(bs, backing_file_bs, &error_abort);
}
bdrv_unref(commit_top_bs);
blk_unref(src); blk_unref(src);
blk_unref(backing);
if (ro) { if (ro) {
/* ignoring error return here */ /* ignoring error return here */

View File

@@ -33,7 +33,6 @@
#define BLOCK_CRYPTO_OPT_LUKS_IVGEN_ALG "ivgen-alg" #define BLOCK_CRYPTO_OPT_LUKS_IVGEN_ALG "ivgen-alg"
#define BLOCK_CRYPTO_OPT_LUKS_IVGEN_HASH_ALG "ivgen-hash-alg" #define BLOCK_CRYPTO_OPT_LUKS_IVGEN_HASH_ALG "ivgen-hash-alg"
#define BLOCK_CRYPTO_OPT_LUKS_HASH_ALG "hash-alg" #define BLOCK_CRYPTO_OPT_LUKS_HASH_ALG "hash-alg"
#define BLOCK_CRYPTO_OPT_LUKS_ITER_TIME "iter-time"
typedef struct BlockCrypto BlockCrypto; typedef struct BlockCrypto BlockCrypto;
@@ -59,8 +58,8 @@ static ssize_t block_crypto_read_func(QCryptoBlock *block,
size_t offset, size_t offset,
uint8_t *buf, uint8_t *buf,
size_t buflen, size_t buflen,
void *opaque, Error **errp,
Error **errp) void *opaque)
{ {
BlockDriverState *bs = opaque; BlockDriverState *bs = opaque;
ssize_t ret; ssize_t ret;
@@ -86,8 +85,8 @@ static ssize_t block_crypto_write_func(QCryptoBlock *block,
size_t offset, size_t offset,
const uint8_t *buf, const uint8_t *buf,
size_t buflen, size_t buflen,
void *opaque, Error **errp,
Error **errp) void *opaque)
{ {
struct BlockCryptoCreateData *data = opaque; struct BlockCryptoCreateData *data = opaque;
ssize_t ret; ssize_t ret;
@@ -103,8 +102,8 @@ static ssize_t block_crypto_write_func(QCryptoBlock *block,
static ssize_t block_crypto_init_func(QCryptoBlock *block, static ssize_t block_crypto_init_func(QCryptoBlock *block,
size_t headerlen, size_t headerlen,
void *opaque, Error **errp,
Error **errp) void *opaque)
{ {
struct BlockCryptoCreateData *data = opaque; struct BlockCryptoCreateData *data = opaque;
int ret; int ret;
@@ -184,11 +183,6 @@ static QemuOptsList block_crypto_create_opts_luks = {
.type = QEMU_OPT_STRING, .type = QEMU_OPT_STRING,
.help = "Name of encryption hash algorithm", .help = "Name of encryption hash algorithm",
}, },
{
.name = BLOCK_CRYPTO_OPT_LUKS_ITER_TIME,
.type = QEMU_OPT_NUMBER,
.help = "Time to spend in PBKDF in milliseconds",
},
{ /* end of list */ } { /* end of list */ }
}, },
}; };
@@ -300,12 +294,6 @@ static int block_crypto_open_generic(QCryptoBlockFormat format,
QCryptoBlockOpenOptions *open_opts = NULL; QCryptoBlockOpenOptions *open_opts = NULL;
unsigned int cflags = 0; unsigned int cflags = 0;
bs->file = bdrv_open_child(NULL, options, "file", bs, &child_file,
false, errp);
if (!bs->file) {
return -EINVAL;
}
opts = qemu_opts_create(opts_spec, NULL, 0, &error_abort); opts = qemu_opts_create(opts_spec, NULL, 0, &error_abort);
qemu_opts_absorb_qdict(opts, options, &local_err); qemu_opts_absorb_qdict(opts, options, &local_err);
if (local_err) { if (local_err) {
@@ -381,8 +369,7 @@ static int block_crypto_create_generic(QCryptoBlockFormat format,
return ret; return ret;
} }
static int block_crypto_truncate(BlockDriverState *bs, int64_t offset, static int block_crypto_truncate(BlockDriverState *bs, int64_t offset)
Error **errp)
{ {
BlockCrypto *crypto = bs->opaque; BlockCrypto *crypto = bs->opaque;
size_t payload_offset = size_t payload_offset =
@@ -390,7 +377,7 @@ static int block_crypto_truncate(BlockDriverState *bs, int64_t offset,
offset += payload_offset; offset += payload_offset;
return bdrv_truncate(bs->file, offset, errp); return bdrv_truncate(bs->file->bs, offset);
} }
static void block_crypto_close(BlockDriverState *bs) static void block_crypto_close(BlockDriverState *bs)
@@ -629,7 +616,6 @@ BlockDriver bdrv_crypto_luks = {
.bdrv_probe = block_crypto_probe_luks, .bdrv_probe = block_crypto_probe_luks,
.bdrv_open = block_crypto_open_luks, .bdrv_open = block_crypto_open_luks,
.bdrv_close = block_crypto_close, .bdrv_close = block_crypto_close,
.bdrv_child_perm = bdrv_format_default_perms,
.bdrv_create = block_crypto_create_luks, .bdrv_create = block_crypto_create_luks,
.bdrv_truncate = block_crypto_truncate, .bdrv_truncate = block_crypto_truncate,
.create_opts = &block_crypto_create_opts_luks, .create_opts = &block_crypto_create_opts_luks,

View File

@@ -68,20 +68,25 @@ static CURLMcode __curl_multi_socket_action(CURLM *multi_handle,
#endif #endif
#define PROTOCOLS (CURLPROTO_HTTP | CURLPROTO_HTTPS | \ #define PROTOCOLS (CURLPROTO_HTTP | CURLPROTO_HTTPS | \
CURLPROTO_FTP | CURLPROTO_FTPS) CURLPROTO_FTP | CURLPROTO_FTPS | \
CURLPROTO_TFTP)
#define CURL_NUM_STATES 8 #define CURL_NUM_STATES 8
#define CURL_NUM_ACB 8 #define CURL_NUM_ACB 8
#define SECTOR_SIZE 512
#define READ_AHEAD_DEFAULT (256 * 1024) #define READ_AHEAD_DEFAULT (256 * 1024)
#define CURL_TIMEOUT_DEFAULT 5 #define CURL_TIMEOUT_DEFAULT 5
#define CURL_TIMEOUT_MAX 10000 #define CURL_TIMEOUT_MAX 10000
#define FIND_RET_NONE 0
#define FIND_RET_OK 1
#define FIND_RET_WAIT 2
#define CURL_BLOCK_OPT_URL "url" #define CURL_BLOCK_OPT_URL "url"
#define CURL_BLOCK_OPT_READAHEAD "readahead" #define CURL_BLOCK_OPT_READAHEAD "readahead"
#define CURL_BLOCK_OPT_SSLVERIFY "sslverify" #define CURL_BLOCK_OPT_SSLVERIFY "sslverify"
#define CURL_BLOCK_OPT_TIMEOUT "timeout" #define CURL_BLOCK_OPT_TIMEOUT "timeout"
#define CURL_BLOCK_OPT_COOKIE "cookie" #define CURL_BLOCK_OPT_COOKIE "cookie"
#define CURL_BLOCK_OPT_COOKIE_SECRET "cookie-secret"
#define CURL_BLOCK_OPT_USERNAME "username" #define CURL_BLOCK_OPT_USERNAME "username"
#define CURL_BLOCK_OPT_PASSWORD_SECRET "password-secret" #define CURL_BLOCK_OPT_PASSWORD_SECRET "password-secret"
#define CURL_BLOCK_OPT_PROXY_USERNAME "proxy-username" #define CURL_BLOCK_OPT_PROXY_USERNAME "proxy-username"
@@ -90,32 +95,25 @@ static CURLMcode __curl_multi_socket_action(CURLM *multi_handle,
struct BDRVCURLState; struct BDRVCURLState;
typedef struct CURLAIOCB { typedef struct CURLAIOCB {
Coroutine *co; BlockAIOCB common;
QEMUBH *bh;
QEMUIOVector *qiov; QEMUIOVector *qiov;
uint64_t offset; int64_t sector_num;
uint64_t bytes; int nb_sectors;
int ret;
size_t start; size_t start;
size_t end; size_t end;
QSIMPLEQ_ENTRY(CURLAIOCB) next;
} CURLAIOCB; } CURLAIOCB;
typedef struct CURLSocket {
int fd;
QLIST_ENTRY(CURLSocket) next;
} CURLSocket;
typedef struct CURLState typedef struct CURLState
{ {
struct BDRVCURLState *s; struct BDRVCURLState *s;
CURLAIOCB *acb[CURL_NUM_ACB]; CURLAIOCB *acb[CURL_NUM_ACB];
CURL *curl; CURL *curl;
QLIST_HEAD(, CURLSocket) sockets; curl_socket_t sock_fd;
char *orig_buf; char *orig_buf;
uint64_t buf_start; size_t buf_start;
size_t buf_off; size_t buf_off;
size_t buf_len; size_t buf_len;
char range[128]; char range[128];
@@ -126,7 +124,7 @@ typedef struct CURLState
typedef struct BDRVCURLState { typedef struct BDRVCURLState {
CURLM *multi; CURLM *multi;
QEMUTimer timer; QEMUTimer timer;
uint64_t len; size_t len;
CURLState states[CURL_NUM_STATES]; CURLState states[CURL_NUM_STATES];
char *url; char *url;
size_t readahead_size; size_t readahead_size;
@@ -135,8 +133,6 @@ typedef struct BDRVCURLState {
char *cookie; char *cookie;
bool accept_range; bool accept_range;
AioContext *aio_context; AioContext *aio_context;
QemuMutex mutex;
QSIMPLEQ_HEAD(, CURLAIOCB) free_state_waitq;
char *username; char *username;
char *password; char *password;
char *proxyusername; char *proxyusername;
@@ -148,7 +144,6 @@ static void curl_multi_do(void *arg);
static void curl_multi_read(void *arg); static void curl_multi_read(void *arg);
#ifdef NEED_CURL_TIMER_CALLBACK #ifdef NEED_CURL_TIMER_CALLBACK
/* Called from curl_multi_do_locked, with s->mutex held. */
static int curl_timer_cb(CURLM *multi, long timeout_ms, void *opaque) static int curl_timer_cb(CURLM *multi, long timeout_ms, void *opaque)
{ {
BDRVCURLState *s = opaque; BDRVCURLState *s = opaque;
@@ -165,57 +160,38 @@ static int curl_timer_cb(CURLM *multi, long timeout_ms, void *opaque)
} }
#endif #endif
/* Called from curl_multi_do_locked, with s->mutex held. */
static int curl_sock_cb(CURL *curl, curl_socket_t fd, int action, static int curl_sock_cb(CURL *curl, curl_socket_t fd, int action,
void *userp, void *sp) void *userp, void *sp)
{ {
BDRVCURLState *s; BDRVCURLState *s;
CURLState *state = NULL; CURLState *state = NULL;
CURLSocket *socket;
curl_easy_getinfo(curl, CURLINFO_PRIVATE, (char **)&state); curl_easy_getinfo(curl, CURLINFO_PRIVATE, (char **)&state);
state->sock_fd = fd;
s = state->s; s = state->s;
QLIST_FOREACH(socket, &state->sockets, next) {
if (socket->fd == fd) {
if (action == CURL_POLL_REMOVE) {
QLIST_REMOVE(socket, next);
g_free(socket);
}
break;
}
}
if (!socket) {
socket = g_new0(CURLSocket, 1);
socket->fd = fd;
QLIST_INSERT_HEAD(&state->sockets, socket, next);
}
socket = NULL;
DPRINTF("CURL (AIO): Sock action %d on fd %d\n", action, (int)fd); DPRINTF("CURL (AIO): Sock action %d on fd %d\n", action, (int)fd);
switch (action) { switch (action) {
case CURL_POLL_IN: case CURL_POLL_IN:
aio_set_fd_handler(s->aio_context, fd, false, aio_set_fd_handler(s->aio_context, fd, false,
curl_multi_read, NULL, NULL, state); curl_multi_read, NULL, state);
break; break;
case CURL_POLL_OUT: case CURL_POLL_OUT:
aio_set_fd_handler(s->aio_context, fd, false, aio_set_fd_handler(s->aio_context, fd, false,
NULL, curl_multi_do, NULL, state); NULL, curl_multi_do, state);
break; break;
case CURL_POLL_INOUT: case CURL_POLL_INOUT:
aio_set_fd_handler(s->aio_context, fd, false, aio_set_fd_handler(s->aio_context, fd, false,
curl_multi_read, curl_multi_do, NULL, state); curl_multi_read, curl_multi_do, state);
break; break;
case CURL_POLL_REMOVE: case CURL_POLL_REMOVE:
aio_set_fd_handler(s->aio_context, fd, false, aio_set_fd_handler(s->aio_context, fd, false,
NULL, NULL, NULL, NULL); NULL, NULL, NULL);
break; break;
} }
return 0; return 0;
} }
/* Called from curl_multi_do_locked, with s->mutex held. */
static size_t curl_header_cb(void *ptr, size_t size, size_t nmemb, void *opaque) static size_t curl_header_cb(void *ptr, size_t size, size_t nmemb, void *opaque)
{ {
BDRVCURLState *s = opaque; BDRVCURLState *s = opaque;
@@ -230,7 +206,6 @@ static size_t curl_header_cb(void *ptr, size_t size, size_t nmemb, void *opaque)
return realsize; return realsize;
} }
/* Called from curl_multi_do_locked, with s->mutex held. */
static size_t curl_read_cb(void *ptr, size_t size, size_t nmemb, void *opaque) static size_t curl_read_cb(void *ptr, size_t size, size_t nmemb, void *opaque)
{ {
CURLState *s = ((CURLState*)opaque); CURLState *s = ((CURLState*)opaque);
@@ -239,13 +214,12 @@ static size_t curl_read_cb(void *ptr, size_t size, size_t nmemb, void *opaque)
DPRINTF("CURL: Just reading %zd bytes\n", realsize); DPRINTF("CURL: Just reading %zd bytes\n", realsize);
if (!s || !s->orig_buf) { if (!s || !s->orig_buf)
goto read_end; return 0;
}
if (s->buf_off >= s->buf_len) { if (s->buf_off >= s->buf_len) {
/* buffer full, read nothing */ /* buffer full, read nothing */
goto read_end; return 0;
} }
realsize = MIN(realsize, s->buf_len - s->buf_off); realsize = MIN(realsize, s->buf_len - s->buf_off);
memcpy(s->orig_buf + s->buf_off, ptr, realsize); memcpy(s->orig_buf + s->buf_off, ptr, realsize);
@@ -258,43 +232,27 @@ static size_t curl_read_cb(void *ptr, size_t size, size_t nmemb, void *opaque)
continue; continue;
if ((s->buf_off >= acb->end)) { if ((s->buf_off >= acb->end)) {
size_t request_length = acb->bytes;
qemu_iovec_from_buf(acb->qiov, 0, s->orig_buf + acb->start, qemu_iovec_from_buf(acb->qiov, 0, s->orig_buf + acb->start,
acb->end - acb->start); acb->end - acb->start);
acb->common.cb(acb->common.opaque, 0);
if (acb->end - acb->start < request_length) { qemu_aio_unref(acb);
size_t offset = acb->end - acb->start;
qemu_iovec_memset(acb->qiov, offset, 0,
request_length - offset);
}
acb->ret = 0;
s->acb[i] = NULL; s->acb[i] = NULL;
qemu_mutex_unlock(&s->s->mutex);
aio_co_wake(acb->co);
qemu_mutex_lock(&s->s->mutex);
} }
} }
read_end: return realsize;
/* curl will error out if we do not return this value */
return size * nmemb;
} }
/* Called with s->mutex held. */ static int curl_find_buf(BDRVCURLState *s, size_t start, size_t len,
static bool curl_find_buf(BDRVCURLState *s, uint64_t start, uint64_t len,
CURLAIOCB *acb) CURLAIOCB *acb)
{ {
int i; int i;
uint64_t end = start + len; size_t end = start + len;
uint64_t clamped_end = MIN(end, s->len);
uint64_t clamped_len = clamped_end - start;
for (i=0; i<CURL_NUM_STATES; i++) { for (i=0; i<CURL_NUM_STATES; i++) {
CURLState *state = &s->states[i]; CURLState *state = &s->states[i];
uint64_t buf_end = (state->buf_start + state->buf_off); size_t buf_end = (state->buf_start + state->buf_off);
uint64_t buf_fend = (state->buf_start + state->buf_len); size_t buf_fend = (state->buf_start + state->buf_len);
if (!state->orig_buf) if (!state->orig_buf)
continue; continue;
@@ -304,44 +262,41 @@ static bool curl_find_buf(BDRVCURLState *s, uint64_t start, uint64_t len,
// Does the existing buffer cover our section? // Does the existing buffer cover our section?
if ((start >= state->buf_start) && if ((start >= state->buf_start) &&
(start <= buf_end) && (start <= buf_end) &&
(clamped_end >= state->buf_start) && (end >= state->buf_start) &&
(clamped_end <= buf_end)) (end <= buf_end))
{ {
char *buf = state->orig_buf + (start - state->buf_start); char *buf = state->orig_buf + (start - state->buf_start);
qemu_iovec_from_buf(acb->qiov, 0, buf, clamped_len); qemu_iovec_from_buf(acb->qiov, 0, buf, len);
if (clamped_len < len) { acb->common.cb(acb->common.opaque, 0);
qemu_iovec_memset(acb->qiov, clamped_len, 0, len - clamped_len);
} return FIND_RET_OK;
acb->ret = 0;
return true;
} }
// Wait for unfinished chunks // Wait for unfinished chunks
if (state->in_use && if (state->in_use &&
(start >= state->buf_start) && (start >= state->buf_start) &&
(start <= buf_fend) && (start <= buf_fend) &&
(clamped_end >= state->buf_start) && (end >= state->buf_start) &&
(clamped_end <= buf_fend)) (end <= buf_fend))
{ {
int j; int j;
acb->start = start - state->buf_start; acb->start = start - state->buf_start;
acb->end = acb->start + clamped_len; acb->end = acb->start + len;
for (j=0; j<CURL_NUM_ACB; j++) { for (j=0; j<CURL_NUM_ACB; j++) {
if (!state->acb[j]) { if (!state->acb[j]) {
state->acb[j] = acb; state->acb[j] = acb;
return true; return FIND_RET_WAIT;
} }
} }
} }
} }
return false; return FIND_RET_NONE;
} }
/* Called with s->mutex held. */
static void curl_multi_check_completion(BDRVCURLState *s) static void curl_multi_check_completion(BDRVCURLState *s)
{ {
int msgs_in_queue; int msgs_in_queue;
@@ -383,11 +338,9 @@ static void curl_multi_check_completion(BDRVCURLState *s)
continue; continue;
} }
acb->ret = -EIO; acb->common.cb(acb->common.opaque, -EPROTO);
qemu_aio_unref(acb);
state->acb[i] = NULL; state->acb[i] = NULL;
qemu_mutex_unlock(&s->mutex);
aio_co_wake(acb->co);
qemu_mutex_lock(&s->mutex);
} }
} }
@@ -397,10 +350,9 @@ static void curl_multi_check_completion(BDRVCURLState *s)
} }
} }
/* Called with s->mutex held. */ static void curl_multi_do(void *arg)
static void curl_multi_do_locked(CURLState *s)
{ {
CURLSocket *socket, *next_socket; CURLState *s = (CURLState *)arg;
int running; int running;
int r; int r;
@@ -408,32 +360,18 @@ static void curl_multi_do_locked(CURLState *s)
return; return;
} }
/* Need to use _SAFE because curl_multi_socket_action() may trigger
* curl_sock_cb() which might modify this list */
QLIST_FOREACH_SAFE(socket, &s->sockets, next, next_socket) {
do { do {
r = curl_multi_socket_action(s->s->multi, socket->fd, 0, &running); r = curl_multi_socket_action(s->s->multi, s->sock_fd, 0, &running);
} while(r == CURLM_CALL_MULTI_PERFORM); } while(r == CURLM_CALL_MULTI_PERFORM);
}
}
static void curl_multi_do(void *arg)
{
CURLState *s = (CURLState *)arg;
qemu_mutex_lock(&s->s->mutex);
curl_multi_do_locked(s);
qemu_mutex_unlock(&s->s->mutex);
} }
static void curl_multi_read(void *arg) static void curl_multi_read(void *arg)
{ {
CURLState *s = (CURLState *)arg; CURLState *s = (CURLState *)arg;
qemu_mutex_lock(&s->s->mutex); curl_multi_do(arg);
curl_multi_do_locked(s);
curl_multi_check_completion(s->s); curl_multi_check_completion(s->s);
qemu_mutex_unlock(&s->s->mutex);
} }
static void curl_multi_timeout_do(void *arg) static void curl_multi_timeout_do(void *arg)
@@ -446,38 +384,40 @@ static void curl_multi_timeout_do(void *arg)
return; return;
} }
qemu_mutex_lock(&s->mutex);
curl_multi_socket_action(s->multi, CURL_SOCKET_TIMEOUT, 0, &running); curl_multi_socket_action(s->multi, CURL_SOCKET_TIMEOUT, 0, &running);
curl_multi_check_completion(s); curl_multi_check_completion(s);
qemu_mutex_unlock(&s->mutex);
#else #else
abort(); abort();
#endif #endif
} }
/* Called with s->mutex held. */ static CURLState *curl_init_state(BlockDriverState *bs, BDRVCURLState *s)
static CURLState *curl_find_state(BDRVCURLState *s)
{ {
CURLState *state = NULL; CURLState *state = NULL;
int i; int i, j;
do {
for (i=0; i<CURL_NUM_STATES; i++) { for (i=0; i<CURL_NUM_STATES; i++) {
if (!s->states[i].in_use) { for (j=0; j<CURL_NUM_ACB; j++)
if (s->states[i].acb[j])
continue;
if (s->states[i].in_use)
continue;
state = &s->states[i]; state = &s->states[i];
state->in_use = 1; state->in_use = 1;
break; break;
} }
if (!state) {
aio_poll(bdrv_get_aio_context(bs), true);
} }
return state; } while(!state);
}
static int curl_init_state(BDRVCURLState *s, CURLState *state)
{
if (!state->curl) { if (!state->curl) {
state->curl = curl_easy_init(); state->curl = curl_easy_init();
if (!state->curl) { if (!state->curl) {
return -EIO; return NULL;
} }
curl_easy_setopt(state->curl, CURLOPT_URL, s->url); curl_easy_setopt(state->curl, CURLOPT_URL, s->url);
curl_easy_setopt(state->curl, CURLOPT_SSL_VERIFYPEER, curl_easy_setopt(state->curl, CURLOPT_SSL_VERIFYPEER,
@@ -527,46 +467,22 @@ static int curl_init_state(BDRVCURLState *s, CURLState *state)
#endif #endif
} }
QLIST_INIT(&state->sockets);
state->s = s; state->s = s;
return 0; return state;
} }
/* Called with s->mutex held. */
static void curl_clean_state(CURLState *s) static void curl_clean_state(CURLState *s)
{ {
CURLAIOCB *next;
int j;
for (j = 0; j < CURL_NUM_ACB; j++) {
assert(!s->acb[j]);
}
if (s->s->multi) if (s->s->multi)
curl_multi_remove_handle(s->s->multi, s->curl); curl_multi_remove_handle(s->s->multi, s->curl);
while (!QLIST_EMPTY(&s->sockets)) {
CURLSocket *socket = QLIST_FIRST(&s->sockets);
QLIST_REMOVE(socket, next);
g_free(socket);
}
s->in_use = 0; s->in_use = 0;
next = QSIMPLEQ_FIRST(&s->s->free_state_waitq);
if (next) {
QSIMPLEQ_REMOVE_HEAD(&s->s->free_state_waitq, next);
qemu_mutex_unlock(&s->s->mutex);
aio_co_wake(next->co);
qemu_mutex_lock(&s->s->mutex);
}
} }
static void curl_parse_filename(const char *filename, QDict *options, static void curl_parse_filename(const char *filename, QDict *options,
Error **errp) Error **errp)
{ {
qdict_put_str(options, CURL_BLOCK_OPT_URL, filename); qdict_put(options, CURL_BLOCK_OPT_URL, qstring_from_str(filename));
} }
static void curl_detach_aio_context(BlockDriverState *bs) static void curl_detach_aio_context(BlockDriverState *bs)
@@ -574,7 +490,6 @@ static void curl_detach_aio_context(BlockDriverState *bs)
BDRVCURLState *s = bs->opaque; BDRVCURLState *s = bs->opaque;
int i; int i;
qemu_mutex_lock(&s->mutex);
for (i = 0; i < CURL_NUM_STATES; i++) { for (i = 0; i < CURL_NUM_STATES; i++) {
if (s->states[i].in_use) { if (s->states[i].in_use) {
curl_clean_state(&s->states[i]); curl_clean_state(&s->states[i]);
@@ -590,7 +505,6 @@ static void curl_detach_aio_context(BlockDriverState *bs)
curl_multi_cleanup(s->multi); curl_multi_cleanup(s->multi);
s->multi = NULL; s->multi = NULL;
} }
qemu_mutex_unlock(&s->mutex);
timer_del(&s->timer); timer_del(&s->timer);
} }
@@ -643,11 +557,6 @@ static QemuOptsList runtime_opts = {
.type = QEMU_OPT_STRING, .type = QEMU_OPT_STRING,
.help = "Pass the cookie or list of cookies with each request" .help = "Pass the cookie or list of cookies with each request"
}, },
{
.name = CURL_BLOCK_OPT_COOKIE_SECRET,
.type = QEMU_OPT_STRING,
.help = "ID of secret used as cookie passed with each request"
},
{ {
.name = CURL_BLOCK_OPT_USERNAME, .name = CURL_BLOCK_OPT_USERNAME,
.type = QEMU_OPT_STRING, .type = QEMU_OPT_STRING,
@@ -682,10 +591,8 @@ static int curl_open(BlockDriverState *bs, QDict *options, int flags,
Error *local_err = NULL; Error *local_err = NULL;
const char *file; const char *file;
const char *cookie; const char *cookie;
const char *cookie_secret;
double d; double d;
const char *secretid; const char *secretid;
const char *protocol_delimiter;
static int inited = 0; static int inited = 0;
@@ -694,7 +601,6 @@ static int curl_open(BlockDriverState *bs, QDict *options, int flags,
return -EROFS; return -EROFS;
} }
qemu_mutex_init(&s->mutex);
opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort); opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort);
qemu_opts_absorb_qdict(opts, options, &local_err); qemu_opts_absorb_qdict(opts, options, &local_err);
if (local_err) { if (local_err) {
@@ -720,22 +626,7 @@ static int curl_open(BlockDriverState *bs, QDict *options, int flags,
s->sslverify = qemu_opt_get_bool(opts, CURL_BLOCK_OPT_SSLVERIFY, true); s->sslverify = qemu_opt_get_bool(opts, CURL_BLOCK_OPT_SSLVERIFY, true);
cookie = qemu_opt_get(opts, CURL_BLOCK_OPT_COOKIE); cookie = qemu_opt_get(opts, CURL_BLOCK_OPT_COOKIE);
cookie_secret = qemu_opt_get(opts, CURL_BLOCK_OPT_COOKIE_SECRET);
if (cookie && cookie_secret) {
error_setg(errp,
"curl driver cannot handle both cookie and cookie secret");
goto out_noclean;
}
if (cookie_secret) {
s->cookie = qcrypto_secret_lookup_as_utf8(cookie_secret, errp);
if (!s->cookie) {
goto out_noclean;
}
} else {
s->cookie = g_strdup(cookie); s->cookie = g_strdup(cookie);
}
file = qemu_opt_get(opts, CURL_BLOCK_OPT_URL); file = qemu_opt_get(opts, CURL_BLOCK_OPT_URL);
if (file == NULL) { if (file == NULL) {
@@ -743,15 +634,6 @@ static int curl_open(BlockDriverState *bs, QDict *options, int flags,
goto out_noclean; goto out_noclean;
} }
if (!strstart(file, bs->drv->protocol_name, &protocol_delimiter) ||
!strstart(protocol_delimiter, "://", NULL))
{
error_setg(errp, "%s curl driver cannot handle the URL '%s' (does not "
"start with '%s://')", bs->drv->protocol_name, file,
bs->drv->protocol_name);
goto out_noclean;
}
s->username = g_strdup(qemu_opt_get(opts, CURL_BLOCK_OPT_USERNAME)); s->username = g_strdup(qemu_opt_get(opts, CURL_BLOCK_OPT_USERNAME));
secretid = qemu_opt_get(opts, CURL_BLOCK_OPT_PASSWORD_SECRET); secretid = qemu_opt_get(opts, CURL_BLOCK_OPT_PASSWORD_SECRET);
@@ -778,22 +660,14 @@ static int curl_open(BlockDriverState *bs, QDict *options, int flags,
} }
DPRINTF("CURL: Opening %s\n", file); DPRINTF("CURL: Opening %s\n", file);
QSIMPLEQ_INIT(&s->free_state_waitq);
s->aio_context = bdrv_get_aio_context(bs); s->aio_context = bdrv_get_aio_context(bs);
s->url = g_strdup(file); s->url = g_strdup(file);
qemu_mutex_lock(&s->mutex); state = curl_init_state(bs, s);
state = curl_find_state(s); if (!state)
qemu_mutex_unlock(&s->mutex);
if (!state) {
goto out_noclean; goto out_noclean;
}
// Get file size // Get file size
if (curl_init_state(s, state) < 0) {
goto out;
}
s->accept_range = false; s->accept_range = false;
curl_easy_setopt(state->curl, CURLOPT_NOBODY, 1); curl_easy_setopt(state->curl, CURLOPT_NOBODY, 1);
curl_easy_setopt(state->curl, CURLOPT_HEADERFUNCTION, curl_easy_setopt(state->curl, CURLOPT_HEADERFUNCTION,
@@ -801,28 +675,11 @@ static int curl_open(BlockDriverState *bs, QDict *options, int flags,
curl_easy_setopt(state->curl, CURLOPT_HEADERDATA, s); curl_easy_setopt(state->curl, CURLOPT_HEADERDATA, s);
if (curl_easy_perform(state->curl)) if (curl_easy_perform(state->curl))
goto out; goto out;
if (curl_easy_getinfo(state->curl, CURLINFO_CONTENT_LENGTH_DOWNLOAD, &d)) { curl_easy_getinfo(state->curl, CURLINFO_CONTENT_LENGTH_DOWNLOAD, &d);
if (d)
s->len = (size_t)d;
else if(!s->len)
goto out; goto out;
}
/* Prior CURL 7.19.4 return value of 0 could mean that the file size is not
* know or the size is zero. From 7.19.4 CURL returns -1 if size is not
* known and zero if it is realy zero-length file. */
#if LIBCURL_VERSION_NUM >= 0x071304
if (d < 0) {
pstrcpy(state->errmsg, CURL_ERROR_SIZE,
"Server didn't report file size.");
goto out;
}
#else
if (d <= 0) {
pstrcpy(state->errmsg, CURL_ERROR_SIZE,
"Unknown file size or zero-length file.");
goto out;
}
#endif
s->len = d;
if ((!strncasecmp(s->url, "http://", strlen("http://")) if ((!strncasecmp(s->url, "http://", strlen("http://"))
|| !strncasecmp(s->url, "https://", strlen("https://"))) || !strncasecmp(s->url, "https://", strlen("https://")))
&& !s->accept_range) { && !s->accept_range) {
@@ -830,11 +687,9 @@ static int curl_open(BlockDriverState *bs, QDict *options, int flags,
"Server does not support 'range' (byte ranges)."); "Server does not support 'range' (byte ranges).");
goto out; goto out;
} }
DPRINTF("CURL: Size = %" PRIu64 "\n", s->len); DPRINTF("CURL: Size = %zd\n", s->len);
qemu_mutex_lock(&s->mutex);
curl_clean_state(state); curl_clean_state(state);
qemu_mutex_unlock(&s->mutex);
curl_easy_cleanup(state->curl); curl_easy_cleanup(state->curl);
state->curl = NULL; state->curl = NULL;
@@ -848,95 +703,94 @@ out:
curl_easy_cleanup(state->curl); curl_easy_cleanup(state->curl);
state->curl = NULL; state->curl = NULL;
out_noclean: out_noclean:
qemu_mutex_destroy(&s->mutex);
g_free(s->cookie); g_free(s->cookie);
g_free(s->url); g_free(s->url);
qemu_opts_del(opts); qemu_opts_del(opts);
return -EINVAL; return -EINVAL;
} }
static void curl_setup_preadv(BlockDriverState *bs, CURLAIOCB *acb) static const AIOCBInfo curl_aiocb_info = {
.aiocb_size = sizeof(CURLAIOCB),
};
static void curl_readv_bh_cb(void *p)
{ {
CURLState *state; CURLState *state;
int running; int running;
BDRVCURLState *s = bs->opaque; CURLAIOCB *acb = p;
BDRVCURLState *s = acb->common.bs->opaque;
uint64_t start = acb->offset; qemu_bh_delete(acb->bh);
uint64_t end; acb->bh = NULL;
qemu_mutex_lock(&s->mutex); size_t start = acb->sector_num * SECTOR_SIZE;
size_t end;
// In case we have the requested data already (e.g. read-ahead), // In case we have the requested data already (e.g. read-ahead),
// we can just call the callback and be done. // we can just call the callback and be done.
if (curl_find_buf(s, start, acb->bytes, acb)) { switch (curl_find_buf(s, start, acb->nb_sectors * SECTOR_SIZE, acb)) {
goto out; case FIND_RET_OK:
qemu_aio_unref(acb);
// fall through
case FIND_RET_WAIT:
return;
default:
break;
} }
// No cache found, so let's start a new request // No cache found, so let's start a new request
for (;;) { state = curl_init_state(acb->common.bs, s);
state = curl_find_state(s); if (!state) {
if (state) { acb->common.cb(acb->common.opaque, -EIO);
break; qemu_aio_unref(acb);
} return;
QSIMPLEQ_INSERT_TAIL(&s->free_state_waitq, acb, next);
qemu_mutex_unlock(&s->mutex);
qemu_coroutine_yield();
qemu_mutex_lock(&s->mutex);
}
if (curl_init_state(s, state) < 0) {
curl_clean_state(state);
acb->ret = -EIO;
goto out;
} }
acb->start = 0; acb->start = 0;
acb->end = MIN(acb->bytes, s->len - start); acb->end = (acb->nb_sectors * SECTOR_SIZE);
state->buf_off = 0; state->buf_off = 0;
g_free(state->orig_buf); g_free(state->orig_buf);
state->buf_start = start; state->buf_start = start;
state->buf_len = MIN(acb->end + s->readahead_size, s->len - start); state->buf_len = acb->end + s->readahead_size;
end = start + state->buf_len - 1; end = MIN(start + state->buf_len, s->len) - 1;
state->orig_buf = g_try_malloc(state->buf_len); state->orig_buf = g_try_malloc(state->buf_len);
if (state->buf_len && state->orig_buf == NULL) { if (state->buf_len && state->orig_buf == NULL) {
curl_clean_state(state); curl_clean_state(state);
acb->ret = -ENOMEM; acb->common.cb(acb->common.opaque, -ENOMEM);
goto out; qemu_aio_unref(acb);
return;
} }
state->acb[0] = acb; state->acb[0] = acb;
snprintf(state->range, 127, "%" PRIu64 "-%" PRIu64, start, end); snprintf(state->range, 127, "%zd-%zd", start, end);
DPRINTF("CURL (AIO): Reading %" PRIu64 " at %" PRIu64 " (%s)\n", DPRINTF("CURL (AIO): Reading %d at %zd (%s)\n",
acb->bytes, start, state->range); (acb->nb_sectors * SECTOR_SIZE), start, state->range);
curl_easy_setopt(state->curl, CURLOPT_RANGE, state->range); curl_easy_setopt(state->curl, CURLOPT_RANGE, state->range);
curl_multi_add_handle(s->multi, state->curl); curl_multi_add_handle(s->multi, state->curl);
/* Tell curl it needs to kick things off */ /* Tell curl it needs to kick things off */
curl_multi_socket_action(s->multi, CURL_SOCKET_TIMEOUT, 0, &running); curl_multi_socket_action(s->multi, CURL_SOCKET_TIMEOUT, 0, &running);
out:
qemu_mutex_unlock(&s->mutex);
} }
static int coroutine_fn curl_co_preadv(BlockDriverState *bs, static BlockAIOCB *curl_aio_readv(BlockDriverState *bs,
uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags) int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
BlockCompletionFunc *cb, void *opaque)
{ {
CURLAIOCB acb = { CURLAIOCB *acb;
.co = qemu_coroutine_self(),
.ret = -EINPROGRESS,
.qiov = qiov,
.offset = offset,
.bytes = bytes
};
curl_setup_preadv(bs, &acb); acb = qemu_aio_get(&curl_aiocb_info, bs, cb, opaque);
while (acb.ret == -EINPROGRESS) {
qemu_coroutine_yield(); acb->qiov = qiov;
} acb->sector_num = sector_num;
return acb.ret; acb->nb_sectors = nb_sectors;
acb->bh = aio_bh_new(bdrv_get_aio_context(bs), curl_readv_bh_cb, acb);
qemu_bh_schedule(acb->bh);
return &acb->common;
} }
static void curl_close(BlockDriverState *bs) static void curl_close(BlockDriverState *bs)
@@ -945,7 +799,6 @@ static void curl_close(BlockDriverState *bs)
DPRINTF("CURL: Close\n"); DPRINTF("CURL: Close\n");
curl_detach_aio_context(bs); curl_detach_aio_context(bs);
qemu_mutex_destroy(&s->mutex);
g_free(s->cookie); g_free(s->cookie);
g_free(s->url); g_free(s->url);
@@ -967,7 +820,7 @@ static BlockDriver bdrv_http = {
.bdrv_close = curl_close, .bdrv_close = curl_close,
.bdrv_getlength = curl_getlength, .bdrv_getlength = curl_getlength,
.bdrv_co_preadv = curl_co_preadv, .bdrv_aio_readv = curl_aio_readv,
.bdrv_detach_aio_context = curl_detach_aio_context, .bdrv_detach_aio_context = curl_detach_aio_context,
.bdrv_attach_aio_context = curl_attach_aio_context, .bdrv_attach_aio_context = curl_attach_aio_context,
@@ -983,7 +836,7 @@ static BlockDriver bdrv_https = {
.bdrv_close = curl_close, .bdrv_close = curl_close,
.bdrv_getlength = curl_getlength, .bdrv_getlength = curl_getlength,
.bdrv_co_preadv = curl_co_preadv, .bdrv_aio_readv = curl_aio_readv,
.bdrv_detach_aio_context = curl_detach_aio_context, .bdrv_detach_aio_context = curl_detach_aio_context,
.bdrv_attach_aio_context = curl_attach_aio_context, .bdrv_attach_aio_context = curl_attach_aio_context,
@@ -999,7 +852,7 @@ static BlockDriver bdrv_ftp = {
.bdrv_close = curl_close, .bdrv_close = curl_close,
.bdrv_getlength = curl_getlength, .bdrv_getlength = curl_getlength,
.bdrv_co_preadv = curl_co_preadv, .bdrv_aio_readv = curl_aio_readv,
.bdrv_detach_aio_context = curl_detach_aio_context, .bdrv_detach_aio_context = curl_detach_aio_context,
.bdrv_attach_aio_context = curl_attach_aio_context, .bdrv_attach_aio_context = curl_attach_aio_context,
@@ -1015,7 +868,23 @@ static BlockDriver bdrv_ftps = {
.bdrv_close = curl_close, .bdrv_close = curl_close,
.bdrv_getlength = curl_getlength, .bdrv_getlength = curl_getlength,
.bdrv_co_preadv = curl_co_preadv, .bdrv_aio_readv = curl_aio_readv,
.bdrv_detach_aio_context = curl_detach_aio_context,
.bdrv_attach_aio_context = curl_attach_aio_context,
};
static BlockDriver bdrv_tftp = {
.format_name = "tftp",
.protocol_name = "tftp",
.instance_size = sizeof(BDRVCURLState),
.bdrv_parse_filename = curl_parse_filename,
.bdrv_file_open = curl_open,
.bdrv_close = curl_close,
.bdrv_getlength = curl_getlength,
.bdrv_aio_readv = curl_aio_readv,
.bdrv_detach_aio_context = curl_detach_aio_context, .bdrv_detach_aio_context = curl_detach_aio_context,
.bdrv_attach_aio_context = curl_attach_aio_context, .bdrv_attach_aio_context = curl_attach_aio_context,
@@ -1027,6 +896,7 @@ static void curl_block_init(void)
bdrv_register(&bdrv_https); bdrv_register(&bdrv_https);
bdrv_register(&bdrv_ftp); bdrv_register(&bdrv_ftp);
bdrv_register(&bdrv_ftps); bdrv_register(&bdrv_ftps);
bdrv_register(&bdrv_tftp);
} }
block_init(curl_block_init); block_init(curl_block_init);

586
block/dictzip.c Normal file
View File

@@ -0,0 +1,586 @@
/*
* DictZip Block driver for dictzip enabled gzip files
*
* Use the "dictzip" tool from the "dictd" package to create gzip files that
* contain the extra DictZip headers.
*
* dictzip(1) is a compression program which creates compressed files in the
* gzip format (see RFC 1952). However, unlike gzip(1), dictzip(1) compresses
* the file in pieces and stores an index to the pieces in the gzip header.
* This allows random access to the file at the granularity of the compressed
* pieces (currently about 64kB) while maintaining good compression ratios
* (within 5% of the expected ratio for dictionary data).
* dictd(8) uses files stored in this format.
*
* For details on DictZip see http://dict.org/.
*
* Copyright (c) 2009 Alexander Graf <agraf@suse.de>
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*/
#include "qemu/osdep.h"
#include "qapi/error.h"
#include "qemu-common.h"
#include "block/block_int.h"
#include <zlib.h>
// #define DEBUG
#ifdef DEBUG
#define dprintf(fmt, ...) do { printf("dzip: " fmt, ## __VA_ARGS__); } while (0)
#else
#define dprintf(fmt, ...) do { } while (0)
#endif
#define SECTOR_SIZE 512
#define Z_STREAM_COUNT 4
#define CACHE_COUNT 20
/* magic values */
#define GZ_MAGIC1 0x1f
#define GZ_MAGIC2 0x8b
#define DZ_MAGIC1 'R'
#define DZ_MAGIC2 'A'
#define GZ_FEXTRA 0x04 /* Optional field (random access index) */
#define GZ_FNAME 0x08 /* Original name */
#define GZ_COMMENT 0x10 /* Zero-terminated, human-readable comment */
#define GZ_FHCRC 0x02 /* Header CRC16 */
/* offsets */
#define GZ_ID 0 /* GZ_MAGIC (16bit) */
#define GZ_FLG 3 /* FLaGs (see above) */
#define GZ_XLEN 10 /* eXtra LENgth (16bit) */
#define GZ_SI 12 /* Subfield ID (16bit) */
#define GZ_VERSION 16 /* Version for subfield format */
#define GZ_CHUNKSIZE 18 /* Chunk size (16bit) */
#define GZ_CHUNKCNT 20 /* Number of chunks (16bit) */
#define GZ_RNDDATA 22 /* Random access data (16bit) */
#define GZ_99_CHUNKSIZE 18 /* Chunk size (32bit) */
#define GZ_99_CHUNKCNT 22 /* Number of chunks (32bit) */
#define GZ_99_FILESIZE 26 /* Size of unpacked file (64bit) */
#define GZ_99_RNDDATA 34 /* Random access data (32bit) */
struct BDRVDictZipState;
typedef struct DictZipAIOCB {
BlockAIOCB common;
struct BDRVDictZipState *s;
QEMUIOVector *qiov; /* QIOV of the original request */
QEMUIOVector *qiov_gz; /* QIOV of the gz subrequest */
QEMUBH *bh; /* BH for cache */
z_stream *zStream; /* stream to use for decoding */
int zStream_id; /* stream id of the above pointer */
size_t start; /* offset into the uncompressed file */
size_t len; /* uncompressed bytes to read */
uint8_t *gzipped; /* the gzipped data */
uint8_t *buf; /* cached result */
size_t gz_len; /* amount of gzip data */
size_t gz_start; /* uncompressed starting point of gzip data */
uint64_t offset; /* offset for "start" into the uncompressed chunk */
int chunks_len; /* amount of uncompressed data in all gzip data */
} DictZipAIOCB;
typedef struct dict_cache {
size_t start;
size_t len;
uint8_t *buf;
} DictCache;
typedef struct BDRVDictZipState {
BlockDriverState *hd;
z_stream zStream[Z_STREAM_COUNT];
DictCache cache[CACHE_COUNT];
int cache_index;
uint8_t stream_in_use;
uint64_t chunk_len;
uint32_t chunk_cnt;
uint16_t *chunks;
uint32_t *chunks32;
uint64_t *offsets;
int64_t file_len;
} BDRVDictZipState;
static int start_zStream(z_stream *zStream)
{
zStream->zalloc = NULL;
zStream->zfree = NULL;
zStream->opaque = NULL;
zStream->next_in = 0;
zStream->avail_in = 0;
zStream->next_out = NULL;
zStream->avail_out = 0;
return inflateInit2( zStream, -15 );
}
static QemuOptsList runtime_opts = {
.name = "dzip",
.head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
.desc = {
{
.name = "filename",
.type = QEMU_OPT_STRING,
.help = "URL to the dictzip file",
},
{ /* end of list */ }
},
};
static int dictzip_open(BlockDriverState *bs, QDict *options, int flags, Error **errp)
{
BDRVDictZipState *s = bs->opaque;
const char *err = "Unknown (read error?)";
uint8_t magic[2];
char buf[100];
uint8_t header_flags;
uint16_t chunk_len16;
uint16_t chunk_cnt16;
uint32_t chunk_len32;
uint16_t header_ver;
uint16_t tmp_short;
uint64_t offset;
int chunks_len;
int headerLength = GZ_XLEN - 1;
int rnd_offs;
int ret;
int i;
QemuOpts *opts;
Error *local_err = NULL;
const char *filename;
opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort);
qemu_opts_absorb_qdict(opts, options, &local_err);
if (local_err != NULL) {
error_propagate(errp, local_err);
ret = -EINVAL;
goto fail;
}
filename = qemu_opt_get(opts, "filename");
if (!strncmp(filename, "dzip://", 7))
filename += 7;
else if (!strncmp(filename, "dzip:", 5))
filename += 5;
s->hd = bdrv_open(filename, NULL, NULL, flags | BDRV_O_PROTOCOL, errp);
if (!s->hd) {
ret = -EINVAL;
qemu_opts_del(opts);
return ret;
}
/* initialize zlib streams */
for (i = 0; i < Z_STREAM_COUNT; i++) {
if (start_zStream( &s->zStream[i] ) != Z_OK) {
err = s->zStream[i].msg;
goto fail;
}
}
/* gzip header */
if (bdrv_pread(s->hd->file, GZ_ID, &magic, sizeof(magic)) != sizeof(magic))
goto fail;
if (!((magic[0] == GZ_MAGIC1) && (magic[1] == GZ_MAGIC2))) {
err = "No gzip file";
goto fail;
}
/* dzip header */
if (bdrv_pread(s->hd->file, GZ_FLG, &header_flags, 1) != 1)
goto fail;
if (!(header_flags & GZ_FEXTRA)) {
err = "Not a dictzip file (wrong flags)";
goto fail;
}
/* extra length */
if (bdrv_pread(s->hd->file, GZ_XLEN, &tmp_short, 2) != 2)
goto fail;
headerLength += le16_to_cpu(tmp_short) + 2;
/* DictZip magic */
if (bdrv_pread(s->hd->file, GZ_SI, &magic, 2) != 2)
goto fail;
if (magic[0] != DZ_MAGIC1 || magic[1] != DZ_MAGIC2) {
err = "Not a dictzip file (missing extra magic)";
goto fail;
}
/* DictZip version */
if (bdrv_pread(s->hd->file, GZ_VERSION, &header_ver, 2) != 2)
goto fail;
header_ver = le16_to_cpu(header_ver);
switch (header_ver) {
case 1: /* Normal DictZip */
/* number of chunks */
if (bdrv_pread(s->hd->file, GZ_CHUNKSIZE, &chunk_len16, 2) != 2)
goto fail;
s->chunk_len = le16_to_cpu(chunk_len16);
/* chunk count */
if (bdrv_pread(s->hd->file, GZ_CHUNKCNT, &chunk_cnt16, 2) != 2)
goto fail;
s->chunk_cnt = le16_to_cpu(chunk_cnt16);
chunks_len = sizeof(short) * s->chunk_cnt;
rnd_offs = GZ_RNDDATA;
break;
case 99: /* Special Alex pigz version */
/* number of chunks */
if (bdrv_pread(s->hd->file, GZ_99_CHUNKSIZE, &chunk_len32, 4) != 4)
goto fail;
dprintf("chunk len [%#x] = %d\n", GZ_99_CHUNKSIZE, chunk_len32);
s->chunk_len = le32_to_cpu(chunk_len32);
/* chunk count */
if (bdrv_pread(s->hd->file, GZ_99_CHUNKCNT, &s->chunk_cnt, 4) != 4)
goto fail;
s->chunk_cnt = le32_to_cpu(s->chunk_cnt);
dprintf("chunk len | count = %"PRId64" | %d\n", s->chunk_len, s->chunk_cnt);
/* file size */
if (bdrv_pread(s->hd->file, GZ_99_FILESIZE, &s->file_len, 8) != 8)
goto fail;
s->file_len = le64_to_cpu(s->file_len);
chunks_len = sizeof(int) * s->chunk_cnt;
rnd_offs = GZ_99_RNDDATA;
break;
default:
err = "Invalid DictZip version";
goto fail;
}
/* random access data */
s->chunks = g_malloc(chunks_len);
if (header_ver == 99)
s->chunks32 = (uint32_t *)s->chunks;
if (bdrv_pread(s->hd->file, rnd_offs, s->chunks, chunks_len) != chunks_len)
goto fail;
/* orig filename */
if (header_flags & GZ_FNAME) {
if (bdrv_pread(s->hd->file, headerLength + 1, buf, sizeof(buf)) != sizeof(buf))
goto fail;
buf[sizeof(buf) - 1] = '\0';
headerLength += strlen(buf) + 1;
if (strlen(buf) == sizeof(buf))
goto fail;
dprintf("filename: %s\n", buf);
}
/* comment field */
if (header_flags & GZ_COMMENT) {
if (bdrv_pread(s->hd->file, headerLength, buf, sizeof(buf)) != sizeof(buf))
goto fail;
buf[sizeof(buf) - 1] = '\0';
headerLength += strlen(buf) + 1;
if (strlen(buf) == sizeof(buf))
goto fail;
dprintf("comment: %s\n", buf);
}
if (header_flags & GZ_FHCRC)
headerLength += 2;
/* uncompressed file length*/
if (!s->file_len) {
uint32_t file_len;
if (bdrv_pread(s->hd->file, bdrv_getlength(s->hd) - 4, &file_len, 4) != 4)
goto fail;
s->file_len = le32_to_cpu(file_len);
}
/* compute offsets */
s->offsets = g_malloc(sizeof( *s->offsets ) * s->chunk_cnt);
for (offset = headerLength + 1, i = 0; i < s->chunk_cnt; i++) {
s->offsets[i] = offset;
switch (header_ver) {
case 1:
offset += le16_to_cpu(s->chunks[i]);
break;
case 99:
offset += le32_to_cpu(s->chunks32[i]);
break;
}
dprintf("chunk %#"PRIx64" - %#"PRIx64" = offset %#"PRIx64" -> %#"PRIx64"\n", i * s->chunk_len, (i+1) * s->chunk_len, s->offsets[i], offset);
}
qemu_opts_del(opts);
return 0;
fail:
fprintf(stderr, "DictZip: Error opening file: %s\n", err);
bdrv_unref(s->hd);
if (s->chunks)
g_free(s->chunks);
qemu_opts_del(opts);
return -EINVAL;
}
/* This callback gets invoked when we have the result in cache already */
static void dictzip_cache_cb(void *opaque)
{
DictZipAIOCB *acb = (DictZipAIOCB *)opaque;
qemu_iovec_from_buf(acb->qiov, 0, acb->buf, acb->len);
acb->common.cb(acb->common.opaque, 0);
qemu_bh_delete(acb->bh);
qemu_aio_unref(acb);
}
/* This callback gets invoked by the underlying block reader when we have
* all compressed data. We uncompress in here. */
static void dictzip_read_cb(void *opaque, int ret)
{
DictZipAIOCB *acb = (DictZipAIOCB *)opaque;
struct BDRVDictZipState *s = acb->s;
uint8_t *buf;
DictCache *cache;
int r, i;
buf = g_malloc(acb->chunks_len);
/* try to find zlib stream for decoding */
do {
for (i = 0; i < Z_STREAM_COUNT; i++) {
if (!(s->stream_in_use & (1 << i))) {
s->stream_in_use |= (1 << i);
acb->zStream_id = i;
acb->zStream = &s->zStream[i];
break;
}
}
} while(!acb->zStream);
/* sure, we could handle more streams, but this callback should be single
threaded and when it's not, we really want to know! */
assert(i == 0);
/* uncompress the chunk */
acb->zStream->next_in = acb->gzipped;
acb->zStream->avail_in = acb->gz_len;
acb->zStream->next_out = buf;
acb->zStream->avail_out = acb->chunks_len;
r = inflate( acb->zStream, Z_PARTIAL_FLUSH );
if ( (r != Z_OK) && (r != Z_STREAM_END) )
fprintf(stderr, "Error inflating: [%d] %s\n", r, acb->zStream->msg);
if ( r == Z_STREAM_END )
inflateReset(acb->zStream);
dprintf("inflating [%d] left: %d | %d bytes\n", r, acb->zStream->avail_in, acb->zStream->avail_out);
s->stream_in_use &= ~(1 << acb->zStream_id);
/* nofity the caller */
qemu_iovec_from_buf(acb->qiov, 0, buf + acb->offset, acb->len);
acb->common.cb(acb->common.opaque, 0);
/* fill the cache */
cache = &s->cache[s->cache_index];
s->cache_index++;
if (s->cache_index == CACHE_COUNT)
s->cache_index = 0;
cache->len = 0;
if (cache->buf)
g_free(cache->buf);
cache->start = acb->gz_start;
cache->buf = buf;
cache->len = acb->chunks_len;
/* free occupied ressources */
g_free(acb->qiov_gz);
qemu_aio_unref(acb);
}
static const AIOCBInfo dictzip_aiocb_info = {
.aiocb_size = sizeof(DictZipAIOCB),
};
/* This is where we get a request from a caller to read something */
static BlockAIOCB *dictzip_aio_readv(BlockDriverState *bs,
int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
BlockCompletionFunc *cb, void *opaque)
{
BDRVDictZipState *s = bs->opaque;
DictZipAIOCB *acb;
QEMUIOVector *qiov_gz;
struct iovec *iov;
uint8_t *buf;
size_t start = sector_num * SECTOR_SIZE;
size_t len = nb_sectors * SECTOR_SIZE;
size_t end = start + len;
size_t gz_start;
size_t gz_len;
int64_t gz_sector_num;
int gz_nb_sectors;
int first_chunk, last_chunk;
int first_offset;
int i;
acb = qemu_aio_get(&dictzip_aiocb_info, bs, cb, opaque);
if (!acb)
return NULL;
/* Search Cache */
for (i = 0; i < CACHE_COUNT; i++) {
if (!s->cache[i].len)
continue;
if ((start >= s->cache[i].start) &&
(end <= (s->cache[i].start + s->cache[i].len))) {
acb->buf = s->cache[i].buf + (start - s->cache[i].start);
acb->len = len;
acb->qiov = qiov;
acb->bh = qemu_bh_new(dictzip_cache_cb, acb);
qemu_bh_schedule(acb->bh);
return &acb->common;
}
}
/* No cache, so let's decode */
/* We need to read these chunks */
first_chunk = start / s->chunk_len;
first_offset = start - first_chunk * s->chunk_len;
last_chunk = end / s->chunk_len;
gz_start = s->offsets[first_chunk];
gz_len = 0;
for (i = first_chunk; i <= last_chunk; i++) {
if (s->chunks32)
gz_len += le32_to_cpu(s->chunks32[i]);
else
gz_len += le16_to_cpu(s->chunks[i]);
}
gz_sector_num = gz_start / SECTOR_SIZE;
gz_nb_sectors = (gz_len / SECTOR_SIZE);
/* account for tail and heads */
while ((gz_start + gz_len) > ((gz_sector_num + gz_nb_sectors) * SECTOR_SIZE))
gz_nb_sectors++;
/* Allocate qiov, iov and buf in one chunk so we only need to free qiov */
qiov_gz = g_malloc0(sizeof(QEMUIOVector) + sizeof(struct iovec) +
(gz_nb_sectors * SECTOR_SIZE));
iov = (struct iovec *)(((char *)qiov_gz) + sizeof(QEMUIOVector));
buf = ((uint8_t *)iov) + sizeof(struct iovec *);
/* Kick off the read by the backing file, so we can start decompressing */
iov->iov_base = (void *)buf;
iov->iov_len = gz_nb_sectors * 512;
qemu_iovec_init_external(qiov_gz, iov, 1);
dprintf("read %zd - %zd => %zd - %zd\n", start, end, gz_start, gz_start + gz_len);
acb->s = s;
acb->qiov = qiov;
acb->qiov_gz = qiov_gz;
acb->start = start;
acb->len = len;
acb->gzipped = buf + (gz_start % SECTOR_SIZE);
acb->gz_len = gz_len;
acb->gz_start = first_chunk * s->chunk_len;
acb->offset = first_offset;
acb->chunks_len = (last_chunk - first_chunk + 1) * s->chunk_len;
return bdrv_aio_readv(s->hd->file, gz_sector_num, qiov_gz, gz_nb_sectors,
dictzip_read_cb, acb);
}
static void dictzip_close(BlockDriverState *bs)
{
BDRVDictZipState *s = bs->opaque;
int i;
for (i = 0; i < CACHE_COUNT; i++) {
if (!s->cache[i].len)
continue;
g_free(s->cache[i].buf);
}
for (i = 0; i < Z_STREAM_COUNT; i++) {
inflateEnd(&s->zStream[i]);
}
if (s->chunks)
g_free(s->chunks);
if (s->offsets)
g_free(s->offsets);
dprintf("Close\n");
}
static int64_t dictzip_getlength(BlockDriverState *bs)
{
BDRVDictZipState *s = bs->opaque;
dprintf("getlength -> %ld\n", s->file_len);
return s->file_len;
}
static BlockDriver bdrv_dictzip = {
.format_name = "dzip",
.protocol_name = "dzip",
.instance_size = sizeof(BDRVDictZipState),
.bdrv_file_open = dictzip_open,
.bdrv_close = dictzip_close,
.bdrv_getlength = dictzip_getlength,
.bdrv_aio_readv = dictzip_aio_readv,
};
static void dictzip_block_init(void)
{
bdrv_register(&bdrv_dictzip);
}
block_init(dictzip_block_init);

View File

@@ -37,43 +37,14 @@
* or enabled. A frozen bitmap can only abdicate() or reclaim(). * or enabled. A frozen bitmap can only abdicate() or reclaim().
*/ */
struct BdrvDirtyBitmap { struct BdrvDirtyBitmap {
QemuMutex *mutex;
HBitmap *bitmap; /* Dirty sector bitmap implementation */ HBitmap *bitmap; /* Dirty sector bitmap implementation */
HBitmap *meta; /* Meta dirty bitmap */
BdrvDirtyBitmap *successor; /* Anonymous child; implies frozen status */ BdrvDirtyBitmap *successor; /* Anonymous child; implies frozen status */
char *name; /* Optional non-empty unique ID */ char *name; /* Optional non-empty unique ID */
int64_t size; /* Size of the bitmap (Number of sectors) */ int64_t size; /* Size of the bitmap (Number of sectors) */
bool disabled; /* Bitmap is read-only */ bool disabled; /* Bitmap is read-only */
int active_iterators; /* How many iterators are active */
QLIST_ENTRY(BdrvDirtyBitmap) list; QLIST_ENTRY(BdrvDirtyBitmap) list;
}; };
struct BdrvDirtyBitmapIter {
HBitmapIter hbi;
BdrvDirtyBitmap *bitmap;
};
static inline void bdrv_dirty_bitmaps_lock(BlockDriverState *bs)
{
qemu_mutex_lock(&bs->dirty_bitmap_mutex);
}
static inline void bdrv_dirty_bitmaps_unlock(BlockDriverState *bs)
{
qemu_mutex_unlock(&bs->dirty_bitmap_mutex);
}
void bdrv_dirty_bitmap_lock(BdrvDirtyBitmap *bitmap)
{
qemu_mutex_lock(bitmap->mutex);
}
void bdrv_dirty_bitmap_unlock(BdrvDirtyBitmap *bitmap)
{
qemu_mutex_unlock(bitmap->mutex);
}
/* Called with BQL or dirty_bitmap lock taken. */
BdrvDirtyBitmap *bdrv_find_dirty_bitmap(BlockDriverState *bs, const char *name) BdrvDirtyBitmap *bdrv_find_dirty_bitmap(BlockDriverState *bs, const char *name)
{ {
BdrvDirtyBitmap *bm; BdrvDirtyBitmap *bm;
@@ -87,7 +58,6 @@ BdrvDirtyBitmap *bdrv_find_dirty_bitmap(BlockDriverState *bs, const char *name)
return NULL; return NULL;
} }
/* Called with BQL taken. */
void bdrv_dirty_bitmap_make_anon(BdrvDirtyBitmap *bitmap) void bdrv_dirty_bitmap_make_anon(BdrvDirtyBitmap *bitmap)
{ {
assert(!bdrv_dirty_bitmap_frozen(bitmap)); assert(!bdrv_dirty_bitmap_frozen(bitmap));
@@ -95,7 +65,6 @@ void bdrv_dirty_bitmap_make_anon(BdrvDirtyBitmap *bitmap)
bitmap->name = NULL; bitmap->name = NULL;
} }
/* Called with BQL taken. */
BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState *bs,
uint32_t granularity, uint32_t granularity,
const char *name, const char *name,
@@ -120,109 +89,24 @@ BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState *bs,
return NULL; return NULL;
} }
bitmap = g_new0(BdrvDirtyBitmap, 1); bitmap = g_new0(BdrvDirtyBitmap, 1);
bitmap->mutex = &bs->dirty_bitmap_mutex;
bitmap->bitmap = hbitmap_alloc(bitmap_size, ctz32(sector_granularity)); bitmap->bitmap = hbitmap_alloc(bitmap_size, ctz32(sector_granularity));
bitmap->size = bitmap_size; bitmap->size = bitmap_size;
bitmap->name = g_strdup(name); bitmap->name = g_strdup(name);
bitmap->disabled = false; bitmap->disabled = false;
bdrv_dirty_bitmaps_lock(bs);
QLIST_INSERT_HEAD(&bs->dirty_bitmaps, bitmap, list); QLIST_INSERT_HEAD(&bs->dirty_bitmaps, bitmap, list);
bdrv_dirty_bitmaps_unlock(bs);
return bitmap; return bitmap;
} }
/* bdrv_create_meta_dirty_bitmap
*
* Create a meta dirty bitmap that tracks the changes of bits in @bitmap. I.e.
* when a dirty status bit in @bitmap is changed (either from reset to set or
* the other way around), its respective meta dirty bitmap bit will be marked
* dirty as well.
*
* @bitmap: the block dirty bitmap for which to create a meta dirty bitmap.
* @chunk_size: how many bytes of bitmap data does each bit in the meta bitmap
* track.
*/
void bdrv_create_meta_dirty_bitmap(BdrvDirtyBitmap *bitmap,
int chunk_size)
{
assert(!bitmap->meta);
qemu_mutex_lock(bitmap->mutex);
bitmap->meta = hbitmap_create_meta(bitmap->bitmap,
chunk_size * BITS_PER_BYTE);
qemu_mutex_unlock(bitmap->mutex);
}
void bdrv_release_meta_dirty_bitmap(BdrvDirtyBitmap *bitmap)
{
assert(bitmap->meta);
qemu_mutex_lock(bitmap->mutex);
hbitmap_free_meta(bitmap->bitmap);
bitmap->meta = NULL;
qemu_mutex_unlock(bitmap->mutex);
}
int bdrv_dirty_bitmap_get_meta_locked(BlockDriverState *bs,
BdrvDirtyBitmap *bitmap, int64_t sector,
int nb_sectors)
{
uint64_t i;
int sectors_per_bit = 1 << hbitmap_granularity(bitmap->meta);
/* To optimize: we can make hbitmap to internally check the range in a
* coarse level, or at least do it word by word. */
for (i = sector; i < sector + nb_sectors; i += sectors_per_bit) {
if (hbitmap_get(bitmap->meta, i)) {
return true;
}
}
return false;
}
int bdrv_dirty_bitmap_get_meta(BlockDriverState *bs,
BdrvDirtyBitmap *bitmap, int64_t sector,
int nb_sectors)
{
bool dirty;
qemu_mutex_lock(bitmap->mutex);
dirty = bdrv_dirty_bitmap_get_meta_locked(bs, bitmap, sector, nb_sectors);
qemu_mutex_unlock(bitmap->mutex);
return dirty;
}
void bdrv_dirty_bitmap_reset_meta(BlockDriverState *bs,
BdrvDirtyBitmap *bitmap, int64_t sector,
int nb_sectors)
{
qemu_mutex_lock(bitmap->mutex);
hbitmap_reset(bitmap->meta, sector, nb_sectors);
qemu_mutex_unlock(bitmap->mutex);
}
int64_t bdrv_dirty_bitmap_size(const BdrvDirtyBitmap *bitmap)
{
return bitmap->size;
}
const char *bdrv_dirty_bitmap_name(const BdrvDirtyBitmap *bitmap)
{
return bitmap->name;
}
/* Called with BQL taken. */
bool bdrv_dirty_bitmap_frozen(BdrvDirtyBitmap *bitmap) bool bdrv_dirty_bitmap_frozen(BdrvDirtyBitmap *bitmap)
{ {
return bitmap->successor; return bitmap->successor;
} }
/* Called with BQL taken. */
bool bdrv_dirty_bitmap_enabled(BdrvDirtyBitmap *bitmap) bool bdrv_dirty_bitmap_enabled(BdrvDirtyBitmap *bitmap)
{ {
return !(bitmap->disabled || bitmap->successor); return !(bitmap->disabled || bitmap->successor);
} }
/* Called with BQL taken. */
DirtyBitmapStatus bdrv_dirty_bitmap_status(BdrvDirtyBitmap *bitmap) DirtyBitmapStatus bdrv_dirty_bitmap_status(BdrvDirtyBitmap *bitmap)
{ {
if (bdrv_dirty_bitmap_frozen(bitmap)) { if (bdrv_dirty_bitmap_frozen(bitmap)) {
@@ -237,7 +121,6 @@ DirtyBitmapStatus bdrv_dirty_bitmap_status(BdrvDirtyBitmap *bitmap)
/** /**
* Create a successor bitmap destined to replace this bitmap after an operation. * Create a successor bitmap destined to replace this bitmap after an operation.
* Requires that the bitmap is not frozen and has no successor. * Requires that the bitmap is not frozen and has no successor.
* Called with BQL taken.
*/ */
int bdrv_dirty_bitmap_create_successor(BlockDriverState *bs, int bdrv_dirty_bitmap_create_successor(BlockDriverState *bs,
BdrvDirtyBitmap *bitmap, Error **errp) BdrvDirtyBitmap *bitmap, Error **errp)
@@ -270,7 +153,6 @@ int bdrv_dirty_bitmap_create_successor(BlockDriverState *bs,
/** /**
* For a bitmap with a successor, yield our name to the successor, * For a bitmap with a successor, yield our name to the successor,
* delete the old bitmap, and return a handle to the new bitmap. * delete the old bitmap, and return a handle to the new bitmap.
* Called with BQL taken.
*/ */
BdrvDirtyBitmap *bdrv_dirty_bitmap_abdicate(BlockDriverState *bs, BdrvDirtyBitmap *bdrv_dirty_bitmap_abdicate(BlockDriverState *bs,
BdrvDirtyBitmap *bitmap, BdrvDirtyBitmap *bitmap,
@@ -298,7 +180,6 @@ BdrvDirtyBitmap *bdrv_dirty_bitmap_abdicate(BlockDriverState *bs,
* In cases of failure where we can no longer safely delete the parent, * In cases of failure where we can no longer safely delete the parent,
* we may wish to re-join the parent and child/successor. * we may wish to re-join the parent and child/successor.
* The merged parent will be un-frozen, but not explicitly re-enabled. * The merged parent will be un-frozen, but not explicitly re-enabled.
* Called with BQL taken.
*/ */
BdrvDirtyBitmap *bdrv_reclaim_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bdrv_reclaim_dirty_bitmap(BlockDriverState *bs,
BdrvDirtyBitmap *parent, BdrvDirtyBitmap *parent,
@@ -323,54 +204,39 @@ BdrvDirtyBitmap *bdrv_reclaim_dirty_bitmap(BlockDriverState *bs,
/** /**
* Truncates _all_ bitmaps attached to a BDS. * Truncates _all_ bitmaps attached to a BDS.
* Called with BQL taken.
*/ */
void bdrv_dirty_bitmap_truncate(BlockDriverState *bs) void bdrv_dirty_bitmap_truncate(BlockDriverState *bs)
{ {
BdrvDirtyBitmap *bitmap; BdrvDirtyBitmap *bitmap;
uint64_t size = bdrv_nb_sectors(bs); uint64_t size = bdrv_nb_sectors(bs);
bdrv_dirty_bitmaps_lock(bs);
QLIST_FOREACH(bitmap, &bs->dirty_bitmaps, list) { QLIST_FOREACH(bitmap, &bs->dirty_bitmaps, list) {
assert(!bdrv_dirty_bitmap_frozen(bitmap)); assert(!bdrv_dirty_bitmap_frozen(bitmap));
assert(!bitmap->active_iterators);
hbitmap_truncate(bitmap->bitmap, size); hbitmap_truncate(bitmap->bitmap, size);
bitmap->size = size; bitmap->size = size;
} }
bdrv_dirty_bitmaps_unlock(bs);
} }
/* Called with BQL taken. */
static void bdrv_do_release_matching_dirty_bitmap(BlockDriverState *bs, static void bdrv_do_release_matching_dirty_bitmap(BlockDriverState *bs,
BdrvDirtyBitmap *bitmap, BdrvDirtyBitmap *bitmap,
bool only_named) bool only_named)
{ {
BdrvDirtyBitmap *bm, *next; BdrvDirtyBitmap *bm, *next;
bdrv_dirty_bitmaps_lock(bs);
QLIST_FOREACH_SAFE(bm, &bs->dirty_bitmaps, list, next) { QLIST_FOREACH_SAFE(bm, &bs->dirty_bitmaps, list, next) {
if ((!bitmap || bm == bitmap) && (!only_named || bm->name)) { if ((!bitmap || bm == bitmap) && (!only_named || bm->name)) {
assert(!bm->active_iterators);
assert(!bdrv_dirty_bitmap_frozen(bm)); assert(!bdrv_dirty_bitmap_frozen(bm));
assert(!bm->meta);
QLIST_REMOVE(bm, list); QLIST_REMOVE(bm, list);
hbitmap_free(bm->bitmap); hbitmap_free(bm->bitmap);
g_free(bm->name); g_free(bm->name);
g_free(bm); g_free(bm);
if (bitmap) { if (bitmap) {
goto out; return;
} }
} }
} }
if (bitmap) {
abort();
} }
out:
bdrv_dirty_bitmaps_unlock(bs);
}
/* Called with BQL taken. */
void bdrv_release_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap) void bdrv_release_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap)
{ {
bdrv_do_release_matching_dirty_bitmap(bs, bitmap, false); bdrv_do_release_matching_dirty_bitmap(bs, bitmap, false);
@@ -379,21 +245,18 @@ void bdrv_release_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap)
/** /**
* Release all named dirty bitmaps attached to a BDS (for use in bdrv_close()). * Release all named dirty bitmaps attached to a BDS (for use in bdrv_close()).
* There must not be any frozen bitmaps attached. * There must not be any frozen bitmaps attached.
* Called with BQL taken.
*/ */
void bdrv_release_named_dirty_bitmaps(BlockDriverState *bs) void bdrv_release_named_dirty_bitmaps(BlockDriverState *bs)
{ {
bdrv_do_release_matching_dirty_bitmap(bs, NULL, true); bdrv_do_release_matching_dirty_bitmap(bs, NULL, true);
} }
/* Called with BQL taken. */
void bdrv_disable_dirty_bitmap(BdrvDirtyBitmap *bitmap) void bdrv_disable_dirty_bitmap(BdrvDirtyBitmap *bitmap)
{ {
assert(!bdrv_dirty_bitmap_frozen(bitmap)); assert(!bdrv_dirty_bitmap_frozen(bitmap));
bitmap->disabled = true; bitmap->disabled = true;
} }
/* Called with BQL taken. */
void bdrv_enable_dirty_bitmap(BdrvDirtyBitmap *bitmap) void bdrv_enable_dirty_bitmap(BdrvDirtyBitmap *bitmap)
{ {
assert(!bdrv_dirty_bitmap_frozen(bitmap)); assert(!bdrv_dirty_bitmap_frozen(bitmap));
@@ -406,7 +269,6 @@ BlockDirtyInfoList *bdrv_query_dirty_bitmaps(BlockDriverState *bs)
BlockDirtyInfoList *list = NULL; BlockDirtyInfoList *list = NULL;
BlockDirtyInfoList **plist = &list; BlockDirtyInfoList **plist = &list;
bdrv_dirty_bitmaps_lock(bs);
QLIST_FOREACH(bm, &bs->dirty_bitmaps, list) { QLIST_FOREACH(bm, &bs->dirty_bitmaps, list) {
BlockDirtyInfo *info = g_new0(BlockDirtyInfo, 1); BlockDirtyInfo *info = g_new0(BlockDirtyInfo, 1);
BlockDirtyInfoList *entry = g_new0(BlockDirtyInfoList, 1); BlockDirtyInfoList *entry = g_new0(BlockDirtyInfoList, 1);
@@ -419,13 +281,11 @@ BlockDirtyInfoList *bdrv_query_dirty_bitmaps(BlockDriverState *bs)
*plist = entry; *plist = entry;
plist = &entry->next; plist = &entry->next;
} }
bdrv_dirty_bitmaps_unlock(bs);
return list; return list;
} }
/* Called within bdrv_dirty_bitmap_lock..unlock */ int bdrv_get_dirty(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
int bdrv_get_dirty_locked(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
int64_t sector) int64_t sector)
{ {
if (bitmap) { if (bitmap) {
@@ -460,81 +320,28 @@ uint32_t bdrv_dirty_bitmap_granularity(BdrvDirtyBitmap *bitmap)
return BDRV_SECTOR_SIZE << hbitmap_granularity(bitmap->bitmap); return BDRV_SECTOR_SIZE << hbitmap_granularity(bitmap->bitmap);
} }
uint32_t bdrv_dirty_bitmap_meta_granularity(BdrvDirtyBitmap *bitmap) void bdrv_dirty_iter_init(BdrvDirtyBitmap *bitmap, HBitmapIter *hbi)
{ {
return BDRV_SECTOR_SIZE << hbitmap_granularity(bitmap->meta); hbitmap_iter_init(hbi, bitmap->bitmap, 0);
} }
BdrvDirtyBitmapIter *bdrv_dirty_iter_new(BdrvDirtyBitmap *bitmap, void bdrv_set_dirty_bitmap(BdrvDirtyBitmap *bitmap,
uint64_t first_sector)
{
BdrvDirtyBitmapIter *iter = g_new(BdrvDirtyBitmapIter, 1);
hbitmap_iter_init(&iter->hbi, bitmap->bitmap, first_sector);
iter->bitmap = bitmap;
bitmap->active_iterators++;
return iter;
}
BdrvDirtyBitmapIter *bdrv_dirty_meta_iter_new(BdrvDirtyBitmap *bitmap)
{
BdrvDirtyBitmapIter *iter = g_new(BdrvDirtyBitmapIter, 1);
hbitmap_iter_init(&iter->hbi, bitmap->meta, 0);
iter->bitmap = bitmap;
bitmap->active_iterators++;
return iter;
}
void bdrv_dirty_iter_free(BdrvDirtyBitmapIter *iter)
{
if (!iter) {
return;
}
assert(iter->bitmap->active_iterators > 0);
iter->bitmap->active_iterators--;
g_free(iter);
}
int64_t bdrv_dirty_iter_next(BdrvDirtyBitmapIter *iter)
{
return hbitmap_iter_next(&iter->hbi);
}
/* Called within bdrv_dirty_bitmap_lock..unlock */
void bdrv_set_dirty_bitmap_locked(BdrvDirtyBitmap *bitmap,
int64_t cur_sector, int64_t nr_sectors) int64_t cur_sector, int64_t nr_sectors)
{ {
assert(bdrv_dirty_bitmap_enabled(bitmap)); assert(bdrv_dirty_bitmap_enabled(bitmap));
hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors); hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors);
} }
void bdrv_set_dirty_bitmap(BdrvDirtyBitmap *bitmap, void bdrv_reset_dirty_bitmap(BdrvDirtyBitmap *bitmap,
int64_t cur_sector, int64_t nr_sectors)
{
bdrv_dirty_bitmap_lock(bitmap);
bdrv_set_dirty_bitmap_locked(bitmap, cur_sector, nr_sectors);
bdrv_dirty_bitmap_unlock(bitmap);
}
/* Called within bdrv_dirty_bitmap_lock..unlock */
void bdrv_reset_dirty_bitmap_locked(BdrvDirtyBitmap *bitmap,
int64_t cur_sector, int64_t nr_sectors) int64_t cur_sector, int64_t nr_sectors)
{ {
assert(bdrv_dirty_bitmap_enabled(bitmap)); assert(bdrv_dirty_bitmap_enabled(bitmap));
hbitmap_reset(bitmap->bitmap, cur_sector, nr_sectors); hbitmap_reset(bitmap->bitmap, cur_sector, nr_sectors);
} }
void bdrv_reset_dirty_bitmap(BdrvDirtyBitmap *bitmap,
int64_t cur_sector, int64_t nr_sectors)
{
bdrv_dirty_bitmap_lock(bitmap);
bdrv_reset_dirty_bitmap_locked(bitmap, cur_sector, nr_sectors);
bdrv_dirty_bitmap_unlock(bitmap);
}
void bdrv_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap, HBitmap **out) void bdrv_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap, HBitmap **out)
{ {
assert(bdrv_dirty_bitmap_enabled(bitmap)); assert(bdrv_dirty_bitmap_enabled(bitmap));
bdrv_dirty_bitmap_lock(bitmap);
if (!out) { if (!out) {
hbitmap_reset_all(bitmap->bitmap); hbitmap_reset_all(bitmap->bitmap);
} else { } else {
@@ -543,7 +350,6 @@ void bdrv_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap, HBitmap **out)
hbitmap_granularity(backup)); hbitmap_granularity(backup));
*out = backup; *out = backup;
} }
bdrv_dirty_bitmap_unlock(bitmap);
} }
void bdrv_undo_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap, HBitmap *in) void bdrv_undo_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap, HBitmap *in)
@@ -554,76 +360,28 @@ void bdrv_undo_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap, HBitmap *in)
hbitmap_free(tmp); hbitmap_free(tmp);
} }
uint64_t bdrv_dirty_bitmap_serialization_size(const BdrvDirtyBitmap *bitmap,
uint64_t start, uint64_t count)
{
return hbitmap_serialization_size(bitmap->bitmap, start, count);
}
uint64_t bdrv_dirty_bitmap_serialization_align(const BdrvDirtyBitmap *bitmap)
{
return hbitmap_serialization_granularity(bitmap->bitmap);
}
void bdrv_dirty_bitmap_serialize_part(const BdrvDirtyBitmap *bitmap,
uint8_t *buf, uint64_t start,
uint64_t count)
{
hbitmap_serialize_part(bitmap->bitmap, buf, start, count);
}
void bdrv_dirty_bitmap_deserialize_part(BdrvDirtyBitmap *bitmap,
uint8_t *buf, uint64_t start,
uint64_t count, bool finish)
{
hbitmap_deserialize_part(bitmap->bitmap, buf, start, count, finish);
}
void bdrv_dirty_bitmap_deserialize_zeroes(BdrvDirtyBitmap *bitmap,
uint64_t start, uint64_t count,
bool finish)
{
hbitmap_deserialize_zeroes(bitmap->bitmap, start, count, finish);
}
void bdrv_dirty_bitmap_deserialize_finish(BdrvDirtyBitmap *bitmap)
{
hbitmap_deserialize_finish(bitmap->bitmap);
}
void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector, void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
int64_t nr_sectors) int64_t nr_sectors)
{ {
BdrvDirtyBitmap *bitmap; BdrvDirtyBitmap *bitmap;
if (QLIST_EMPTY(&bs->dirty_bitmaps)) {
return;
}
bdrv_dirty_bitmaps_lock(bs);
QLIST_FOREACH(bitmap, &bs->dirty_bitmaps, list) { QLIST_FOREACH(bitmap, &bs->dirty_bitmaps, list) {
if (!bdrv_dirty_bitmap_enabled(bitmap)) { if (!bdrv_dirty_bitmap_enabled(bitmap)) {
continue; continue;
} }
hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors); hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors);
} }
bdrv_dirty_bitmaps_unlock(bs);
} }
/** /**
* Advance a BdrvDirtyBitmapIter to an arbitrary offset. * Advance an HBitmapIter to an arbitrary offset.
*/ */
void bdrv_set_dirty_iter(BdrvDirtyBitmapIter *iter, int64_t sector_num) void bdrv_set_dirty_iter(HBitmapIter *hbi, int64_t offset)
{ {
hbitmap_iter_init(&iter->hbi, iter->hbi.hb, sector_num); assert(hbi->hb);
hbitmap_iter_init(hbi, hbi->hb, offset);
} }
int64_t bdrv_get_dirty_count(BdrvDirtyBitmap *bitmap) int64_t bdrv_get_dirty_count(BdrvDirtyBitmap *bitmap)
{ {
return hbitmap_count(bitmap->bitmap); return hbitmap_count(bitmap->bitmap);
} }
int64_t bdrv_get_meta_dirty_count(BdrvDirtyBitmap *bitmap)
{
return hbitmap_count(bitmap->meta);
}

View File

@@ -1,61 +0,0 @@
/*
* DMG bzip2 uncompression
*
* Copyright (c) 2004 Johannes E. Schindelin
* Copyright (c) 2016 Red Hat, Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*/
#include "qemu/osdep.h"
#include "qemu-common.h"
#include "dmg.h"
#include <bzlib.h>
static int dmg_uncompress_bz2_do(char *next_in, unsigned int avail_in,
char *next_out, unsigned int avail_out)
{
int ret;
uint64_t total_out;
bz_stream bzstream = {};
ret = BZ2_bzDecompressInit(&bzstream, 0, 0);
if (ret != BZ_OK) {
return -1;
}
bzstream.next_in = next_in;
bzstream.avail_in = avail_in;
bzstream.next_out = next_out;
bzstream.avail_out = avail_out;
ret = BZ2_bzDecompress(&bzstream);
total_out = ((uint64_t)bzstream.total_out_hi32 << 32) +
bzstream.total_out_lo32;
BZ2_bzDecompressEnd(&bzstream);
if (ret != BZ_STREAM_END ||
total_out != avail_out) {
return -1;
}
return 0;
}
__attribute__((constructor))
static void dmg_bz2_init(void)
{
assert(!dmg_uncompress_bz2);
dmg_uncompress_bz2 = dmg_uncompress_bz2_do;
}

View File

@@ -28,10 +28,10 @@
#include "qemu/bswap.h" #include "qemu/bswap.h"
#include "qemu/error-report.h" #include "qemu/error-report.h"
#include "qemu/module.h" #include "qemu/module.h"
#include "dmg.h" #include <zlib.h>
#ifdef CONFIG_BZIP2
int (*dmg_uncompress_bz2)(char *next_in, unsigned int avail_in, #include <bzlib.h>
char *next_out, unsigned int avail_out); #endif
enum { enum {
/* Limit chunk sizes to prevent unreasonable amounts of memory being used /* Limit chunk sizes to prevent unreasonable amounts of memory being used
@@ -41,6 +41,31 @@ enum {
DMG_SECTORCOUNTS_MAX = DMG_LENGTHS_MAX / 512, DMG_SECTORCOUNTS_MAX = DMG_LENGTHS_MAX / 512,
}; };
typedef struct BDRVDMGState {
CoMutex lock;
/* each chunk contains a certain number of sectors,
* offsets[i] is the offset in the .dmg file,
* lengths[i] is the length of the compressed chunk,
* sectors[i] is the sector beginning at offsets[i],
* sectorcounts[i] is the number of sectors in that chunk,
* the sectors array is ordered
* 0<=i<n_chunks */
uint32_t n_chunks;
uint32_t* types;
uint64_t* offsets;
uint64_t* lengths;
uint64_t* sectors;
uint64_t* sectorcounts;
uint32_t current_chunk;
uint8_t *compressed_chunk;
uint8_t *uncompressed_chunk;
z_stream zstream;
#ifdef CONFIG_BZIP2
bz_stream bzstream;
#endif
} BDRVDMGState;
static int dmg_probe(const uint8_t *buf, int buf_size, const char *filename) static int dmg_probe(const uint8_t *buf, int buf_size, const char *filename)
{ {
int len; int len;
@@ -185,9 +210,10 @@ static bool dmg_is_known_block_type(uint32_t entry_type)
case 0x00000001: /* uncompressed */ case 0x00000001: /* uncompressed */
case 0x00000002: /* zeroes */ case 0x00000002: /* zeroes */
case 0x80000005: /* zlib */ case 0x80000005: /* zlib */
return true; #ifdef CONFIG_BZIP2
case 0x80000006: /* bzip2 */ case 0x80000006: /* bzip2 */
return !!dmg_uncompress_bz2; #endif
return true;
default: default:
return false; return false;
} }
@@ -413,18 +439,7 @@ static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
int64_t offset; int64_t offset;
int ret; int ret;
bs->file = bdrv_open_child(NULL, options, "file", bs, &child_file, bs->read_only = true;
false, errp);
if (!bs->file) {
return -EINVAL;
}
ret = bdrv_set_read_only(bs, true, errp);
if (ret < 0) {
return ret;
}
block_module_load_one("dmg-bz2");
s->n_chunks = 0; s->n_chunks = 0;
s->offsets = s->lengths = s->sectors = s->sectorcounts = NULL; s->offsets = s->lengths = s->sectors = s->sectorcounts = NULL;
@@ -572,6 +587,9 @@ static inline int dmg_read_chunk(BlockDriverState *bs, uint64_t sector_num)
if (!is_sector_in_chunk(s, s->current_chunk, sector_num)) { if (!is_sector_in_chunk(s, s->current_chunk, sector_num)) {
int ret; int ret;
uint32_t chunk = search_chunk(s, sector_num); uint32_t chunk = search_chunk(s, sector_num);
#ifdef CONFIG_BZIP2
uint64_t total_out;
#endif
if (chunk >= s->n_chunks) { if (chunk >= s->n_chunks) {
return -1; return -1;
@@ -602,10 +620,8 @@ static inline int dmg_read_chunk(BlockDriverState *bs, uint64_t sector_num)
return -1; return -1;
} }
break; } break; }
#ifdef CONFIG_BZIP2
case 0x80000006: /* bzip2 compressed */ case 0x80000006: /* bzip2 compressed */
if (!dmg_uncompress_bz2) {
break;
}
/* we need to buffer, because only the chunk as whole can be /* we need to buffer, because only the chunk as whole can be
* inflated. */ * inflated. */
ret = bdrv_pread(bs->file, s->offsets[chunk], ret = bdrv_pread(bs->file, s->offsets[chunk],
@@ -614,15 +630,24 @@ static inline int dmg_read_chunk(BlockDriverState *bs, uint64_t sector_num)
return -1; return -1;
} }
ret = dmg_uncompress_bz2((char *)s->compressed_chunk, ret = BZ2_bzDecompressInit(&s->bzstream, 0, 0);
(unsigned int) s->lengths[chunk], if (ret != BZ_OK) {
(char *)s->uncompressed_chunk, return -1;
(unsigned int) }
(512 * s->sectorcounts[chunk])); s->bzstream.next_in = (char *)s->compressed_chunk;
if (ret < 0) { s->bzstream.avail_in = (unsigned int) s->lengths[chunk];
return ret; s->bzstream.next_out = (char *)s->uncompressed_chunk;
s->bzstream.avail_out = (unsigned int) 512 * s->sectorcounts[chunk];
ret = BZ2_bzDecompress(&s->bzstream);
total_out = ((uint64_t)s->bzstream.total_out_hi32 << 32) +
s->bzstream.total_out_lo32;
BZ2_bzDecompressEnd(&s->bzstream);
if (ret != BZ_STREAM_END ||
total_out != 512 * s->sectorcounts[chunk]) {
return -1;
} }
break; break;
#endif /* CONFIG_BZIP2 */
case 1: /* copy */ case 1: /* copy */
ret = bdrv_pread(bs->file, s->offsets[chunk], ret = bdrv_pread(bs->file, s->offsets[chunk],
s->uncompressed_chunk, s->lengths[chunk]); s->uncompressed_chunk, s->lengths[chunk]);
@@ -701,7 +726,6 @@ static BlockDriver bdrv_dmg = {
.bdrv_probe = dmg_probe, .bdrv_probe = dmg_probe,
.bdrv_open = dmg_open, .bdrv_open = dmg_open,
.bdrv_refresh_limits = dmg_refresh_limits, .bdrv_refresh_limits = dmg_refresh_limits,
.bdrv_child_perm = bdrv_format_default_perms,
.bdrv_co_preadv = dmg_co_preadv, .bdrv_co_preadv = dmg_co_preadv,
.bdrv_close = dmg_close, .bdrv_close = dmg_close,
}; };

View File

@@ -1,59 +0,0 @@
/*
* Header for DMG driver
*
* Copyright (c) 2004-2006 Fabrice Bellard
* Copyright (c) 2016 Red hat, Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*/
#ifndef BLOCK_DMG_H
#define BLOCK_DMG_H
#include "qemu/osdep.h"
#include "qemu-common.h"
#include "block/block_int.h"
#include <zlib.h>
typedef struct BDRVDMGState {
CoMutex lock;
/* each chunk contains a certain number of sectors,
* offsets[i] is the offset in the .dmg file,
* lengths[i] is the length of the compressed chunk,
* sectors[i] is the sector beginning at offsets[i],
* sectorcounts[i] is the number of sectors in that chunk,
* the sectors array is ordered
* 0<=i<n_chunks */
uint32_t n_chunks;
uint32_t *types;
uint64_t *offsets;
uint64_t *lengths;
uint64_t *sectors;
uint64_t *sectorcounts;
uint32_t current_chunk;
uint8_t *compressed_chunk;
uint8_t *uncompressed_chunk;
z_stream zstream;
} BDRVDMGState;
extern int (*dmg_uncompress_bz2)(char *next_in, unsigned int avail_in,
char *next_out, unsigned int avail_out);
#endif

View File

@@ -12,10 +12,8 @@
#include "block/block_int.h" #include "block/block_int.h"
#include "qapi/error.h" #include "qapi/error.h"
#include "qapi/qmp/qerror.h" #include "qapi/qmp/qerror.h"
#include "qapi/util.h"
#include "qemu/uri.h" #include "qemu/uri.h"
#include "qemu/error-report.h" #include "qemu/error-report.h"
#include "qemu/cutils.h"
#define GLUSTER_OPT_FILENAME "filename" #define GLUSTER_OPT_FILENAME "filename"
#define GLUSTER_OPT_VOLUME "volume" #define GLUSTER_OPT_VOLUME "volume"
@@ -32,14 +30,13 @@
#define GLUSTER_DEFAULT_PORT 24007 #define GLUSTER_DEFAULT_PORT 24007
#define GLUSTER_DEBUG_DEFAULT 4 #define GLUSTER_DEBUG_DEFAULT 4
#define GLUSTER_DEBUG_MAX 9 #define GLUSTER_DEBUG_MAX 9
#define GLUSTER_OPT_LOGFILE "logfile"
#define GLUSTER_LOGFILE_DEFAULT "-" /* handled in libgfapi as /dev/stderr */
#define GERR_INDEX_HINT "hint: check in 'server' array index '%d'\n" #define GERR_INDEX_HINT "hint: check in 'server' array index '%d'\n"
typedef struct GlusterAIOCB { typedef struct GlusterAIOCB {
int64_t size; int64_t size;
int ret; int ret;
QEMUBH *bh;
Coroutine *coroutine; Coroutine *coroutine;
AioContext *aio_context; AioContext *aio_context;
} GlusterAIOCB; } GlusterAIOCB;
@@ -47,9 +44,8 @@ typedef struct GlusterAIOCB {
typedef struct BDRVGlusterState { typedef struct BDRVGlusterState {
struct glfs *glfs; struct glfs *glfs;
struct glfs_fd *fd; struct glfs_fd *fd;
char *logfile;
bool supports_seek_data; bool supports_seek_data;
int debug; int debug_level;
} BDRVGlusterState; } BDRVGlusterState;
typedef struct BDRVGlusterReopenState { typedef struct BDRVGlusterReopenState {
@@ -58,19 +54,6 @@ typedef struct BDRVGlusterReopenState {
} BDRVGlusterReopenState; } BDRVGlusterReopenState;
typedef struct GlfsPreopened {
char *volume;
glfs_t *fs;
int ref;
} GlfsPreopened;
typedef struct ListElement {
QLIST_ENTRY(ListElement) list;
GlfsPreopened saved;
} ListElement;
static QLIST_HEAD(glfs_list, ListElement) glfs_list;
static QemuOptsList qemu_gluster_create_opts = { static QemuOptsList qemu_gluster_create_opts = {
.name = "qemu-gluster-create-opts", .name = "qemu-gluster-create-opts",
.head = QTAILQ_HEAD_INITIALIZER(qemu_gluster_create_opts.head), .head = QTAILQ_HEAD_INITIALIZER(qemu_gluster_create_opts.head),
@@ -90,11 +73,6 @@ static QemuOptsList qemu_gluster_create_opts = {
.type = QEMU_OPT_NUMBER, .type = QEMU_OPT_NUMBER,
.help = "Gluster log level, valid range is 0-9", .help = "Gluster log level, valid range is 0-9",
}, },
{
.name = GLUSTER_OPT_LOGFILE,
.type = QEMU_OPT_STRING,
.help = "Logfile path of libgfapi",
},
{ /* end of list */ } { /* end of list */ }
} }
}; };
@@ -113,11 +91,6 @@ static QemuOptsList runtime_opts = {
.type = QEMU_OPT_NUMBER, .type = QEMU_OPT_NUMBER,
.help = "Gluster log level, valid range is 0-9", .help = "Gluster log level, valid range is 0-9",
}, },
{
.name = GLUSTER_OPT_LOGFILE,
.type = QEMU_OPT_STRING,
.help = "Logfile path of libgfapi",
},
{ /* end of list */ } { /* end of list */ }
}, },
}; };
@@ -152,7 +125,7 @@ static QemuOptsList runtime_type_opts = {
{ {
.name = GLUSTER_OPT_TYPE, .name = GLUSTER_OPT_TYPE,
.type = QEMU_OPT_STRING, .type = QEMU_OPT_STRING,
.help = "inet|unix", .help = "tcp|unix",
}, },
{ /* end of list */ } { /* end of list */ }
}, },
@@ -171,14 +144,14 @@ static QemuOptsList runtime_unix_opts = {
}, },
}; };
static QemuOptsList runtime_inet_opts = { static QemuOptsList runtime_tcp_opts = {
.name = "gluster_inet", .name = "gluster_tcp",
.head = QTAILQ_HEAD_INITIALIZER(runtime_inet_opts.head), .head = QTAILQ_HEAD_INITIALIZER(runtime_tcp_opts.head),
.desc = { .desc = {
{ {
.name = GLUSTER_OPT_TYPE, .name = GLUSTER_OPT_TYPE,
.type = QEMU_OPT_STRING, .type = QEMU_OPT_STRING,
.help = "inet|unix", .help = "tcp|unix",
}, },
{ {
.name = GLUSTER_OPT_HOST, .name = GLUSTER_OPT_HOST,
@@ -187,7 +160,7 @@ static QemuOptsList runtime_inet_opts = {
}, },
{ {
.name = GLUSTER_OPT_PORT, .name = GLUSTER_OPT_PORT,
.type = QEMU_OPT_STRING, .type = QEMU_OPT_NUMBER,
.help = "port number on which glusterd is listening (default 24007)", .help = "port number on which glusterd is listening (default 24007)",
}, },
{ {
@@ -209,58 +182,6 @@ static QemuOptsList runtime_inet_opts = {
}, },
}; };
static void glfs_set_preopened(const char *volume, glfs_t *fs)
{
ListElement *entry = NULL;
entry = g_new(ListElement, 1);
entry->saved.volume = g_strdup(volume);
entry->saved.fs = fs;
entry->saved.ref = 1;
QLIST_INSERT_HEAD(&glfs_list, entry, list);
}
static glfs_t *glfs_find_preopened(const char *volume)
{
ListElement *entry = NULL;
QLIST_FOREACH(entry, &glfs_list, list) {
if (strcmp(entry->saved.volume, volume) == 0) {
entry->saved.ref++;
return entry->saved.fs;
}
}
return NULL;
}
static void glfs_clear_preopened(glfs_t *fs)
{
ListElement *entry = NULL;
ListElement *next;
if (fs == NULL) {
return;
}
QLIST_FOREACH_SAFE(entry, &glfs_list, list, next) {
if (entry->saved.fs == fs) {
if (--entry->saved.ref) {
return;
}
QLIST_REMOVE(entry, list);
glfs_fini(entry->saved.fs);
g_free(entry->saved.volume);
g_free(entry);
}
}
}
static int parse_volume_options(BlockdevOptionsGluster *gconf, char *path) static int parse_volume_options(BlockdevOptionsGluster *gconf, char *path)
{ {
char *p, *q; char *p, *q;
@@ -321,7 +242,7 @@ static int parse_volume_options(BlockdevOptionsGluster *gconf, char *path)
static int qemu_gluster_parse_uri(BlockdevOptionsGluster *gconf, static int qemu_gluster_parse_uri(BlockdevOptionsGluster *gconf,
const char *filename) const char *filename)
{ {
SocketAddress *gsconf; GlusterServer *gsconf;
URI *uri; URI *uri;
QueryParams *qp = NULL; QueryParams *qp = NULL;
bool is_unix = false; bool is_unix = false;
@@ -332,19 +253,19 @@ static int qemu_gluster_parse_uri(BlockdevOptionsGluster *gconf,
return -EINVAL; return -EINVAL;
} }
gconf->server = g_new0(SocketAddressList, 1); gconf->server = g_new0(GlusterServerList, 1);
gconf->server->value = gsconf = g_new0(SocketAddress, 1); gconf->server->value = gsconf = g_new0(GlusterServer, 1);
/* transport */ /* transport */
if (!uri->scheme || !strcmp(uri->scheme, "gluster")) { if (!uri->scheme || !strcmp(uri->scheme, "gluster")) {
gsconf->type = SOCKET_ADDRESS_TYPE_INET; gsconf->type = GLUSTER_TRANSPORT_TCP;
} else if (!strcmp(uri->scheme, "gluster+tcp")) { } else if (!strcmp(uri->scheme, "gluster+tcp")) {
gsconf->type = SOCKET_ADDRESS_TYPE_INET; gsconf->type = GLUSTER_TRANSPORT_TCP;
} else if (!strcmp(uri->scheme, "gluster+unix")) { } else if (!strcmp(uri->scheme, "gluster+unix")) {
gsconf->type = SOCKET_ADDRESS_TYPE_UNIX; gsconf->type = GLUSTER_TRANSPORT_UNIX;
is_unix = true; is_unix = true;
} else if (!strcmp(uri->scheme, "gluster+rdma")) { } else if (!strcmp(uri->scheme, "gluster+rdma")) {
gsconf->type = SOCKET_ADDRESS_TYPE_INET; gsconf->type = GLUSTER_TRANSPORT_TCP;
error_report("Warning: rdma feature is not supported, falling " error_report("Warning: rdma feature is not supported, falling "
"back to tcp"); "back to tcp");
} else { } else {
@@ -374,11 +295,11 @@ static int qemu_gluster_parse_uri(BlockdevOptionsGluster *gconf,
} }
gsconf->u.q_unix.path = g_strdup(qp->p[0].value); gsconf->u.q_unix.path = g_strdup(qp->p[0].value);
} else { } else {
gsconf->u.inet.host = g_strdup(uri->server ? uri->server : "localhost"); gsconf->u.tcp.host = g_strdup(uri->server ? uri->server : "localhost");
if (uri->port) { if (uri->port) {
gsconf->u.inet.port = g_strdup_printf("%d", uri->port); gsconf->u.tcp.port = g_strdup_printf("%d", uri->port);
} else { } else {
gsconf->u.inet.port = g_strdup_printf("%d", GLUSTER_DEFAULT_PORT); gsconf->u.tcp.port = g_strdup_printf("%d", GLUSTER_DEFAULT_PORT);
} }
} }
@@ -396,43 +317,23 @@ static struct glfs *qemu_gluster_glfs_init(BlockdevOptionsGluster *gconf,
struct glfs *glfs; struct glfs *glfs;
int ret; int ret;
int old_errno; int old_errno;
SocketAddressList *server; GlusterServerList *server;
unsigned long long port;
glfs = glfs_find_preopened(gconf->volume);
if (glfs) {
return glfs;
}
glfs = glfs_new(gconf->volume); glfs = glfs_new(gconf->volume);
if (!glfs) { if (!glfs) {
goto out; goto out;
} }
glfs_set_preopened(gconf->volume, glfs);
for (server = gconf->server; server; server = server->next) { for (server = gconf->server; server; server = server->next) {
switch (server->value->type) { if (server->value->type == GLUSTER_TRANSPORT_UNIX) {
case SOCKET_ADDRESS_TYPE_UNIX: ret = glfs_set_volfile_server(glfs,
ret = glfs_set_volfile_server(glfs, "unix", GlusterTransport_lookup[server->value->type],
server->value->u.q_unix.path, 0); server->value->u.q_unix.path, 0);
break; } else {
case SOCKET_ADDRESS_TYPE_INET: ret = glfs_set_volfile_server(glfs,
if (parse_uint_full(server->value->u.inet.port, &port, 10) < 0 || GlusterTransport_lookup[server->value->type],
port > 65535) { server->value->u.tcp.host,
error_setg(errp, "'%s' is not a valid port number", atoi(server->value->u.tcp.port));
server->value->u.inet.port);
errno = EINVAL;
goto out;
}
ret = glfs_set_volfile_server(glfs, "tcp",
server->value->u.inet.host,
(int)port);
break;
case SOCKET_ADDRESS_TYPE_VSOCK:
case SOCKET_ADDRESS_TYPE_FD:
default:
abort();
} }
if (ret < 0) { if (ret < 0) {
@@ -440,7 +341,7 @@ static struct glfs *qemu_gluster_glfs_init(BlockdevOptionsGluster *gconf,
} }
} }
ret = glfs_set_logging(glfs, gconf->logfile, gconf->debug); ret = glfs_set_logging(glfs, "-", gconf->debug_level);
if (ret < 0) { if (ret < 0) {
goto out; goto out;
} }
@@ -450,13 +351,13 @@ static struct glfs *qemu_gluster_glfs_init(BlockdevOptionsGluster *gconf,
error_setg(errp, "Gluster connection for volume %s, path %s failed" error_setg(errp, "Gluster connection for volume %s, path %s failed"
" to connect", gconf->volume, gconf->path); " to connect", gconf->volume, gconf->path);
for (server = gconf->server; server; server = server->next) { for (server = gconf->server; server; server = server->next) {
if (server->value->type == SOCKET_ADDRESS_TYPE_UNIX) { if (server->value->type == GLUSTER_TRANSPORT_UNIX) {
error_append_hint(errp, "hint: failed on socket %s ", error_append_hint(errp, "hint: failed on socket %s ",
server->value->u.q_unix.path); server->value->u.q_unix.path);
} else { } else {
error_append_hint(errp, "hint: failed on host %s and port %s ", error_append_hint(errp, "hint: failed on host %s and port %s ",
server->value->u.inet.host, server->value->u.tcp.host,
server->value->u.inet.port); server->value->u.tcp.port);
} }
} }
@@ -474,12 +375,29 @@ static struct glfs *qemu_gluster_glfs_init(BlockdevOptionsGluster *gconf,
out: out:
if (glfs) { if (glfs) {
old_errno = errno; old_errno = errno;
glfs_clear_preopened(glfs); glfs_fini(glfs);
errno = old_errno; errno = old_errno;
} }
return NULL; return NULL;
} }
static int qapi_enum_parse(const char *opt)
{
int i;
if (!opt) {
return GLUSTER_TRANSPORT__MAX;
}
for (i = 0; i < GLUSTER_TRANSPORT__MAX; i++) {
if (!strcmp(opt, GlusterTransport_lookup[i])) {
return i;
}
}
return i;
}
/* /*
* Convert the json formatted command line into qapi. * Convert the json formatted command line into qapi.
*/ */
@@ -487,13 +405,14 @@ static int qemu_gluster_parse_json(BlockdevOptionsGluster *gconf,
QDict *options, Error **errp) QDict *options, Error **errp)
{ {
QemuOpts *opts; QemuOpts *opts;
SocketAddress *gsconf = NULL; GlusterServer *gsconf;
SocketAddressList *curr = NULL; GlusterServerList *curr = NULL;
QDict *backing_options = NULL; QDict *backing_options = NULL;
Error *local_err = NULL; Error *local_err = NULL;
char *str = NULL; char *str = NULL;
const char *ptr; const char *ptr;
int i, type, num_servers; size_t num_servers;
int i;
/* create opts info from runtime_json_opts list */ /* create opts info from runtime_json_opts list */
opts = qemu_opts_create(&runtime_json_opts, NULL, 0, &error_abort); opts = qemu_opts_create(&runtime_json_opts, NULL, 0, &error_abort);
@@ -535,32 +454,25 @@ static int qemu_gluster_parse_json(BlockdevOptionsGluster *gconf,
} }
ptr = qemu_opt_get(opts, GLUSTER_OPT_TYPE); ptr = qemu_opt_get(opts, GLUSTER_OPT_TYPE);
gsconf = g_new0(GlusterServer, 1);
gsconf->type = qapi_enum_parse(ptr);
if (!ptr) { if (!ptr) {
error_setg(&local_err, QERR_MISSING_PARAMETER, GLUSTER_OPT_TYPE); error_setg(&local_err, QERR_MISSING_PARAMETER, GLUSTER_OPT_TYPE);
error_append_hint(&local_err, GERR_INDEX_HINT, i); error_append_hint(&local_err, GERR_INDEX_HINT, i);
goto out; goto out;
} }
gsconf = g_new0(SocketAddress, 1); if (gsconf->type == GLUSTER_TRANSPORT__MAX) {
if (!strcmp(ptr, "tcp")) { error_setg(&local_err, QERR_INVALID_PARAMETER_VALUE,
ptr = "inet"; /* accept legacy "tcp" */ GLUSTER_OPT_TYPE, "tcp or unix");
}
type = qapi_enum_parse(SocketAddressType_lookup, ptr,
SOCKET_ADDRESS_TYPE__MAX, -1, NULL);
if (type != SOCKET_ADDRESS_TYPE_INET
&& type != SOCKET_ADDRESS_TYPE_UNIX) {
error_setg(&local_err,
"Parameter '%s' may be 'inet' or 'unix'",
GLUSTER_OPT_TYPE);
error_append_hint(&local_err, GERR_INDEX_HINT, i); error_append_hint(&local_err, GERR_INDEX_HINT, i);
goto out; goto out;
} }
gsconf->type = type;
qemu_opts_del(opts); qemu_opts_del(opts);
if (gsconf->type == SOCKET_ADDRESS_TYPE_INET) { if (gsconf->type == GLUSTER_TRANSPORT_TCP) {
/* create opts info from runtime_inet_opts list */ /* create opts info from runtime_tcp_opts list */
opts = qemu_opts_create(&runtime_inet_opts, NULL, 0, &error_abort); opts = qemu_opts_create(&runtime_tcp_opts, NULL, 0, &error_abort);
qemu_opts_absorb_qdict(opts, backing_options, &local_err); qemu_opts_absorb_qdict(opts, backing_options, &local_err);
if (local_err) { if (local_err) {
goto out; goto out;
@@ -573,7 +485,7 @@ static int qemu_gluster_parse_json(BlockdevOptionsGluster *gconf,
error_append_hint(&local_err, GERR_INDEX_HINT, i); error_append_hint(&local_err, GERR_INDEX_HINT, i);
goto out; goto out;
} }
gsconf->u.inet.host = g_strdup(ptr); gsconf->u.tcp.host = g_strdup(ptr);
ptr = qemu_opt_get(opts, GLUSTER_OPT_PORT); ptr = qemu_opt_get(opts, GLUSTER_OPT_PORT);
if (!ptr) { if (!ptr) {
error_setg(&local_err, QERR_MISSING_PARAMETER, error_setg(&local_err, QERR_MISSING_PARAMETER,
@@ -581,28 +493,28 @@ static int qemu_gluster_parse_json(BlockdevOptionsGluster *gconf,
error_append_hint(&local_err, GERR_INDEX_HINT, i); error_append_hint(&local_err, GERR_INDEX_HINT, i);
goto out; goto out;
} }
gsconf->u.inet.port = g_strdup(ptr); gsconf->u.tcp.port = g_strdup(ptr);
/* defend for unsupported fields in InetSocketAddress, /* defend for unsupported fields in InetSocketAddress,
* i.e. @ipv4, @ipv6 and @to * i.e. @ipv4, @ipv6 and @to
*/ */
ptr = qemu_opt_get(opts, GLUSTER_OPT_TO); ptr = qemu_opt_get(opts, GLUSTER_OPT_TO);
if (ptr) { if (ptr) {
gsconf->u.inet.has_to = true; gsconf->u.tcp.has_to = true;
} }
ptr = qemu_opt_get(opts, GLUSTER_OPT_IPV4); ptr = qemu_opt_get(opts, GLUSTER_OPT_IPV4);
if (ptr) { if (ptr) {
gsconf->u.inet.has_ipv4 = true; gsconf->u.tcp.has_ipv4 = true;
} }
ptr = qemu_opt_get(opts, GLUSTER_OPT_IPV6); ptr = qemu_opt_get(opts, GLUSTER_OPT_IPV6);
if (ptr) { if (ptr) {
gsconf->u.inet.has_ipv6 = true; gsconf->u.tcp.has_ipv6 = true;
} }
if (gsconf->u.inet.has_to) { if (gsconf->u.tcp.has_to) {
error_setg(&local_err, "Parameter 'to' not supported"); error_setg(&local_err, "Parameter 'to' not supported");
goto out; goto out;
} }
if (gsconf->u.inet.has_ipv4 || gsconf->u.inet.has_ipv6) { if (gsconf->u.tcp.has_ipv4 || gsconf->u.tcp.has_ipv6) {
error_setg(&local_err, "Parameters 'ipv4/ipv6' not supported"); error_setg(&local_err, "Parameters 'ipv4/ipv6' not supported");
goto out; goto out;
} }
@@ -627,18 +539,16 @@ static int qemu_gluster_parse_json(BlockdevOptionsGluster *gconf,
} }
if (gconf->server == NULL) { if (gconf->server == NULL) {
gconf->server = g_new0(SocketAddressList, 1); gconf->server = g_new0(GlusterServerList, 1);
gconf->server->value = gsconf; gconf->server->value = gsconf;
curr = gconf->server; curr = gconf->server;
} else { } else {
curr->next = g_new0(SocketAddressList, 1); curr->next = g_new0(GlusterServerList, 1);
curr->next->value = gsconf; curr->next->value = gsconf;
curr = curr->next; curr = curr->next;
} }
gsconf = NULL;
QDECREF(backing_options); qdict_del(backing_options, str);
backing_options = NULL;
g_free(str); g_free(str);
str = NULL; str = NULL;
} }
@@ -647,10 +557,11 @@ static int qemu_gluster_parse_json(BlockdevOptionsGluster *gconf,
out: out:
error_propagate(errp, local_err); error_propagate(errp, local_err);
qapi_free_SocketAddress(gsconf);
qemu_opts_del(opts); qemu_opts_del(opts);
if (str) {
qdict_del(backing_options, str);
g_free(str); g_free(str);
QDECREF(backing_options); }
errno = EINVAL; errno = EINVAL;
return -errno; return -errno;
} }
@@ -665,9 +576,7 @@ static struct glfs *qemu_gluster_init(BlockdevOptionsGluster *gconf,
if (ret < 0) { if (ret < 0) {
error_setg(errp, "invalid URI"); error_setg(errp, "invalid URI");
error_append_hint(errp, "Usage: file=gluster[+transport]://" error_append_hint(errp, "Usage: file=gluster[+transport]://"
"[host[:port]]volume/path[?socket=...]" "[host[:port]]/volume/path[?socket=...]\n");
"[,file.debug=N]"
"[,file.logfile=/path/filename.log]\n");
errno = -ret; errno = -ret;
return NULL; return NULL;
} }
@@ -677,9 +586,7 @@ static struct glfs *qemu_gluster_init(BlockdevOptionsGluster *gconf,
error_append_hint(errp, "Usage: " error_append_hint(errp, "Usage: "
"-drive driver=qcow2,file.driver=gluster," "-drive driver=qcow2,file.driver=gluster,"
"file.volume=testvol,file.path=/path/a.qcow2" "file.volume=testvol,file.path=/path/a.qcow2"
"[,file.debug=9]" "[,file.debug=9],file.server.0.type=tcp,"
"[,file.logfile=/path/filename.log],"
"file.server.0.type=inet,"
"file.server.0.host=1.2.3.4," "file.server.0.host=1.2.3.4,"
"file.server.0.port=24007," "file.server.0.port=24007,"
"file.server.1.transport=unix," "file.server.1.transport=unix,"
@@ -694,6 +601,15 @@ static struct glfs *qemu_gluster_init(BlockdevOptionsGluster *gconf,
return qemu_gluster_glfs_init(gconf, errp); return qemu_gluster_glfs_init(gconf, errp);
} }
static void qemu_gluster_complete_aio(void *opaque)
{
GlusterAIOCB *acb = (GlusterAIOCB *)opaque;
qemu_bh_delete(acb->bh);
acb->bh = NULL;
qemu_coroutine_enter(acb->coroutine);
}
/* /*
* AIO callback routine called from GlusterFS thread. * AIO callback routine called from GlusterFS thread.
*/ */
@@ -709,7 +625,8 @@ static void gluster_finish_aiocb(struct glfs_fd *fd, ssize_t ret, void *arg)
acb->ret = -EIO; /* Partial read/write - fail it */ acb->ret = -EIO; /* Partial read/write - fail it */
} }
aio_co_schedule(acb->aio_context, acb->coroutine); acb->bh = aio_bh_new(acb->aio_context, qemu_gluster_complete_aio, acb);
qemu_bh_schedule(acb->bh);
} }
static void qemu_gluster_parse_flags(int bdrv_flags, int *open_flags) static void qemu_gluster_parse_flags(int bdrv_flags, int *open_flags)
@@ -738,10 +655,7 @@ static void qemu_gluster_parse_flags(int bdrv_flags, int *open_flags)
*/ */
static bool qemu_gluster_test_seek(struct glfs_fd *fd) static bool qemu_gluster_test_seek(struct glfs_fd *fd)
{ {
off_t ret = 0; off_t ret, eof;
#if defined SEEK_HOLE && defined SEEK_DATA
off_t eof;
eof = glfs_lseek(fd, 0, SEEK_END); eof = glfs_lseek(fd, 0, SEEK_END);
if (eof < 0) { if (eof < 0) {
@@ -751,8 +665,6 @@ static bool qemu_gluster_test_seek(struct glfs_fd *fd)
/* this should always fail with ENXIO if SEEK_DATA is supported */ /* this should always fail with ENXIO if SEEK_DATA is supported */
ret = glfs_lseek(fd, eof, SEEK_DATA); ret = glfs_lseek(fd, eof, SEEK_DATA);
#endif
return (ret < 0) && (errno == ENXIO); return (ret < 0) && (errno == ENXIO);
} }
@@ -765,7 +677,7 @@ static int qemu_gluster_open(BlockDriverState *bs, QDict *options,
BlockdevOptionsGluster *gconf = NULL; BlockdevOptionsGluster *gconf = NULL;
QemuOpts *opts; QemuOpts *opts;
Error *local_err = NULL; Error *local_err = NULL;
const char *filename, *logfile; const char *filename;
opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort); opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort);
qemu_opts_absorb_qdict(opts, options, &local_err); qemu_opts_absorb_qdict(opts, options, &local_err);
@@ -777,24 +689,17 @@ static int qemu_gluster_open(BlockDriverState *bs, QDict *options,
filename = qemu_opt_get(opts, GLUSTER_OPT_FILENAME); filename = qemu_opt_get(opts, GLUSTER_OPT_FILENAME);
s->debug = qemu_opt_get_number(opts, GLUSTER_OPT_DEBUG, s->debug_level = qemu_opt_get_number(opts, GLUSTER_OPT_DEBUG,
GLUSTER_DEBUG_DEFAULT); GLUSTER_DEBUG_DEFAULT);
if (s->debug < 0) { if (s->debug_level < 0) {
s->debug = 0; s->debug_level = 0;
} else if (s->debug > GLUSTER_DEBUG_MAX) { } else if (s->debug_level > GLUSTER_DEBUG_MAX) {
s->debug = GLUSTER_DEBUG_MAX; s->debug_level = GLUSTER_DEBUG_MAX;
} }
gconf = g_new0(BlockdevOptionsGluster, 1); gconf = g_new0(BlockdevOptionsGluster, 1);
gconf->debug = s->debug; gconf->debug_level = s->debug_level;
gconf->has_debug = true; gconf->has_debug_level = true;
logfile = qemu_opt_get(opts, GLUSTER_OPT_LOGFILE);
s->logfile = g_strdup(logfile ? logfile : GLUSTER_LOGFILE_DEFAULT);
gconf->logfile = g_strdup(s->logfile);
gconf->has_logfile = true;
s->glfs = qemu_gluster_init(gconf, filename, options, errp); s->glfs = qemu_gluster_init(gconf, filename, options, errp);
if (!s->glfs) { if (!s->glfs) {
ret = -errno; ret = -errno;
@@ -833,13 +738,12 @@ out:
if (!ret) { if (!ret) {
return ret; return ret;
} }
g_free(s->logfile);
if (s->fd) { if (s->fd) {
glfs_close(s->fd); glfs_close(s->fd);
} }
if (s->glfs) {
glfs_clear_preopened(s->glfs); glfs_fini(s->glfs);
}
return ret; return ret;
} }
@@ -863,10 +767,8 @@ static int qemu_gluster_reopen_prepare(BDRVReopenState *state,
qemu_gluster_parse_flags(state->flags, &open_flags); qemu_gluster_parse_flags(state->flags, &open_flags);
gconf = g_new0(BlockdevOptionsGluster, 1); gconf = g_new0(BlockdevOptionsGluster, 1);
gconf->debug = s->debug; gconf->debug_level = s->debug_level;
gconf->has_debug = true; gconf->has_debug_level = true;
gconf->logfile = g_strdup(s->logfile);
gconf->has_logfile = true;
reop_s->glfs = qemu_gluster_init(gconf, state->bs->filename, NULL, errp); reop_s->glfs = qemu_gluster_init(gconf, state->bs->filename, NULL, errp);
if (reop_s->glfs == NULL) { if (reop_s->glfs == NULL) {
ret = -errno; ret = -errno;
@@ -906,8 +808,9 @@ static void qemu_gluster_reopen_commit(BDRVReopenState *state)
if (s->fd) { if (s->fd) {
glfs_close(s->fd); glfs_close(s->fd);
} }
if (s->glfs) {
glfs_clear_preopened(s->glfs); glfs_fini(s->glfs);
}
/* use the newly opened image / connection */ /* use the newly opened image / connection */
s->fd = reop_s->fd; s->fd = reop_s->fd;
@@ -932,7 +835,9 @@ static void qemu_gluster_reopen_abort(BDRVReopenState *state)
glfs_close(reop_s->fd); glfs_close(reop_s->fd);
} }
glfs_clear_preopened(reop_s->glfs); if (reop_s->glfs) {
glfs_fini(reop_s->glfs);
}
g_free(state->opaque); g_free(state->opaque);
state->opaque = NULL; state->opaque = NULL;
@@ -963,6 +868,29 @@ static coroutine_fn int qemu_gluster_co_pwrite_zeroes(BlockDriverState *bs,
qemu_coroutine_yield(); qemu_coroutine_yield();
return acb.ret; return acb.ret;
} }
static inline bool gluster_supports_zerofill(void)
{
return 1;
}
static inline int qemu_gluster_zerofill(struct glfs_fd *fd, int64_t offset,
int64_t size)
{
return glfs_zerofill(fd, offset, size);
}
#else
static inline bool gluster_supports_zerofill(void)
{
return 0;
}
static inline int qemu_gluster_zerofill(struct glfs_fd *fd, int64_t offset,
int64_t size)
{
return 0;
}
#endif #endif
static int qemu_gluster_create(const char *filename, static int qemu_gluster_create(const char *filename,
@@ -972,26 +900,19 @@ static int qemu_gluster_create(const char *filename,
struct glfs *glfs; struct glfs *glfs;
struct glfs_fd *fd; struct glfs_fd *fd;
int ret = 0; int ret = 0;
PreallocMode prealloc; int prealloc = 0;
int64_t total_size = 0; int64_t total_size = 0;
char *tmp = NULL; char *tmp = NULL;
Error *local_err = NULL;
gconf = g_new0(BlockdevOptionsGluster, 1); gconf = g_new0(BlockdevOptionsGluster, 1);
gconf->debug = qemu_opt_get_number_del(opts, GLUSTER_OPT_DEBUG, gconf->debug_level = qemu_opt_get_number_del(opts, GLUSTER_OPT_DEBUG,
GLUSTER_DEBUG_DEFAULT); GLUSTER_DEBUG_DEFAULT);
if (gconf->debug < 0) { if (gconf->debug_level < 0) {
gconf->debug = 0; gconf->debug_level = 0;
} else if (gconf->debug > GLUSTER_DEBUG_MAX) { } else if (gconf->debug_level > GLUSTER_DEBUG_MAX) {
gconf->debug = GLUSTER_DEBUG_MAX; gconf->debug_level = GLUSTER_DEBUG_MAX;
} }
gconf->has_debug = true; gconf->has_debug_level = true;
gconf->logfile = qemu_opt_get_del(opts, GLUSTER_OPT_LOGFILE);
if (!gconf->logfile) {
gconf->logfile = g_strdup(GLUSTER_LOGFILE_DEFAULT);
}
gconf->has_logfile = true;
glfs = qemu_gluster_init(gconf, filename, NULL, errp); glfs = qemu_gluster_init(gconf, filename, NULL, errp);
if (!glfs) { if (!glfs) {
@@ -1003,12 +924,13 @@ static int qemu_gluster_create(const char *filename,
BDRV_SECTOR_SIZE); BDRV_SECTOR_SIZE);
tmp = qemu_opt_get_del(opts, BLOCK_OPT_PREALLOC); tmp = qemu_opt_get_del(opts, BLOCK_OPT_PREALLOC);
prealloc = qapi_enum_parse(PreallocMode_lookup, tmp, if (!tmp || !strcmp(tmp, "off")) {
PREALLOC_MODE__MAX, PREALLOC_MODE_OFF, prealloc = 0;
&local_err); } else if (!strcmp(tmp, "full") && gluster_supports_zerofill()) {
g_free(tmp); prealloc = 1;
if (local_err) { } else {
error_propagate(errp, local_err); error_setg(errp, "Invalid preallocation mode: '%s'"
" or GlusterFS doesn't support zerofill API", tmp);
ret = -EINVAL; ret = -EINVAL;
goto out; goto out;
} }
@@ -1017,50 +939,25 @@ static int qemu_gluster_create(const char *filename,
O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, S_IRUSR | S_IWUSR); O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, S_IRUSR | S_IWUSR);
if (!fd) { if (!fd) {
ret = -errno; ret = -errno;
goto out; } else {
}
switch (prealloc) {
#ifdef CONFIG_GLUSTERFS_FALLOCATE
case PREALLOC_MODE_FALLOC:
if (glfs_fallocate(fd, 0, 0, total_size)) {
error_setg(errp, "Could not preallocate data for the new file");
ret = -errno;
}
break;
#endif /* CONFIG_GLUSTERFS_FALLOCATE */
#ifdef CONFIG_GLUSTERFS_ZEROFILL
case PREALLOC_MODE_FULL:
if (!glfs_ftruncate(fd, total_size)) { if (!glfs_ftruncate(fd, total_size)) {
if (glfs_zerofill(fd, 0, total_size)) { if (prealloc && qemu_gluster_zerofill(fd, 0, total_size)) {
error_setg(errp, "Could not zerofill the new file");
ret = -errno; ret = -errno;
} }
} else { } else {
error_setg(errp, "Could not resize file");
ret = -errno; ret = -errno;
} }
break;
#endif /* CONFIG_GLUSTERFS_ZEROFILL */
case PREALLOC_MODE_OFF:
if (glfs_ftruncate(fd, total_size) != 0) {
ret = -errno;
error_setg(errp, "Could not resize file");
}
break;
default:
ret = -EINVAL;
error_setg(errp, "Unsupported preallocation mode: %s",
PreallocMode_lookup[prealloc]);
break;
}
if (glfs_close(fd) != 0) { if (glfs_close(fd) != 0) {
ret = -errno; ret = -errno;
} }
}
out: out:
g_free(tmp);
qapi_free_BlockdevOptionsGluster(gconf); qapi_free_BlockdevOptionsGluster(gconf);
glfs_clear_preopened(glfs); if (glfs) {
glfs_fini(glfs);
}
return ret; return ret;
} }
@@ -1095,17 +992,14 @@ static coroutine_fn int qemu_gluster_co_rw(BlockDriverState *bs,
return acb.ret; return acb.ret;
} }
static int qemu_gluster_truncate(BlockDriverState *bs, int64_t offset, static int qemu_gluster_truncate(BlockDriverState *bs, int64_t offset)
Error **errp)
{ {
int ret; int ret;
BDRVGlusterState *s = bs->opaque; BDRVGlusterState *s = bs->opaque;
ret = glfs_ftruncate(s->fd, offset); ret = glfs_ftruncate(s->fd, offset);
if (ret < 0) { if (ret < 0) {
ret = -errno; return -errno;
error_setg_errno(errp, -ret, "Failed to truncate file");
return ret;
} }
return 0; return 0;
@@ -1131,12 +1025,11 @@ static void qemu_gluster_close(BlockDriverState *bs)
{ {
BDRVGlusterState *s = bs->opaque; BDRVGlusterState *s = bs->opaque;
g_free(s->logfile);
if (s->fd) { if (s->fd) {
glfs_close(s->fd); glfs_close(s->fd);
s->fd = NULL; s->fd = NULL;
} }
glfs_clear_preopened(s->glfs); glfs_fini(s->glfs);
} }
static coroutine_fn int qemu_gluster_co_flush_to_disk(BlockDriverState *bs) static coroutine_fn int qemu_gluster_co_flush_to_disk(BlockDriverState *bs)
@@ -1249,20 +1142,18 @@ static int qemu_gluster_has_zero_init(BlockDriverState *bs)
* If @start is in a trailing hole or beyond EOF, return -ENXIO. * If @start is in a trailing hole or beyond EOF, return -ENXIO.
* If we can't find out, return a negative errno other than -ENXIO. * If we can't find out, return a negative errno other than -ENXIO.
* *
* (Shamefully copied from file-posix.c, only miniscule adaptions.) * (Shamefully copied from raw-posix.c, only miniscule adaptions.)
*/ */
static int find_allocation(BlockDriverState *bs, off_t start, static int find_allocation(BlockDriverState *bs, off_t start,
off_t *data, off_t *hole) off_t *data, off_t *hole)
{ {
BDRVGlusterState *s = bs->opaque; BDRVGlusterState *s = bs->opaque;
off_t offs;
if (!s->supports_seek_data) { if (!s->supports_seek_data) {
goto exit; return -ENOTSUP;
} }
#if defined SEEK_HOLE && defined SEEK_DATA
off_t offs;
/* /*
* SEEK_DATA cases: * SEEK_DATA cases:
* D1. offs == start: start is in data * D1. offs == start: start is in data
@@ -1278,14 +1169,7 @@ static int find_allocation(BlockDriverState *bs, off_t start,
if (offs < 0) { if (offs < 0) {
return -errno; /* D3 or D4 */ return -errno; /* D3 or D4 */
} }
assert(offs >= start);
if (offs < start) {
/* This is not a valid return by lseek(). We are safe to just return
* -EIO in this case, and we'll treat it like D4. Unfortunately some
* versions of gluster server will return offs < start, so an assert
* here will unnecessarily abort QEMU. */
return -EIO;
}
if (offs > start) { if (offs > start) {
/* D2: in hole, next data at offs */ /* D2: in hole, next data at offs */
@@ -1317,14 +1201,7 @@ static int find_allocation(BlockDriverState *bs, off_t start,
if (offs < 0) { if (offs < 0) {
return -errno; /* D1 and (H3 or H4) */ return -errno; /* D1 and (H3 or H4) */
} }
assert(offs >= start);
if (offs < start) {
/* This is not a valid return by lseek(). We are safe to just return
* -EIO in this case, and we'll treat it like H4. Unfortunately some
* versions of gluster server will return offs < start, so an assert
* here will unnecessarily abort QEMU. */
return -EIO;
}
if (offs > start) { if (offs > start) {
/* /*
@@ -1340,10 +1217,6 @@ static int find_allocation(BlockDriverState *bs, off_t start,
/* D1 and H1 */ /* D1 and H1 */
return -EBUSY; return -EBUSY;
#endif
exit:
return -ENOTSUP;
} }
/* /*
@@ -1359,7 +1232,7 @@ exit:
* 'nb_sectors' is the max value 'pnum' should be set to. If nb_sectors goes * 'nb_sectors' is the max value 'pnum' should be set to. If nb_sectors goes
* beyond the end of the disk image it will be clamped. * beyond the end of the disk image it will be clamped.
* *
* (Based on raw_co_get_block_status() from file-posix.c.) * (Based on raw_co_get_block_status() from raw-posix.c.)
*/ */
static int64_t coroutine_fn qemu_gluster_co_get_block_status( static int64_t coroutine_fn qemu_gluster_co_get_block_status(
BlockDriverState *bs, int64_t sector_num, int nb_sectors, int *pnum, BlockDriverState *bs, int64_t sector_num, int nb_sectors, int *pnum,

File diff suppressed because it is too large Load Diff

View File

@@ -1,69 +0,0 @@
/*
* QEMU Block driver for iSCSI images (static options)
*
* Copyright (c) 2017 Peter Lieven <pl@kamp.de>
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*/
#include "qemu/osdep.h"
#include "qemu-common.h"
#include "qemu/config-file.h"
static QemuOptsList qemu_iscsi_opts = {
.name = "iscsi",
.head = QTAILQ_HEAD_INITIALIZER(qemu_iscsi_opts.head),
.desc = {
{
.name = "user",
.type = QEMU_OPT_STRING,
.help = "username for CHAP authentication to target",
},{
.name = "password",
.type = QEMU_OPT_STRING,
.help = "password for CHAP authentication to target",
},{
.name = "password-secret",
.type = QEMU_OPT_STRING,
.help = "ID of the secret providing password for CHAP "
"authentication to target",
},{
.name = "header-digest",
.type = QEMU_OPT_STRING,
.help = "HeaderDigest setting. "
"{CRC32C|CRC32C-NONE|NONE-CRC32C|NONE}",
},{
.name = "initiator-name",
.type = QEMU_OPT_STRING,
.help = "Initiator iqn name to use when connecting",
},{
.name = "timeout",
.type = QEMU_OPT_NUMBER,
.help = "Request timeout in seconds (default 0 = no timeout)",
},
{ /* end of list */ }
},
};
static void iscsi_block_opts_init(void)
{
qemu_add_opts(&qemu_iscsi_opts);
}
block_init(iscsi_block_opts_init);

File diff suppressed because it is too large Load Diff

View File

@@ -54,11 +54,12 @@ struct LinuxAioState {
io_context_t ctx; io_context_t ctx;
EventNotifier e; EventNotifier e;
/* io queue for submit at batch. Protected by AioContext lock. */ /* io queue for submit at batch */
LaioQueue io_q; LaioQueue io_q;
/* I/O completion processing. Only runs in I/O thread. */ /* I/O completion processing */
QEMUBH *completion_bh; QEMUBH *completion_bh;
struct io_event events[MAX_EVENTS];
int event_idx; int event_idx;
int event_max; int event_max;
}; };
@@ -94,159 +95,64 @@ static void qemu_laio_process_completion(struct qemu_laiocb *laiocb)
laiocb->ret = ret; laiocb->ret = ret;
if (laiocb->co) { if (laiocb->co) {
/* If the coroutine is already entered it must be in ioq_submit() and qemu_coroutine_enter(laiocb->co);
* will notice laio->ret has been filled in when it eventually runs
* later. Coroutines cannot be entered recursively so avoid doing
* that!
*/
if (!qemu_coroutine_entered(laiocb->co)) {
aio_co_wake(laiocb->co);
}
} else { } else {
laiocb->common.cb(laiocb->common.opaque, ret); laiocb->common.cb(laiocb->common.opaque, ret);
qemu_aio_unref(laiocb); qemu_aio_unref(laiocb);
} }
} }
/** /* The completion BH fetches completed I/O requests and invokes their
* aio_ring buffer which is shared between userspace and kernel. * callbacks.
*
* This copied from linux/fs/aio.c, common header does not exist
* but AIO exists for ages so we assume ABI is stable.
*/
struct aio_ring {
unsigned id; /* kernel internal index number */
unsigned nr; /* number of io_events */
unsigned head; /* Written to by userland or by kernel. */
unsigned tail;
unsigned magic;
unsigned compat_features;
unsigned incompat_features;
unsigned header_length; /* size of aio_ring */
struct io_event io_events[0];
};
/**
* io_getevents_peek:
* @ctx: AIO context
* @events: pointer on events array, output value
* Returns the number of completed events and sets a pointer
* on events array. This function does not update the internal
* ring buffer, only reads head and tail. When @events has been
* processed io_getevents_commit() must be called.
*/
static inline unsigned int io_getevents_peek(io_context_t ctx,
struct io_event **events)
{
struct aio_ring *ring = (struct aio_ring *)ctx;
unsigned int head = ring->head, tail = ring->tail;
unsigned int nr;
nr = tail >= head ? tail - head : ring->nr - head;
*events = ring->io_events + head;
/* To avoid speculative loads of s->events[i] before observing tail.
Paired with smp_wmb() inside linux/fs/aio.c: aio_complete(). */
smp_rmb();
return nr;
}
/**
* io_getevents_commit:
* @ctx: AIO context
* @nr: the number of events on which head should be advanced
*
* Advances head of a ring buffer.
*/
static inline void io_getevents_commit(io_context_t ctx, unsigned int nr)
{
struct aio_ring *ring = (struct aio_ring *)ctx;
if (nr) {
ring->head = (ring->head + nr) % ring->nr;
}
}
/**
* io_getevents_advance_and_peek:
* @ctx: AIO context
* @events: pointer on events array, output value
* @nr: the number of events on which head should be advanced
*
* Advances head of a ring buffer and returns number of elements left.
*/
static inline unsigned int
io_getevents_advance_and_peek(io_context_t ctx,
struct io_event **events,
unsigned int nr)
{
io_getevents_commit(ctx, nr);
return io_getevents_peek(ctx, events);
}
/**
* qemu_laio_process_completions:
* @s: AIO state
*
* Fetches completed I/O requests and invokes their callbacks.
* *
* The function is somewhat tricky because it supports nested event loops, for * The function is somewhat tricky because it supports nested event loops, for
* example when a request callback invokes aio_poll(). In order to do this, * example when a request callback invokes aio_poll(). In order to do this,
* indices are kept in LinuxAioState. Function schedules BH completion so it * the completion events array and index are kept in LinuxAioState. The BH
* can be called again in a nested event loop. When there are no events left * reschedules itself as long as there are completions pending so it will
* to complete the BH is being canceled. * either be called again in a nested event loop or will be called after all
* events have been completed. When there are no events left to complete, the
* BH returns without rescheduling.
*/ */
static void qemu_laio_process_completions(LinuxAioState *s)
{
struct io_event *events;
/* Reschedule so nested event loops see currently pending completions */
qemu_bh_schedule(s->completion_bh);
while ((s->event_max = io_getevents_advance_and_peek(s->ctx, &events,
s->event_idx))) {
for (s->event_idx = 0; s->event_idx < s->event_max; ) {
struct iocb *iocb = events[s->event_idx].obj;
struct qemu_laiocb *laiocb =
container_of(iocb, struct qemu_laiocb, iocb);
laiocb->ret = io_event_ret(&events[s->event_idx]);
/* Change counters one-by-one because we can be nested. */
s->io_q.in_flight--;
s->event_idx++;
qemu_laio_process_completion(laiocb);
}
}
qemu_bh_cancel(s->completion_bh);
/* If we are nested we have to notify the level above that we are done
* by setting event_max to zero, upper level will then jump out of it's
* own `for` loop. If we are the last all counters droped to zero. */
s->event_max = 0;
s->event_idx = 0;
}
static void qemu_laio_process_completions_and_submit(LinuxAioState *s)
{
qemu_laio_process_completions(s);
aio_context_acquire(s->aio_context);
if (!s->io_q.plugged && !QSIMPLEQ_EMPTY(&s->io_q.pending)) {
ioq_submit(s);
}
aio_context_release(s->aio_context);
}
static void qemu_laio_completion_bh(void *opaque) static void qemu_laio_completion_bh(void *opaque)
{ {
LinuxAioState *s = opaque; LinuxAioState *s = opaque;
qemu_laio_process_completions_and_submit(s); /* Fetch more completion events when empty */
if (s->event_idx == s->event_max) {
do {
struct timespec ts = { 0 };
s->event_max = io_getevents(s->ctx, MAX_EVENTS, MAX_EVENTS,
s->events, &ts);
} while (s->event_max == -EINTR);
s->event_idx = 0;
if (s->event_max <= 0) {
s->event_max = 0;
return; /* no more events */
}
s->io_q.in_flight -= s->event_max;
}
/* Reschedule so nested event loops see currently pending completions */
qemu_bh_schedule(s->completion_bh);
/* Process completion events */
while (s->event_idx < s->event_max) {
struct iocb *iocb = s->events[s->event_idx].obj;
struct qemu_laiocb *laiocb =
container_of(iocb, struct qemu_laiocb, iocb);
laiocb->ret = io_event_ret(&s->events[s->event_idx]);
s->event_idx++;
qemu_laio_process_completion(laiocb);
}
if (!s->io_q.plugged && !QSIMPLEQ_EMPTY(&s->io_q.pending)) {
ioq_submit(s);
}
qemu_bh_cancel(s->completion_bh);
} }
static void qemu_laio_completion_cb(EventNotifier *e) static void qemu_laio_completion_cb(EventNotifier *e)
@@ -254,24 +160,10 @@ static void qemu_laio_completion_cb(EventNotifier *e)
LinuxAioState *s = container_of(e, LinuxAioState, e); LinuxAioState *s = container_of(e, LinuxAioState, e);
if (event_notifier_test_and_clear(&s->e)) { if (event_notifier_test_and_clear(&s->e)) {
qemu_laio_process_completions_and_submit(s); qemu_laio_completion_bh(s);
} }
} }
static bool qemu_laio_poll_cb(void *opaque)
{
EventNotifier *e = opaque;
LinuxAioState *s = container_of(e, LinuxAioState, e);
struct io_event *events;
if (!io_getevents_peek(s->ctx, &events)) {
return false;
}
qemu_laio_process_completions_and_submit(s);
return true;
}
static void laio_cancel(BlockAIOCB *blockacb) static void laio_cancel(BlockAIOCB *blockacb)
{ {
struct qemu_laiocb *laiocb = (struct qemu_laiocb *)blockacb; struct qemu_laiocb *laiocb = (struct qemu_laiocb *)blockacb;
@@ -344,19 +236,6 @@ static void ioq_submit(LinuxAioState *s)
QSIMPLEQ_SPLIT_AFTER(&s->io_q.pending, aiocb, next, &completed); QSIMPLEQ_SPLIT_AFTER(&s->io_q.pending, aiocb, next, &completed);
} while (ret == len && !QSIMPLEQ_EMPTY(&s->io_q.pending)); } while (ret == len && !QSIMPLEQ_EMPTY(&s->io_q.pending));
s->io_q.blocked = (s->io_q.in_queue > 0); s->io_q.blocked = (s->io_q.in_queue > 0);
if (s->io_q.in_flight) {
/* We can try to complete something just right away if there are
* still requests in-flight. */
qemu_laio_process_completions(s);
/*
* Even we have completed everything (in_flight == 0), the queue can
* have still pended requests (in_queue > 0). We do not attempt to
* repeat submission to avoid IO hang. The reason is simple: s->e is
* still set and completion callback will be called shortly and all
* pended requests will be submitted from there.
*/
}
} }
void laio_io_plug(BlockDriverState *bs, LinuxAioState *s) void laio_io_plug(BlockDriverState *bs, LinuxAioState *s)
@@ -414,7 +293,6 @@ int coroutine_fn laio_co_submit(BlockDriverState *bs, LinuxAioState *s, int fd,
.co = qemu_coroutine_self(), .co = qemu_coroutine_self(),
.nbytes = qiov->size, .nbytes = qiov->size,
.ctx = s, .ctx = s,
.ret = -EINPROGRESS,
.is_read = (type == QEMU_AIO_READ), .is_read = (type == QEMU_AIO_READ),
.qiov = qiov, .qiov = qiov,
}; };
@@ -424,9 +302,7 @@ int coroutine_fn laio_co_submit(BlockDriverState *bs, LinuxAioState *s, int fd,
return ret; return ret;
} }
if (laiocb.ret == -EINPROGRESS) {
qemu_coroutine_yield(); qemu_coroutine_yield();
}
return laiocb.ret; return laiocb.ret;
} }
@@ -456,9 +332,8 @@ BlockAIOCB *laio_submit(BlockDriverState *bs, LinuxAioState *s, int fd,
void laio_detach_aio_context(LinuxAioState *s, AioContext *old_context) void laio_detach_aio_context(LinuxAioState *s, AioContext *old_context)
{ {
aio_set_event_notifier(old_context, &s->e, false, NULL, NULL); aio_set_event_notifier(old_context, &s->e, false, NULL);
qemu_bh_delete(s->completion_bh); qemu_bh_delete(s->completion_bh);
s->aio_context = NULL;
} }
void laio_attach_aio_context(LinuxAioState *s, AioContext *new_context) void laio_attach_aio_context(LinuxAioState *s, AioContext *new_context)
@@ -466,8 +341,7 @@ void laio_attach_aio_context(LinuxAioState *s, AioContext *new_context)
s->aio_context = new_context; s->aio_context = new_context;
s->completion_bh = aio_bh_new(new_context, qemu_laio_completion_bh, s); s->completion_bh = aio_bh_new(new_context, qemu_laio_completion_bh, s);
aio_set_event_notifier(new_context, &s->e, false, aio_set_event_notifier(new_context, &s->e, false,
qemu_laio_completion_cb, qemu_laio_completion_cb);
qemu_laio_poll_cb);
} }
LinuxAioState *laio_init(void) LinuxAioState *laio_init(void)

View File

@@ -12,9 +12,8 @@
*/ */
#include "qemu/osdep.h" #include "qemu/osdep.h"
#include "qemu/cutils.h"
#include "trace.h" #include "trace.h"
#include "block/blockjob_int.h" #include "block/blockjob.h"
#include "block/block_int.h" #include "block/block_int.h"
#include "sysemu/block-backend.h" #include "sysemu/block-backend.h"
#include "qapi/error.h" #include "qapi/error.h"
@@ -39,10 +38,7 @@ typedef struct MirrorBlockJob {
BlockJob common; BlockJob common;
RateLimit limit; RateLimit limit;
BlockBackend *target; BlockBackend *target;
BlockDriverState *mirror_top_bs;
BlockDriverState *source;
BlockDriverState *base; BlockDriverState *base;
/* The name of the graph node to replace */ /* The name of the graph node to replace */
char *replaces; char *replaces;
/* The BDS to replace */ /* The BDS to replace */
@@ -59,7 +55,7 @@ typedef struct MirrorBlockJob {
int64_t bdev_length; int64_t bdev_length;
unsigned long *cow_bitmap; unsigned long *cow_bitmap;
BdrvDirtyBitmap *dirty_bitmap; BdrvDirtyBitmap *dirty_bitmap;
BdrvDirtyBitmapIter *dbi; HBitmapIter hbi;
uint8_t *buf; uint8_t *buf;
QSIMPLEQ_HEAD(, MirrorBuffer) buf_free; QSIMPLEQ_HEAD(, MirrorBuffer) buf_free;
int buf_free_count; int buf_free_count;
@@ -73,7 +69,6 @@ typedef struct MirrorBlockJob {
bool waiting_for_io; bool waiting_for_io;
int target_cluster_sectors; int target_cluster_sectors;
int max_iov; int max_iov;
bool initial_zeroing_ongoing;
} MirrorBlockJob; } MirrorBlockJob;
typedef struct MirrorOp { typedef struct MirrorOp {
@@ -122,10 +117,9 @@ static void mirror_iteration_done(MirrorOp *op, int ret)
if (s->cow_bitmap) { if (s->cow_bitmap) {
bitmap_set(s->cow_bitmap, chunk_num, nb_chunks); bitmap_set(s->cow_bitmap, chunk_num, nb_chunks);
} }
if (!s->initial_zeroing_ongoing) {
s->common.offset += (uint64_t)op->nb_sectors * BDRV_SECTOR_SIZE; s->common.offset += (uint64_t)op->nb_sectors * BDRV_SECTOR_SIZE;
} }
}
qemu_iovec_destroy(&op->qiov); qemu_iovec_destroy(&op->qiov);
g_free(op); g_free(op);
@@ -138,8 +132,6 @@ static void mirror_write_complete(void *opaque, int ret)
{ {
MirrorOp *op = opaque; MirrorOp *op = opaque;
MirrorBlockJob *s = op->s; MirrorBlockJob *s = op->s;
aio_context_acquire(blk_get_aio_context(s->common.blk));
if (ret < 0) { if (ret < 0) {
BlockErrorAction action; BlockErrorAction action;
@@ -150,15 +142,12 @@ static void mirror_write_complete(void *opaque, int ret)
} }
} }
mirror_iteration_done(op, ret); mirror_iteration_done(op, ret);
aio_context_release(blk_get_aio_context(s->common.blk));
} }
static void mirror_read_complete(void *opaque, int ret) static void mirror_read_complete(void *opaque, int ret)
{ {
MirrorOp *op = opaque; MirrorOp *op = opaque;
MirrorBlockJob *s = op->s; MirrorBlockJob *s = op->s;
aio_context_acquire(blk_get_aio_context(s->common.blk));
if (ret < 0) { if (ret < 0) {
BlockErrorAction action; BlockErrorAction action;
@@ -169,12 +158,11 @@ static void mirror_read_complete(void *opaque, int ret)
} }
mirror_iteration_done(op, ret); mirror_iteration_done(op, ret);
} else { return;
}
blk_aio_pwritev(s->target, op->sector_num * BDRV_SECTOR_SIZE, &op->qiov, blk_aio_pwritev(s->target, op->sector_num * BDRV_SECTOR_SIZE, &op->qiov,
0, mirror_write_complete, op); 0, mirror_write_complete, op);
} }
aio_context_release(blk_get_aio_context(s->common.blk));
}
static inline void mirror_clip_sectors(MirrorBlockJob *s, static inline void mirror_clip_sectors(MirrorBlockJob *s,
int64_t sector_num, int64_t sector_num,
@@ -331,7 +319,7 @@ static void mirror_do_zero_or_discard(MirrorBlockJob *s,
static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s) static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
{ {
BlockDriverState *source = s->source; BlockDriverState *source = blk_bs(s->common.blk);
int64_t sector_num, first_chunk; int64_t sector_num, first_chunk;
uint64_t delay_ns = 0; uint64_t delay_ns = 0;
/* At least the first dirty chunk is mirrored in one iteration. */ /* At least the first dirty chunk is mirrored in one iteration. */
@@ -342,15 +330,13 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
int max_io_sectors = MAX((s->buf_size >> BDRV_SECTOR_BITS) / MAX_IN_FLIGHT, int max_io_sectors = MAX((s->buf_size >> BDRV_SECTOR_BITS) / MAX_IN_FLIGHT,
MAX_IO_SECTORS); MAX_IO_SECTORS);
bdrv_dirty_bitmap_lock(s->dirty_bitmap); sector_num = hbitmap_iter_next(&s->hbi);
sector_num = bdrv_dirty_iter_next(s->dbi);
if (sector_num < 0) { if (sector_num < 0) {
bdrv_set_dirty_iter(s->dbi, 0); bdrv_dirty_iter_init(s->dirty_bitmap, &s->hbi);
sector_num = bdrv_dirty_iter_next(s->dbi); sector_num = hbitmap_iter_next(&s->hbi);
trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bitmap)); trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bitmap));
assert(sector_num >= 0); assert(sector_num >= 0);
} }
bdrv_dirty_bitmap_unlock(s->dirty_bitmap);
first_chunk = sector_num / sectors_per_chunk; first_chunk = sector_num / sectors_per_chunk;
while (test_bit(first_chunk, s->in_flight_bitmap)) { while (test_bit(first_chunk, s->in_flight_bitmap)) {
@@ -362,26 +348,25 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
/* Find the number of consective dirty chunks following the first dirty /* Find the number of consective dirty chunks following the first dirty
* one, and wait for in flight requests in them. */ * one, and wait for in flight requests in them. */
bdrv_dirty_bitmap_lock(s->dirty_bitmap);
while (nb_chunks * sectors_per_chunk < (s->buf_size >> BDRV_SECTOR_BITS)) { while (nb_chunks * sectors_per_chunk < (s->buf_size >> BDRV_SECTOR_BITS)) {
int64_t next_dirty; int64_t hbitmap_next;
int64_t next_sector = sector_num + nb_chunks * sectors_per_chunk; int64_t next_sector = sector_num + nb_chunks * sectors_per_chunk;
int64_t next_chunk = next_sector / sectors_per_chunk; int64_t next_chunk = next_sector / sectors_per_chunk;
if (next_sector >= end || if (next_sector >= end ||
!bdrv_get_dirty_locked(source, s->dirty_bitmap, next_sector)) { !bdrv_get_dirty(source, s->dirty_bitmap, next_sector)) {
break; break;
} }
if (test_bit(next_chunk, s->in_flight_bitmap)) { if (test_bit(next_chunk, s->in_flight_bitmap)) {
break; break;
} }
next_dirty = bdrv_dirty_iter_next(s->dbi); hbitmap_next = hbitmap_iter_next(&s->hbi);
if (next_dirty > next_sector || next_dirty < 0) { if (hbitmap_next > next_sector || hbitmap_next < 0) {
/* The bitmap iterator's cache is stale, refresh it */ /* The bitmap iterator's cache is stale, refresh it */
bdrv_set_dirty_iter(s->dbi, next_sector); bdrv_set_dirty_iter(&s->hbi, next_sector);
next_dirty = bdrv_dirty_iter_next(s->dbi); hbitmap_next = hbitmap_iter_next(&s->hbi);
} }
assert(next_dirty == next_sector); assert(hbitmap_next == next_sector);
nb_chunks++; nb_chunks++;
} }
@@ -389,13 +374,11 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
* calling bdrv_get_block_status_above could yield - if some blocks are * calling bdrv_get_block_status_above could yield - if some blocks are
* marked dirty in this window, we need to know. * marked dirty in this window, we need to know.
*/ */
bdrv_reset_dirty_bitmap_locked(s->dirty_bitmap, sector_num, bdrv_reset_dirty_bitmap(s->dirty_bitmap, sector_num,
nb_chunks * sectors_per_chunk); nb_chunks * sectors_per_chunk);
bdrv_dirty_bitmap_unlock(s->dirty_bitmap);
bitmap_set(s->in_flight_bitmap, sector_num / sectors_per_chunk, nb_chunks); bitmap_set(s->in_flight_bitmap, sector_num / sectors_per_chunk, nb_chunks);
while (nb_chunks > 0 && sector_num < end) { while (nb_chunks > 0 && sector_num < end) {
int64_t ret; int ret;
int io_sectors, io_sectors_acct; int io_sectors, io_sectors_acct;
BlockDriverState *file; BlockDriverState *file;
enum MirrorMethod { enum MirrorMethod {
@@ -486,11 +469,7 @@ static void mirror_free_init(MirrorBlockJob *s)
} }
} }
/* This is also used for the .pause callback. There is no matching static void mirror_drain(MirrorBlockJob *s)
* mirror_resume() because mirror_run() will begin iterating again
* when the job is resumed.
*/
static void mirror_wait_for_all_io(MirrorBlockJob *s)
{ {
while (s->in_flight > 0) { while (s->in_flight > 0) {
mirror_wait_for_io(s); mirror_wait_for_io(s);
@@ -506,44 +485,12 @@ static void mirror_exit(BlockJob *job, void *opaque)
MirrorBlockJob *s = container_of(job, MirrorBlockJob, common); MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
MirrorExitData *data = opaque; MirrorExitData *data = opaque;
AioContext *replace_aio_context = NULL; AioContext *replace_aio_context = NULL;
BlockDriverState *src = s->source; BlockDriverState *src = blk_bs(s->common.blk);
BlockDriverState *target_bs = blk_bs(s->target); BlockDriverState *target_bs = blk_bs(s->target);
BlockDriverState *mirror_top_bs = s->mirror_top_bs;
Error *local_err = NULL;
bdrv_release_dirty_bitmap(src, s->dirty_bitmap);
/* Make sure that the source BDS doesn't go away before we called /* Make sure that the source BDS doesn't go away before we called
* block_job_completed(). */ * block_job_completed(). */
bdrv_ref(src); bdrv_ref(src);
bdrv_ref(mirror_top_bs);
bdrv_ref(target_bs);
/* Remove target parent that still uses BLK_PERM_WRITE/RESIZE before
* inserting target_bs at s->to_replace, where we might not be able to get
* these permissions.
*
* Note that blk_unref() alone doesn't necessarily drop permissions because
* we might be running nested inside mirror_drain(), which takes an extra
* reference, so use an explicit blk_set_perm() first. */
blk_set_perm(s->target, 0, BLK_PERM_ALL, &error_abort);
blk_unref(s->target);
s->target = NULL;
/* We don't access the source any more. Dropping any WRITE/RESIZE is
* required before it could become a backing file of target_bs. */
bdrv_child_try_set_perm(mirror_top_bs->backing, 0, BLK_PERM_ALL,
&error_abort);
if (s->backing_mode == MIRROR_SOURCE_BACKING_CHAIN) {
BlockDriverState *backing = s->is_none_mode ? src : s->base;
if (backing_bs(target_bs) != backing) {
bdrv_set_backing_hd(target_bs, backing, &local_err);
if (local_err) {
error_report_err(local_err);
data->ret = -EPERM;
}
}
}
if (s->to_replace) { if (s->to_replace) {
replace_aio_context = bdrv_get_aio_context(s->to_replace); replace_aio_context = bdrv_get_aio_context(s->to_replace);
@@ -563,12 +510,12 @@ static void mirror_exit(BlockJob *job, void *opaque)
/* The mirror job has no requests in flight any more, but we need to /* The mirror job has no requests in flight any more, but we need to
* drain potential other users of the BDS before changing the graph. */ * drain potential other users of the BDS before changing the graph. */
bdrv_drained_begin(target_bs); bdrv_drained_begin(target_bs);
bdrv_replace_node(to_replace, target_bs, &local_err); bdrv_replace_in_backing_chain(to_replace, target_bs);
bdrv_drained_end(target_bs); bdrv_drained_end(target_bs);
if (local_err) {
error_report_err(local_err); /* We just changed the BDS the job BB refers to */
data->ret = -EPERM; blk_remove_bs(job->blk);
} blk_insert_bs(job->blk, src);
} }
if (s->to_replace) { if (s->to_replace) {
bdrv_op_unblock_all(s->to_replace, s->replace_blocker); bdrv_op_unblock_all(s->to_replace, s->replace_blocker);
@@ -579,29 +526,11 @@ static void mirror_exit(BlockJob *job, void *opaque)
aio_context_release(replace_aio_context); aio_context_release(replace_aio_context);
} }
g_free(s->replaces); g_free(s->replaces);
bdrv_unref(target_bs); bdrv_op_unblock_all(target_bs, s->common.blocker);
blk_unref(s->target);
/* Remove the mirror filter driver from the graph. Before this, get rid of
* the blockers on the intermediate nodes so that the resulting state is
* valid. Also give up permissions on mirror_top_bs->backing, which might
* block the removal. */
block_job_remove_all_bdrv(job);
bdrv_child_try_set_perm(mirror_top_bs->backing, 0, BLK_PERM_ALL,
&error_abort);
bdrv_replace_node(mirror_top_bs, backing_bs(mirror_top_bs), &error_abort);
/* We just changed the BDS the job BB refers to (with either or both of the
* bdrv_replace_node() calls), so switch the BB back so the cleanup does
* the right thing. We don't need any permissions any more now. */
blk_remove_bs(job->blk);
blk_set_perm(job->blk, 0, BLK_PERM_ALL, &error_abort);
blk_insert_bs(job->blk, mirror_top_bs, &error_abort);
block_job_completed(&s->common, data->ret); block_job_completed(&s->common, data->ret);
g_free(data); g_free(data);
bdrv_drained_end(src); bdrv_drained_end(src);
bdrv_unref(mirror_top_bs);
bdrv_unref(src); bdrv_unref(src);
} }
@@ -621,7 +550,7 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
{ {
int64_t sector_num, end; int64_t sector_num, end;
BlockDriverState *base = s->base; BlockDriverState *base = s->base;
BlockDriverState *bs = s->source; BlockDriverState *bs = blk_bs(s->common.blk);
BlockDriverState *target_bs = blk_bs(s->target); BlockDriverState *target_bs = blk_bs(s->target);
int ret, n; int ret, n;
@@ -633,7 +562,6 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
return 0; return 0;
} }
s->initial_zeroing_ongoing = true;
for (sector_num = 0; sector_num < end; ) { for (sector_num = 0; sector_num < end; ) {
int nb_sectors = MIN(end - sector_num, int nb_sectors = MIN(end - sector_num,
QEMU_ALIGN_DOWN(INT_MAX, s->granularity) >> BDRV_SECTOR_BITS); QEMU_ALIGN_DOWN(INT_MAX, s->granularity) >> BDRV_SECTOR_BITS);
@@ -641,13 +569,11 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
mirror_throttle(s); mirror_throttle(s);
if (block_job_is_cancelled(&s->common)) { if (block_job_is_cancelled(&s->common)) {
s->initial_zeroing_ongoing = false;
return 0; return 0;
} }
if (s->in_flight >= MAX_IN_FLIGHT) { if (s->in_flight >= MAX_IN_FLIGHT) {
trace_mirror_yield(s, UINT64_MAX, s->buf_free_count, trace_mirror_yield(s, s->in_flight, s->buf_free_count, -1);
s->in_flight);
mirror_wait_for_io(s); mirror_wait_for_io(s);
continue; continue;
} }
@@ -656,8 +582,7 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
sector_num += nb_sectors; sector_num += nb_sectors;
} }
mirror_wait_for_all_io(s); mirror_drain(s);
s->initial_zeroing_ongoing = false;
} }
/* First part, loop on the sectors and initialize the dirty bitmap. */ /* First part, loop on the sectors and initialize the dirty bitmap. */
@@ -686,27 +611,12 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
return 0; return 0;
} }
/* Called when going out of the streaming phase to flush the bulk of the
* data to the medium, or just before completing.
*/
static int mirror_flush(MirrorBlockJob *s)
{
int ret = blk_flush(s->target);
if (ret < 0) {
if (mirror_error_action(s, false, -ret) == BLOCK_ERROR_ACTION_REPORT) {
s->ret = ret;
}
}
return ret;
}
static void coroutine_fn mirror_run(void *opaque) static void coroutine_fn mirror_run(void *opaque)
{ {
MirrorBlockJob *s = opaque; MirrorBlockJob *s = opaque;
MirrorExitData *data; MirrorExitData *data;
BlockDriverState *bs = s->source; BlockDriverState *bs = blk_bs(s->common.blk);
BlockDriverState *target_bs = blk_bs(s->target); BlockDriverState *target_bs = blk_bs(s->target);
bool need_drain = true;
int64_t length; int64_t length;
BlockDriverInfo bdi; BlockDriverInfo bdi;
char backing_filename[2]; /* we only need 2 characters because we are only char backing_filename[2]; /* we only need 2 characters because we are only
@@ -722,28 +632,7 @@ static void coroutine_fn mirror_run(void *opaque)
if (s->bdev_length < 0) { if (s->bdev_length < 0) {
ret = s->bdev_length; ret = s->bdev_length;
goto immediate_exit; goto immediate_exit;
} } else if (s->bdev_length == 0) {
/* Active commit must resize the base image if its size differs from the
* active layer. */
if (s->base == blk_bs(s->target)) {
int64_t base_length;
base_length = blk_getlength(s->target);
if (base_length < 0) {
ret = base_length;
goto immediate_exit;
}
if (s->bdev_length > base_length) {
ret = blk_truncate(s->target, s->bdev_length, NULL);
if (ret < 0) {
goto immediate_exit;
}
}
}
if (s->bdev_length == 0) {
/* Report BLOCK_JOB_READY and wait for complete. */ /* Report BLOCK_JOB_READY and wait for complete. */
block_job_event_ready(&s->common); block_job_event_ready(&s->common);
s->synced = true; s->synced = true;
@@ -790,8 +679,7 @@ static void coroutine_fn mirror_run(void *opaque)
} }
} }
assert(!s->dbi); bdrv_dirty_iter_init(s->dirty_bitmap, &s->hbi);
s->dbi = bdrv_dirty_iter_new(s->dirty_bitmap, 0);
for (;;) { for (;;) {
uint64_t delay_ns = 0; uint64_t delay_ns = 0;
int64_t cnt, delta; int64_t cnt, delta;
@@ -822,7 +710,7 @@ static void coroutine_fn mirror_run(void *opaque)
s->common.iostatus == BLOCK_DEVICE_IO_STATUS_OK) { s->common.iostatus == BLOCK_DEVICE_IO_STATUS_OK) {
if (s->in_flight >= MAX_IN_FLIGHT || s->buf_free_count == 0 || if (s->in_flight >= MAX_IN_FLIGHT || s->buf_free_count == 0 ||
(cnt == 0 && s->in_flight > 0)) { (cnt == 0 && s->in_flight > 0)) {
trace_mirror_yield(s, cnt, s->buf_free_count, s->in_flight); trace_mirror_yield(s, s->in_flight, s->buf_free_count, cnt);
mirror_wait_for_io(s); mirror_wait_for_io(s);
continue; continue;
} else if (cnt != 0) { } else if (cnt != 0) {
@@ -833,16 +721,19 @@ static void coroutine_fn mirror_run(void *opaque)
should_complete = false; should_complete = false;
if (s->in_flight == 0 && cnt == 0) { if (s->in_flight == 0 && cnt == 0) {
trace_mirror_before_flush(s); trace_mirror_before_flush(s);
if (!s->synced) { ret = blk_flush(s->target);
if (mirror_flush(s) < 0) { if (ret < 0) {
/* Go check s->ret. */ if (mirror_error_action(s, false, -ret) ==
continue; BLOCK_ERROR_ACTION_REPORT) {
goto immediate_exit;
} }
} else {
/* We're out of the streaming phase. From now on, if the job /* We're out of the streaming phase. From now on, if the job
* is cancelled we will actually complete all pending I/O and * is cancelled we will actually complete all pending I/O and
* report completion. This way, block-job-cancel will leave * report completion. This way, block-job-cancel will leave
* the target in a consistent state. * the target in a consistent state.
*/ */
if (!s->synced) {
block_job_event_ready(&s->common); block_job_event_ready(&s->common);
s->synced = true; s->synced = true;
} }
@@ -851,6 +742,7 @@ static void coroutine_fn mirror_run(void *opaque)
block_job_is_cancelled(&s->common); block_job_is_cancelled(&s->common);
cnt = bdrv_get_dirty_count(s->dirty_bitmap); cnt = bdrv_get_dirty_count(s->dirty_bitmap);
} }
}
if (cnt == 0 && should_complete) { if (cnt == 0 && should_complete) {
/* The dirty bitmap is not updated while operations are pending. /* The dirty bitmap is not updated while operations are pending.
@@ -859,26 +751,11 @@ static void coroutine_fn mirror_run(void *opaque)
* source has dirty data to copy! * source has dirty data to copy!
* *
* Note that I/O can be submitted by the guest while * Note that I/O can be submitted by the guest while
* mirror_populate runs, so pause it now. Before deciding * mirror_populate runs.
* whether to switch to target check one last time if I/O has
* come in the meanwhile, and if not flush the data to disk.
*/ */
trace_mirror_before_drain(s, cnt); trace_mirror_before_drain(s, cnt);
bdrv_co_drain(bs);
bdrv_drained_begin(bs);
cnt = bdrv_get_dirty_count(s->dirty_bitmap); cnt = bdrv_get_dirty_count(s->dirty_bitmap);
if (cnt > 0 || mirror_flush(s) < 0) {
bdrv_drained_end(bs);
continue;
}
/* The two disks are in sync. Exit and report successful
* completion.
*/
assert(QLIST_EMPTY(&bs->tracked_requests));
s->common.cancelled = false;
need_drain = false;
break;
} }
ret = 0; ret = 0;
@@ -891,6 +768,13 @@ static void coroutine_fn mirror_run(void *opaque)
} else if (!should_complete) { } else if (!should_complete) {
delay_ns = (s->in_flight == 0 && cnt == 0 ? SLICE_TIME : 0); delay_ns = (s->in_flight == 0 && cnt == 0 ? SLICE_TIME : 0);
block_job_sleep_ns(&s->common, QEMU_CLOCK_REALTIME, delay_ns); block_job_sleep_ns(&s->common, QEMU_CLOCK_REALTIME, delay_ns);
} else if (cnt == 0) {
/* The two disks are in sync. Exit and report successful
* completion.
*/
assert(QLIST_EMPTY(&bs->tracked_requests));
s->common.cancelled = false;
break;
} }
s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME); s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
} }
@@ -902,22 +786,20 @@ immediate_exit:
* the target is a copy of the source. * the target is a copy of the source.
*/ */
assert(ret < 0 || (!s->synced && block_job_is_cancelled(&s->common))); assert(ret < 0 || (!s->synced && block_job_is_cancelled(&s->common)));
assert(need_drain); mirror_drain(s);
mirror_wait_for_all_io(s);
} }
assert(s->in_flight == 0); assert(s->in_flight == 0);
qemu_vfree(s->buf); qemu_vfree(s->buf);
g_free(s->cow_bitmap); g_free(s->cow_bitmap);
g_free(s->in_flight_bitmap); g_free(s->in_flight_bitmap);
bdrv_dirty_iter_free(s->dbi); bdrv_release_dirty_bitmap(bs, s->dirty_bitmap);
data = g_malloc(sizeof(*data)); data = g_malloc(sizeof(*data));
data->ret = ret; data->ret = ret;
/* Before we switch to target in mirror_exit, make sure data doesn't
if (need_drain) { * change. */
bdrv_drained_begin(bs); bdrv_drained_begin(bs);
}
block_job_defer_to_main_loop(&s->common, mirror_exit, data); block_job_defer_to_main_loop(&s->common, mirror_exit, data);
} }
@@ -935,8 +817,9 @@ static void mirror_set_speed(BlockJob *job, int64_t speed, Error **errp)
static void mirror_complete(BlockJob *job, Error **errp) static void mirror_complete(BlockJob *job, Error **errp)
{ {
MirrorBlockJob *s = container_of(job, MirrorBlockJob, common); MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
BlockDriverState *target; BlockDriverState *src, *target;
src = blk_bs(job->blk);
target = blk_bs(s->target); target = blk_bs(s->target);
if (!s->synced) { if (!s->synced) {
@@ -968,10 +851,6 @@ static void mirror_complete(BlockJob *job, Error **errp)
replace_aio_context = bdrv_get_aio_context(s->to_replace); replace_aio_context = bdrv_get_aio_context(s->to_replace);
aio_context_acquire(replace_aio_context); aio_context_acquire(replace_aio_context);
/* TODO Translate this into permission system. Current definition of
* GRAPH_MOD would require to request it for the parents; they might
* not even be BlockDriverStates, however, so a BdrvChild can't address
* them. May need redefinition of GRAPH_MOD. */
error_setg(&s->replace_blocker, error_setg(&s->replace_blocker,
"block device is in use by block-job-complete"); "block device is in use by block-job-complete");
bdrv_op_block_all(s->to_replace, s->replace_blocker); bdrv_op_block_all(s->to_replace, s->replace_blocker);
@@ -980,15 +859,25 @@ static void mirror_complete(BlockJob *job, Error **errp)
aio_context_release(replace_aio_context); aio_context_release(replace_aio_context);
} }
if (s->backing_mode == MIRROR_SOURCE_BACKING_CHAIN) {
BlockDriverState *backing = s->is_none_mode ? src : s->base;
if (backing_bs(target) != backing) {
bdrv_set_backing_hd(target, backing);
}
}
s->should_complete = true; s->should_complete = true;
block_job_enter(&s->common); block_job_enter(&s->common);
} }
static void mirror_pause(BlockJob *job) /* There is no matching mirror_resume() because mirror_run() will begin
* iterating again when the job is resumed.
*/
static void coroutine_fn mirror_pause(BlockJob *job)
{ {
MirrorBlockJob *s = container_of(job, MirrorBlockJob, common); MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
mirror_wait_for_all_io(s); mirror_drain(s);
} }
static void mirror_attached_aio_context(BlockJob *job, AioContext *new_context) static void mirror_attached_aio_context(BlockJob *job, AioContext *new_context)
@@ -998,143 +887,38 @@ static void mirror_attached_aio_context(BlockJob *job, AioContext *new_context)
blk_set_aio_context(s->target, new_context); blk_set_aio_context(s->target, new_context);
} }
static void mirror_drain(BlockJob *job)
{
MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
/* Need to keep a reference in case blk_drain triggers execution
* of mirror_complete...
*/
if (s->target) {
BlockBackend *target = s->target;
blk_ref(target);
blk_drain(target);
blk_unref(target);
}
}
static const BlockJobDriver mirror_job_driver = { static const BlockJobDriver mirror_job_driver = {
.instance_size = sizeof(MirrorBlockJob), .instance_size = sizeof(MirrorBlockJob),
.job_type = BLOCK_JOB_TYPE_MIRROR, .job_type = BLOCK_JOB_TYPE_MIRROR,
.set_speed = mirror_set_speed, .set_speed = mirror_set_speed,
.start = mirror_run,
.complete = mirror_complete, .complete = mirror_complete,
.pause = mirror_pause, .pause = mirror_pause,
.attached_aio_context = mirror_attached_aio_context, .attached_aio_context = mirror_attached_aio_context,
.drain = mirror_drain,
}; };
static const BlockJobDriver commit_active_job_driver = { static const BlockJobDriver commit_active_job_driver = {
.instance_size = sizeof(MirrorBlockJob), .instance_size = sizeof(MirrorBlockJob),
.job_type = BLOCK_JOB_TYPE_COMMIT, .job_type = BLOCK_JOB_TYPE_COMMIT,
.set_speed = mirror_set_speed, .set_speed = mirror_set_speed,
.start = mirror_run,
.complete = mirror_complete, .complete = mirror_complete,
.pause = mirror_pause, .pause = mirror_pause,
.attached_aio_context = mirror_attached_aio_context, .attached_aio_context = mirror_attached_aio_context,
.drain = mirror_drain,
};
static int coroutine_fn bdrv_mirror_top_preadv(BlockDriverState *bs,
uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags)
{
return bdrv_co_preadv(bs->backing, offset, bytes, qiov, flags);
}
static int coroutine_fn bdrv_mirror_top_pwritev(BlockDriverState *bs,
uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags)
{
return bdrv_co_pwritev(bs->backing, offset, bytes, qiov, flags);
}
static int coroutine_fn bdrv_mirror_top_flush(BlockDriverState *bs)
{
return bdrv_co_flush(bs->backing->bs);
}
static int64_t coroutine_fn bdrv_mirror_top_get_block_status(
BlockDriverState *bs, int64_t sector_num, int nb_sectors, int *pnum,
BlockDriverState **file)
{
*pnum = nb_sectors;
*file = bs->backing->bs;
return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID | BDRV_BLOCK_DATA |
(sector_num << BDRV_SECTOR_BITS);
}
static int coroutine_fn bdrv_mirror_top_pwrite_zeroes(BlockDriverState *bs,
int64_t offset, int count, BdrvRequestFlags flags)
{
return bdrv_co_pwrite_zeroes(bs->backing, offset, count, flags);
}
static int coroutine_fn bdrv_mirror_top_pdiscard(BlockDriverState *bs,
int64_t offset, int count)
{
return bdrv_co_pdiscard(bs->backing->bs, offset, count);
}
static void bdrv_mirror_top_refresh_filename(BlockDriverState *bs, QDict *opts)
{
bdrv_refresh_filename(bs->backing->bs);
pstrcpy(bs->exact_filename, sizeof(bs->exact_filename),
bs->backing->bs->filename);
}
static void bdrv_mirror_top_close(BlockDriverState *bs)
{
}
static void bdrv_mirror_top_child_perm(BlockDriverState *bs, BdrvChild *c,
const BdrvChildRole *role,
uint64_t perm, uint64_t shared,
uint64_t *nperm, uint64_t *nshared)
{
/* Must be able to forward guest writes to the real image */
*nperm = 0;
if (perm & BLK_PERM_WRITE) {
*nperm |= BLK_PERM_WRITE;
}
*nshared = BLK_PERM_ALL;
}
/* Dummy node that provides consistent read to its users without requiring it
* from its backing file and that allows writes on the backing file chain. */
static BlockDriver bdrv_mirror_top = {
.format_name = "mirror_top",
.bdrv_co_preadv = bdrv_mirror_top_preadv,
.bdrv_co_pwritev = bdrv_mirror_top_pwritev,
.bdrv_co_pwrite_zeroes = bdrv_mirror_top_pwrite_zeroes,
.bdrv_co_pdiscard = bdrv_mirror_top_pdiscard,
.bdrv_co_flush = bdrv_mirror_top_flush,
.bdrv_co_get_block_status = bdrv_mirror_top_get_block_status,
.bdrv_refresh_filename = bdrv_mirror_top_refresh_filename,
.bdrv_close = bdrv_mirror_top_close,
.bdrv_child_perm = bdrv_mirror_top_child_perm,
}; };
static void mirror_start_job(const char *job_id, BlockDriverState *bs, static void mirror_start_job(const char *job_id, BlockDriverState *bs,
int creation_flags, BlockDriverState *target, BlockDriverState *target, const char *replaces,
const char *replaces, int64_t speed, int64_t speed, uint32_t granularity,
uint32_t granularity, int64_t buf_size, int64_t buf_size,
BlockMirrorBackingMode backing_mode, BlockMirrorBackingMode backing_mode,
BlockdevOnError on_source_error, BlockdevOnError on_source_error,
BlockdevOnError on_target_error, BlockdevOnError on_target_error,
bool unmap, bool unmap,
BlockCompletionFunc *cb, BlockCompletionFunc *cb,
void *opaque, void *opaque, Error **errp,
const BlockJobDriver *driver, const BlockJobDriver *driver,
bool is_none_mode, BlockDriverState *base, bool is_none_mode, BlockDriverState *base)
bool auto_complete, const char *filter_node_name,
Error **errp)
{ {
MirrorBlockJob *s; MirrorBlockJob *s;
BlockDriverState *mirror_top_bs;
bool target_graph_mod;
bool target_is_backing;
Error *local_err = NULL;
int ret;
if (granularity == 0) { if (granularity == 0) {
granularity = bdrv_get_default_bitmap_granularity(target); granularity = bdrv_get_default_bitmap_granularity(target);
@@ -1151,65 +935,13 @@ static void mirror_start_job(const char *job_id, BlockDriverState *bs,
buf_size = DEFAULT_MIRROR_BUF_SIZE; buf_size = DEFAULT_MIRROR_BUF_SIZE;
} }
/* In the case of active commit, add dummy driver to provide consistent s = block_job_create(job_id, driver, bs, speed, cb, opaque, errp);
* reads on the top, while disabling it in the intermediate nodes, and make
* the backing chain writable. */
mirror_top_bs = bdrv_new_open_driver(&bdrv_mirror_top, filter_node_name,
BDRV_O_RDWR, errp);
if (mirror_top_bs == NULL) {
return;
}
mirror_top_bs->total_sectors = bs->total_sectors;
bdrv_set_aio_context(mirror_top_bs, bdrv_get_aio_context(bs));
/* bdrv_append takes ownership of the mirror_top_bs reference, need to keep
* it alive until block_job_create() succeeds even if bs has no parent. */
bdrv_ref(mirror_top_bs);
bdrv_drained_begin(bs);
bdrv_append(mirror_top_bs, bs, &local_err);
bdrv_drained_end(bs);
if (local_err) {
bdrv_unref(mirror_top_bs);
error_propagate(errp, local_err);
return;
}
/* Make sure that the source is not resized while the job is running */
s = block_job_create(job_id, driver, mirror_top_bs,
BLK_PERM_CONSISTENT_READ,
BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED |
BLK_PERM_WRITE | BLK_PERM_GRAPH_MOD, speed,
creation_flags, cb, opaque, errp);
if (!s) { if (!s) {
goto fail; return;
} }
/* The block job now has a reference to this node */
bdrv_unref(mirror_top_bs);
s->source = bs; s->target = blk_new();
s->mirror_top_bs = mirror_top_bs; blk_insert_bs(s->target, target);
/* No resize for the target either; while the mirror is still running, a
* consistent read isn't necessarily possible. We could possibly allow
* writes and graph modifications, though it would likely defeat the
* purpose of a mirror, so leave them blocked for now.
*
* In the case of active commit, things look a bit different, though,
* because the target is an already populated backing file in active use.
* We can allow anything except resize there.*/
target_is_backing = bdrv_chain_contains(bs, target);
target_graph_mod = (backing_mode != MIRROR_LEAVE_BACKING_CHAIN);
s->target = blk_new(BLK_PERM_WRITE | BLK_PERM_RESIZE |
(target_graph_mod ? BLK_PERM_GRAPH_MOD : 0),
BLK_PERM_WRITE_UNCHANGED |
(target_is_backing ? BLK_PERM_CONSISTENT_READ |
BLK_PERM_WRITE |
BLK_PERM_GRAPH_MOD : 0));
ret = blk_insert_bs(s->target, target, errp);
if (ret < 0) {
goto fail;
}
s->replaces = g_strdup(replaces); s->replaces = g_strdup(replaces);
s->on_source_error = on_source_error; s->on_source_error = on_source_error;
@@ -1220,57 +952,20 @@ static void mirror_start_job(const char *job_id, BlockDriverState *bs,
s->granularity = granularity; s->granularity = granularity;
s->buf_size = ROUND_UP(buf_size, granularity); s->buf_size = ROUND_UP(buf_size, granularity);
s->unmap = unmap; s->unmap = unmap;
if (auto_complete) {
s->should_complete = true;
}
s->dirty_bitmap = bdrv_create_dirty_bitmap(bs, granularity, NULL, errp); s->dirty_bitmap = bdrv_create_dirty_bitmap(bs, granularity, NULL, errp);
if (!s->dirty_bitmap) { if (!s->dirty_bitmap) {
goto fail;
}
/* Required permissions are already taken with blk_new() */
block_job_add_bdrv(&s->common, "target", target, 0, BLK_PERM_ALL,
&error_abort);
/* In commit_active_start() all intermediate nodes disappear, so
* any jobs in them must be blocked */
if (target_is_backing) {
BlockDriverState *iter;
for (iter = backing_bs(bs); iter != target; iter = backing_bs(iter)) {
/* XXX BLK_PERM_WRITE needs to be allowed so we don't block
* ourselves at s->base (if writes are blocked for a node, they are
* also blocked for its backing file). The other options would be a
* second filter driver above s->base (== target). */
ret = block_job_add_bdrv(&s->common, "intermediate node", iter, 0,
BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE,
errp);
if (ret < 0) {
goto fail;
}
}
}
trace_mirror_start(bs, s, opaque);
block_job_start(&s->common);
return;
fail:
if (s) {
/* Make sure this BDS does not go away until we have completed the graph
* changes below */
bdrv_ref(mirror_top_bs);
g_free(s->replaces); g_free(s->replaces);
blk_unref(s->target); blk_unref(s->target);
block_job_early_fail(&s->common); block_job_unref(&s->common);
return;
} }
bdrv_child_try_set_perm(mirror_top_bs->backing, 0, BLK_PERM_ALL, bdrv_op_block_all(target, s->common.blocker);
&error_abort);
bdrv_replace_node(mirror_top_bs, backing_bs(mirror_top_bs), &error_abort);
bdrv_unref(mirror_top_bs); s->common.co = qemu_coroutine_create(mirror_run, s);
trace_mirror_start(bs, s, s->common.co, opaque);
qemu_coroutine_enter(s->common.co);
} }
void mirror_start(const char *job_id, BlockDriverState *bs, void mirror_start(const char *job_id, BlockDriverState *bs,
@@ -1279,7 +974,9 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
MirrorSyncMode mode, BlockMirrorBackingMode backing_mode, MirrorSyncMode mode, BlockMirrorBackingMode backing_mode,
BlockdevOnError on_source_error, BlockdevOnError on_source_error,
BlockdevOnError on_target_error, BlockdevOnError on_target_error,
bool unmap, const char *filter_node_name, Error **errp) bool unmap,
BlockCompletionFunc *cb,
void *opaque, Error **errp)
{ {
bool is_none_mode; bool is_none_mode;
BlockDriverState *base; BlockDriverState *base;
@@ -1290,21 +987,21 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
} }
is_none_mode = mode == MIRROR_SYNC_MODE_NONE; is_none_mode = mode == MIRROR_SYNC_MODE_NONE;
base = mode == MIRROR_SYNC_MODE_TOP ? backing_bs(bs) : NULL; base = mode == MIRROR_SYNC_MODE_TOP ? backing_bs(bs) : NULL;
mirror_start_job(job_id, bs, BLOCK_JOB_DEFAULT, target, replaces, mirror_start_job(job_id, bs, target, replaces,
speed, granularity, buf_size, backing_mode, speed, granularity, buf_size, backing_mode,
on_source_error, on_target_error, unmap, NULL, NULL, on_source_error, on_target_error, unmap, cb, opaque, errp,
&mirror_job_driver, is_none_mode, base, false, &mirror_job_driver, is_none_mode, base);
filter_node_name, errp);
} }
void commit_active_start(const char *job_id, BlockDriverState *bs, void commit_active_start(const char *job_id, BlockDriverState *bs,
BlockDriverState *base, int creation_flags, BlockDriverState *base, int64_t speed,
int64_t speed, BlockdevOnError on_error, BlockdevOnError on_error,
const char *filter_node_name, BlockCompletionFunc *cb,
BlockCompletionFunc *cb, void *opaque, void *opaque, Error **errp)
bool auto_complete, Error **errp)
{ {
int64_t length, base_length;
int orig_base_flags; int orig_base_flags;
int ret;
Error *local_err = NULL; Error *local_err = NULL;
orig_base_flags = bdrv_get_flags(base); orig_base_flags = bdrv_get_flags(base);
@@ -1313,11 +1010,35 @@ void commit_active_start(const char *job_id, BlockDriverState *bs,
return; return;
} }
mirror_start_job(job_id, bs, creation_flags, base, NULL, speed, 0, 0, length = bdrv_getlength(bs);
if (length < 0) {
error_setg_errno(errp, -length,
"Unable to determine length of %s", bs->filename);
goto error_restore_flags;
}
base_length = bdrv_getlength(base);
if (base_length < 0) {
error_setg_errno(errp, -base_length,
"Unable to determine length of %s", base->filename);
goto error_restore_flags;
}
if (length > base_length) {
ret = bdrv_truncate(base, length);
if (ret < 0) {
error_setg_errno(errp, -ret,
"Top image %s is larger than base image %s, and "
"resize of base image failed",
bs->filename, base->filename);
goto error_restore_flags;
}
}
mirror_start_job(job_id, bs, base, NULL, speed, 0, 0,
MIRROR_LEAVE_BACKING_CHAIN, MIRROR_LEAVE_BACKING_CHAIN,
on_error, on_error, true, cb, opaque, on_error, on_error, false, cb, opaque, &local_err,
&commit_active_job_driver, false, base, auto_complete, &commit_active_job_driver, false, base);
filter_node_name, &local_err);
if (local_err) { if (local_err) {
error_propagate(errp, local_err); error_propagate(errp, local_err);
goto error_restore_flags; goto error_restore_flags;

View File

@@ -1,7 +1,6 @@
/* /*
* QEMU Block driver for NBD * QEMU Block driver for NBD
* *
* Copyright (C) 2016 Red Hat, Inc.
* Copyright (C) 2008 Bull S.A.S. * Copyright (C) 2008 Bull S.A.S.
* Author: Laurent Vivier <Laurent.Vivier@bull.net> * Author: Laurent Vivier <Laurent.Vivier@bull.net>
* *
@@ -28,26 +27,25 @@
*/ */
#include "qemu/osdep.h" #include "qemu/osdep.h"
#include "qapi/error.h"
#include "nbd-client.h" #include "nbd-client.h"
#define HANDLE_TO_INDEX(bs, handle) ((handle) ^ ((uint64_t)(intptr_t)bs)) #define HANDLE_TO_INDEX(bs, handle) ((handle) ^ ((uint64_t)(intptr_t)bs))
#define INDEX_TO_HANDLE(bs, index) ((index) ^ ((uint64_t)(intptr_t)bs)) #define INDEX_TO_HANDLE(bs, index) ((index) ^ ((uint64_t)(intptr_t)bs))
static void nbd_recv_coroutines_enter_all(NBDClientSession *s) static void nbd_recv_coroutines_enter_all(NbdClientSession *s)
{ {
int i; int i;
for (i = 0; i < MAX_NBD_REQUESTS; i++) { for (i = 0; i < MAX_NBD_REQUESTS; i++) {
if (s->recv_coroutine[i]) { if (s->recv_coroutine[i]) {
aio_co_wake(s->recv_coroutine[i]); qemu_coroutine_enter(s->recv_coroutine[i]);
} }
} }
} }
static void nbd_teardown_connection(BlockDriverState *bs) static void nbd_teardown_connection(BlockDriverState *bs)
{ {
NBDClientSession *client = nbd_get_client_session(bs); NbdClientSession *client = nbd_get_client_session(bs);
if (!client->ioc) { /* Already closed */ if (!client->ioc) { /* Already closed */
return; return;
@@ -57,7 +55,7 @@ static void nbd_teardown_connection(BlockDriverState *bs)
qio_channel_shutdown(client->ioc, qio_channel_shutdown(client->ioc,
QIO_CHANNEL_SHUTDOWN_BOTH, QIO_CHANNEL_SHUTDOWN_BOTH,
NULL); NULL);
BDRV_POLL_WHILE(bs, client->read_reply_co); nbd_recv_coroutines_enter_all(client);
nbd_client_detach_aio_context(bs); nbd_client_detach_aio_context(bs);
object_unref(OBJECT(client->sioc)); object_unref(OBJECT(client->sioc));
@@ -66,63 +64,65 @@ static void nbd_teardown_connection(BlockDriverState *bs)
client->ioc = NULL; client->ioc = NULL;
} }
static coroutine_fn void nbd_read_reply_entry(void *opaque) static void nbd_reply_ready(void *opaque)
{ {
NBDClientSession *s = opaque; BlockDriverState *bs = opaque;
NbdClientSession *s = nbd_get_client_session(bs);
uint64_t i; uint64_t i;
int ret; int ret;
Error *local_err = NULL;
for (;;) { if (!s->ioc) { /* Already closed */
assert(s->reply.handle == 0); return;
ret = nbd_receive_reply(s->ioc, &s->reply, &local_err); }
if (ret < 0) {
error_report_err(local_err); if (s->reply.handle == 0) {
/* No reply already in flight. Fetch a header. It is possible
* that another thread has done the same thing in parallel, so
* the socket is not readable anymore.
*/
ret = nbd_receive_reply(s->ioc, &s->reply);
if (ret == -EAGAIN) {
return;
}
if (ret < 0) {
s->reply.handle = 0;
goto fail;
} }
if (ret <= 0) {
break;
} }
/* There's no need for a mutex on the receive side, because the /* There's no need for a mutex on the receive side, because the
* handler acts as a synchronization point and ensures that only * handler acts as a synchronization point and ensures that only
* one coroutine is called until the reply finishes. * one coroutine is called until the reply finishes. */
*/
i = HANDLE_TO_INDEX(s, s->reply.handle); i = HANDLE_TO_INDEX(s, s->reply.handle);
if (i >= MAX_NBD_REQUESTS || !s->recv_coroutine[i]) { if (i >= MAX_NBD_REQUESTS) {
break; goto fail;
} }
/* We're woken up by the recv_coroutine itself. Note that there if (s->recv_coroutine[i]) {
* is no race between yielding and reentering read_reply_co. This qemu_coroutine_enter(s->recv_coroutine[i]);
* is because: return;
*
* - if recv_coroutine[i] runs on the same AioContext, it is only
* entered after we yield
*
* - if recv_coroutine[i] runs on a different AioContext, reentering
* read_reply_co happens through a bottom half, which can only
* run after we yield.
*/
aio_co_wake(s->recv_coroutine[i]);
qemu_coroutine_yield();
} }
nbd_recv_coroutines_enter_all(s); fail:
s->read_reply_co = NULL; nbd_teardown_connection(bs);
}
static void nbd_restart_write(void *opaque)
{
BlockDriverState *bs = opaque;
qemu_coroutine_enter(nbd_get_client_session(bs)->send_coroutine);
} }
static int nbd_co_send_request(BlockDriverState *bs, static int nbd_co_send_request(BlockDriverState *bs,
NBDRequest *request, struct nbd_request *request,
QEMUIOVector *qiov) QEMUIOVector *qiov)
{ {
NBDClientSession *s = nbd_get_client_session(bs); NbdClientSession *s = nbd_get_client_session(bs);
AioContext *aio_context;
int rc, ret, i; int rc, ret, i;
qemu_co_mutex_lock(&s->send_mutex); qemu_co_mutex_lock(&s->send_mutex);
while (s->in_flight == MAX_NBD_REQUESTS) {
qemu_co_queue_wait(&s->free_sema, &s->send_mutex);
}
s->in_flight++;
for (i = 0; i < MAX_NBD_REQUESTS; i++) { for (i = 0; i < MAX_NBD_REQUESTS; i++) {
if (s->recv_coroutine[i] == NULL) { if (s->recv_coroutine[i] == NULL) {
@@ -140,12 +140,17 @@ static int nbd_co_send_request(BlockDriverState *bs,
return -EPIPE; return -EPIPE;
} }
s->send_coroutine = qemu_coroutine_self();
aio_context = bdrv_get_aio_context(bs);
aio_set_fd_handler(aio_context, s->sioc->fd, false,
nbd_reply_ready, nbd_restart_write, bs);
if (qiov) { if (qiov) {
qio_channel_set_cork(s->ioc, true); qio_channel_set_cork(s->ioc, true);
rc = nbd_send_request(s->ioc, request); rc = nbd_send_request(s->ioc, request);
if (rc >= 0) { if (rc >= 0) {
ret = nbd_rwv(s->ioc, qiov->iov, qiov->niov, request->len, false, ret = nbd_wr_syncv(s->ioc, qiov->iov, qiov->niov, request->len,
NULL); false);
if (ret != request->len) { if (ret != request->len) {
rc = -EIO; rc = -EIO;
} }
@@ -154,18 +159,22 @@ static int nbd_co_send_request(BlockDriverState *bs,
} else { } else {
rc = nbd_send_request(s->ioc, request); rc = nbd_send_request(s->ioc, request);
} }
aio_set_fd_handler(aio_context, s->sioc->fd, false,
nbd_reply_ready, NULL, bs);
s->send_coroutine = NULL;
qemu_co_mutex_unlock(&s->send_mutex); qemu_co_mutex_unlock(&s->send_mutex);
return rc; return rc;
} }
static void nbd_co_receive_reply(NBDClientSession *s, static void nbd_co_receive_reply(NbdClientSession *s,
NBDRequest *request, struct nbd_request *request,
NBDReply *reply, struct nbd_reply *reply,
QEMUIOVector *qiov) QEMUIOVector *qiov)
{ {
int ret; int ret;
/* Wait until we're woken up by nbd_read_reply_entry. */ /* Wait until we're woken up by the read handler. TODO: perhaps
* peek at the next reply and avoid yielding if it's ours? */
qemu_coroutine_yield(); qemu_coroutine_yield();
*reply = s->reply; *reply = s->reply;
if (reply->handle != request->handle || if (reply->handle != request->handle ||
@@ -173,8 +182,8 @@ static void nbd_co_receive_reply(NBDClientSession *s,
reply->error = EIO; reply->error = EIO;
} else { } else {
if (qiov && reply->error == 0) { if (qiov && reply->error == 0) {
ret = nbd_rwv(s->ioc, qiov->iov, qiov->niov, request->len, true, ret = nbd_wr_syncv(s->ioc, qiov->iov, qiov->niov, request->len,
NULL); true);
if (ret != request->len) { if (ret != request->len) {
reply->error = EIO; reply->error = EIO;
} }
@@ -185,118 +194,91 @@ static void nbd_co_receive_reply(NBDClientSession *s,
} }
} }
static void nbd_coroutine_end(BlockDriverState *bs, static void nbd_coroutine_start(NbdClientSession *s,
NBDRequest *request) struct nbd_request *request)
{ {
NBDClientSession *s = nbd_get_client_session(bs); /* Poor man semaphore. The free_sema is locked when no other request
int i = HANDLE_TO_INDEX(s, request->handle); * can be accepted, and unlocked after receiving one reply. */
if (s->in_flight >= MAX_NBD_REQUESTS - 1) {
qemu_co_mutex_lock(&s->free_sema);
assert(s->in_flight < MAX_NBD_REQUESTS);
}
s->in_flight++;
s->recv_coroutine[i] = NULL; /* s->recv_coroutine[i] is set as soon as we get the send_lock. */
/* Kick the read_reply_co to get the next reply. */
if (s->read_reply_co) {
aio_co_wake(s->read_reply_co);
} }
qemu_co_mutex_lock(&s->send_mutex); static void nbd_coroutine_end(NbdClientSession *s,
s->in_flight--; struct nbd_request *request)
qemu_co_queue_next(&s->free_sema); {
qemu_co_mutex_unlock(&s->send_mutex); int i = HANDLE_TO_INDEX(s, request->handle);
s->recv_coroutine[i] = NULL;
if (s->in_flight-- == MAX_NBD_REQUESTS) {
qemu_co_mutex_unlock(&s->free_sema);
}
} }
int nbd_client_co_preadv(BlockDriverState *bs, uint64_t offset, int nbd_client_co_preadv(BlockDriverState *bs, uint64_t offset,
uint64_t bytes, QEMUIOVector *qiov, int flags) uint64_t bytes, QEMUIOVector *qiov, int flags)
{ {
NBDClientSession *client = nbd_get_client_session(bs); NbdClientSession *client = nbd_get_client_session(bs);
NBDRequest request = { struct nbd_request request = {
.type = NBD_CMD_READ, .type = NBD_CMD_READ,
.from = offset, .from = offset,
.len = bytes, .len = bytes,
}; };
NBDReply reply; struct nbd_reply reply;
ssize_t ret; ssize_t ret;
assert(bytes <= NBD_MAX_BUFFER_SIZE); assert(bytes <= NBD_MAX_BUFFER_SIZE);
assert(!flags); assert(!flags);
nbd_coroutine_start(client, &request);
ret = nbd_co_send_request(bs, &request, NULL); ret = nbd_co_send_request(bs, &request, NULL);
if (ret < 0) { if (ret < 0) {
reply.error = -ret; reply.error = -ret;
} else { } else {
nbd_co_receive_reply(client, &request, &reply, qiov); nbd_co_receive_reply(client, &request, &reply, qiov);
} }
nbd_coroutine_end(bs, &request); nbd_coroutine_end(client, &request);
return -reply.error; return -reply.error;
} }
int nbd_client_co_pwritev(BlockDriverState *bs, uint64_t offset, int nbd_client_co_pwritev(BlockDriverState *bs, uint64_t offset,
uint64_t bytes, QEMUIOVector *qiov, int flags) uint64_t bytes, QEMUIOVector *qiov, int flags)
{ {
NBDClientSession *client = nbd_get_client_session(bs); NbdClientSession *client = nbd_get_client_session(bs);
NBDRequest request = { struct nbd_request request = {
.type = NBD_CMD_WRITE, .type = NBD_CMD_WRITE,
.from = offset, .from = offset,
.len = bytes, .len = bytes,
}; };
NBDReply reply; struct nbd_reply reply;
ssize_t ret; ssize_t ret;
if (flags & BDRV_REQ_FUA) { if (flags & BDRV_REQ_FUA) {
assert(client->nbdflags & NBD_FLAG_SEND_FUA); assert(client->nbdflags & NBD_FLAG_SEND_FUA);
request.flags |= NBD_CMD_FLAG_FUA; request.type |= NBD_CMD_FLAG_FUA;
} }
assert(bytes <= NBD_MAX_BUFFER_SIZE); assert(bytes <= NBD_MAX_BUFFER_SIZE);
nbd_coroutine_start(client, &request);
ret = nbd_co_send_request(bs, &request, qiov); ret = nbd_co_send_request(bs, &request, qiov);
if (ret < 0) { if (ret < 0) {
reply.error = -ret; reply.error = -ret;
} else { } else {
nbd_co_receive_reply(client, &request, &reply, NULL); nbd_co_receive_reply(client, &request, &reply, NULL);
} }
nbd_coroutine_end(bs, &request); nbd_coroutine_end(client, &request);
return -reply.error;
}
int nbd_client_co_pwrite_zeroes(BlockDriverState *bs, int64_t offset,
int count, BdrvRequestFlags flags)
{
ssize_t ret;
NBDClientSession *client = nbd_get_client_session(bs);
NBDRequest request = {
.type = NBD_CMD_WRITE_ZEROES,
.from = offset,
.len = count,
};
NBDReply reply;
if (!(client->nbdflags & NBD_FLAG_SEND_WRITE_ZEROES)) {
return -ENOTSUP;
}
if (flags & BDRV_REQ_FUA) {
assert(client->nbdflags & NBD_FLAG_SEND_FUA);
request.flags |= NBD_CMD_FLAG_FUA;
}
if (!(flags & BDRV_REQ_MAY_UNMAP)) {
request.flags |= NBD_CMD_FLAG_NO_HOLE;
}
ret = nbd_co_send_request(bs, &request, NULL);
if (ret < 0) {
reply.error = -ret;
} else {
nbd_co_receive_reply(client, &request, &reply, NULL);
}
nbd_coroutine_end(bs, &request);
return -reply.error; return -reply.error;
} }
int nbd_client_co_flush(BlockDriverState *bs) int nbd_client_co_flush(BlockDriverState *bs)
{ {
NBDClientSession *client = nbd_get_client_session(bs); NbdClientSession *client = nbd_get_client_session(bs);
NBDRequest request = { .type = NBD_CMD_FLUSH }; struct nbd_request request = { .type = NBD_CMD_FLUSH };
NBDReply reply; struct nbd_reply reply;
ssize_t ret; ssize_t ret;
if (!(client->nbdflags & NBD_FLAG_SEND_FLUSH)) { if (!(client->nbdflags & NBD_FLAG_SEND_FLUSH)) {
@@ -306,60 +288,66 @@ int nbd_client_co_flush(BlockDriverState *bs)
request.from = 0; request.from = 0;
request.len = 0; request.len = 0;
nbd_coroutine_start(client, &request);
ret = nbd_co_send_request(bs, &request, NULL); ret = nbd_co_send_request(bs, &request, NULL);
if (ret < 0) { if (ret < 0) {
reply.error = -ret; reply.error = -ret;
} else { } else {
nbd_co_receive_reply(client, &request, &reply, NULL); nbd_co_receive_reply(client, &request, &reply, NULL);
} }
nbd_coroutine_end(bs, &request); nbd_coroutine_end(client, &request);
return -reply.error; return -reply.error;
} }
int nbd_client_co_pdiscard(BlockDriverState *bs, int64_t offset, int count) int nbd_client_co_pdiscard(BlockDriverState *bs, int64_t offset, int count)
{ {
NBDClientSession *client = nbd_get_client_session(bs); NbdClientSession *client = nbd_get_client_session(bs);
NBDRequest request = { struct nbd_request request = {
.type = NBD_CMD_TRIM, .type = NBD_CMD_TRIM,
.from = offset, .from = offset,
.len = count, .len = count,
}; };
NBDReply reply; struct nbd_reply reply;
ssize_t ret; ssize_t ret;
if (!(client->nbdflags & NBD_FLAG_SEND_TRIM)) { if (!(client->nbdflags & NBD_FLAG_SEND_TRIM)) {
return 0; return 0;
} }
nbd_coroutine_start(client, &request);
ret = nbd_co_send_request(bs, &request, NULL); ret = nbd_co_send_request(bs, &request, NULL);
if (ret < 0) { if (ret < 0) {
reply.error = -ret; reply.error = -ret;
} else { } else {
nbd_co_receive_reply(client, &request, &reply, NULL); nbd_co_receive_reply(client, &request, &reply, NULL);
} }
nbd_coroutine_end(bs, &request); nbd_coroutine_end(client, &request);
return -reply.error; return -reply.error;
} }
void nbd_client_detach_aio_context(BlockDriverState *bs) void nbd_client_detach_aio_context(BlockDriverState *bs)
{ {
NBDClientSession *client = nbd_get_client_session(bs); aio_set_fd_handler(bdrv_get_aio_context(bs),
qio_channel_detach_aio_context(QIO_CHANNEL(client->sioc)); nbd_get_client_session(bs)->sioc->fd,
false, NULL, NULL, NULL);
} }
void nbd_client_attach_aio_context(BlockDriverState *bs, void nbd_client_attach_aio_context(BlockDriverState *bs,
AioContext *new_context) AioContext *new_context)
{ {
NBDClientSession *client = nbd_get_client_session(bs); aio_set_fd_handler(new_context, nbd_get_client_session(bs)->sioc->fd,
qio_channel_attach_aio_context(QIO_CHANNEL(client->sioc), new_context); false, nbd_reply_ready, NULL, bs);
aio_co_schedule(new_context, client->read_reply_co);
} }
void nbd_client_close(BlockDriverState *bs) void nbd_client_close(BlockDriverState *bs)
{ {
NBDClientSession *client = nbd_get_client_session(bs); NbdClientSession *client = nbd_get_client_session(bs);
NBDRequest request = { .type = NBD_CMD_DISC }; struct nbd_request request = {
.type = NBD_CMD_DISC,
.from = 0,
.len = 0
};
if (client->ioc == NULL) { if (client->ioc == NULL) {
return; return;
@@ -377,7 +365,7 @@ int nbd_client_init(BlockDriverState *bs,
const char *hostname, const char *hostname,
Error **errp) Error **errp)
{ {
NBDClientSession *client = nbd_get_client_session(bs); NbdClientSession *client = nbd_get_client_session(bs);
int ret; int ret;
/* NBD handshake */ /* NBD handshake */
@@ -395,14 +383,10 @@ int nbd_client_init(BlockDriverState *bs,
} }
if (client->nbdflags & NBD_FLAG_SEND_FUA) { if (client->nbdflags & NBD_FLAG_SEND_FUA) {
bs->supported_write_flags = BDRV_REQ_FUA; bs->supported_write_flags = BDRV_REQ_FUA;
bs->supported_zero_flags |= BDRV_REQ_FUA;
}
if (client->nbdflags & NBD_FLAG_SEND_WRITE_ZEROES) {
bs->supported_zero_flags |= BDRV_REQ_MAY_UNMAP;
} }
qemu_co_mutex_init(&client->send_mutex); qemu_co_mutex_init(&client->send_mutex);
qemu_co_queue_init(&client->free_sema); qemu_co_mutex_init(&client->free_sema);
client->sioc = sioc; client->sioc = sioc;
object_ref(OBJECT(client->sioc)); object_ref(OBJECT(client->sioc));
@@ -414,7 +398,7 @@ int nbd_client_init(BlockDriverState *bs,
/* Now that we're connected, set the socket to be non-blocking and /* Now that we're connected, set the socket to be non-blocking and
* kick the reply mechanism. */ * kick the reply mechanism. */
qio_channel_set_blocking(QIO_CHANNEL(sioc), false, NULL); qio_channel_set_blocking(QIO_CHANNEL(sioc), false, NULL);
client->read_reply_co = qemu_coroutine_create(nbd_read_reply_entry, client);
nbd_client_attach_aio_context(bs, bdrv_get_aio_context(bs)); nbd_client_attach_aio_context(bs, bdrv_get_aio_context(bs));
logout("Established connection with NBD server\n"); logout("Established connection with NBD server\n");

View File

@@ -17,22 +17,24 @@
#define MAX_NBD_REQUESTS 16 #define MAX_NBD_REQUESTS 16
typedef struct NBDClientSession { typedef struct NbdClientSession {
QIOChannelSocket *sioc; /* The master data channel */ QIOChannelSocket *sioc; /* The master data channel */
QIOChannel *ioc; /* The current I/O channel which may differ (eg TLS) */ QIOChannel *ioc; /* The current I/O channel which may differ (eg TLS) */
uint16_t nbdflags; uint16_t nbdflags;
off_t size; off_t size;
CoMutex send_mutex; CoMutex send_mutex;
CoQueue free_sema; CoMutex free_sema;
Coroutine *read_reply_co; Coroutine *send_coroutine;
int in_flight; int in_flight;
Coroutine *recv_coroutine[MAX_NBD_REQUESTS]; Coroutine *recv_coroutine[MAX_NBD_REQUESTS];
NBDReply reply; struct nbd_reply reply;
} NBDClientSession;
NBDClientSession *nbd_get_client_session(BlockDriverState *bs); bool is_unix;
} NbdClientSession;
NbdClientSession *nbd_get_client_session(BlockDriverState *bs);
int nbd_client_init(BlockDriverState *bs, int nbd_client_init(BlockDriverState *bs,
QIOChannelSocket *sock, QIOChannelSocket *sock,
@@ -46,8 +48,6 @@ int nbd_client_co_pdiscard(BlockDriverState *bs, int64_t offset, int count);
int nbd_client_co_flush(BlockDriverState *bs); int nbd_client_co_flush(BlockDriverState *bs);
int nbd_client_co_pwritev(BlockDriverState *bs, uint64_t offset, int nbd_client_co_pwritev(BlockDriverState *bs, uint64_t offset,
uint64_t bytes, QEMUIOVector *qiov, int flags); uint64_t bytes, QEMUIOVector *qiov, int flags);
int nbd_client_co_pwrite_zeroes(BlockDriverState *bs, int64_t offset,
int count, BdrvRequestFlags flags);
int nbd_client_co_preadv(BlockDriverState *bs, uint64_t offset, int nbd_client_co_preadv(BlockDriverState *bs, uint64_t offset,
uint64_t bytes, QEMUIOVector *qiov, int flags); uint64_t bytes, QEMUIOVector *qiov, int flags);

View File

@@ -32,9 +32,6 @@
#include "qemu/uri.h" #include "qemu/uri.h"
#include "block/block_int.h" #include "block/block_int.h"
#include "qemu/module.h" #include "qemu/module.h"
#include "qapi-visit.h"
#include "qapi/qobject-input-visitor.h"
#include "qapi/qobject-output-visitor.h"
#include "qapi/qmp/qdict.h" #include "qapi/qmp/qdict.h"
#include "qapi/qmp/qjson.h" #include "qapi/qmp/qjson.h"
#include "qapi/qmp/qint.h" #include "qapi/qmp/qint.h"
@@ -44,11 +41,10 @@
#define EN_OPTSTR ":exportname=" #define EN_OPTSTR ":exportname="
typedef struct BDRVNBDState { typedef struct BDRVNBDState {
NBDClientSession client; NbdClientSession client;
/* For nbd_refresh_filename() */ /* For nbd_refresh_filename() */
SocketAddress *saddr; char *path, *host, *port, *export, *tlscredsid;
char *export, *tlscredsid;
} BDRVNBDState; } BDRVNBDState;
static int nbd_parse_uri(const char *filename, QDict *options) static int nbd_parse_uri(const char *filename, QDict *options)
@@ -79,7 +75,7 @@ static int nbd_parse_uri(const char *filename, QDict *options)
p = uri->path ? uri->path : "/"; p = uri->path ? uri->path : "/";
p += strspn(p, "/"); p += strspn(p, "/");
if (p[0]) { if (p[0]) {
qdict_put_str(options, "export", p); qdict_put(options, "export", qstring_from_str(p));
} }
qp = query_params_parse(uri->query); qp = query_params_parse(uri->query);
@@ -94,12 +90,9 @@ static int nbd_parse_uri(const char *filename, QDict *options)
ret = -EINVAL; ret = -EINVAL;
goto out; goto out;
} }
qdict_put_str(options, "server.type", "unix"); qdict_put(options, "path", qstring_from_str(qp->p[0].value));
qdict_put_str(options, "server.path", qp->p[0].value);
} else { } else {
QString *host; QString *host;
char *port_str;
/* nbd[+tcp]://host[:port]/export */ /* nbd[+tcp]://host[:port]/export */
if (!uri->server) { if (!uri->server) {
ret = -EINVAL; ret = -EINVAL;
@@ -114,13 +107,13 @@ static int nbd_parse_uri(const char *filename, QDict *options)
host = qstring_from_str(uri->server); host = qstring_from_str(uri->server);
} }
qdict_put_str(options, "server.type", "inet"); qdict_put(options, "host", host);
qdict_put(options, "server.host", host); if (uri->port) {
char* port_str = g_strdup_printf("%d", uri->port);
port_str = g_strdup_printf("%d", uri->port ?: NBD_DEFAULT_PORT); qdict_put(options, "port", qstring_from_str(port_str));
qdict_put_str(options, "server.port", port_str);
g_free(port_str); g_free(port_str);
} }
}
out: out:
if (qp) { if (qp) {
@@ -130,26 +123,6 @@ out:
return ret; return ret;
} }
static bool nbd_has_filename_options_conflict(QDict *options, Error **errp)
{
const QDictEntry *e;
for (e = qdict_first(options); e; e = qdict_next(options, e)) {
if (!strcmp(e->key, "host") ||
!strcmp(e->key, "port") ||
!strcmp(e->key, "path") ||
!strcmp(e->key, "export") ||
strstart(e->key, "server.", NULL))
{
error_setg(errp, "Option '%s' cannot be used with a file name",
e->key);
return true;
}
}
return false;
}
static void nbd_parse_filename(const char *filename, QDict *options, static void nbd_parse_filename(const char *filename, QDict *options,
Error **errp) Error **errp)
{ {
@@ -158,7 +131,12 @@ static void nbd_parse_filename(const char *filename, QDict *options,
const char *host_spec; const char *host_spec;
const char *unixpath; const char *unixpath;
if (nbd_has_filename_options_conflict(options, errp)) { if (qdict_haskey(options, "host")
|| qdict_haskey(options, "port")
|| qdict_haskey(options, "path"))
{
error_setg(errp, "host/port/path and a file name may not be specified "
"at the same time");
return; return;
} }
@@ -180,7 +158,7 @@ static void nbd_parse_filename(const char *filename, QDict *options,
export_name[0] = 0; /* truncate 'file' */ export_name[0] = 0; /* truncate 'file' */
export_name += strlen(EN_OPTSTR); export_name += strlen(EN_OPTSTR);
qdict_put_str(options, "export", export_name); qdict_put(options, "export", qstring_from_str(export_name));
} }
/* extract the host_spec - fail if it's not nbd:... */ /* extract the host_spec - fail if it's not nbd:... */
@@ -195,19 +173,17 @@ static void nbd_parse_filename(const char *filename, QDict *options,
/* are we a UNIX or TCP socket? */ /* are we a UNIX or TCP socket? */
if (strstart(host_spec, "unix:", &unixpath)) { if (strstart(host_spec, "unix:", &unixpath)) {
qdict_put_str(options, "server.type", "unix"); qdict_put(options, "path", qstring_from_str(unixpath));
qdict_put_str(options, "server.path", unixpath);
} else { } else {
InetSocketAddress *addr = g_new(InetSocketAddress, 1); InetSocketAddress *addr = NULL;
if (inet_parse(addr, host_spec, errp)) { addr = inet_parse(host_spec, errp);
goto out_inet; if (!addr) {
goto out;
} }
qdict_put_str(options, "server.type", "inet"); qdict_put(options, "host", qstring_from_str(addr->host));
qdict_put_str(options, "server.host", addr->host); qdict_put(options, "port", qstring_from_str(addr->port));
qdict_put_str(options, "server.port", addr->port);
out_inet:
qapi_free_InetSocketAddress(addr); qapi_free_InetSocketAddress(addr);
} }
@@ -215,92 +191,51 @@ out:
g_free(file); g_free(file);
} }
static bool nbd_process_legacy_socket_options(QDict *output_options, static SocketAddress *nbd_config(BDRVNBDState *s, QemuOpts *opts, Error **errp)
QemuOpts *legacy_opts,
Error **errp)
{ {
const char *path = qemu_opt_get(legacy_opts, "path"); SocketAddress *saddr;
const char *host = qemu_opt_get(legacy_opts, "host");
const char *port = qemu_opt_get(legacy_opts, "port");
const QDictEntry *e;
if (!path && !host && !port) { s->path = g_strdup(qemu_opt_get(opts, "path"));
return true; s->host = g_strdup(qemu_opt_get(opts, "host"));
if (!s->path == !s->host) {
if (s->path) {
error_setg(errp, "path and host may not be used at the same time.");
} else {
error_setg(errp, "one of path and host must be specified.");
}
return NULL;
} }
for (e = qdict_first(output_options); e; e = qdict_next(output_options, e)) saddr = g_new0(SocketAddress, 1);
{
if (strstart(e->key, "server.", NULL)) { if (s->path) {
error_setg(errp, "Cannot use 'server' and path/host/port at the " UnixSocketAddress *q_unix;
"same time"); saddr->type = SOCKET_ADDRESS_KIND_UNIX;
return false; q_unix = saddr->u.q_unix.data = g_new0(UnixSocketAddress, 1);
q_unix->path = g_strdup(s->path);
} else {
InetSocketAddress *inet;
s->port = g_strdup(qemu_opt_get(opts, "port"));
saddr->type = SOCKET_ADDRESS_KIND_INET;
inet = saddr->u.inet.data = g_new0(InetSocketAddress, 1);
inet->host = g_strdup(s->host);
inet->port = g_strdup(s->port);
if (!inet->port) {
inet->port = g_strdup_printf("%d", NBD_DEFAULT_PORT);
} }
} }
if (path && host) { s->client.is_unix = saddr->type == SOCKET_ADDRESS_KIND_UNIX;
error_setg(errp, "path and host may not be used at the same time");
return false;
} else if (path) {
if (port) {
error_setg(errp, "port may not be used without host");
return false;
}
qdict_put_str(output_options, "server.type", "unix"); s->export = g_strdup(qemu_opt_get(opts, "export"));
qdict_put_str(output_options, "server.path", path);
} else if (host) {
qdict_put_str(output_options, "server.type", "inet");
qdict_put_str(output_options, "server.host", host);
qdict_put_str(output_options, "server.port",
port ?: stringify(NBD_DEFAULT_PORT));
}
return true;
}
static SocketAddress *nbd_config(BDRVNBDState *s, QDict *options,
Error **errp)
{
SocketAddress *saddr = NULL;
QDict *addr = NULL;
QObject *crumpled_addr = NULL;
Visitor *iv = NULL;
Error *local_err = NULL;
qdict_extract_subqdict(options, &addr, "server.");
if (!qdict_size(addr)) {
error_setg(errp, "NBD server address missing");
goto done;
}
crumpled_addr = qdict_crumple(addr, errp);
if (!crumpled_addr) {
goto done;
}
/*
* FIXME .numeric, .to, .ipv4 or .ipv6 don't work with -drive
* server.type=inet. .to doesn't matter, it's ignored anyway.
* That's because when @options come from -blockdev or
* blockdev_add, members are typed according to the QAPI schema,
* but when they come from -drive, they're all QString. The
* visitor expects the former.
*/
iv = qobject_input_visitor_new(crumpled_addr);
visit_type_SocketAddress(iv, NULL, &saddr, &local_err);
if (local_err) {
error_propagate(errp, local_err);
goto done;
}
done:
QDECREF(addr);
qobject_decref(crumpled_addr);
visit_free(iv);
return saddr; return saddr;
} }
NBDClientSession *nbd_get_client_session(BlockDriverState *bs) NbdClientSession *nbd_get_client_session(BlockDriverState *bs)
{ {
BDRVNBDState *s = bs->opaque; BDRVNBDState *s = bs->opaque;
return &s->client; return &s->client;
@@ -313,13 +248,11 @@ static QIOChannelSocket *nbd_establish_connection(SocketAddress *saddr,
Error *local_err = NULL; Error *local_err = NULL;
sioc = qio_channel_socket_new(); sioc = qio_channel_socket_new();
qio_channel_set_name(QIO_CHANNEL(sioc), "nbd-client");
qio_channel_socket_connect_sync(sioc, qio_channel_socket_connect_sync(sioc,
saddr, saddr,
&local_err); &local_err);
if (local_err) { if (local_err) {
object_unref(OBJECT(sioc));
error_propagate(errp, local_err); error_propagate(errp, local_err);
return NULL; return NULL;
} }
@@ -399,6 +332,7 @@ static int nbd_open(BlockDriverState *bs, QDict *options, int flags,
QemuOpts *opts = NULL; QemuOpts *opts = NULL;
Error *local_err = NULL; Error *local_err = NULL;
QIOChannelSocket *sioc = NULL; QIOChannelSocket *sioc = NULL;
SocketAddress *saddr = NULL;
QCryptoTLSCreds *tlscreds = NULL; QCryptoTLSCreds *tlscreds = NULL;
const char *hostname = NULL; const char *hostname = NULL;
int ret = -EINVAL; int ret = -EINVAL;
@@ -410,19 +344,12 @@ static int nbd_open(BlockDriverState *bs, QDict *options, int flags,
goto error; goto error;
} }
/* Translate @host, @port, and @path to a SocketAddress */
if (!nbd_process_legacy_socket_options(options, opts, errp)) {
goto error;
}
/* Pop the config into our state object. Exit if invalid. */ /* Pop the config into our state object. Exit if invalid. */
s->saddr = nbd_config(s, options, errp); saddr = nbd_config(s, opts, errp);
if (!s->saddr) { if (!saddr) {
goto error; goto error;
} }
s->export = g_strdup(qemu_opt_get(opts, "export"));
s->tlscredsid = g_strdup(qemu_opt_get(opts, "tls-creds")); s->tlscredsid = g_strdup(qemu_opt_get(opts, "tls-creds"));
if (s->tlscredsid) { if (s->tlscredsid) {
tlscreds = nbd_get_tls_creds(s->tlscredsid, errp); tlscreds = nbd_get_tls_creds(s->tlscredsid, errp);
@@ -430,18 +357,17 @@ static int nbd_open(BlockDriverState *bs, QDict *options, int flags,
goto error; goto error;
} }
/* TODO SOCKET_ADDRESS_KIND_FD where fd has AF_INET or AF_INET6 */ if (saddr->type != SOCKET_ADDRESS_KIND_INET) {
if (s->saddr->type != SOCKET_ADDRESS_TYPE_INET) {
error_setg(errp, "TLS only supported over IP sockets"); error_setg(errp, "TLS only supported over IP sockets");
goto error; goto error;
} }
hostname = s->saddr->u.inet.host; hostname = saddr->u.inet.data->host;
} }
/* establish TCP connection, return error if it fails /* establish TCP connection, return error if it fails
* TODO: Configurable retry-until-timeout behaviour. * TODO: Configurable retry-until-timeout behaviour.
*/ */
sioc = nbd_establish_connection(s->saddr, errp); sioc = nbd_establish_connection(saddr, errp);
if (!sioc) { if (!sioc) {
ret = -ECONNREFUSED; ret = -ECONNREFUSED;
goto error; goto error;
@@ -458,10 +384,13 @@ static int nbd_open(BlockDriverState *bs, QDict *options, int flags,
object_unref(OBJECT(tlscreds)); object_unref(OBJECT(tlscreds));
} }
if (ret < 0) { if (ret < 0) {
qapi_free_SocketAddress(s->saddr); g_free(s->path);
g_free(s->host);
g_free(s->port);
g_free(s->export); g_free(s->export);
g_free(s->tlscredsid); g_free(s->tlscredsid);
} }
qapi_free_SocketAddress(saddr);
qemu_opts_del(opts); qemu_opts_del(opts);
return ret; return ret;
} }
@@ -474,7 +403,6 @@ static int nbd_co_flush(BlockDriverState *bs)
static void nbd_refresh_limits(BlockDriverState *bs, Error **errp) static void nbd_refresh_limits(BlockDriverState *bs, Error **errp)
{ {
bs->bl.max_pdiscard = NBD_MAX_BUFFER_SIZE; bs->bl.max_pdiscard = NBD_MAX_BUFFER_SIZE;
bs->bl.max_pwrite_zeroes = NBD_MAX_BUFFER_SIZE;
bs->bl.max_transfer = NBD_MAX_BUFFER_SIZE; bs->bl.max_transfer = NBD_MAX_BUFFER_SIZE;
} }
@@ -484,7 +412,9 @@ static void nbd_close(BlockDriverState *bs)
nbd_client_close(bs); nbd_client_close(bs);
qapi_free_SocketAddress(s->saddr); g_free(s->path);
g_free(s->host);
g_free(s->port);
g_free(s->export); g_free(s->export);
g_free(s->tlscredsid); g_free(s->tlscredsid);
} }
@@ -511,50 +441,45 @@ static void nbd_refresh_filename(BlockDriverState *bs, QDict *options)
{ {
BDRVNBDState *s = bs->opaque; BDRVNBDState *s = bs->opaque;
QDict *opts = qdict_new(); QDict *opts = qdict_new();
QObject *saddr_qdict;
Visitor *ov;
const char *host = NULL, *port = NULL, *path = NULL;
if (s->saddr->type == SOCKET_ADDRESS_TYPE_INET) { qdict_put_obj(opts, "driver", QOBJECT(qstring_from_str("nbd")));
const InetSocketAddress *inet = &s->saddr->u.inet;
if (!inet->has_ipv4 && !inet->has_ipv6 && !inet->has_to) {
host = inet->host;
port = inet->port;
}
} else if (s->saddr->type == SOCKET_ADDRESS_TYPE_UNIX) {
path = s->saddr->u.q_unix.path;
} /* else can't represent as pseudo-filename */
qdict_put_str(opts, "driver", "nbd"); if (s->path && s->export) {
if (path && s->export) {
snprintf(bs->exact_filename, sizeof(bs->exact_filename), snprintf(bs->exact_filename, sizeof(bs->exact_filename),
"nbd+unix:///%s?socket=%s", s->export, path); "nbd+unix:///%s?socket=%s", s->export, s->path);
} else if (path && !s->export) { } else if (s->path && !s->export) {
snprintf(bs->exact_filename, sizeof(bs->exact_filename), snprintf(bs->exact_filename, sizeof(bs->exact_filename),
"nbd+unix://?socket=%s", path); "nbd+unix://?socket=%s", s->path);
} else if (host && s->export) { } else if (!s->path && s->export && s->port) {
snprintf(bs->exact_filename, sizeof(bs->exact_filename), snprintf(bs->exact_filename, sizeof(bs->exact_filename),
"nbd://%s:%s/%s", host, port, s->export); "nbd://%s:%s/%s", s->host, s->port, s->export);
} else if (host && !s->export) { } else if (!s->path && s->export && !s->port) {
snprintf(bs->exact_filename, sizeof(bs->exact_filename), snprintf(bs->exact_filename, sizeof(bs->exact_filename),
"nbd://%s:%s", host, port); "nbd://%s/%s", s->host, s->export);
} else if (!s->path && !s->export && s->port) {
snprintf(bs->exact_filename, sizeof(bs->exact_filename),
"nbd://%s:%s", s->host, s->port);
} else if (!s->path && !s->export && !s->port) {
snprintf(bs->exact_filename, sizeof(bs->exact_filename),
"nbd://%s", s->host);
} }
ov = qobject_output_visitor_new(&saddr_qdict); if (s->path) {
visit_type_SocketAddress(ov, NULL, &s->saddr, &error_abort); qdict_put_obj(opts, "path", QOBJECT(qstring_from_str(s->path)));
visit_complete(ov, &saddr_qdict); } else if (s->port) {
visit_free(ov); qdict_put_obj(opts, "host", QOBJECT(qstring_from_str(s->host)));
qdict_put_obj(opts, "server", saddr_qdict); qdict_put_obj(opts, "port", QOBJECT(qstring_from_str(s->port)));
} else {
qdict_put_obj(opts, "host", QOBJECT(qstring_from_str(s->host)));
}
if (s->export) { if (s->export) {
qdict_put_str(opts, "export", s->export); qdict_put_obj(opts, "export", QOBJECT(qstring_from_str(s->export)));
} }
if (s->tlscredsid) { if (s->tlscredsid) {
qdict_put_str(opts, "tls-creds", s->tlscredsid); qdict_put_obj(opts, "tls-creds",
QOBJECT(qstring_from_str(s->tlscredsid)));
} }
qdict_flatten(opts);
bs->full_open_options = opts; bs->full_open_options = opts;
} }
@@ -566,7 +491,6 @@ static BlockDriver bdrv_nbd = {
.bdrv_file_open = nbd_open, .bdrv_file_open = nbd_open,
.bdrv_co_preadv = nbd_client_co_preadv, .bdrv_co_preadv = nbd_client_co_preadv,
.bdrv_co_pwritev = nbd_client_co_pwritev, .bdrv_co_pwritev = nbd_client_co_pwritev,
.bdrv_co_pwrite_zeroes = nbd_client_co_pwrite_zeroes,
.bdrv_close = nbd_close, .bdrv_close = nbd_close,
.bdrv_co_flush_to_os = nbd_co_flush, .bdrv_co_flush_to_os = nbd_co_flush,
.bdrv_co_pdiscard = nbd_client_co_pdiscard, .bdrv_co_pdiscard = nbd_client_co_pdiscard,
@@ -585,7 +509,6 @@ static BlockDriver bdrv_nbd_tcp = {
.bdrv_file_open = nbd_open, .bdrv_file_open = nbd_open,
.bdrv_co_preadv = nbd_client_co_preadv, .bdrv_co_preadv = nbd_client_co_preadv,
.bdrv_co_pwritev = nbd_client_co_pwritev, .bdrv_co_pwritev = nbd_client_co_pwritev,
.bdrv_co_pwrite_zeroes = nbd_client_co_pwrite_zeroes,
.bdrv_close = nbd_close, .bdrv_close = nbd_close,
.bdrv_co_flush_to_os = nbd_co_flush, .bdrv_co_flush_to_os = nbd_co_flush,
.bdrv_co_pdiscard = nbd_client_co_pdiscard, .bdrv_co_pdiscard = nbd_client_co_pdiscard,
@@ -604,7 +527,6 @@ static BlockDriver bdrv_nbd_unix = {
.bdrv_file_open = nbd_open, .bdrv_file_open = nbd_open,
.bdrv_co_preadv = nbd_client_co_preadv, .bdrv_co_preadv = nbd_client_co_preadv,
.bdrv_co_pwritev = nbd_client_co_pwritev, .bdrv_co_pwritev = nbd_client_co_pwritev,
.bdrv_co_pwrite_zeroes = nbd_client_co_pwrite_zeroes,
.bdrv_close = nbd_close, .bdrv_close = nbd_close,
.bdrv_co_flush_to_os = nbd_co_flush, .bdrv_co_flush_to_os = nbd_co_flush,
.bdrv_co_pdiscard = nbd_client_co_pdiscard, .bdrv_co_pdiscard = nbd_client_co_pdiscard,

View File

@@ -35,15 +35,8 @@
#include "qemu/uri.h" #include "qemu/uri.h"
#include "qemu/cutils.h" #include "qemu/cutils.h"
#include "sysemu/sysemu.h" #include "sysemu/sysemu.h"
#include "qapi/qmp/qdict.h"
#include "qapi/qmp/qint.h"
#include "qapi/qmp/qstring.h"
#include "qapi-visit.h"
#include "qapi/qobject-input-visitor.h"
#include "qapi/qobject-output-visitor.h"
#include <nfsc/libnfs.h> #include <nfsc/libnfs.h>
#define QEMU_NFS_MAX_READAHEAD_SIZE 1048576 #define QEMU_NFS_MAX_READAHEAD_SIZE 1048576
#define QEMU_NFS_MAX_PAGECACHE_SIZE (8388608 / NFS_BLKSIZE) #define QEMU_NFS_MAX_PAGECACHE_SIZE (8388608 / NFS_BLKSIZE)
#define QEMU_NFS_MAX_DEBUG_LEVEL 2 #define QEMU_NFS_MAX_DEBUG_LEVEL 2
@@ -54,139 +47,23 @@ typedef struct NFSClient {
int events; int events;
bool has_zero_init; bool has_zero_init;
AioContext *aio_context; AioContext *aio_context;
QemuMutex mutex;
blkcnt_t st_blocks; blkcnt_t st_blocks;
bool cache_used; bool cache_used;
NFSServer *server;
char *path;
int64_t uid, gid, tcp_syncnt, readahead, pagecache, debug;
} NFSClient; } NFSClient;
typedef struct NFSRPC { typedef struct NFSRPC {
BlockDriverState *bs;
int ret; int ret;
int complete; int complete;
QEMUIOVector *iov; QEMUIOVector *iov;
struct stat *st; struct stat *st;
Coroutine *co; Coroutine *co;
QEMUBH *bh;
NFSClient *client; NFSClient *client;
} NFSRPC; } NFSRPC;
static int nfs_parse_uri(const char *filename, QDict *options, Error **errp)
{
URI *uri = NULL;
QueryParams *qp = NULL;
int ret = -EINVAL, i;
uri = uri_parse(filename);
if (!uri) {
error_setg(errp, "Invalid URI specified");
goto out;
}
if (strcmp(uri->scheme, "nfs") != 0) {
error_setg(errp, "URI scheme must be 'nfs'");
goto out;
}
if (!uri->server) {
error_setg(errp, "missing hostname in URI");
goto out;
}
if (!uri->path) {
error_setg(errp, "missing file path in URI");
goto out;
}
qp = query_params_parse(uri->query);
if (!qp) {
error_setg(errp, "could not parse query parameters");
goto out;
}
qdict_put_str(options, "server.host", uri->server);
qdict_put_str(options, "server.type", "inet");
qdict_put_str(options, "path", uri->path);
for (i = 0; i < qp->n; i++) {
unsigned long long val;
if (!qp->p[i].value) {
error_setg(errp, "Value for NFS parameter expected: %s",
qp->p[i].name);
goto out;
}
if (parse_uint_full(qp->p[i].value, &val, 0)) {
error_setg(errp, "Illegal value for NFS parameter: %s",
qp->p[i].name);
goto out;
}
if (!strcmp(qp->p[i].name, "uid")) {
qdict_put_str(options, "user", qp->p[i].value);
} else if (!strcmp(qp->p[i].name, "gid")) {
qdict_put_str(options, "group", qp->p[i].value);
} else if (!strcmp(qp->p[i].name, "tcp-syncnt")) {
qdict_put_str(options, "tcp-syn-count", qp->p[i].value);
} else if (!strcmp(qp->p[i].name, "readahead")) {
qdict_put_str(options, "readahead-size", qp->p[i].value);
} else if (!strcmp(qp->p[i].name, "pagecache")) {
qdict_put_str(options, "page-cache-size", qp->p[i].value);
} else if (!strcmp(qp->p[i].name, "debug")) {
qdict_put_str(options, "debug", qp->p[i].value);
} else {
error_setg(errp, "Unknown NFS parameter name: %s",
qp->p[i].name);
goto out;
}
}
ret = 0;
out:
if (qp) {
query_params_free(qp);
}
if (uri) {
uri_free(uri);
}
return ret;
}
static bool nfs_has_filename_options_conflict(QDict *options, Error **errp)
{
const QDictEntry *qe;
for (qe = qdict_first(options); qe; qe = qdict_next(options, qe)) {
if (!strcmp(qe->key, "host") ||
!strcmp(qe->key, "path") ||
!strcmp(qe->key, "user") ||
!strcmp(qe->key, "group") ||
!strcmp(qe->key, "tcp-syn-count") ||
!strcmp(qe->key, "readahead-size") ||
!strcmp(qe->key, "page-cache-size") ||
!strcmp(qe->key, "debug") ||
strstart(qe->key, "server.", NULL))
{
error_setg(errp, "Option %s cannot be used with a filename",
qe->key);
return true;
}
}
return false;
}
static void nfs_parse_filename(const char *filename, QDict *options,
Error **errp)
{
if (nfs_has_filename_options_conflict(options, errp)) {
return;
}
nfs_parse_uri(filename, options, errp);
}
static void nfs_process_read(void *arg); static void nfs_process_read(void *arg);
static void nfs_process_write(void *arg); static void nfs_process_write(void *arg);
/* Called with QemuMutex held. */
static void nfs_set_events(NFSClient *client) static void nfs_set_events(NFSClient *client)
{ {
int ev = nfs_which_events(client->context); int ev = nfs_which_events(client->context);
@@ -194,8 +71,7 @@ static void nfs_set_events(NFSClient *client)
aio_set_fd_handler(client->aio_context, nfs_get_fd(client->context), aio_set_fd_handler(client->aio_context, nfs_get_fd(client->context),
false, false,
(ev & POLLIN) ? nfs_process_read : NULL, (ev & POLLIN) ? nfs_process_read : NULL,
(ev & POLLOUT) ? nfs_process_write : NULL, (ev & POLLOUT) ? nfs_process_write : NULL, client);
NULL, client);
} }
client->events = ev; client->events = ev;
@@ -204,48 +80,39 @@ static void nfs_set_events(NFSClient *client)
static void nfs_process_read(void *arg) static void nfs_process_read(void *arg)
{ {
NFSClient *client = arg; NFSClient *client = arg;
qemu_mutex_lock(&client->mutex);
nfs_service(client->context, POLLIN); nfs_service(client->context, POLLIN);
nfs_set_events(client); nfs_set_events(client);
qemu_mutex_unlock(&client->mutex);
} }
static void nfs_process_write(void *arg) static void nfs_process_write(void *arg)
{ {
NFSClient *client = arg; NFSClient *client = arg;
qemu_mutex_lock(&client->mutex);
nfs_service(client->context, POLLOUT); nfs_service(client->context, POLLOUT);
nfs_set_events(client); nfs_set_events(client);
qemu_mutex_unlock(&client->mutex);
} }
static void nfs_co_init_task(BlockDriverState *bs, NFSRPC *task) static void nfs_co_init_task(NFSClient *client, NFSRPC *task)
{ {
*task = (NFSRPC) { *task = (NFSRPC) {
.co = qemu_coroutine_self(), .co = qemu_coroutine_self(),
.bs = bs, .client = client,
.client = bs->opaque,
}; };
} }
static void nfs_co_generic_bh_cb(void *opaque) static void nfs_co_generic_bh_cb(void *opaque)
{ {
NFSRPC *task = opaque; NFSRPC *task = opaque;
task->complete = 1; task->complete = 1;
aio_co_wake(task->co); qemu_bh_delete(task->bh);
qemu_coroutine_enter(task->co);
} }
/* Called (via nfs_service) with QemuMutex held. */
static void static void
nfs_co_generic_cb(int ret, struct nfs_context *nfs, void *data, nfs_co_generic_cb(int ret, struct nfs_context *nfs, void *data,
void *private_data) void *private_data)
{ {
NFSRPC *task = private_data; NFSRPC *task = private_data;
task->ret = ret; task->ret = ret;
assert(!task->st);
if (task->ret > 0 && task->iov) { if (task->ret > 0 && task->iov) {
if (task->ret <= task->iov->size) { if (task->ret <= task->iov->size) {
qemu_iovec_from_buf(task->iov, 0, data, task->ret); qemu_iovec_from_buf(task->iov, 0, data, task->ret);
@@ -253,33 +120,40 @@ nfs_co_generic_cb(int ret, struct nfs_context *nfs, void *data,
task->ret = -EIO; task->ret = -EIO;
} }
} }
if (task->ret == 0 && task->st) {
memcpy(task->st, data, sizeof(struct stat));
}
if (task->ret < 0) { if (task->ret < 0) {
error_report("NFS Error: %s", nfs_get_error(nfs)); error_report("NFS Error: %s", nfs_get_error(nfs));
} }
aio_bh_schedule_oneshot(task->client->aio_context, if (task->co) {
task->bh = aio_bh_new(task->client->aio_context,
nfs_co_generic_bh_cb, task); nfs_co_generic_bh_cb, task);
qemu_bh_schedule(task->bh);
} else {
task->complete = 1;
}
} }
static int coroutine_fn nfs_co_preadv(BlockDriverState *bs, uint64_t offset, static int coroutine_fn nfs_co_readv(BlockDriverState *bs,
uint64_t bytes, QEMUIOVector *iov, int64_t sector_num, int nb_sectors,
int flags) QEMUIOVector *iov)
{ {
NFSClient *client = bs->opaque; NFSClient *client = bs->opaque;
NFSRPC task; NFSRPC task;
nfs_co_init_task(bs, &task); nfs_co_init_task(client, &task);
task.iov = iov; task.iov = iov;
qemu_mutex_lock(&client->mutex);
if (nfs_pread_async(client->context, client->fh, if (nfs_pread_async(client->context, client->fh,
offset, bytes, nfs_co_generic_cb, &task) != 0) { sector_num * BDRV_SECTOR_SIZE,
qemu_mutex_unlock(&client->mutex); nb_sectors * BDRV_SECTOR_SIZE,
nfs_co_generic_cb, &task) != 0) {
return -ENOMEM; return -ENOMEM;
} }
nfs_set_events(client);
qemu_mutex_unlock(&client->mutex);
while (!task.complete) { while (!task.complete) {
nfs_set_events(client);
qemu_coroutine_yield(); qemu_coroutine_yield();
} }
@@ -295,50 +169,39 @@ static int coroutine_fn nfs_co_preadv(BlockDriverState *bs, uint64_t offset,
return 0; return 0;
} }
static int coroutine_fn nfs_co_pwritev(BlockDriverState *bs, uint64_t offset, static int coroutine_fn nfs_co_writev(BlockDriverState *bs,
uint64_t bytes, QEMUIOVector *iov, int64_t sector_num, int nb_sectors,
int flags) QEMUIOVector *iov)
{ {
NFSClient *client = bs->opaque; NFSClient *client = bs->opaque;
NFSRPC task; NFSRPC task;
char *buf = NULL; char *buf = NULL;
bool my_buffer = false;
nfs_co_init_task(bs, &task); nfs_co_init_task(client, &task);
if (iov->niov != 1) { buf = g_try_malloc(nb_sectors * BDRV_SECTOR_SIZE);
buf = g_try_malloc(bytes); if (nb_sectors && buf == NULL) {
if (bytes && buf == NULL) {
return -ENOMEM; return -ENOMEM;
} }
qemu_iovec_to_buf(iov, 0, buf, bytes);
my_buffer = true;
} else {
buf = iov->iov[0].iov_base;
}
qemu_mutex_lock(&client->mutex); qemu_iovec_to_buf(iov, 0, buf, nb_sectors * BDRV_SECTOR_SIZE);
if (nfs_pwrite_async(client->context, client->fh, if (nfs_pwrite_async(client->context, client->fh,
offset, bytes, buf, sector_num * BDRV_SECTOR_SIZE,
nfs_co_generic_cb, &task) != 0) { nb_sectors * BDRV_SECTOR_SIZE,
qemu_mutex_unlock(&client->mutex); buf, nfs_co_generic_cb, &task) != 0) {
if (my_buffer) {
g_free(buf); g_free(buf);
}
return -ENOMEM; return -ENOMEM;
} }
nfs_set_events(client);
qemu_mutex_unlock(&client->mutex);
while (!task.complete) { while (!task.complete) {
nfs_set_events(client);
qemu_coroutine_yield(); qemu_coroutine_yield();
} }
if (my_buffer) {
g_free(buf); g_free(buf);
}
if (task.ret != bytes) { if (task.ret != nb_sectors * BDRV_SECTOR_SIZE) {
return task.ret < 0 ? task.ret : -EIO; return task.ret < 0 ? task.ret : -EIO;
} }
@@ -350,62 +213,30 @@ static int coroutine_fn nfs_co_flush(BlockDriverState *bs)
NFSClient *client = bs->opaque; NFSClient *client = bs->opaque;
NFSRPC task; NFSRPC task;
nfs_co_init_task(bs, &task); nfs_co_init_task(client, &task);
qemu_mutex_lock(&client->mutex);
if (nfs_fsync_async(client->context, client->fh, nfs_co_generic_cb, if (nfs_fsync_async(client->context, client->fh, nfs_co_generic_cb,
&task) != 0) { &task) != 0) {
qemu_mutex_unlock(&client->mutex);
return -ENOMEM; return -ENOMEM;
} }
nfs_set_events(client);
qemu_mutex_unlock(&client->mutex);
while (!task.complete) { while (!task.complete) {
nfs_set_events(client);
qemu_coroutine_yield(); qemu_coroutine_yield();
} }
return task.ret; return task.ret;
} }
/* TODO Convert to fine grained options */
static QemuOptsList runtime_opts = { static QemuOptsList runtime_opts = {
.name = "nfs", .name = "nfs",
.head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head), .head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
.desc = { .desc = {
{ {
.name = "path", .name = "filename",
.type = QEMU_OPT_STRING, .type = QEMU_OPT_STRING,
.help = "Path of the image on the host", .help = "URL to the NFS file",
},
{
.name = "user",
.type = QEMU_OPT_NUMBER,
.help = "UID value to use when talking to the server",
},
{
.name = "group",
.type = QEMU_OPT_NUMBER,
.help = "GID value to use when talking to the server",
},
{
.name = "tcp-syn-count",
.type = QEMU_OPT_NUMBER,
.help = "Number of SYNs to send during the session establish",
},
{
.name = "readahead-size",
.type = QEMU_OPT_NUMBER,
.help = "Set the readahead size in bytes",
},
{
.name = "page-cache-size",
.type = QEMU_OPT_NUMBER,
.help = "Set the pagecache size in bytes",
},
{
.name = "debug",
.type = QEMU_OPT_NUMBER,
.help = "Set the NFS debug level (max 2)",
}, },
{ /* end of list */ } { /* end of list */ }
}, },
@@ -416,7 +247,7 @@ static void nfs_detach_aio_context(BlockDriverState *bs)
NFSClient *client = bs->opaque; NFSClient *client = bs->opaque;
aio_set_fd_handler(client->aio_context, nfs_get_fd(client->context), aio_set_fd_handler(client->aio_context, nfs_get_fd(client->context),
false, NULL, NULL, NULL, NULL); false, NULL, NULL, NULL);
client->events = 0; client->events = 0;
} }
@@ -436,7 +267,7 @@ static void nfs_client_close(NFSClient *client)
nfs_close(client->context, client->fh); nfs_close(client->context, client->fh);
} }
aio_set_fd_handler(client->aio_context, nfs_get_fd(client->context), aio_set_fd_handler(client->aio_context, nfs_get_fd(client->context),
false, NULL, NULL, NULL, NULL); false, NULL, NULL, NULL);
nfs_destroy_context(client->context); nfs_destroy_context(client->context);
} }
memset(client, 0, sizeof(NFSClient)); memset(client, 0, sizeof(NFSClient));
@@ -446,75 +277,27 @@ static void nfs_file_close(BlockDriverState *bs)
{ {
NFSClient *client = bs->opaque; NFSClient *client = bs->opaque;
nfs_client_close(client); nfs_client_close(client);
qemu_mutex_destroy(&client->mutex);
} }
static NFSServer *nfs_config(QDict *options, Error **errp) static int64_t nfs_client_open(NFSClient *client, const char *filename,
int flags, Error **errp, int open_flags)
{ {
NFSServer *server = NULL; int ret = -EINVAL, i;
QDict *addr = NULL;
QObject *crumpled_addr = NULL;
Visitor *iv = NULL;
Error *local_error = NULL;
qdict_extract_subqdict(options, &addr, "server.");
if (!qdict_size(addr)) {
error_setg(errp, "NFS server address missing");
goto out;
}
crumpled_addr = qdict_crumple(addr, errp);
if (!crumpled_addr) {
goto out;
}
/*
* Caution: this works only because all scalar members of
* NFSServer are QString in @crumpled_addr. The visitor expects
* @crumpled_addr to be typed according to the QAPI schema. It
* is when @options come from -blockdev or blockdev_add. But when
* they come from -drive, they're all QString.
*/
iv = qobject_input_visitor_new(crumpled_addr);
visit_type_NFSServer(iv, NULL, &server, &local_error);
if (local_error) {
error_propagate(errp, local_error);
goto out;
}
out:
QDECREF(addr);
qobject_decref(crumpled_addr);
visit_free(iv);
return server;
}
static int64_t nfs_client_open(NFSClient *client, QDict *options,
int flags, int open_flags, Error **errp)
{
int ret = -EINVAL;
QemuOpts *opts = NULL;
Error *local_err = NULL;
struct stat st; struct stat st;
URI *uri;
QueryParams *qp = NULL;
char *file = NULL, *strp = NULL; char *file = NULL, *strp = NULL;
opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort); uri = uri_parse(filename);
qemu_opts_absorb_qdict(opts, options, &local_err); if (!uri) {
if (local_err) { error_setg(errp, "Invalid URL specified");
error_propagate(errp, local_err);
ret = -EINVAL;
goto fail; goto fail;
} }
if (!uri->server) {
client->path = g_strdup(qemu_opt_get(opts, "path")); error_setg(errp, "Invalid URL specified");
if (!client->path) {
ret = -EINVAL;
error_setg(errp, "No path was specified");
goto fail; goto fail;
} }
strp = strrchr(uri->path, '/');
strp = strrchr(client->path, '/');
if (strp == NULL) { if (strp == NULL) {
error_setg(errp, "Invalid URL specified"); error_setg(errp, "Invalid URL specified");
goto fail; goto fail;
@@ -522,89 +305,85 @@ static int64_t nfs_client_open(NFSClient *client, QDict *options,
file = g_strdup(strp); file = g_strdup(strp);
*strp = 0; *strp = 0;
/* Pop the config into our state object, Exit if invalid */
client->server = nfs_config(options, errp);
if (!client->server) {
ret = -EINVAL;
goto fail;
}
client->context = nfs_init_context(); client->context = nfs_init_context();
if (client->context == NULL) { if (client->context == NULL) {
error_setg(errp, "Failed to init NFS context"); error_setg(errp, "Failed to init NFS context");
goto fail; goto fail;
} }
if (qemu_opt_get(opts, "user")) { qp = query_params_parse(uri->query);
client->uid = qemu_opt_get_number(opts, "user", 0); for (i = 0; i < qp->n; i++) {
nfs_set_uid(client->context, client->uid); unsigned long long val;
if (!qp->p[i].value) {
error_setg(errp, "Value for NFS parameter expected: %s",
qp->p[i].name);
goto fail;
} }
if (parse_uint_full(qp->p[i].value, &val, 0)) {
if (qemu_opt_get(opts, "group")) { error_setg(errp, "Illegal value for NFS parameter: %s",
client->gid = qemu_opt_get_number(opts, "group", 0); qp->p[i].name);
nfs_set_gid(client->context, client->gid); goto fail;
} }
if (!strcmp(qp->p[i].name, "uid")) {
if (qemu_opt_get(opts, "tcp-syn-count")) { nfs_set_uid(client->context, val);
client->tcp_syncnt = qemu_opt_get_number(opts, "tcp-syn-count", 0); } else if (!strcmp(qp->p[i].name, "gid")) {
nfs_set_tcp_syncnt(client->context, client->tcp_syncnt); nfs_set_gid(client->context, val);
} } else if (!strcmp(qp->p[i].name, "tcp-syncnt")) {
nfs_set_tcp_syncnt(client->context, val);
#ifdef LIBNFS_FEATURE_READAHEAD #ifdef LIBNFS_FEATURE_READAHEAD
if (qemu_opt_get(opts, "readahead-size")) { } else if (!strcmp(qp->p[i].name, "readahead")) {
if (open_flags & BDRV_O_NOCACHE) { if (open_flags & BDRV_O_NOCACHE) {
error_setg(errp, "Cannot enable NFS readahead " error_setg(errp, "Cannot enable NFS readahead "
"if cache.direct = on"); "if cache.direct = on");
goto fail; goto fail;
} }
client->readahead = qemu_opt_get_number(opts, "readahead-size", 0); if (val > QEMU_NFS_MAX_READAHEAD_SIZE) {
if (client->readahead > QEMU_NFS_MAX_READAHEAD_SIZE) {
error_report("NFS Warning: Truncating NFS readahead" error_report("NFS Warning: Truncating NFS readahead"
" size to %d", QEMU_NFS_MAX_READAHEAD_SIZE); " size to %d", QEMU_NFS_MAX_READAHEAD_SIZE);
client->readahead = QEMU_NFS_MAX_READAHEAD_SIZE; val = QEMU_NFS_MAX_READAHEAD_SIZE;
} }
nfs_set_readahead(client->context, client->readahead); nfs_set_readahead(client->context, val);
#ifdef LIBNFS_FEATURE_PAGECACHE #ifdef LIBNFS_FEATURE_PAGECACHE
nfs_set_pagecache_ttl(client->context, 0); nfs_set_pagecache_ttl(client->context, 0);
#endif #endif
client->cache_used = true; client->cache_used = true;
}
#endif #endif
#ifdef LIBNFS_FEATURE_PAGECACHE #ifdef LIBNFS_FEATURE_PAGECACHE
if (qemu_opt_get(opts, "page-cache-size")) { nfs_set_pagecache_ttl(client->context, 0);
} else if (!strcmp(qp->p[i].name, "pagecache")) {
if (open_flags & BDRV_O_NOCACHE) { if (open_flags & BDRV_O_NOCACHE) {
error_setg(errp, "Cannot enable NFS pagecache " error_setg(errp, "Cannot enable NFS pagecache "
"if cache.direct = on"); "if cache.direct = on");
goto fail; goto fail;
} }
client->pagecache = qemu_opt_get_number(opts, "page-cache-size", 0); if (val > QEMU_NFS_MAX_PAGECACHE_SIZE) {
if (client->pagecache > QEMU_NFS_MAX_PAGECACHE_SIZE) {
error_report("NFS Warning: Truncating NFS pagecache" error_report("NFS Warning: Truncating NFS pagecache"
" size to %d pages", QEMU_NFS_MAX_PAGECACHE_SIZE); " size to %d pages", QEMU_NFS_MAX_PAGECACHE_SIZE);
client->pagecache = QEMU_NFS_MAX_PAGECACHE_SIZE; val = QEMU_NFS_MAX_PAGECACHE_SIZE;
} }
nfs_set_pagecache(client->context, client->pagecache); nfs_set_pagecache(client->context, val);
nfs_set_pagecache_ttl(client->context, 0); nfs_set_pagecache_ttl(client->context, 0);
client->cache_used = true; client->cache_used = true;
}
#endif #endif
#ifdef LIBNFS_FEATURE_DEBUG #ifdef LIBNFS_FEATURE_DEBUG
if (qemu_opt_get(opts, "debug")) { } else if (!strcmp(qp->p[i].name, "debug")) {
client->debug = qemu_opt_get_number(opts, "debug", 0);
/* limit the maximum debug level to avoid potential flooding /* limit the maximum debug level to avoid potential flooding
* of our log files. */ * of our log files. */
if (client->debug > QEMU_NFS_MAX_DEBUG_LEVEL) { if (val > QEMU_NFS_MAX_DEBUG_LEVEL) {
error_report("NFS Warning: Limiting NFS debug level" error_report("NFS Warning: Limiting NFS debug level"
" to %d", QEMU_NFS_MAX_DEBUG_LEVEL); " to %d", QEMU_NFS_MAX_DEBUG_LEVEL);
client->debug = QEMU_NFS_MAX_DEBUG_LEVEL; val = QEMU_NFS_MAX_DEBUG_LEVEL;
}
nfs_set_debug(client->context, client->debug);
} }
nfs_set_debug(client->context, val);
#endif #endif
} else {
error_setg(errp, "Unknown NFS parameter name: %s",
qp->p[i].name);
goto fail;
}
}
ret = nfs_mount(client->context, client->server->host, client->path); ret = nfs_mount(client->context, uri->server, uri->path);
if (ret < 0) { if (ret < 0) {
error_setg(errp, "Failed to mount nfs share: %s", error_setg(errp, "Failed to mount nfs share: %s",
nfs_get_error(client->context)); nfs_get_error(client->context));
@@ -637,13 +416,14 @@ static int64_t nfs_client_open(NFSClient *client, QDict *options,
ret = DIV_ROUND_UP(st.st_size, BDRV_SECTOR_SIZE); ret = DIV_ROUND_UP(st.st_size, BDRV_SECTOR_SIZE);
client->st_blocks = st.st_blocks; client->st_blocks = st.st_blocks;
client->has_zero_init = S_ISREG(st.st_mode); client->has_zero_init = S_ISREG(st.st_mode);
*strp = '/';
goto out; goto out;
fail: fail:
nfs_client_close(client); nfs_client_close(client);
out: out:
qemu_opts_del(opts); if (qp) {
query_params_free(qp);
}
uri_free(uri);
g_free(file); g_free(file);
return ret; return ret;
} }
@@ -652,18 +432,28 @@ static int nfs_file_open(BlockDriverState *bs, QDict *options, int flags,
Error **errp) { Error **errp) {
NFSClient *client = bs->opaque; NFSClient *client = bs->opaque;
int64_t ret; int64_t ret;
QemuOpts *opts;
Error *local_err = NULL;
client->aio_context = bdrv_get_aio_context(bs); client->aio_context = bdrv_get_aio_context(bs);
ret = nfs_client_open(client, options, opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort);
(flags & BDRV_O_RDWR) ? O_RDWR : O_RDONLY, qemu_opts_absorb_qdict(opts, options, &local_err);
bs->open_flags, errp); if (local_err) {
if (ret < 0) { error_propagate(errp, local_err);
return ret; ret = -EINVAL;
goto out;
}
ret = nfs_client_open(client, qemu_opt_get(opts, "filename"),
(flags & BDRV_O_RDWR) ? O_RDWR : O_RDONLY,
errp, bs->open_flags);
if (ret < 0) {
goto out;
} }
qemu_mutex_init(&client->mutex);
bs->total_sectors = ret; bs->total_sectors = ret;
ret = 0; ret = 0;
out:
qemu_opts_del(opts);
return ret; return ret;
} }
@@ -685,7 +475,6 @@ static int nfs_file_create(const char *url, QemuOpts *opts, Error **errp)
int ret = 0; int ret = 0;
int64_t total_size = 0; int64_t total_size = 0;
NFSClient *client = g_new0(NFSClient, 1); NFSClient *client = g_new0(NFSClient, 1);
QDict *options = NULL;
client->aio_context = qemu_get_aio_context(); client->aio_context = qemu_get_aio_context();
@@ -693,20 +482,13 @@ static int nfs_file_create(const char *url, QemuOpts *opts, Error **errp)
total_size = ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0), total_size = ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0),
BDRV_SECTOR_SIZE); BDRV_SECTOR_SIZE);
options = qdict_new(); ret = nfs_client_open(client, url, O_CREAT, errp, 0);
ret = nfs_parse_uri(url, options, errp);
if (ret < 0) {
goto out;
}
ret = nfs_client_open(client, options, O_CREAT, 0, errp);
if (ret < 0) { if (ret < 0) {
goto out; goto out;
} }
ret = nfs_ftruncate(client->context, client->fh, total_size); ret = nfs_ftruncate(client->context, client->fh, total_size);
nfs_client_close(client); nfs_client_close(client);
out: out:
QDECREF(options);
g_free(client); g_free(client);
return ret; return ret;
} }
@@ -717,25 +499,6 @@ static int nfs_has_zero_init(BlockDriverState *bs)
return client->has_zero_init; return client->has_zero_init;
} }
/* Called (via nfs_service) with QemuMutex held. */
static void
nfs_get_allocated_file_size_cb(int ret, struct nfs_context *nfs, void *data,
void *private_data)
{
NFSRPC *task = private_data;
task->ret = ret;
if (task->ret == 0) {
memcpy(task->st, data, sizeof(struct stat));
}
if (task->ret < 0) {
error_report("NFS Error: %s", nfs_get_error(nfs));
}
/* Set task->complete before reading bs->wakeup. */
atomic_mb_set(&task->complete, 1);
bdrv_wakeup(task->bs);
}
static int64_t nfs_get_allocated_file_size(BlockDriverState *bs) static int64_t nfs_get_allocated_file_size(BlockDriverState *bs)
{ {
NFSClient *client = bs->opaque; NFSClient *client = bs->opaque;
@@ -747,31 +510,24 @@ static int64_t nfs_get_allocated_file_size(BlockDriverState *bs)
return client->st_blocks * 512; return client->st_blocks * 512;
} }
task.bs = bs;
task.st = &st; task.st = &st;
if (nfs_fstat_async(client->context, client->fh, nfs_get_allocated_file_size_cb, if (nfs_fstat_async(client->context, client->fh, nfs_co_generic_cb,
&task) != 0) { &task) != 0) {
return -ENOMEM; return -ENOMEM;
} }
while (!task.complete) {
nfs_set_events(client); nfs_set_events(client);
BDRV_POLL_WHILE(bs, !task.complete); aio_poll(client->aio_context, true);
}
return (task.ret < 0 ? task.ret : st.st_blocks * 512); return (task.ret < 0 ? task.ret : st.st_blocks * 512);
} }
static int nfs_file_truncate(BlockDriverState *bs, int64_t offset, Error **errp) static int nfs_file_truncate(BlockDriverState *bs, int64_t offset)
{ {
NFSClient *client = bs->opaque; NFSClient *client = bs->opaque;
int ret; return nfs_ftruncate(client->context, client->fh, offset);
ret = nfs_ftruncate(client->context, client->fh, offset);
if (ret < 0) {
error_setg_errno(errp, -ret, "Failed to truncate file");
return ret;
}
return 0;
} }
/* Note that this will not re-establish a connection with the NFS server /* Note that this will not re-establish a connection with the NFS server
@@ -808,62 +564,6 @@ static int nfs_reopen_prepare(BDRVReopenState *state,
return 0; return 0;
} }
static void nfs_refresh_filename(BlockDriverState *bs, QDict *options)
{
NFSClient *client = bs->opaque;
QDict *opts = qdict_new();
QObject *server_qdict;
Visitor *ov;
qdict_put_str(opts, "driver", "nfs");
if (client->uid && !client->gid) {
snprintf(bs->exact_filename, sizeof(bs->exact_filename),
"nfs://%s%s?uid=%" PRId64, client->server->host, client->path,
client->uid);
} else if (!client->uid && client->gid) {
snprintf(bs->exact_filename, sizeof(bs->exact_filename),
"nfs://%s%s?gid=%" PRId64, client->server->host, client->path,
client->gid);
} else if (client->uid && client->gid) {
snprintf(bs->exact_filename, sizeof(bs->exact_filename),
"nfs://%s%s?uid=%" PRId64 "&gid=%" PRId64,
client->server->host, client->path, client->uid, client->gid);
} else {
snprintf(bs->exact_filename, sizeof(bs->exact_filename),
"nfs://%s%s", client->server->host, client->path);
}
ov = qobject_output_visitor_new(&server_qdict);
visit_type_NFSServer(ov, NULL, &client->server, &error_abort);
visit_complete(ov, &server_qdict);
qdict_put_obj(opts, "server", server_qdict);
qdict_put_str(opts, "path", client->path);
if (client->uid) {
qdict_put_int(opts, "user", client->uid);
}
if (client->gid) {
qdict_put_int(opts, "group", client->gid);
}
if (client->tcp_syncnt) {
qdict_put_int(opts, "tcp-syn-cnt", client->tcp_syncnt);
}
if (client->readahead) {
qdict_put_int(opts, "readahead-size", client->readahead);
}
if (client->pagecache) {
qdict_put_int(opts, "page-cache-size", client->pagecache);
}
if (client->debug) {
qdict_put_int(opts, "debug", client->debug);
}
visit_free(ov);
qdict_flatten(opts);
bs->full_open_options = opts;
}
#ifdef LIBNFS_FEATURE_PAGECACHE #ifdef LIBNFS_FEATURE_PAGECACHE
static void nfs_invalidate_cache(BlockDriverState *bs, static void nfs_invalidate_cache(BlockDriverState *bs,
Error **errp) Error **errp)
@@ -878,7 +578,7 @@ static BlockDriver bdrv_nfs = {
.protocol_name = "nfs", .protocol_name = "nfs",
.instance_size = sizeof(NFSClient), .instance_size = sizeof(NFSClient),
.bdrv_parse_filename = nfs_parse_filename, .bdrv_needs_filename = true,
.create_opts = &nfs_create_opts, .create_opts = &nfs_create_opts,
.bdrv_has_zero_init = nfs_has_zero_init, .bdrv_has_zero_init = nfs_has_zero_init,
@@ -890,13 +590,12 @@ static BlockDriver bdrv_nfs = {
.bdrv_create = nfs_file_create, .bdrv_create = nfs_file_create,
.bdrv_reopen_prepare = nfs_reopen_prepare, .bdrv_reopen_prepare = nfs_reopen_prepare,
.bdrv_co_preadv = nfs_co_preadv, .bdrv_co_readv = nfs_co_readv,
.bdrv_co_pwritev = nfs_co_pwritev, .bdrv_co_writev = nfs_co_writev,
.bdrv_co_flush_to_disk = nfs_co_flush, .bdrv_co_flush_to_disk = nfs_co_flush,
.bdrv_detach_aio_context = nfs_detach_aio_context, .bdrv_detach_aio_context = nfs_detach_aio_context,
.bdrv_attach_aio_context = nfs_attach_aio_context, .bdrv_attach_aio_context = nfs_attach_aio_context,
.bdrv_refresh_filename = nfs_refresh_filename,
#ifdef LIBNFS_FEATURE_PAGECACHE #ifdef LIBNFS_FEATURE_PAGECACHE
.bdrv_invalidate_cache = nfs_invalidate_cache, .bdrv_invalidate_cache = nfs_invalidate_cache,

View File

@@ -124,6 +124,7 @@ static coroutine_fn int null_co_flush(BlockDriverState *bs)
typedef struct { typedef struct {
BlockAIOCB common; BlockAIOCB common;
QEMUBH *bh;
QEMUTimer timer; QEMUTimer timer;
} NullAIOCB; } NullAIOCB;
@@ -135,6 +136,7 @@ static void null_bh_cb(void *opaque)
{ {
NullAIOCB *acb = opaque; NullAIOCB *acb = opaque;
acb->common.cb(acb->common.opaque, 0); acb->common.cb(acb->common.opaque, 0);
qemu_bh_delete(acb->bh);
qemu_aio_unref(acb); qemu_aio_unref(acb);
} }
@@ -162,7 +164,8 @@ static inline BlockAIOCB *null_aio_common(BlockDriverState *bs,
timer_mod_ns(&acb->timer, timer_mod_ns(&acb->timer,
qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + s->latency_ns); qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + s->latency_ns);
} else { } else {
aio_bh_schedule_oneshot(bdrv_get_aio_context(bs), null_bh_cb, acb); acb->bh = aio_bh_new(bdrv_get_aio_context(bs), null_bh_cb, acb);
qemu_bh_schedule(acb->bh);
} }
return &acb->common; return &acb->common;
} }
@@ -232,7 +235,7 @@ static void null_refresh_filename(BlockDriverState *bs, QDict *opts)
bs->drv->format_name); bs->drv->format_name);
} }
qdict_put_str(opts, "driver", bs->drv->format_name); qdict_put(opts, "driver", qstring_from_str(bs->drv->format_name));
bs->full_open_options = opts; bs->full_open_options = opts;
} }

View File

@@ -114,7 +114,7 @@ static QemuOptsList parallels_runtime_opts = {
.name = PARALLELS_OPT_PREALLOC_SIZE, .name = PARALLELS_OPT_PREALLOC_SIZE,
.type = QEMU_OPT_SIZE, .type = QEMU_OPT_SIZE,
.help = "Preallocation size on image expansion", .help = "Preallocation size on image expansion",
.def_value_str = "128M", .def_value_str = "128MiB",
}, },
{ {
.name = PARALLELS_OPT_PREALLOC_MODE, .name = PARALLELS_OPT_PREALLOC_MODE,
@@ -192,7 +192,8 @@ static int64_t allocate_clusters(BlockDriverState *bs, int64_t sector_num,
int nb_sectors, int *pnum) int nb_sectors, int *pnum)
{ {
BDRVParallelsState *s = bs->opaque; BDRVParallelsState *s = bs->opaque;
int64_t pos, space, idx, to_allocate, i; uint32_t idx, to_allocate, i;
int64_t pos, space;
pos = block_status(s, sector_num, nb_sectors, pnum); pos = block_status(s, sector_num, nb_sectors, pnum);
if (pos > 0) { if (pos > 0) {
@@ -200,19 +201,11 @@ static int64_t allocate_clusters(BlockDriverState *bs, int64_t sector_num,
} }
idx = sector_num / s->tracks; idx = sector_num / s->tracks;
if (idx >= s->bat_size) {
return -EINVAL;
}
to_allocate = DIV_ROUND_UP(sector_num + *pnum, s->tracks) - idx; to_allocate = DIV_ROUND_UP(sector_num + *pnum, s->tracks) - idx;
/* This function is called only by parallels_co_writev(), which will never
* pass a sector_num at or beyond the end of the image (because the block
* layer never passes such a sector_num to that function). Therefore, idx
* is always below s->bat_size.
* block_status() will limit *pnum so that sector_num + *pnum will not
* exceed the image end. Therefore, idx + to_allocate cannot exceed
* s->bat_size.
* Note that s->bat_size is an unsigned int, therefore idx + to_allocate
* will always fit into a uint32_t. */
assert(idx < s->bat_size && idx + to_allocate <= s->bat_size);
space = to_allocate * s->tracks; space = to_allocate * s->tracks;
if (s->data_end + space > bdrv_getlength(bs->file->bs) >> BDRV_SECTOR_BITS) { if (s->data_end + space > bdrv_getlength(bs->file->bs) >> BDRV_SECTOR_BITS) {
int ret; int ret;
@@ -222,9 +215,8 @@ static int64_t allocate_clusters(BlockDriverState *bs, int64_t sector_num,
s->data_end << BDRV_SECTOR_BITS, s->data_end << BDRV_SECTOR_BITS,
space << BDRV_SECTOR_BITS, 0); space << BDRV_SECTOR_BITS, 0);
} else { } else {
ret = bdrv_truncate(bs->file, ret = bdrv_truncate(bs->file->bs,
(s->data_end + space) << BDRV_SECTOR_BITS, (s->data_end + space) << BDRV_SECTOR_BITS);
NULL);
} }
if (ret < 0) { if (ret < 0) {
return ret; return ret;
@@ -457,10 +449,8 @@ static int parallels_check(BlockDriverState *bs, BdrvCheckResult *res,
size - res->image_end_offset); size - res->image_end_offset);
res->leaks += count; res->leaks += count;
if (fix & BDRV_FIX_LEAKS) { if (fix & BDRV_FIX_LEAKS) {
Error *local_err = NULL; ret = bdrv_truncate(bs->file->bs, res->image_end_offset);
ret = bdrv_truncate(bs->file, res->image_end_offset, &local_err);
if (ret < 0) { if (ret < 0) {
error_report_err(local_err);
res->check_errors++; res->check_errors++;
return ret; return ret;
} }
@@ -498,8 +488,7 @@ static int parallels_create(const char *filename, QemuOpts *opts, Error **errp)
} }
file = blk_new_open(filename, NULL, NULL, file = blk_new_open(filename, NULL, NULL,
BDRV_O_RDWR | BDRV_O_RESIZE | BDRV_O_PROTOCOL, BDRV_O_RDWR | BDRV_O_PROTOCOL, &local_err);
&local_err);
if (file == NULL) { if (file == NULL) {
error_propagate(errp, local_err); error_propagate(errp, local_err);
return -EIO; return -EIO;
@@ -507,7 +496,7 @@ static int parallels_create(const char *filename, QemuOpts *opts, Error **errp)
blk_set_allow_write_beyond_eof(file, true); blk_set_allow_write_beyond_eof(file, true);
ret = blk_truncate(file, 0, errp); ret = blk_truncate(file, 0);
if (ret < 0) { if (ret < 0) {
goto exit; goto exit;
} }
@@ -592,12 +581,6 @@ static int parallels_open(BlockDriverState *bs, QDict *options, int flags,
Error *local_err = NULL; Error *local_err = NULL;
char *buf; char *buf;
bs->file = bdrv_open_child(NULL, options, "file", bs, &child_file,
false, errp);
if (!bs->file) {
return -EINVAL;
}
ret = bdrv_pread(bs->file, 0, &ph, sizeof(ph)); ret = bdrv_pread(bs->file, 0, &ph, sizeof(ph));
if (ret < 0) { if (ret < 0) {
goto fail; goto fail;
@@ -697,9 +680,8 @@ static int parallels_open(BlockDriverState *bs, QDict *options, int flags,
if (local_err != NULL) { if (local_err != NULL) {
goto fail_options; goto fail_options;
} }
if (!bdrv_has_zero_init(bs->file->bs) ||
if (!(flags & BDRV_O_RESIZE) || !bdrv_has_zero_init(bs->file->bs) || bdrv_truncate(bs->file->bs, bdrv_getlength(bs->file->bs)) != 0) {
bdrv_truncate(bs->file, bdrv_getlength(bs->file->bs), NULL) != 0) {
s->prealloc_mode = PRL_PREALLOC_MODE_FALLOCATE; s->prealloc_mode = PRL_PREALLOC_MODE_FALLOCATE;
} }
@@ -742,7 +724,7 @@ static void parallels_close(BlockDriverState *bs)
} }
if (bs->open_flags & BDRV_O_RDWR) { if (bs->open_flags & BDRV_O_RDWR) {
bdrv_truncate(bs->file, s->data_end << BDRV_SECTOR_BITS, NULL); bdrv_truncate(bs->file->bs, s->data_end << BDRV_SECTOR_BITS);
} }
g_free(s->bat_dirty_bmap); g_free(s->bat_dirty_bmap);
@@ -774,7 +756,6 @@ static BlockDriver bdrv_parallels = {
.bdrv_probe = parallels_probe, .bdrv_probe = parallels_probe,
.bdrv_open = parallels_open, .bdrv_open = parallels_open,
.bdrv_close = parallels_close, .bdrv_close = parallels_close,
.bdrv_child_perm = bdrv_format_default_perms,
.bdrv_co_get_block_status = parallels_co_get_block_status, .bdrv_co_get_block_status = parallels_co_get_block_status,
.bdrv_has_zero_init = bdrv_has_zero_init_1, .bdrv_has_zero_init = bdrv_has_zero_init_1,
.bdrv_co_flush_to_os = parallels_co_flush_to_os, .bdrv_co_flush_to_os = parallels_co_flush_to_os,

View File

@@ -29,7 +29,7 @@
#include "block/write-threshold.h" #include "block/write-threshold.h"
#include "qmp-commands.h" #include "qmp-commands.h"
#include "qapi-visit.h" #include "qapi-visit.h"
#include "qapi/qobject-output-visitor.h" #include "qapi/qmp-output-visitor.h"
#include "qapi/qmp/types.h" #include "qapi/qmp/types.h"
#include "sysemu/block-backend.h" #include "sysemu/block-backend.h"
#include "qemu/cutils.h" #include "qemu/cutils.h"
@@ -237,8 +237,8 @@ void bdrv_query_image_info(BlockDriverState *bs,
size = bdrv_getlength(bs); size = bdrv_getlength(bs);
if (size < 0) { if (size < 0) {
error_setg_errno(errp, -size, "Can't get image size '%s'", error_setg_errno(errp, -size, "Can't get size of device '%s'",
bs->exact_filename); bdrv_get_device_name(bs));
goto out; goto out;
} }
@@ -357,6 +357,10 @@ static void bdrv_query_info(BlockBackend *blk, BlockInfo **p_info,
qapi_free_BlockInfo(info); qapi_free_BlockInfo(info);
} }
static BlockStats *bdrv_query_stats(BlockBackend *blk,
const BlockDriverState *bs,
bool query_backing);
static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk) static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
{ {
BlockAcctStats *stats = blk_get_stats(blk); BlockAcctStats *stats = blk_get_stats(blk);
@@ -424,33 +428,44 @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
} }
} }
static BlockStats *bdrv_query_bds_stats(const BlockDriverState *bs, static void bdrv_query_bds_stats(BlockStats *s, const BlockDriverState *bs,
bool query_backing) bool query_backing)
{ {
BlockStats *s = NULL;
s = g_malloc0(sizeof(*s));
s->stats = g_malloc0(sizeof(*s->stats));
if (!bs) {
return s;
}
if (bdrv_get_node_name(bs)[0]) { if (bdrv_get_node_name(bs)[0]) {
s->has_node_name = true; s->has_node_name = true;
s->node_name = g_strdup(bdrv_get_node_name(bs)); s->node_name = g_strdup(bdrv_get_node_name(bs));
} }
s->stats->wr_highest_offset = stat64_get(&bs->wr_highest_offset); s->stats->wr_highest_offset = bs->wr_highest_offset;
if (bs->file) { if (bs->file) {
s->has_parent = true; s->has_parent = true;
s->parent = bdrv_query_bds_stats(bs->file->bs, query_backing); s->parent = bdrv_query_stats(NULL, bs->file->bs, query_backing);
} }
if (query_backing && bs->backing) { if (query_backing && bs->backing) {
s->has_backing = true; s->has_backing = true;
s->backing = bdrv_query_bds_stats(bs->backing->bs, query_backing); s->backing = bdrv_query_stats(NULL, bs->backing->bs, query_backing);
}
}
static BlockStats *bdrv_query_stats(BlockBackend *blk,
const BlockDriverState *bs,
bool query_backing)
{
BlockStats *s;
s = g_malloc0(sizeof(*s));
s->stats = g_malloc0(sizeof(*s->stats));
if (blk) {
s->has_device = true;
s->device = g_strdup(blk_name(blk));
bdrv_query_blk_stats(s->stats, blk);
}
if (bs) {
bdrv_query_bds_stats(s, bs, query_backing);
} }
return s; return s;
@@ -479,45 +494,43 @@ BlockInfoList *qmp_query_block(Error **errp)
return head; return head;
} }
static bool next_query_bds(BlockBackend **blk, BlockDriverState **bs,
bool query_nodes)
{
if (query_nodes) {
*bs = bdrv_next_node(*bs);
return !!*bs;
}
*blk = blk_next(*blk);
*bs = *blk ? blk_bs(*blk) : NULL;
return !!*blk;
}
BlockStatsList *qmp_query_blockstats(bool has_query_nodes, BlockStatsList *qmp_query_blockstats(bool has_query_nodes,
bool query_nodes, bool query_nodes,
Error **errp) Error **errp)
{ {
BlockStatsList *head = NULL, **p_next = &head; BlockStatsList *head = NULL, **p_next = &head;
BlockBackend *blk; BlockBackend *blk = NULL;
BlockDriverState *bs; BlockDriverState *bs = NULL;
/* Just to be safe if query_nodes is not always initialized */ /* Just to be safe if query_nodes is not always initialized */
if (has_query_nodes && query_nodes) { query_nodes = has_query_nodes && query_nodes;
for (bs = bdrv_next_node(NULL); bs; bs = bdrv_next_node(bs)) {
while (next_query_bds(&blk, &bs, query_nodes)) {
BlockStatsList *info = g_malloc0(sizeof(*info)); BlockStatsList *info = g_malloc0(sizeof(*info));
AioContext *ctx = bdrv_get_aio_context(bs); AioContext *ctx = blk ? blk_get_aio_context(blk)
: bdrv_get_aio_context(bs);
aio_context_acquire(ctx); aio_context_acquire(ctx);
info->value = bdrv_query_bds_stats(bs, false); info->value = bdrv_query_stats(blk, bs, !query_nodes);
aio_context_release(ctx); aio_context_release(ctx);
*p_next = info; *p_next = info;
p_next = &info->next; p_next = &info->next;
} }
} else {
for (blk = blk_next(NULL); blk; blk = blk_next(blk)) {
BlockStatsList *info = g_malloc0(sizeof(*info));
AioContext *ctx = blk_get_aio_context(blk);
BlockStats *s;
aio_context_acquire(ctx);
s = bdrv_query_bds_stats(blk_bs(blk), true);
s->has_device = true;
s->device = g_strdup(blk_name(blk));
bdrv_query_blk_stats(s->stats, blk);
aio_context_release(ctx);
info->value = s;
*p_next = info;
p_next = &info->next;
}
}
return head; return head;
} }
@@ -678,13 +691,13 @@ void bdrv_image_info_specific_dump(fprintf_function func_fprintf, void *f,
ImageInfoSpecific *info_spec) ImageInfoSpecific *info_spec)
{ {
QObject *obj, *data; QObject *obj, *data;
Visitor *v = qobject_output_visitor_new(&obj); Visitor *v = qmp_output_visitor_new(&obj);
visit_type_ImageInfoSpecific(v, NULL, &info_spec, &error_abort); visit_type_ImageInfoSpecific(v, NULL, &info_spec, &error_abort);
visit_complete(v, &obj); visit_complete(v, &obj);
assert(qobject_type(obj) == QTYPE_QDICT);
data = qdict_get(qobject_to_qdict(obj), "data"); data = qdict_get(qobject_to_qdict(obj), "data");
dump_qobject(func_fprintf, f, 1, data); dump_qobject(func_fprintf, f, 1, data);
qobject_decref(obj);
visit_free(v); visit_free(v);
} }

View File

@@ -32,7 +32,7 @@
#include <zlib.h> #include <zlib.h>
#include "qapi/qmp/qerror.h" #include "qapi/qmp/qerror.h"
#include "crypto/cipher.h" #include "crypto/cipher.h"
#include "migration/blocker.h" #include "migration/migration.h"
/**************************************************************/ /**************************************************************/
/* QEMU COW block driver with compression and encryption support */ /* QEMU COW block driver with compression and encryption support */
@@ -104,13 +104,6 @@ static int qcow_open(BlockDriverState *bs, QDict *options, int flags,
unsigned int len, i, shift; unsigned int len, i, shift;
int ret; int ret;
QCowHeader header; QCowHeader header;
Error *local_err = NULL;
bs->file = bdrv_open_child(NULL, options, "file", bs, &child_file,
false, errp);
if (!bs->file) {
return -EINVAL;
}
ret = bdrv_pread(bs->file, 0, &header, sizeof(header)); ret = bdrv_pread(bs->file, 0, &header, sizeof(header));
if (ret < 0) { if (ret < 0) {
@@ -160,8 +153,7 @@ static int qcow_open(BlockDriverState *bs, QDict *options, int flags,
ret = -EINVAL; ret = -EINVAL;
goto fail; goto fail;
} }
if (!qcrypto_cipher_supports(QCRYPTO_CIPHER_ALG_AES_128, if (!qcrypto_cipher_supports(QCRYPTO_CIPHER_ALG_AES_128)) {
QCRYPTO_CIPHER_MODE_CBC)) {
error_setg(errp, "AES cipher not available"); error_setg(errp, "AES cipher not available");
ret = -EINVAL; ret = -EINVAL;
goto fail; goto fail;
@@ -259,12 +251,7 @@ static int qcow_open(BlockDriverState *bs, QDict *options, int flags,
error_setg(&s->migration_blocker, "The qcow format used by node '%s' " error_setg(&s->migration_blocker, "The qcow format used by node '%s' "
"does not support live migration", "does not support live migration",
bdrv_get_device_or_node_name(bs)); bdrv_get_device_or_node_name(bs));
ret = migrate_add_blocker(s->migration_blocker, &local_err); migrate_add_blocker(s->migration_blocker);
if (local_err) {
error_propagate(errp, local_err);
error_free(s->migration_blocker);
goto fail;
}
qemu_co_mutex_init(&s->lock); qemu_co_mutex_init(&s->lock);
return 0; return 0;
@@ -473,7 +460,7 @@ static uint64_t get_cluster_offset(BlockDriverState *bs,
/* round to cluster size */ /* round to cluster size */
cluster_offset = (cluster_offset + s->cluster_size - 1) & cluster_offset = (cluster_offset + s->cluster_size - 1) &
~(s->cluster_size - 1); ~(s->cluster_size - 1);
bdrv_truncate(bs->file, cluster_offset + s->cluster_size, NULL); bdrv_truncate(bs->file->bs, cluster_offset + s->cluster_size);
/* if encrypted, we must initialize the cluster /* if encrypted, we must initialize the cluster
content which won't be written */ content which won't be written */
if (bs->encrypted && if (bs->encrypted &&
@@ -823,8 +810,7 @@ static int qcow_create(const char *filename, QemuOpts *opts, Error **errp)
} }
qcow_blk = blk_new_open(filename, NULL, NULL, qcow_blk = blk_new_open(filename, NULL, NULL,
BDRV_O_RDWR | BDRV_O_RESIZE | BDRV_O_PROTOCOL, BDRV_O_RDWR | BDRV_O_PROTOCOL, &local_err);
&local_err);
if (qcow_blk == NULL) { if (qcow_blk == NULL) {
error_propagate(errp, local_err); error_propagate(errp, local_err);
ret = -EIO; ret = -EIO;
@@ -833,7 +819,7 @@ static int qcow_create(const char *filename, QemuOpts *opts, Error **errp)
blk_set_allow_write_beyond_eof(qcow_blk, true); blk_set_allow_write_beyond_eof(qcow_blk, true);
ret = blk_truncate(qcow_blk, 0, errp); ret = blk_truncate(qcow_blk, 0);
if (ret < 0) { if (ret < 0) {
goto exit; goto exit;
} }
@@ -852,7 +838,6 @@ static int qcow_create(const char *filename, QemuOpts *opts, Error **errp)
header_size += backing_filename_len; header_size += backing_filename_len;
} else { } else {
/* special backing file for vvfat */ /* special backing file for vvfat */
g_free(backing_file);
backing_file = NULL; backing_file = NULL;
} }
header.cluster_bits = 9; /* 512 byte cluster to avoid copying header.cluster_bits = 9; /* 512 byte cluster to avoid copying
@@ -917,7 +902,7 @@ static int qcow_make_empty(BlockDriverState *bs)
if (bdrv_pwrite_sync(bs->file, s->l1_table_offset, s->l1_table, if (bdrv_pwrite_sync(bs->file, s->l1_table_offset, s->l1_table,
l1_length) < 0) l1_length) < 0)
return -1; return -1;
ret = bdrv_truncate(bs->file, s->l1_table_offset + l1_length, NULL); ret = bdrv_truncate(bs->file->bs, s->l1_table_offset + l1_length);
if (ret < 0) if (ret < 0)
return ret; return ret;
@@ -928,32 +913,75 @@ static int qcow_make_empty(BlockDriverState *bs)
return 0; return 0;
} }
typedef struct QcowWriteCo {
BlockDriverState *bs;
int64_t sector_num;
const uint8_t *buf;
int nb_sectors;
int ret;
} QcowWriteCo;
static void qcow_write_co_entry(void *opaque)
{
QcowWriteCo *co = opaque;
QEMUIOVector qiov;
struct iovec iov = (struct iovec) {
.iov_base = (uint8_t*) co->buf,
.iov_len = co->nb_sectors * BDRV_SECTOR_SIZE,
};
qemu_iovec_init_external(&qiov, &iov, 1);
co->ret = qcow_co_writev(co->bs, co->sector_num, co->nb_sectors, &qiov);
}
/* Wrapper for non-coroutine contexts */
static int qcow_write(BlockDriverState *bs, int64_t sector_num,
const uint8_t *buf, int nb_sectors)
{
Coroutine *co;
AioContext *aio_context = bdrv_get_aio_context(bs);
QcowWriteCo data = {
.bs = bs,
.sector_num = sector_num,
.buf = buf,
.nb_sectors = nb_sectors,
.ret = -EINPROGRESS,
};
co = qemu_coroutine_create(qcow_write_co_entry, &data);
qemu_coroutine_enter(co);
while (data.ret == -EINPROGRESS) {
aio_poll(aio_context, true);
}
return data.ret;
}
/* XXX: put compressed sectors first, then all the cluster aligned /* XXX: put compressed sectors first, then all the cluster aligned
tables to avoid losing bytes in alignment */ tables to avoid losing bytes in alignment */
static coroutine_fn int static int qcow_write_compressed(BlockDriverState *bs, int64_t sector_num,
qcow_co_pwritev_compressed(BlockDriverState *bs, uint64_t offset, const uint8_t *buf, int nb_sectors)
uint64_t bytes, QEMUIOVector *qiov)
{ {
BDRVQcowState *s = bs->opaque; BDRVQcowState *s = bs->opaque;
QEMUIOVector hd_qiov;
struct iovec iov;
z_stream strm; z_stream strm;
int ret, out_len; int ret, out_len;
uint8_t *buf, *out_buf; uint8_t *out_buf;
uint64_t cluster_offset; uint64_t cluster_offset;
buf = qemu_blockalign(bs, s->cluster_size); if (nb_sectors != s->cluster_sectors) {
if (bytes != s->cluster_size) { ret = -EINVAL;
if (bytes > s->cluster_size ||
offset + bytes != bs->total_sectors << BDRV_SECTOR_BITS)
{
qemu_vfree(buf);
return -EINVAL;
}
/* Zero-pad last write if image size is not cluster aligned */ /* Zero-pad last write if image size is not cluster aligned */
memset(buf + bytes, 0, s->cluster_size - bytes); if (sector_num + nb_sectors == bs->total_sectors &&
nb_sectors < s->cluster_sectors) {
uint8_t *pad_buf = qemu_blockalign(bs, s->cluster_size);
memset(pad_buf, 0, s->cluster_size);
memcpy(pad_buf, buf, nb_sectors * BDRV_SECTOR_SIZE);
ret = qcow_write_compressed(bs, sector_num,
pad_buf, s->cluster_sectors);
qemu_vfree(pad_buf);
}
return ret;
} }
qemu_iovec_to_buf(qiov, 0, buf, qiov->size);
out_buf = g_malloc(s->cluster_size); out_buf = g_malloc(s->cluster_size);
@@ -984,35 +1012,27 @@ qcow_co_pwritev_compressed(BlockDriverState *bs, uint64_t offset,
if (ret != Z_STREAM_END || out_len >= s->cluster_size) { if (ret != Z_STREAM_END || out_len >= s->cluster_size) {
/* could not compress: write normal cluster */ /* could not compress: write normal cluster */
ret = qcow_co_writev(bs, offset >> BDRV_SECTOR_BITS, ret = qcow_write(bs, sector_num, buf, s->cluster_sectors);
bytes >> BDRV_SECTOR_BITS, qiov);
if (ret < 0) { if (ret < 0) {
goto fail; goto fail;
} }
goto success; } else {
} cluster_offset = get_cluster_offset(bs, sector_num << 9, 2,
qemu_co_mutex_lock(&s->lock); out_len, 0, 0);
cluster_offset = get_cluster_offset(bs, offset, 2, out_len, 0, 0);
qemu_co_mutex_unlock(&s->lock);
if (cluster_offset == 0) { if (cluster_offset == 0) {
ret = -EIO; ret = -EIO;
goto fail; goto fail;
} }
cluster_offset &= s->cluster_offset_mask;
iov = (struct iovec) { cluster_offset &= s->cluster_offset_mask;
.iov_base = out_buf, ret = bdrv_pwrite(bs->file, cluster_offset, out_buf, out_len);
.iov_len = out_len,
};
qemu_iovec_init_external(&hd_qiov, &iov, 1);
ret = bdrv_co_pwritev(bs->file, cluster_offset, out_len, &hd_qiov, 0);
if (ret < 0) { if (ret < 0) {
goto fail; goto fail;
} }
success: }
ret = 0; ret = 0;
fail: fail:
qemu_vfree(buf);
g_free(out_buf); g_free(out_buf);
return ret; return ret;
} }
@@ -1054,7 +1074,6 @@ static BlockDriver bdrv_qcow = {
.bdrv_probe = qcow_probe, .bdrv_probe = qcow_probe,
.bdrv_open = qcow_open, .bdrv_open = qcow_open,
.bdrv_close = qcow_close, .bdrv_close = qcow_close,
.bdrv_child_perm = bdrv_format_default_perms,
.bdrv_reopen_prepare = qcow_reopen_prepare, .bdrv_reopen_prepare = qcow_reopen_prepare,
.bdrv_create = qcow_create, .bdrv_create = qcow_create,
.bdrv_has_zero_init = bdrv_has_zero_init_1, .bdrv_has_zero_init = bdrv_has_zero_init_1,
@@ -1066,7 +1085,7 @@ static BlockDriver bdrv_qcow = {
.bdrv_set_key = qcow_set_key, .bdrv_set_key = qcow_set_key,
.bdrv_make_empty = qcow_make_empty, .bdrv_make_empty = qcow_make_empty,
.bdrv_co_pwritev_compressed = qcow_co_pwritev_compressed, .bdrv_write_compressed = qcow_write_compressed,
.bdrv_get_info = qcow_get_info, .bdrv_get_info = qcow_get_info,
.create_opts = &qcow_create_opts, .create_opts = &qcow_create_opts,

View File

@@ -22,6 +22,7 @@
* THE SOFTWARE. * THE SOFTWARE.
*/ */
/* Needed for CONFIG_MADVISE */
#include "qemu/osdep.h" #include "qemu/osdep.h"
#include "block/block_int.h" #include "block/block_int.h"
#include "qemu-common.h" #include "qemu-common.h"
@@ -65,8 +66,7 @@ static inline int qcow2_cache_get_table_idx(BlockDriverState *bs,
static void qcow2_cache_table_release(BlockDriverState *bs, Qcow2Cache *c, static void qcow2_cache_table_release(BlockDriverState *bs, Qcow2Cache *c,
int i, int num_tables) int i, int num_tables)
{ {
/* Using MADV_DONTNEED to discard memory is a Linux-specific feature */ #if QEMU_MADV_DONTNEED != QEMU_MADV_INVALID
#ifdef CONFIG_LINUX
BDRVQcow2State *s = bs->opaque; BDRVQcow2State *s = bs->opaque;
void *t = qcow2_cache_get_table_addr(bs, c, i); void *t = qcow2_cache_get_table_addr(bs, c, i);
int align = getpagesize(); int align = getpagesize();
@@ -74,7 +74,7 @@ static void qcow2_cache_table_release(BlockDriverState *bs, Qcow2Cache *c,
size_t offset = QEMU_ALIGN_UP((uintptr_t) t, align) - (uintptr_t) t; size_t offset = QEMU_ALIGN_UP((uintptr_t) t, align) - (uintptr_t) t;
size_t length = QEMU_ALIGN_DOWN(mem_size - offset, align); size_t length = QEMU_ALIGN_DOWN(mem_size - offset, align);
if (length > 0) { if (length > 0) {
madvise((uint8_t *) t + offset, length, MADV_DONTNEED); qemu_madvise((uint8_t *) t + offset, length, QEMU_MADV_DONTNEED);
} }
#endif #endif
} }

View File

@@ -83,9 +83,7 @@ int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
} }
memset(new_l1_table, 0, align_offset(new_l1_size2, 512)); memset(new_l1_table, 0, align_offset(new_l1_size2, 512));
if (s->l1_size) {
memcpy(new_l1_table, s->l1_table, s->l1_size * sizeof(uint64_t)); memcpy(new_l1_table, s->l1_table, s->l1_size * sizeof(uint64_t));
}
/* write new table (align to cluster) */ /* write new table (align to cluster) */
BLKDBG_EVENT(bs->file, BLKDBG_L1_GROW_ALLOC_TABLE); BLKDBG_EVENT(bs->file, BLKDBG_L1_GROW_ALLOC_TABLE);
@@ -309,19 +307,14 @@ static int count_contiguous_clusters(int nb_clusters, int cluster_size,
uint64_t *l2_table, uint64_t stop_flags) uint64_t *l2_table, uint64_t stop_flags)
{ {
int i; int i;
QCow2ClusterType first_cluster_type;
uint64_t mask = stop_flags | L2E_OFFSET_MASK | QCOW_OFLAG_COMPRESSED; uint64_t mask = stop_flags | L2E_OFFSET_MASK | QCOW_OFLAG_COMPRESSED;
uint64_t first_entry = be64_to_cpu(l2_table[0]); uint64_t first_entry = be64_to_cpu(l2_table[0]);
uint64_t offset = first_entry & mask; uint64_t offset = first_entry & mask;
if (!offset) { if (!offset)
return 0; return 0;
}
/* must be allocated */ assert(qcow2_get_cluster_type(first_entry) == QCOW2_CLUSTER_NORMAL);
first_cluster_type = qcow2_get_cluster_type(first_entry);
assert(first_cluster_type == QCOW2_CLUSTER_NORMAL ||
first_cluster_type == QCOW2_CLUSTER_ZERO_ALLOC);
for (i = 0; i < nb_clusters; i++) { for (i = 0; i < nb_clusters; i++) {
uint64_t l2_entry = be64_to_cpu(l2_table[i]) & mask; uint64_t l2_entry = be64_to_cpu(l2_table[i]) & mask;
@@ -333,21 +326,14 @@ static int count_contiguous_clusters(int nb_clusters, int cluster_size,
return i; return i;
} }
/* static int count_contiguous_clusters_by_type(int nb_clusters,
* Checks how many consecutive unallocated clusters in a given L2
* table have the same cluster type.
*/
static int count_contiguous_clusters_unallocated(int nb_clusters,
uint64_t *l2_table, uint64_t *l2_table,
QCow2ClusterType wanted_type) int wanted_type)
{ {
int i; int i;
assert(wanted_type == QCOW2_CLUSTER_ZERO_PLAIN ||
wanted_type == QCOW2_CLUSTER_UNALLOCATED);
for (i = 0; i < nb_clusters; i++) { for (i = 0; i < nb_clusters; i++) {
uint64_t entry = be64_to_cpu(l2_table[i]); int type = qcow2_get_cluster_type(be64_to_cpu(l2_table[i]));
QCow2ClusterType type = qcow2_get_cluster_type(entry);
if (type != wanted_type) { if (type != wanted_type) {
break; break;
@@ -441,7 +427,7 @@ static int coroutine_fn do_perform_cow(BlockDriverState *bs,
if (bs->encrypted) { if (bs->encrypted) {
Error *err = NULL; Error *err = NULL;
int64_t sector = (src_cluster_offset + offset_in_cluster) int64_t sector = (cluster_offset + offset_in_cluster)
>> BDRV_SECTOR_BITS; >> BDRV_SECTOR_BITS;
assert(s->cipher); assert(s->cipher);
assert((offset_in_cluster & ~BDRV_SECTOR_MASK) == 0); assert((offset_in_cluster & ~BDRV_SECTOR_MASK) == 0);
@@ -499,7 +485,6 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
int l1_bits, c; int l1_bits, c;
unsigned int offset_in_cluster; unsigned int offset_in_cluster;
uint64_t bytes_available, bytes_needed, nb_clusters; uint64_t bytes_available, bytes_needed, nb_clusters;
QCow2ClusterType type;
int ret; int ret;
offset_in_cluster = offset_into_cluster(s, offset); offset_in_cluster = offset_into_cluster(s, offset);
@@ -522,13 +507,13 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
l1_index = offset >> l1_bits; l1_index = offset >> l1_bits;
if (l1_index >= s->l1_size) { if (l1_index >= s->l1_size) {
type = QCOW2_CLUSTER_UNALLOCATED; ret = QCOW2_CLUSTER_UNALLOCATED;
goto out; goto out;
} }
l2_offset = s->l1_table[l1_index] & L1E_OFFSET_MASK; l2_offset = s->l1_table[l1_index] & L1E_OFFSET_MASK;
if (!l2_offset) { if (!l2_offset) {
type = QCOW2_CLUSTER_UNALLOCATED; ret = QCOW2_CLUSTER_UNALLOCATED;
goto out; goto out;
} }
@@ -557,37 +542,38 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
* true */ * true */
assert(nb_clusters <= INT_MAX); assert(nb_clusters <= INT_MAX);
type = qcow2_get_cluster_type(*cluster_offset); ret = qcow2_get_cluster_type(*cluster_offset);
if (s->qcow_version < 3 && (type == QCOW2_CLUSTER_ZERO_PLAIN || switch (ret) {
type == QCOW2_CLUSTER_ZERO_ALLOC)) { case QCOW2_CLUSTER_COMPRESSED:
/* Compressed clusters can only be processed one by one */
c = 1;
*cluster_offset &= L2E_COMPRESSED_OFFSET_SIZE_MASK;
break;
case QCOW2_CLUSTER_ZERO:
if (s->qcow_version < 3) {
qcow2_signal_corruption(bs, true, -1, -1, "Zero cluster entry found" qcow2_signal_corruption(bs, true, -1, -1, "Zero cluster entry found"
" in pre-v3 image (L2 offset: %#" PRIx64 " in pre-v3 image (L2 offset: %#" PRIx64
", L2 index: %#x)", l2_offset, l2_index); ", L2 index: %#x)", l2_offset, l2_index);
ret = -EIO; ret = -EIO;
goto fail; goto fail;
} }
switch (type) { c = count_contiguous_clusters_by_type(nb_clusters, &l2_table[l2_index],
case QCOW2_CLUSTER_COMPRESSED: QCOW2_CLUSTER_ZERO);
/* Compressed clusters can only be processed one by one */ *cluster_offset = 0;
c = 1; break;
*cluster_offset &= L2E_COMPRESSED_OFFSET_SIZE_MASK; case QCOW2_CLUSTER_UNALLOCATED:
break; /* how many empty clusters ? */
case QCOW2_CLUSTER_ZERO_PLAIN: c = count_contiguous_clusters_by_type(nb_clusters, &l2_table[l2_index],
case QCOW2_CLUSTER_UNALLOCATED: QCOW2_CLUSTER_UNALLOCATED);
/* how many empty clusters ? */
c = count_contiguous_clusters_unallocated(nb_clusters,
&l2_table[l2_index], type);
*cluster_offset = 0; *cluster_offset = 0;
break; break;
case QCOW2_CLUSTER_ZERO_ALLOC:
case QCOW2_CLUSTER_NORMAL: case QCOW2_CLUSTER_NORMAL:
/* how many allocated clusters ? */ /* how many allocated clusters ? */
c = count_contiguous_clusters(nb_clusters, s->cluster_size, c = count_contiguous_clusters(nb_clusters, s->cluster_size,
&l2_table[l2_index], QCOW_OFLAG_ZERO); &l2_table[l2_index], QCOW_OFLAG_ZERO);
*cluster_offset &= L2E_OFFSET_MASK; *cluster_offset &= L2E_OFFSET_MASK;
if (offset_into_cluster(s, *cluster_offset)) { if (offset_into_cluster(s, *cluster_offset)) {
qcow2_signal_corruption(bs, true, -1, -1, qcow2_signal_corruption(bs, true, -1, -1, "Data cluster offset %#"
"Cluster allocation offset %#"
PRIx64 " unaligned (L2 offset: %#" PRIx64 PRIx64 " unaligned (L2 offset: %#" PRIx64
", L2 index: %#x)", *cluster_offset, ", L2 index: %#x)", *cluster_offset,
l2_offset, l2_index); l2_offset, l2_index);
@@ -614,7 +600,7 @@ out:
assert(bytes_available - offset_in_cluster <= UINT_MAX); assert(bytes_available - offset_in_cluster <= UINT_MAX);
*bytes = bytes_available - offset_in_cluster; *bytes = bytes_available - offset_in_cluster;
return type; return ret;
fail: fail:
qcow2_cache_put(bs, s->l2_table_cache, (void **)&l2_table); qcow2_cache_put(bs, s->l2_table_cache, (void **)&l2_table);
@@ -847,7 +833,7 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
* Don't discard clusters that reach a refcount of 0 (e.g. compressed * Don't discard clusters that reach a refcount of 0 (e.g. compressed
* clusters), the next write will reuse them anyway. * clusters), the next write will reuse them anyway.
*/ */
if (!m->keep_old_clusters && j != 0) { if (j != 0) {
for (i = 0; i < j; i++) { for (i = 0; i < j; i++) {
qcow2_free_any_clusters(bs, be64_to_cpu(old_cluster[i]), 1, qcow2_free_any_clusters(bs, be64_to_cpu(old_cluster[i]), 1,
QCOW2_DISCARD_NEVER); QCOW2_DISCARD_NEVER);
@@ -872,7 +858,7 @@ static int count_cow_clusters(BDRVQcow2State *s, int nb_clusters,
for (i = 0; i < nb_clusters; i++) { for (i = 0; i < nb_clusters; i++) {
uint64_t l2_entry = be64_to_cpu(l2_table[l2_index + i]); uint64_t l2_entry = be64_to_cpu(l2_table[l2_index + i]);
QCow2ClusterType cluster_type = qcow2_get_cluster_type(l2_entry); int cluster_type = qcow2_get_cluster_type(l2_entry);
switch(cluster_type) { switch(cluster_type) {
case QCOW2_CLUSTER_NORMAL: case QCOW2_CLUSTER_NORMAL:
@@ -882,8 +868,7 @@ static int count_cow_clusters(BDRVQcow2State *s, int nb_clusters,
break; break;
case QCOW2_CLUSTER_UNALLOCATED: case QCOW2_CLUSTER_UNALLOCATED:
case QCOW2_CLUSTER_COMPRESSED: case QCOW2_CLUSTER_COMPRESSED:
case QCOW2_CLUSTER_ZERO_PLAIN: case QCOW2_CLUSTER_ZERO:
case QCOW2_CLUSTER_ZERO_ALLOC:
break; break;
default: default:
abort(); abort();
@@ -945,7 +930,9 @@ static int handle_dependencies(BlockDriverState *bs, uint64_t guest_offset,
if (bytes == 0) { if (bytes == 0) {
/* Wait for the dependency to complete. We need to recheck /* Wait for the dependency to complete. We need to recheck
* the free/allocated clusters when we continue. */ * the free/allocated clusters when we continue. */
qemu_co_queue_wait(&old_alloc->dependent_requests, &s->lock); qemu_co_mutex_unlock(&s->lock);
qemu_co_queue_wait(&old_alloc->dependent_requests);
qemu_co_mutex_lock(&s->lock);
return -EAGAIN; return -EAGAIN;
} }
} }
@@ -1145,9 +1132,8 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
uint64_t entry; uint64_t entry;
uint64_t nb_clusters; uint64_t nb_clusters;
int ret; int ret;
bool keep_old_clusters = false;
uint64_t alloc_cluster_offset = 0; uint64_t alloc_cluster_offset;
trace_qcow2_handle_alloc(qemu_coroutine_self(), guest_offset, *host_offset, trace_qcow2_handle_alloc(qemu_coroutine_self(), guest_offset, *host_offset,
*bytes); *bytes);
@@ -1184,30 +1170,8 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
* wrong with our code. */ * wrong with our code. */
assert(nb_clusters > 0); assert(nb_clusters > 0);
if (qcow2_get_cluster_type(entry) == QCOW2_CLUSTER_ZERO_ALLOC &&
(entry & QCOW_OFLAG_COPIED) &&
(!*host_offset ||
start_of_cluster(s, *host_offset) == (entry & L2E_OFFSET_MASK)))
{
/* Try to reuse preallocated zero clusters; contiguous normal clusters
* would be fine, too, but count_cow_clusters() above has limited
* nb_clusters already to a range of COW clusters */
int preallocated_nb_clusters =
count_contiguous_clusters(nb_clusters, s->cluster_size,
&l2_table[l2_index], QCOW_OFLAG_COPIED);
assert(preallocated_nb_clusters > 0);
nb_clusters = preallocated_nb_clusters;
alloc_cluster_offset = entry & L2E_OFFSET_MASK;
/* We want to reuse these clusters, so qcow2_alloc_cluster_link_l2()
* should not free them. */
keep_old_clusters = true;
}
qcow2_cache_put(bs, s->l2_table_cache, (void **) &l2_table); qcow2_cache_put(bs, s->l2_table_cache, (void **) &l2_table);
if (!alloc_cluster_offset) {
/* Allocate, if necessary at a given offset in the image file */ /* Allocate, if necessary at a given offset in the image file */
alloc_cluster_offset = start_of_cluster(s, *host_offset); alloc_cluster_offset = start_of_cluster(s, *host_offset);
ret = do_alloc_cluster_offset(bs, guest_offset, &alloc_cluster_offset, ret = do_alloc_cluster_offset(bs, guest_offset, &alloc_cluster_offset,
@@ -1222,17 +1186,16 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
return 0; return 0;
} }
/* !*host_offset would overwrite the image header and is reserved for /* !*host_offset would overwrite the image header and is reserved for "no
* "no host offset preferred". If 0 was a valid host offset, it'd * host offset preferred". If 0 was a valid host offset, it'd trigger the
* trigger the following overlap check; do that now to avoid having an * following overlap check; do that now to avoid having an invalid value in
* invalid value in *host_offset. */ * *host_offset. */
if (!alloc_cluster_offset) { if (!alloc_cluster_offset) {
ret = qcow2_pre_write_overlap_check(bs, 0, alloc_cluster_offset, ret = qcow2_pre_write_overlap_check(bs, 0, alloc_cluster_offset,
nb_clusters * s->cluster_size); nb_clusters * s->cluster_size);
assert(ret < 0); assert(ret < 0);
goto fail; goto fail;
} }
}
/* /*
* Save info needed for meta data update. * Save info needed for meta data update.
@@ -1262,8 +1225,6 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
.offset = start_of_cluster(s, guest_offset), .offset = start_of_cluster(s, guest_offset),
.nb_clusters = nb_clusters, .nb_clusters = nb_clusters,
.keep_old_clusters = keep_old_clusters,
.cow_start = { .cow_start = {
.offset = 0, .offset = 0,
.nb_bytes = offset_into_cluster(s, guest_offset), .nb_bytes = offset_into_cluster(s, guest_offset),
@@ -1517,13 +1478,12 @@ static int discard_single_l2(BlockDriverState *bs, uint64_t offset,
} }
break; break;
case QCOW2_CLUSTER_ZERO_PLAIN: case QCOW2_CLUSTER_ZERO:
if (!full_discard) { if (!full_discard) {
continue; continue;
} }
break; break;
case QCOW2_CLUSTER_ZERO_ALLOC:
case QCOW2_CLUSTER_NORMAL: case QCOW2_CLUSTER_NORMAL:
case QCOW2_CLUSTER_COMPRESSED: case QCOW2_CLUSTER_COMPRESSED:
break; break;
@@ -1549,36 +1509,37 @@ static int discard_single_l2(BlockDriverState *bs, uint64_t offset,
return nb_clusters; return nb_clusters;
} }
int qcow2_cluster_discard(BlockDriverState *bs, uint64_t offset, int qcow2_discard_clusters(BlockDriverState *bs, uint64_t offset,
uint64_t bytes, enum qcow2_discard_type type, int nb_sectors, enum qcow2_discard_type type, bool full_discard)
bool full_discard)
{ {
BDRVQcow2State *s = bs->opaque; BDRVQcow2State *s = bs->opaque;
uint64_t end_offset = offset + bytes; uint64_t end_offset;
uint64_t nb_clusters; uint64_t nb_clusters;
int64_t cleared;
int ret; int ret;
/* Caller must pass aligned values, except at image end */ end_offset = offset + (nb_sectors << BDRV_SECTOR_BITS);
assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
end_offset == bs->total_sectors << BDRV_SECTOR_BITS);
nb_clusters = size_to_clusters(s, bytes); /* Round start up and end down */
offset = align_offset(offset, s->cluster_size);
end_offset = start_of_cluster(s, end_offset);
if (offset > end_offset) {
return 0;
}
nb_clusters = size_to_clusters(s, end_offset - offset);
s->cache_discards = true; s->cache_discards = true;
/* Each L2 table is handled by its own loop iteration */ /* Each L2 table is handled by its own loop iteration */
while (nb_clusters > 0) { while (nb_clusters > 0) {
cleared = discard_single_l2(bs, offset, nb_clusters, type, ret = discard_single_l2(bs, offset, nb_clusters, type, full_discard);
full_discard); if (ret < 0) {
if (cleared < 0) {
ret = cleared;
goto fail; goto fail;
} }
nb_clusters -= cleared; nb_clusters -= ret;
offset += (cleared * s->cluster_size); offset += (ret * s->cluster_size);
} }
ret = 0; ret = 0;
@@ -1595,14 +1556,13 @@ fail:
* clusters. * clusters.
*/ */
static int zero_single_l2(BlockDriverState *bs, uint64_t offset, static int zero_single_l2(BlockDriverState *bs, uint64_t offset,
uint64_t nb_clusters, int flags) uint64_t nb_clusters)
{ {
BDRVQcow2State *s = bs->opaque; BDRVQcow2State *s = bs->opaque;
uint64_t *l2_table; uint64_t *l2_table;
int l2_index; int l2_index;
int ret; int ret;
int i; int i;
bool unmap = !!(flags & BDRV_REQ_MAY_UNMAP);
ret = get_cluster_table(bs, offset, &l2_table, &l2_index); ret = get_cluster_table(bs, offset, &l2_table, &l2_index);
if (ret < 0) { if (ret < 0) {
@@ -1615,22 +1575,12 @@ static int zero_single_l2(BlockDriverState *bs, uint64_t offset,
for (i = 0; i < nb_clusters; i++) { for (i = 0; i < nb_clusters; i++) {
uint64_t old_offset; uint64_t old_offset;
QCow2ClusterType cluster_type;
old_offset = be64_to_cpu(l2_table[l2_index + i]); old_offset = be64_to_cpu(l2_table[l2_index + i]);
/* /* Update L2 entries */
* Minimize L2 changes if the cluster already reads back as
* zeroes with correct allocation.
*/
cluster_type = qcow2_get_cluster_type(old_offset);
if (cluster_type == QCOW2_CLUSTER_ZERO_PLAIN ||
(cluster_type == QCOW2_CLUSTER_ZERO_ALLOC && !unmap)) {
continue;
}
qcow2_cache_entry_mark_dirty(bs, s->l2_table_cache, l2_table); qcow2_cache_entry_mark_dirty(bs, s->l2_table_cache, l2_table);
if (cluster_type == QCOW2_CLUSTER_COMPRESSED || unmap) { if (old_offset & QCOW_OFLAG_COMPRESSED) {
l2_table[l2_index + i] = cpu_to_be64(QCOW_OFLAG_ZERO); l2_table[l2_index + i] = cpu_to_be64(QCOW_OFLAG_ZERO);
qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST); qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
} else { } else {
@@ -1643,39 +1593,30 @@ static int zero_single_l2(BlockDriverState *bs, uint64_t offset,
return nb_clusters; return nb_clusters;
} }
int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t offset, int qcow2_zero_clusters(BlockDriverState *bs, uint64_t offset, int nb_sectors)
uint64_t bytes, int flags)
{ {
BDRVQcow2State *s = bs->opaque; BDRVQcow2State *s = bs->opaque;
uint64_t end_offset = offset + bytes;
uint64_t nb_clusters; uint64_t nb_clusters;
int64_t cleared;
int ret; int ret;
/* Caller must pass aligned values, except at image end */
assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
end_offset == bs->total_sectors << BDRV_SECTOR_BITS);
/* The zero flag is only supported by version 3 and newer */ /* The zero flag is only supported by version 3 and newer */
if (s->qcow_version < 3) { if (s->qcow_version < 3) {
return -ENOTSUP; return -ENOTSUP;
} }
/* Each L2 table is handled by its own loop iteration */ /* Each L2 table is handled by its own loop iteration */
nb_clusters = size_to_clusters(s, bytes); nb_clusters = size_to_clusters(s, nb_sectors << BDRV_SECTOR_BITS);
s->cache_discards = true; s->cache_discards = true;
while (nb_clusters > 0) { while (nb_clusters > 0) {
cleared = zero_single_l2(bs, offset, nb_clusters, flags); ret = zero_single_l2(bs, offset, nb_clusters);
if (cleared < 0) { if (ret < 0) {
ret = cleared;
goto fail; goto fail;
} }
nb_clusters -= cleared; nb_clusters -= ret;
offset += (cleared * s->cluster_size); offset += (ret * s->cluster_size);
} }
ret = 0; ret = 0;
@@ -1759,14 +1700,14 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
for (j = 0; j < s->l2_size; j++) { for (j = 0; j < s->l2_size; j++) {
uint64_t l2_entry = be64_to_cpu(l2_table[j]); uint64_t l2_entry = be64_to_cpu(l2_table[j]);
int64_t offset = l2_entry & L2E_OFFSET_MASK; int64_t offset = l2_entry & L2E_OFFSET_MASK;
QCow2ClusterType cluster_type = qcow2_get_cluster_type(l2_entry); int cluster_type = qcow2_get_cluster_type(l2_entry);
bool preallocated = offset != 0;
if (cluster_type != QCOW2_CLUSTER_ZERO_PLAIN && if (cluster_type != QCOW2_CLUSTER_ZERO) {
cluster_type != QCOW2_CLUSTER_ZERO_ALLOC) {
continue; continue;
} }
if (cluster_type == QCOW2_CLUSTER_ZERO_PLAIN) { if (!preallocated) {
if (!bs->backing) { if (!bs->backing) {
/* not backed; therefore we can simply deallocate the /* not backed; therefore we can simply deallocate the
* cluster */ * cluster */
@@ -1797,12 +1738,11 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
} }
if (offset_into_cluster(s, offset)) { if (offset_into_cluster(s, offset)) {
qcow2_signal_corruption(bs, true, -1, -1, qcow2_signal_corruption(bs, true, -1, -1, "Data cluster offset "
"Cluster allocation offset "
"%#" PRIx64 " unaligned (L2 offset: %#" "%#" PRIx64 " unaligned (L2 offset: %#"
PRIx64 ", L2 index: %#x)", offset, PRIx64 ", L2 index: %#x)", offset,
l2_offset, j); l2_offset, j);
if (cluster_type == QCOW2_CLUSTER_ZERO_PLAIN) { if (!preallocated) {
qcow2_free_clusters(bs, offset, s->cluster_size, qcow2_free_clusters(bs, offset, s->cluster_size,
QCOW2_DISCARD_ALWAYS); QCOW2_DISCARD_ALWAYS);
} }
@@ -1812,7 +1752,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
ret = qcow2_pre_write_overlap_check(bs, 0, offset, s->cluster_size); ret = qcow2_pre_write_overlap_check(bs, 0, offset, s->cluster_size);
if (ret < 0) { if (ret < 0) {
if (cluster_type == QCOW2_CLUSTER_ZERO_PLAIN) { if (!preallocated) {
qcow2_free_clusters(bs, offset, s->cluster_size, qcow2_free_clusters(bs, offset, s->cluster_size,
QCOW2_DISCARD_ALWAYS); QCOW2_DISCARD_ALWAYS);
} }
@@ -1821,7 +1761,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
ret = bdrv_pwrite_zeroes(bs->file, offset, s->cluster_size, 0); ret = bdrv_pwrite_zeroes(bs->file, offset, s->cluster_size, 0);
if (ret < 0) { if (ret < 0) {
if (cluster_type == QCOW2_CLUSTER_ZERO_PLAIN) { if (!preallocated) {
qcow2_free_clusters(bs, offset, s->cluster_size, qcow2_free_clusters(bs, offset, s->cluster_size,
QCOW2_DISCARD_ALWAYS); QCOW2_DISCARD_ALWAYS);
} }

View File

@@ -83,16 +83,6 @@ static Qcow2SetRefcountFunc *const set_refcount_funcs[] = {
/*********************************************************/ /*********************************************************/
/* refcount handling */ /* refcount handling */
static void update_max_refcount_table_index(BDRVQcow2State *s)
{
unsigned i = s->refcount_table_size - 1;
while (i > 0 && (s->refcount_table[i] & REFT_OFFSET_MASK) == 0) {
i--;
}
/* Set s->max_refcount_table_index to the index of the last used entry */
s->max_refcount_table_index = i;
}
int qcow2_refcount_init(BlockDriverState *bs) int qcow2_refcount_init(BlockDriverState *bs)
{ {
BDRVQcow2State *s = bs->opaque; BDRVQcow2State *s = bs->opaque;
@@ -121,7 +111,6 @@ int qcow2_refcount_init(BlockDriverState *bs)
} }
for(i = 0; i < s->refcount_table_size; i++) for(i = 0; i < s->refcount_table_size; i++)
be64_to_cpus(&s->refcount_table[i]); be64_to_cpus(&s->refcount_table[i]);
update_max_refcount_table_index(s);
} }
return 0; return 0;
fail: fail:
@@ -450,10 +439,6 @@ static int alloc_refcount_block(BlockDriverState *bs,
} }
s->refcount_table[refcount_table_index] = new_block; s->refcount_table[refcount_table_index] = new_block;
/* If there's a hole in s->refcount_table then it can happen
* that refcount_table_index < s->max_refcount_table_index */
s->max_refcount_table_index =
MAX(s->max_refcount_table_index, refcount_table_index);
/* The new refcount block may be where the caller intended to put its /* The new refcount block may be where the caller intended to put its
* data, so let it restart the search. */ * data, so let it restart the search. */
@@ -595,7 +580,6 @@ static int alloc_refcount_block(BlockDriverState *bs,
s->refcount_table = new_table; s->refcount_table = new_table;
s->refcount_table_size = table_size; s->refcount_table_size = table_size;
s->refcount_table_offset = table_offset; s->refcount_table_offset = table_offset;
update_max_refcount_table_index(s);
/* Free old table. */ /* Free old table. */
qcow2_free_clusters(bs, old_table_offset, old_table_size * sizeof(uint64_t), qcow2_free_clusters(bs, old_table_offset, old_table_size * sizeof(uint64_t),
@@ -1028,7 +1012,8 @@ void qcow2_free_any_clusters(BlockDriverState *bs, uint64_t l2_entry,
} }
break; break;
case QCOW2_CLUSTER_NORMAL: case QCOW2_CLUSTER_NORMAL:
case QCOW2_CLUSTER_ZERO_ALLOC: case QCOW2_CLUSTER_ZERO:
if (l2_entry & L2E_OFFSET_MASK) {
if (offset_into_cluster(s, l2_entry & L2E_OFFSET_MASK)) { if (offset_into_cluster(s, l2_entry & L2E_OFFSET_MASK)) {
qcow2_signal_corruption(bs, false, -1, -1, qcow2_signal_corruption(bs, false, -1, -1,
"Cannot free unaligned cluster %#llx", "Cannot free unaligned cluster %#llx",
@@ -1037,8 +1022,8 @@ void qcow2_free_any_clusters(BlockDriverState *bs, uint64_t l2_entry,
qcow2_free_clusters(bs, l2_entry & L2E_OFFSET_MASK, qcow2_free_clusters(bs, l2_entry & L2E_OFFSET_MASK,
nb_clusters << s->cluster_bits, type); nb_clusters << s->cluster_bits, type);
} }
}
break; break;
case QCOW2_CLUSTER_ZERO_PLAIN:
case QCOW2_CLUSTER_UNALLOCATED: case QCOW2_CLUSTER_UNALLOCATED:
break; break;
default: default:
@@ -1058,9 +1043,9 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
int64_t l1_table_offset, int l1_size, int addend) int64_t l1_table_offset, int l1_size, int addend)
{ {
BDRVQcow2State *s = bs->opaque; BDRVQcow2State *s = bs->opaque;
uint64_t *l1_table, *l2_table, l2_offset, entry, l1_size2, refcount; uint64_t *l1_table, *l2_table, l2_offset, offset, l1_size2, refcount;
bool l1_allocated = false; bool l1_allocated = false;
int64_t old_entry, old_l2_offset; int64_t old_offset, old_l2_offset;
int i, j, l1_modified = 0, nb_csectors; int i, j, l1_modified = 0, nb_csectors;
int ret; int ret;
@@ -1088,9 +1073,8 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
goto fail; goto fail;
} }
for (i = 0; i < l1_size; i++) { for(i = 0;i < l1_size; i++)
be64_to_cpus(&l1_table[i]); be64_to_cpus(&l1_table[i]);
}
} else { } else {
assert(l1_size == s->l1_size); assert(l1_size == s->l1_size);
l1_table = s->l1_table; l1_table = s->l1_table;
@@ -1119,20 +1103,18 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
for(j = 0; j < s->l2_size; j++) { for(j = 0; j < s->l2_size; j++) {
uint64_t cluster_index; uint64_t cluster_index;
uint64_t offset;
entry = be64_to_cpu(l2_table[j]); offset = be64_to_cpu(l2_table[j]);
old_entry = entry; old_offset = offset;
entry &= ~QCOW_OFLAG_COPIED; offset &= ~QCOW_OFLAG_COPIED;
offset = entry & L2E_OFFSET_MASK;
switch (qcow2_get_cluster_type(entry)) { switch (qcow2_get_cluster_type(offset)) {
case QCOW2_CLUSTER_COMPRESSED: case QCOW2_CLUSTER_COMPRESSED:
nb_csectors = ((entry >> s->csize_shift) & nb_csectors = ((offset >> s->csize_shift) &
s->csize_mask) + 1; s->csize_mask) + 1;
if (addend != 0) { if (addend != 0) {
ret = update_refcount(bs, ret = update_refcount(bs,
(entry & s->cluster_offset_mask) & ~511, (offset & s->cluster_offset_mask) & ~511,
nb_csectors * 512, abs(addend), addend < 0, nb_csectors * 512, abs(addend), addend < 0,
QCOW2_DISCARD_SNAPSHOT); QCOW2_DISCARD_SNAPSHOT);
if (ret < 0) { if (ret < 0) {
@@ -1144,19 +1126,24 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
break; break;
case QCOW2_CLUSTER_NORMAL: case QCOW2_CLUSTER_NORMAL:
case QCOW2_CLUSTER_ZERO_ALLOC: case QCOW2_CLUSTER_ZERO:
if (offset_into_cluster(s, offset)) { if (offset_into_cluster(s, offset & L2E_OFFSET_MASK)) {
qcow2_signal_corruption(bs, true, -1, -1, "Cluster " qcow2_signal_corruption(bs, true, -1, -1, "Data "
"allocation offset %#" PRIx64 "cluster offset %#llx "
"unaligned (L2 offset: %#" "unaligned (L2 offset: %#"
PRIx64 ", L2 index: %#x)", PRIx64 ", L2 index: %#x)",
offset, l2_offset, j); offset & L2E_OFFSET_MASK,
l2_offset, j);
ret = -EIO; ret = -EIO;
goto fail; goto fail;
} }
cluster_index = offset >> s->cluster_bits; cluster_index = (offset & L2E_OFFSET_MASK) >> s->cluster_bits;
assert(cluster_index); if (!cluster_index) {
/* unallocated */
refcount = 0;
break;
}
if (addend != 0) { if (addend != 0) {
ret = qcow2_update_cluster_refcount(bs, ret = qcow2_update_cluster_refcount(bs,
cluster_index, abs(addend), addend < 0, cluster_index, abs(addend), addend < 0,
@@ -1172,7 +1159,6 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
} }
break; break;
case QCOW2_CLUSTER_ZERO_PLAIN:
case QCOW2_CLUSTER_UNALLOCATED: case QCOW2_CLUSTER_UNALLOCATED:
refcount = 0; refcount = 0;
break; break;
@@ -1182,14 +1168,14 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
} }
if (refcount == 1) { if (refcount == 1) {
entry |= QCOW_OFLAG_COPIED; offset |= QCOW_OFLAG_COPIED;
} }
if (entry != old_entry) { if (offset != old_offset) {
if (addend > 0) { if (addend > 0) {
qcow2_cache_set_dependency(bs, s->l2_table_cache, qcow2_cache_set_dependency(bs, s->l2_table_cache,
s->refcount_block_cache); s->refcount_block_cache);
} }
l2_table[j] = cpu_to_be64(entry); l2_table[j] = cpu_to_be64(offset);
qcow2_cache_entry_mark_dirty(bs, s->l2_table_cache, qcow2_cache_entry_mark_dirty(bs, s->l2_table_cache,
l2_table); l2_table);
} }
@@ -1439,7 +1425,12 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
} }
break; break;
case QCOW2_CLUSTER_ZERO_ALLOC: case QCOW2_CLUSTER_ZERO:
if ((l2_entry & L2E_OFFSET_MASK) == 0) {
break;
}
/* fall through */
case QCOW2_CLUSTER_NORMAL: case QCOW2_CLUSTER_NORMAL:
{ {
uint64_t offset = l2_entry & L2E_OFFSET_MASK; uint64_t offset = l2_entry & L2E_OFFSET_MASK;
@@ -1469,7 +1460,6 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
break; break;
} }
case QCOW2_CLUSTER_ZERO_PLAIN:
case QCOW2_CLUSTER_UNALLOCATED: case QCOW2_CLUSTER_UNALLOCATED:
break; break;
@@ -1632,10 +1622,10 @@ static int check_oflag_copied(BlockDriverState *bs, BdrvCheckResult *res,
for (j = 0; j < s->l2_size; j++) { for (j = 0; j < s->l2_size; j++) {
uint64_t l2_entry = be64_to_cpu(l2_table[j]); uint64_t l2_entry = be64_to_cpu(l2_table[j]);
uint64_t data_offset = l2_entry & L2E_OFFSET_MASK; uint64_t data_offset = l2_entry & L2E_OFFSET_MASK;
QCow2ClusterType cluster_type = qcow2_get_cluster_type(l2_entry); int cluster_type = qcow2_get_cluster_type(l2_entry);
if (cluster_type == QCOW2_CLUSTER_NORMAL || if ((cluster_type == QCOW2_CLUSTER_NORMAL) ||
cluster_type == QCOW2_CLUSTER_ZERO_ALLOC) { ((cluster_type == QCOW2_CLUSTER_ZERO) && (data_offset != 0))) {
ret = qcow2_get_refcount(bs, ret = qcow2_get_refcount(bs,
data_offset >> s->cluster_bits, data_offset >> s->cluster_bits,
&refcount); &refcount);
@@ -1722,17 +1712,14 @@ static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
if (fix & BDRV_FIX_ERRORS) { if (fix & BDRV_FIX_ERRORS) {
int64_t new_nb_clusters; int64_t new_nb_clusters;
Error *local_err = NULL;
if (offset > INT64_MAX - s->cluster_size) { if (offset > INT64_MAX - s->cluster_size) {
ret = -EINVAL; ret = -EINVAL;
goto resize_fail; goto resize_fail;
} }
ret = bdrv_truncate(bs->file, offset + s->cluster_size, ret = bdrv_truncate(bs->file->bs, offset + s->cluster_size);
&local_err);
if (ret < 0) { if (ret < 0) {
error_report_err(local_err);
goto resize_fail; goto resize_fail;
} }
size = bdrv_getlength(bs->file->bs); size = bdrv_getlength(bs->file->bs);
@@ -2184,7 +2171,6 @@ write_refblocks:
s->refcount_table = on_disk_reftable; s->refcount_table = on_disk_reftable;
s->refcount_table_offset = reftable_offset; s->refcount_table_offset = reftable_offset;
s->refcount_table_size = reftable_size; s->refcount_table_size = reftable_size;
update_max_refcount_table_index(s);
return 0; return 0;
@@ -2397,11 +2383,7 @@ int qcow2_check_metadata_overlap(BlockDriverState *bs, int ign, int64_t offset,
} }
if ((chk & QCOW2_OL_REFCOUNT_BLOCK) && s->refcount_table) { if ((chk & QCOW2_OL_REFCOUNT_BLOCK) && s->refcount_table) {
unsigned last_entry = s->max_refcount_table_index; for (i = 0; i < s->refcount_table_size; i++) {
assert(last_entry < s->refcount_table_size);
assert(last_entry + 1 == s->refcount_table_size ||
(s->refcount_table[last_entry + 1] & REFT_OFFSET_MASK) == 0);
for (i = 0; i <= last_entry; i++) {
if ((s->refcount_table[i] & REFT_OFFSET_MASK) && if ((s->refcount_table[i] & REFT_OFFSET_MASK) &&
overlaps_with(s->refcount_table[i] & REFT_OFFSET_MASK, overlaps_with(s->refcount_table[i] & REFT_OFFSET_MASK,
s->cluster_size)) { s->cluster_size)) {
@@ -2889,7 +2871,6 @@ int qcow2_change_refcount_order(BlockDriverState *bs, int refcount_order,
/* Now update the rest of the in-memory information */ /* Now update the rest of the in-memory information */
old_reftable = s->refcount_table; old_reftable = s->refcount_table;
s->refcount_table = new_reftable; s->refcount_table = new_reftable;
update_max_refcount_table_index(s);
s->refcount_bits = 1 << refcount_order; s->refcount_bits = 1 << refcount_order;
s->refcount_max = UINT64_C(1) << (s->refcount_bits - 1); s->refcount_max = UINT64_C(1) << (s->refcount_bits - 1);

View File

@@ -440,8 +440,9 @@ int qcow2_snapshot_create(BlockDriverState *bs, QEMUSnapshotInfo *sn_info)
/* The VM state isn't needed any more in the active L1 table; in fact, it /* The VM state isn't needed any more in the active L1 table; in fact, it
* hurts by causing expensive COW for the next snapshot. */ * hurts by causing expensive COW for the next snapshot. */
qcow2_cluster_discard(bs, qcow2_vm_state_offset(s), qcow2_discard_clusters(bs, qcow2_vm_state_offset(s),
align_offset(sn->vm_state_size, s->cluster_size), align_offset(sn->vm_state_size, s->cluster_size)
>> BDRV_SECTOR_BITS,
QCOW2_DISCARD_NEVER, false); QCOW2_DISCARD_NEVER, false);
#ifdef DEBUG_ALLOC #ifdef DEBUG_ALLOC

View File

@@ -668,14 +668,6 @@ static int qcow2_update_options_prepare(BlockDriverState *bs,
r->cache_clean_interval = r->cache_clean_interval =
qemu_opt_get_number(opts, QCOW2_OPT_CACHE_CLEAN_INTERVAL, qemu_opt_get_number(opts, QCOW2_OPT_CACHE_CLEAN_INTERVAL,
s->cache_clean_interval); s->cache_clean_interval);
#ifndef CONFIG_LINUX
if (r->cache_clean_interval != 0) {
error_setg(errp, QCOW2_OPT_CACHE_CLEAN_INTERVAL
" not supported on this host");
ret = -EINVAL;
goto fail;
}
#endif
if (r->cache_clean_interval > UINT_MAX) { if (r->cache_clean_interval > UINT_MAX) {
error_setg(errp, "Cache clean interval too big"); error_setg(errp, "Cache clean interval too big");
ret = -EINVAL; ret = -EINVAL;
@@ -814,7 +806,7 @@ static int qcow2_update_options(BlockDriverState *bs, QDict *options,
return ret; return ret;
} }
static int qcow2_do_open(BlockDriverState *bs, QDict *options, int flags, static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
Error **errp) Error **errp)
{ {
BDRVQcow2State *s = bs->opaque; BDRVQcow2State *s = bs->opaque;
@@ -967,8 +959,7 @@ static int qcow2_do_open(BlockDriverState *bs, QDict *options, int flags,
ret = -EINVAL; ret = -EINVAL;
goto fail; goto fail;
} }
if (!qcrypto_cipher_supports(QCRYPTO_CIPHER_ALG_AES_128, if (!qcrypto_cipher_supports(QCRYPTO_CIPHER_ALG_AES_128)) {
QCRYPTO_CIPHER_MODE_CBC)) {
error_setg(errp, "AES cipher not available"); error_setg(errp, "AES cipher not available");
ret = -EINVAL; ret = -EINVAL;
goto fail; goto fail;
@@ -1163,7 +1154,6 @@ static int qcow2_do_open(BlockDriverState *bs, QDict *options, int flags,
/* Initialise locks */ /* Initialise locks */
qemu_co_mutex_init(&s->lock); qemu_co_mutex_init(&s->lock);
bs->supported_zero_flags = BDRV_REQ_MAY_UNMAP;
/* Repair image if dirty */ /* Repair image if dirty */
if (!(flags & (BDRV_O_CHECK | BDRV_O_INACTIVE)) && !bs->read_only && if (!(flags & (BDRV_O_CHECK | BDRV_O_INACTIVE)) && !bs->read_only &&
@@ -1205,18 +1195,6 @@ static int qcow2_do_open(BlockDriverState *bs, QDict *options, int flags,
return ret; return ret;
} }
static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
Error **errp)
{
bs->file = bdrv_open_child(NULL, options, "file", bs, &child_file,
false, errp);
if (!bs->file) {
return -EINVAL;
}
return qcow2_do_open(bs, options, flags, errp);
}
static void qcow2_refresh_limits(BlockDriverState *bs, Error **errp) static void qcow2_refresh_limits(BlockDriverState *bs, Error **errp)
{ {
BDRVQcow2State *s = bs->opaque; BDRVQcow2State *s = bs->opaque;
@@ -1226,7 +1204,6 @@ static void qcow2_refresh_limits(BlockDriverState *bs, Error **errp)
bs->bl.request_alignment = BDRV_SECTOR_SIZE; bs->bl.request_alignment = BDRV_SECTOR_SIZE;
} }
bs->bl.pwrite_zeroes_alignment = s->cluster_size; bs->bl.pwrite_zeroes_alignment = s->cluster_size;
bs->bl.pdiscard_alignment = s->cluster_size;
} }
static int qcow2_set_key(BlockDriverState *bs, const char *key) static int qcow2_set_key(BlockDriverState *bs, const char *key)
@@ -1385,7 +1362,7 @@ static int64_t coroutine_fn qcow2_co_get_block_status(BlockDriverState *bs,
*file = bs->file->bs; *file = bs->file->bs;
status |= BDRV_BLOCK_OFFSET_VALID | cluster_offset; status |= BDRV_BLOCK_OFFSET_VALID | cluster_offset;
} }
if (ret == QCOW2_CLUSTER_ZERO_PLAIN || ret == QCOW2_CLUSTER_ZERO_ALLOC) { if (ret == QCOW2_CLUSTER_ZERO) {
status |= BDRV_BLOCK_ZERO; status |= BDRV_BLOCK_ZERO;
} else if (ret != QCOW2_CLUSTER_UNALLOCATED) { } else if (ret != QCOW2_CLUSTER_UNALLOCATED) {
status |= BDRV_BLOCK_DATA; status |= BDRV_BLOCK_DATA;
@@ -1482,8 +1459,7 @@ static coroutine_fn int qcow2_co_preadv(BlockDriverState *bs, uint64_t offset,
} }
break; break;
case QCOW2_CLUSTER_ZERO_PLAIN: case QCOW2_CLUSTER_ZERO:
case QCOW2_CLUSTER_ZERO_ALLOC:
qemu_iovec_memset(&hd_qiov, 0, 0, cur_bytes); qemu_iovec_memset(&hd_qiov, 0, 0, cur_bytes);
break; break;
@@ -1798,7 +1774,7 @@ static void qcow2_invalidate_cache(BlockDriverState *bs, Error **errp)
options = qdict_clone_shallow(bs->options); options = qdict_clone_shallow(bs->options);
flags &= ~BDRV_O_INACTIVE; flags &= ~BDRV_O_INACTIVE;
ret = qcow2_do_open(bs, options, flags, &local_err); ret = qcow2_open(bs, options, flags, &local_err);
QDECREF(options); QDECREF(options);
if (local_err) { if (local_err) {
error_propagate(errp, local_err); error_propagate(errp, local_err);
@@ -1828,10 +1804,7 @@ static size_t header_ext_add(char *buf, uint32_t magic, const void *s,
.magic = cpu_to_be32(magic), .magic = cpu_to_be32(magic),
.len = cpu_to_be32(len), .len = cpu_to_be32(len),
}; };
if (len) {
memcpy(buf + sizeof(QCowExtension), s, len); memcpy(buf + sizeof(QCowExtension), s, len);
}
return ext_len; return ext_len;
} }
@@ -2140,7 +2113,7 @@ static int qcow2_create2(const char *filename, int64_t total_size,
* too, as long as the bulk is allocated here). Therefore, using * too, as long as the bulk is allocated here). Therefore, using
* floating point arithmetic is fine. */ * floating point arithmetic is fine. */
int64_t meta_size = 0; int64_t meta_size = 0;
uint64_t nreftablee, nrefblocke, nl1e, nl2e, refblock_count; uint64_t nreftablee, nrefblocke, nl1e, nl2e;
int64_t aligned_total_size = align_offset(total_size, cluster_size); int64_t aligned_total_size = align_offset(total_size, cluster_size);
int refblock_bits, refblock_size; int refblock_bits, refblock_size;
/* refcount entry size in bytes */ /* refcount entry size in bytes */
@@ -2183,12 +2156,11 @@ static int qcow2_create2(const char *filename, int64_t total_size,
nrefblocke = (aligned_total_size + meta_size + cluster_size) nrefblocke = (aligned_total_size + meta_size + cluster_size)
/ (cluster_size - rces - rces * sizeof(uint64_t) / (cluster_size - rces - rces * sizeof(uint64_t)
/ cluster_size); / cluster_size);
refblock_count = DIV_ROUND_UP(nrefblocke, refblock_size); meta_size += DIV_ROUND_UP(nrefblocke, refblock_size) * cluster_size;
meta_size += refblock_count * cluster_size;
/* total size of refcount tables */ /* total size of refcount tables */
nreftablee = align_offset(refblock_count, nreftablee = nrefblocke / refblock_size;
cluster_size / sizeof(uint64_t)); nreftablee = align_offset(nreftablee, cluster_size / sizeof(uint64_t));
meta_size += nreftablee * sizeof(uint64_t); meta_size += nreftablee * sizeof(uint64_t);
qemu_opt_set_number(opts, BLOCK_OPT_SIZE, qemu_opt_set_number(opts, BLOCK_OPT_SIZE,
@@ -2204,8 +2176,7 @@ static int qcow2_create2(const char *filename, int64_t total_size,
} }
blk = blk_new_open(filename, NULL, NULL, blk = blk_new_open(filename, NULL, NULL,
BDRV_O_RDWR | BDRV_O_RESIZE | BDRV_O_PROTOCOL, BDRV_O_RDWR | BDRV_O_PROTOCOL, &local_err);
&local_err);
if (blk == NULL) { if (blk == NULL) {
error_propagate(errp, local_err); error_propagate(errp, local_err);
return -EIO; return -EIO;
@@ -2267,10 +2238,9 @@ static int qcow2_create2(const char *filename, int64_t total_size,
* table) * table)
*/ */
options = qdict_new(); options = qdict_new();
qdict_put_str(options, "driver", "qcow2"); qdict_put(options, "driver", qstring_from_str("qcow2"));
blk = blk_new_open(filename, NULL, options, blk = blk_new_open(filename, NULL, options,
BDRV_O_RDWR | BDRV_O_RESIZE | BDRV_O_NO_FLUSH, BDRV_O_RDWR | BDRV_O_NO_FLUSH, &local_err);
&local_err);
if (blk == NULL) { if (blk == NULL) {
error_propagate(errp, local_err); error_propagate(errp, local_err);
ret = -EIO; ret = -EIO;
@@ -2296,9 +2266,9 @@ static int qcow2_create2(const char *filename, int64_t total_size,
} }
/* Okay, now that we have a valid image, let's give it the right size */ /* Okay, now that we have a valid image, let's give it the right size */
ret = blk_truncate(blk, total_size, errp); ret = blk_truncate(blk, total_size);
if (ret < 0) { if (ret < 0) {
error_prepend(errp, "Could not resize image: "); error_setg_errno(errp, -ret, "Could not resize image");
goto out; goto out;
} }
@@ -2329,7 +2299,7 @@ static int qcow2_create2(const char *filename, int64_t total_size,
/* Reopen the image without BDRV_O_NO_FLUSH to flush it before returning */ /* Reopen the image without BDRV_O_NO_FLUSH to flush it before returning */
options = qdict_new(); options = qdict_new();
qdict_put_str(options, "driver", "qcow2"); qdict_put(options, "driver", qstring_from_str("qcow2"));
blk = blk_new_open(filename, NULL, options, blk = blk_new_open(filename, NULL, options,
BDRV_O_RDWR | BDRV_O_NO_BACKING, &local_err); BDRV_O_RDWR | BDRV_O_NO_BACKING, &local_err);
if (blk == NULL) { if (blk == NULL) {
@@ -2451,10 +2421,6 @@ static bool is_zero_sectors(BlockDriverState *bs, int64_t start,
BlockDriverState *file; BlockDriverState *file;
int64_t res; int64_t res;
if (start + count > bs->total_sectors) {
count = bs->total_sectors - start;
}
if (!count) { if (!count) {
return true; return true;
} }
@@ -2473,9 +2439,6 @@ static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
uint32_t tail = (offset + count) % s->cluster_size; uint32_t tail = (offset + count) % s->cluster_size;
trace_qcow2_pwrite_zeroes_start_req(qemu_coroutine_self(), offset, count); trace_qcow2_pwrite_zeroes_start_req(qemu_coroutine_self(), offset, count);
if (offset + count == bs->total_sectors * BDRV_SECTOR_SIZE) {
tail = 0;
}
if (head || tail) { if (head || tail) {
int64_t cl_start = (offset - head) >> BDRV_SECTOR_BITS; int64_t cl_start = (offset - head) >> BDRV_SECTOR_BITS;
@@ -2499,9 +2462,7 @@ static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
count = s->cluster_size; count = s->cluster_size;
nr = s->cluster_size; nr = s->cluster_size;
ret = qcow2_get_cluster_offset(bs, offset, &nr, &off); ret = qcow2_get_cluster_offset(bs, offset, &nr, &off);
if (ret != QCOW2_CLUSTER_UNALLOCATED && if (ret != QCOW2_CLUSTER_UNALLOCATED && ret != QCOW2_CLUSTER_ZERO) {
ret != QCOW2_CLUSTER_ZERO_PLAIN &&
ret != QCOW2_CLUSTER_ZERO_ALLOC) {
qemu_co_mutex_unlock(&s->lock); qemu_co_mutex_unlock(&s->lock);
return -ENOTSUP; return -ENOTSUP;
} }
@@ -2512,7 +2473,7 @@ static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
trace_qcow2_pwrite_zeroes(qemu_coroutine_self(), offset, count); trace_qcow2_pwrite_zeroes(qemu_coroutine_self(), offset, count);
/* Whatever is left can use real zero clusters */ /* Whatever is left can use real zero clusters */
ret = qcow2_cluster_zeroize(bs, offset, count, flags); ret = qcow2_zero_clusters(bs, offset, count >> BDRV_SECTOR_BITS);
qemu_co_mutex_unlock(&s->lock); qemu_co_mutex_unlock(&s->lock);
return ret; return ret;
@@ -2524,50 +2485,39 @@ static coroutine_fn int qcow2_co_pdiscard(BlockDriverState *bs,
int ret; int ret;
BDRVQcow2State *s = bs->opaque; BDRVQcow2State *s = bs->opaque;
if (!QEMU_IS_ALIGNED(offset | count, s->cluster_size)) {
assert(count < s->cluster_size);
/* Ignore partial clusters, except for the special case of the
* complete partial cluster at the end of an unaligned file */
if (!QEMU_IS_ALIGNED(offset, s->cluster_size) ||
offset + count != bs->total_sectors * BDRV_SECTOR_SIZE) {
return -ENOTSUP;
}
}
qemu_co_mutex_lock(&s->lock); qemu_co_mutex_lock(&s->lock);
ret = qcow2_cluster_discard(bs, offset, count, QCOW2_DISCARD_REQUEST, ret = qcow2_discard_clusters(bs, offset, count >> BDRV_SECTOR_BITS,
false); QCOW2_DISCARD_REQUEST, false);
qemu_co_mutex_unlock(&s->lock); qemu_co_mutex_unlock(&s->lock);
return ret; return ret;
} }
static int qcow2_truncate(BlockDriverState *bs, int64_t offset, Error **errp) static int qcow2_truncate(BlockDriverState *bs, int64_t offset)
{ {
BDRVQcow2State *s = bs->opaque; BDRVQcow2State *s = bs->opaque;
int64_t new_l1_size; int64_t new_l1_size;
int ret; int ret;
if (offset & 511) { if (offset & 511) {
error_setg(errp, "The new size must be a multiple of 512"); error_report("The new size must be a multiple of 512");
return -EINVAL; return -EINVAL;
} }
/* cannot proceed if image has snapshots */ /* cannot proceed if image has snapshots */
if (s->nb_snapshots) { if (s->nb_snapshots) {
error_setg(errp, "Can't resize an image which has snapshots"); error_report("Can't resize an image which has snapshots");
return -ENOTSUP; return -ENOTSUP;
} }
/* shrinking is currently not supported */ /* shrinking is currently not supported */
if (offset < bs->total_sectors * 512) { if (offset < bs->total_sectors * 512) {
error_setg(errp, "qcow2 doesn't support shrinking images yet"); error_report("qcow2 doesn't support shrinking images yet");
return -ENOTSUP; return -ENOTSUP;
} }
new_l1_size = size_to_l1(s, offset); new_l1_size = size_to_l1(s, offset);
ret = qcow2_grow_l1_table(bs, new_l1_size, true); ret = qcow2_grow_l1_table(bs, new_l1_size, true);
if (ret < 0) { if (ret < 0) {
error_setg_errno(errp, -ret, "Failed to grow the L1 table");
return ret; return ret;
} }
@@ -2576,7 +2526,6 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t offset, Error **errp)
ret = bdrv_pwrite_sync(bs->file, offsetof(QCowHeader, size), ret = bdrv_pwrite_sync(bs->file, offsetof(QCowHeader, size),
&offset, sizeof(uint64_t)); &offset, sizeof(uint64_t));
if (ret < 0) { if (ret < 0) {
error_setg_errno(errp, -ret, "Failed to update the image size");
return ret; return ret;
} }
@@ -2584,39 +2533,84 @@ static int qcow2_truncate(BlockDriverState *bs, int64_t offset, Error **errp)
return 0; return 0;
} }
typedef struct Qcow2WriteCo {
BlockDriverState *bs;
int64_t sector_num;
const uint8_t *buf;
int nb_sectors;
int ret;
} Qcow2WriteCo;
static void qcow2_write_co_entry(void *opaque)
{
Qcow2WriteCo *co = opaque;
QEMUIOVector qiov;
uint64_t offset = co->sector_num * BDRV_SECTOR_SIZE;
uint64_t bytes = co->nb_sectors * BDRV_SECTOR_SIZE;
struct iovec iov = (struct iovec) {
.iov_base = (uint8_t*) co->buf,
.iov_len = bytes,
};
qemu_iovec_init_external(&qiov, &iov, 1);
co->ret = qcow2_co_pwritev(co->bs, offset, bytes, &qiov, 0);
}
/* Wrapper for non-coroutine contexts */
static int qcow2_write(BlockDriverState *bs, int64_t sector_num,
const uint8_t *buf, int nb_sectors)
{
Coroutine *co;
AioContext *aio_context = bdrv_get_aio_context(bs);
Qcow2WriteCo data = {
.bs = bs,
.sector_num = sector_num,
.buf = buf,
.nb_sectors = nb_sectors,
.ret = -EINPROGRESS,
};
co = qemu_coroutine_create(qcow2_write_co_entry, &data);
qemu_coroutine_enter(co);
while (data.ret == -EINPROGRESS) {
aio_poll(aio_context, true);
}
return data.ret;
}
/* XXX: put compressed sectors first, then all the cluster aligned /* XXX: put compressed sectors first, then all the cluster aligned
tables to avoid losing bytes in alignment */ tables to avoid losing bytes in alignment */
static coroutine_fn int static int qcow2_write_compressed(BlockDriverState *bs, int64_t sector_num,
qcow2_co_pwritev_compressed(BlockDriverState *bs, uint64_t offset, const uint8_t *buf, int nb_sectors)
uint64_t bytes, QEMUIOVector *qiov)
{ {
BDRVQcow2State *s = bs->opaque; BDRVQcow2State *s = bs->opaque;
QEMUIOVector hd_qiov;
struct iovec iov;
z_stream strm; z_stream strm;
int ret, out_len; int ret, out_len;
uint8_t *buf, *out_buf; uint8_t *out_buf;
uint64_t cluster_offset; uint64_t cluster_offset;
if (bytes == 0) { if (nb_sectors == 0) {
/* align end of file to a sector boundary to ease reading with /* align end of file to a sector boundary to ease reading with
sector based I/Os */ sector based I/Os */
cluster_offset = bdrv_getlength(bs->file->bs); cluster_offset = bdrv_getlength(bs->file->bs);
return bdrv_truncate(bs->file, cluster_offset, NULL); return bdrv_truncate(bs->file->bs, cluster_offset);
} }
buf = qemu_blockalign(bs, s->cluster_size); if (nb_sectors != s->cluster_sectors) {
if (bytes != s->cluster_size) { ret = -EINVAL;
if (bytes > s->cluster_size ||
offset + bytes != bs->total_sectors << BDRV_SECTOR_BITS)
{
qemu_vfree(buf);
return -EINVAL;
}
/* Zero-pad last write if image size is not cluster aligned */ /* Zero-pad last write if image size is not cluster aligned */
memset(buf + bytes, 0, s->cluster_size - bytes); if (sector_num + nb_sectors == bs->total_sectors &&
nb_sectors < s->cluster_sectors) {
uint8_t *pad_buf = qemu_blockalign(bs, s->cluster_size);
memset(pad_buf, 0, s->cluster_size);
memcpy(pad_buf, buf, nb_sectors * BDRV_SECTOR_SIZE);
ret = qcow2_write_compressed(bs, sector_num,
pad_buf, s->cluster_sectors);
qemu_vfree(pad_buf);
}
return ret;
} }
qemu_iovec_to_buf(qiov, 0, buf, bytes);
out_buf = g_malloc(s->cluster_size); out_buf = g_malloc(s->cluster_size);
@@ -2647,44 +2641,33 @@ qcow2_co_pwritev_compressed(BlockDriverState *bs, uint64_t offset,
if (ret != Z_STREAM_END || out_len >= s->cluster_size) { if (ret != Z_STREAM_END || out_len >= s->cluster_size) {
/* could not compress: write normal cluster */ /* could not compress: write normal cluster */
ret = qcow2_co_pwritev(bs, offset, bytes, qiov, 0); ret = qcow2_write(bs, sector_num, buf, s->cluster_sectors);
if (ret < 0) { if (ret < 0) {
goto fail; goto fail;
} }
goto success; } else {
} cluster_offset = qcow2_alloc_compressed_cluster_offset(bs,
sector_num << 9, out_len);
qemu_co_mutex_lock(&s->lock);
cluster_offset =
qcow2_alloc_compressed_cluster_offset(bs, offset, out_len);
if (!cluster_offset) { if (!cluster_offset) {
qemu_co_mutex_unlock(&s->lock);
ret = -EIO; ret = -EIO;
goto fail; goto fail;
} }
cluster_offset &= s->cluster_offset_mask; cluster_offset &= s->cluster_offset_mask;
ret = qcow2_pre_write_overlap_check(bs, 0, cluster_offset, out_len); ret = qcow2_pre_write_overlap_check(bs, 0, cluster_offset, out_len);
qemu_co_mutex_unlock(&s->lock);
if (ret < 0) { if (ret < 0) {
goto fail; goto fail;
} }
iov = (struct iovec) {
.iov_base = out_buf,
.iov_len = out_len,
};
qemu_iovec_init_external(&hd_qiov, &iov, 1);
BLKDBG_EVENT(bs->file, BLKDBG_WRITE_COMPRESSED); BLKDBG_EVENT(bs->file, BLKDBG_WRITE_COMPRESSED);
ret = bdrv_co_pwritev(bs->file, cluster_offset, out_len, &hd_qiov, 0); ret = bdrv_pwrite(bs->file, cluster_offset, out_buf, out_len);
if (ret < 0) { if (ret < 0) {
goto fail; goto fail;
} }
success: }
ret = 0; ret = 0;
fail: fail:
qemu_vfree(buf);
g_free(out_buf); g_free(out_buf);
return ret; return ret;
} }
@@ -2692,7 +2675,6 @@ fail:
static int make_completely_empty(BlockDriverState *bs) static int make_completely_empty(BlockDriverState *bs)
{ {
BDRVQcow2State *s = bs->opaque; BDRVQcow2State *s = bs->opaque;
Error *local_err = NULL;
int ret, l1_clusters; int ret, l1_clusters;
int64_t offset; int64_t offset;
uint64_t *new_reftable = NULL; uint64_t *new_reftable = NULL;
@@ -2776,7 +2758,6 @@ static int make_completely_empty(BlockDriverState *bs)
s->refcount_table_offset = s->cluster_size; s->refcount_table_offset = s->cluster_size;
s->refcount_table_size = s->cluster_size / sizeof(uint64_t); s->refcount_table_size = s->cluster_size / sizeof(uint64_t);
s->max_refcount_table_index = 0;
g_free(s->refcount_table); g_free(s->refcount_table);
s->refcount_table = new_reftable; s->refcount_table = new_reftable;
@@ -2817,10 +2798,8 @@ static int make_completely_empty(BlockDriverState *bs)
goto fail; goto fail;
} }
ret = bdrv_truncate(bs->file, (3 + l1_clusters) * s->cluster_size, ret = bdrv_truncate(bs->file->bs, (3 + l1_clusters) * s->cluster_size);
&local_err);
if (ret < 0) { if (ret < 0) {
error_report_err(local_err);
goto fail; goto fail;
} }
@@ -2843,8 +2822,8 @@ fail:
static int qcow2_make_empty(BlockDriverState *bs) static int qcow2_make_empty(BlockDriverState *bs)
{ {
BDRVQcow2State *s = bs->opaque; BDRVQcow2State *s = bs->opaque;
uint64_t offset, end_offset; uint64_t start_sector;
int step = QEMU_ALIGN_DOWN(INT_MAX, s->cluster_size); int sector_step = INT_MAX / BDRV_SECTOR_SIZE;
int l1_clusters, ret = 0; int l1_clusters, ret = 0;
l1_clusters = DIV_ROUND_UP(s->l1_size, s->cluster_size / sizeof(uint64_t)); l1_clusters = DIV_ROUND_UP(s->l1_size, s->cluster_size / sizeof(uint64_t));
@@ -2861,14 +2840,17 @@ static int qcow2_make_empty(BlockDriverState *bs)
/* This fallback code simply discards every active cluster; this is slow, /* This fallback code simply discards every active cluster; this is slow,
* but works in all cases */ * but works in all cases */
end_offset = bs->total_sectors * BDRV_SECTOR_SIZE; for (start_sector = 0; start_sector < bs->total_sectors;
for (offset = 0; offset < end_offset; offset += step) { start_sector += sector_step)
{
/* As this function is generally used after committing an external /* As this function is generally used after committing an external
* snapshot, QCOW2_DISCARD_SNAPSHOT seems appropriate. Also, the * snapshot, QCOW2_DISCARD_SNAPSHOT seems appropriate. Also, the
* default action for this kind of discard is to pass the discard, * default action for this kind of discard is to pass the discard,
* which will ideally result in an actually smaller image file, as * which will ideally result in an actually smaller image file, as
* is probably desired. */ * is probably desired. */
ret = qcow2_cluster_discard(bs, offset, MIN(step, end_offset - offset), ret = qcow2_discard_clusters(bs, start_sector * BDRV_SECTOR_SIZE,
MIN(sector_step,
bs->total_sectors - start_sector),
QCOW2_DISCARD_SNAPSHOT, true); QCOW2_DISCARD_SNAPSHOT, true);
if (ret < 0) { if (ret < 0) {
break; break;
@@ -3132,7 +3114,6 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
uint64_t cluster_size = s->cluster_size; uint64_t cluster_size = s->cluster_size;
bool encrypt; bool encrypt;
int refcount_bits = s->refcount_bits; int refcount_bits = s->refcount_bits;
Error *local_err = NULL;
int ret; int ret;
QemuOptDesc *desc = opts->list->desc; QemuOptDesc *desc = opts->list->desc;
Qcow2AmendHelperCBInfo helper_cb_info; Qcow2AmendHelperCBInfo helper_cb_info;
@@ -3222,6 +3203,7 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
if (s->refcount_bits != refcount_bits) { if (s->refcount_bits != refcount_bits) {
int refcount_order = ctz32(refcount_bits); int refcount_order = ctz32(refcount_bits);
Error *local_error = NULL;
if (new_version < 3 && refcount_bits != 16) { if (new_version < 3 && refcount_bits != 16) {
error_report("Different refcount widths than 16 bits require " error_report("Different refcount widths than 16 bits require "
@@ -3233,9 +3215,9 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
helper_cb_info.current_operation = QCOW2_CHANGING_REFCOUNT_ORDER; helper_cb_info.current_operation = QCOW2_CHANGING_REFCOUNT_ORDER;
ret = qcow2_change_refcount_order(bs, refcount_order, ret = qcow2_change_refcount_order(bs, refcount_order,
&qcow2_amend_helper_cb, &qcow2_amend_helper_cb,
&helper_cb_info, &local_err); &helper_cb_info, &local_error);
if (ret < 0) { if (ret < 0) {
error_report_err(local_err); error_report_err(local_error);
return ret; return ret;
} }
} }
@@ -3281,18 +3263,8 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
} }
if (new_size) { if (new_size) {
BlockBackend *blk = blk_new(BLK_PERM_RESIZE, BLK_PERM_ALL); ret = bdrv_truncate(bs, new_size);
ret = blk_insert_bs(blk, bs, &local_err);
if (ret < 0) { if (ret < 0) {
error_report_err(local_err);
blk_unref(blk);
return ret;
}
ret = blk_truncate(blk, new_size, &local_err);
blk_unref(blk);
if (ret < 0) {
error_report_err(local_err);
return ret; return ret;
} }
} }
@@ -3428,7 +3400,6 @@ BlockDriver bdrv_qcow2 = {
.bdrv_reopen_commit = qcow2_reopen_commit, .bdrv_reopen_commit = qcow2_reopen_commit,
.bdrv_reopen_abort = qcow2_reopen_abort, .bdrv_reopen_abort = qcow2_reopen_abort,
.bdrv_join_options = qcow2_join_options, .bdrv_join_options = qcow2_join_options,
.bdrv_child_perm = bdrv_format_default_perms,
.bdrv_create = qcow2_create, .bdrv_create = qcow2_create,
.bdrv_has_zero_init = bdrv_has_zero_init_1, .bdrv_has_zero_init = bdrv_has_zero_init_1,
.bdrv_co_get_block_status = qcow2_co_get_block_status, .bdrv_co_get_block_status = qcow2_co_get_block_status,
@@ -3441,7 +3412,7 @@ BlockDriver bdrv_qcow2 = {
.bdrv_co_pwrite_zeroes = qcow2_co_pwrite_zeroes, .bdrv_co_pwrite_zeroes = qcow2_co_pwrite_zeroes,
.bdrv_co_pdiscard = qcow2_co_pdiscard, .bdrv_co_pdiscard = qcow2_co_pdiscard,
.bdrv_truncate = qcow2_truncate, .bdrv_truncate = qcow2_truncate,
.bdrv_co_pwritev_compressed = qcow2_co_pwritev_compressed, .bdrv_write_compressed = qcow2_write_compressed,
.bdrv_make_empty = qcow2_make_empty, .bdrv_make_empty = qcow2_make_empty,
.bdrv_snapshot_create = qcow2_snapshot_create, .bdrv_snapshot_create = qcow2_snapshot_create,

View File

@@ -251,7 +251,6 @@ typedef struct BDRVQcow2State {
uint64_t *refcount_table; uint64_t *refcount_table;
uint64_t refcount_table_offset; uint64_t refcount_table_offset;
uint32_t refcount_table_size; uint32_t refcount_table_size;
uint32_t max_refcount_table_index; /* Last used entry in refcount_table */
uint64_t free_cluster_index; uint64_t free_cluster_index;
uint64_t free_byte_offset; uint64_t free_byte_offset;
@@ -322,9 +321,6 @@ typedef struct QCowL2Meta
/** Number of newly allocated clusters */ /** Number of newly allocated clusters */
int nb_clusters; int nb_clusters;
/** Do not free the old clusters */
bool keep_old_clusters;
/** /**
* Requests that overlap with this allocation and wait to be restarted * Requests that overlap with this allocation and wait to be restarted
* when the allocating request has completed. * when the allocating request has completed.
@@ -349,13 +345,12 @@ typedef struct QCowL2Meta
QLIST_ENTRY(QCowL2Meta) next_in_flight; QLIST_ENTRY(QCowL2Meta) next_in_flight;
} QCowL2Meta; } QCowL2Meta;
typedef enum QCow2ClusterType { enum {
QCOW2_CLUSTER_UNALLOCATED, QCOW2_CLUSTER_UNALLOCATED,
QCOW2_CLUSTER_ZERO_PLAIN,
QCOW2_CLUSTER_ZERO_ALLOC,
QCOW2_CLUSTER_NORMAL, QCOW2_CLUSTER_NORMAL,
QCOW2_CLUSTER_COMPRESSED, QCOW2_CLUSTER_COMPRESSED,
} QCow2ClusterType; QCOW2_CLUSTER_ZERO
};
typedef enum QCow2MetadataOverlap { typedef enum QCow2MetadataOverlap {
QCOW2_OL_MAIN_HEADER_BITNR = 0, QCOW2_OL_MAIN_HEADER_BITNR = 0,
@@ -444,15 +439,12 @@ static inline uint64_t qcow2_max_refcount_clusters(BDRVQcow2State *s)
return QCOW_MAX_REFTABLE_SIZE >> s->cluster_bits; return QCOW_MAX_REFTABLE_SIZE >> s->cluster_bits;
} }
static inline QCow2ClusterType qcow2_get_cluster_type(uint64_t l2_entry) static inline int qcow2_get_cluster_type(uint64_t l2_entry)
{ {
if (l2_entry & QCOW_OFLAG_COMPRESSED) { if (l2_entry & QCOW_OFLAG_COMPRESSED) {
return QCOW2_CLUSTER_COMPRESSED; return QCOW2_CLUSTER_COMPRESSED;
} else if (l2_entry & QCOW_OFLAG_ZERO) { } else if (l2_entry & QCOW_OFLAG_ZERO) {
if (l2_entry & L2E_OFFSET_MASK) { return QCOW2_CLUSTER_ZERO;
return QCOW2_CLUSTER_ZERO_ALLOC;
}
return QCOW2_CLUSTER_ZERO_PLAIN;
} else if (!(l2_entry & L2E_OFFSET_MASK)) { } else if (!(l2_entry & L2E_OFFSET_MASK)) {
return QCOW2_CLUSTER_UNALLOCATED; return QCOW2_CLUSTER_UNALLOCATED;
} else { } else {
@@ -481,6 +473,8 @@ static inline uint64_t refcount_diff(uint64_t r1, uint64_t r2)
return r1 > r2 ? r1 - r2 : r2 - r1; return r1 > r2 ? r1 - r2 : r2 - r1;
} }
// FIXME Need qcow2_ prefix to global functions
/* qcow2.c functions */ /* qcow2.c functions */
int qcow2_backing_read1(BlockDriverState *bs, QEMUIOVector *qiov, int qcow2_backing_read1(BlockDriverState *bs, QEMUIOVector *qiov,
int64_t sector_num, int nb_sectors); int64_t sector_num, int nb_sectors);
@@ -536,6 +530,7 @@ int qcow2_change_refcount_order(BlockDriverState *bs, int refcount_order,
int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size, int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
bool exact_size); bool exact_size);
int qcow2_write_l1_entry(BlockDriverState *bs, int l1_index); int qcow2_write_l1_entry(BlockDriverState *bs, int l1_index);
void qcow2_l2_cache_reset(BlockDriverState *bs);
int qcow2_decompress_cluster(BlockDriverState *bs, uint64_t cluster_offset); int qcow2_decompress_cluster(BlockDriverState *bs, uint64_t cluster_offset);
int qcow2_encrypt_sectors(BDRVQcow2State *s, int64_t sector_num, int qcow2_encrypt_sectors(BDRVQcow2State *s, int64_t sector_num,
uint8_t *out_buf, const uint8_t *in_buf, uint8_t *out_buf, const uint8_t *in_buf,
@@ -551,11 +546,9 @@ uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
int compressed_size); int compressed_size);
int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m); int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m);
int qcow2_cluster_discard(BlockDriverState *bs, uint64_t offset, int qcow2_discard_clusters(BlockDriverState *bs, uint64_t offset,
uint64_t bytes, enum qcow2_discard_type type, int nb_sectors, enum qcow2_discard_type type, bool full_discard);
bool full_discard); int qcow2_zero_clusters(BlockDriverState *bs, uint64_t offset, int nb_sectors);
int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t offset,
uint64_t bytes, int flags);
int qcow2_expand_zero_clusters(BlockDriverState *bs, int qcow2_expand_zero_clusters(BlockDriverState *bs,
BlockDriverAmendStatusCB *status_cb, BlockDriverAmendStatusCB *status_cb,

View File

@@ -83,7 +83,6 @@ static void qed_find_cluster_cb(void *opaque, int ret)
unsigned int index; unsigned int index;
unsigned int n; unsigned int n;
qed_acquire(s);
if (ret) { if (ret) {
goto out; goto out;
} }
@@ -110,7 +109,6 @@ static void qed_find_cluster_cb(void *opaque, int ret)
out: out:
find_cluster_cb->cb(find_cluster_cb->opaque, ret, offset, len); find_cluster_cb->cb(find_cluster_cb->opaque, ret, offset, len);
qed_release(s);
g_free(find_cluster_cb); g_free(find_cluster_cb);
} }

View File

@@ -31,7 +31,6 @@ static void qed_read_table_cb(void *opaque, int ret)
{ {
QEDReadTableCB *read_table_cb = opaque; QEDReadTableCB *read_table_cb = opaque;
QEDTable *table = read_table_cb->table; QEDTable *table = read_table_cb->table;
BDRVQEDState *s = read_table_cb->s;
int noffsets = read_table_cb->qiov.size / sizeof(uint64_t); int noffsets = read_table_cb->qiov.size / sizeof(uint64_t);
int i; int i;
@@ -41,15 +40,13 @@ static void qed_read_table_cb(void *opaque, int ret)
} }
/* Byteswap offsets */ /* Byteswap offsets */
qed_acquire(s);
for (i = 0; i < noffsets; i++) { for (i = 0; i < noffsets; i++) {
table->offsets[i] = le64_to_cpu(table->offsets[i]); table->offsets[i] = le64_to_cpu(table->offsets[i]);
} }
qed_release(s);
out: out:
/* Completion */ /* Completion */
trace_qed_read_table_cb(s, read_table_cb->table, ret); trace_qed_read_table_cb(read_table_cb->s, read_table_cb->table, ret);
gencb_complete(&read_table_cb->gencb, ret); gencb_complete(&read_table_cb->gencb, ret);
} }
@@ -87,9 +84,8 @@ typedef struct {
static void qed_write_table_cb(void *opaque, int ret) static void qed_write_table_cb(void *opaque, int ret)
{ {
QEDWriteTableCB *write_table_cb = opaque; QEDWriteTableCB *write_table_cb = opaque;
BDRVQEDState *s = write_table_cb->s;
trace_qed_write_table_cb(s, trace_qed_write_table_cb(write_table_cb->s,
write_table_cb->orig_table, write_table_cb->orig_table,
write_table_cb->flush, write_table_cb->flush,
ret); ret);
@@ -101,10 +97,8 @@ static void qed_write_table_cb(void *opaque, int ret)
if (write_table_cb->flush) { if (write_table_cb->flush) {
/* We still need to flush first */ /* We still need to flush first */
write_table_cb->flush = false; write_table_cb->flush = false;
qed_acquire(s);
bdrv_aio_flush(write_table_cb->s->bs, qed_write_table_cb, bdrv_aio_flush(write_table_cb->s->bs, qed_write_table_cb,
write_table_cb); write_table_cb);
qed_release(s);
return; return;
} }
@@ -180,7 +174,9 @@ int qed_read_l1_table_sync(BDRVQEDState *s)
qed_read_table(s, s->header.l1_table_offset, qed_read_table(s, s->header.l1_table_offset,
s->l1_table, qed_sync_cb, &ret); s->l1_table, qed_sync_cb, &ret);
BDRV_POLL_WHILE(s->bs, ret == -EINPROGRESS); while (ret == -EINPROGRESS) {
aio_poll(bdrv_get_aio_context(s->bs), true);
}
return ret; return ret;
} }
@@ -199,7 +195,9 @@ int qed_write_l1_table_sync(BDRVQEDState *s, unsigned int index,
int ret = -EINPROGRESS; int ret = -EINPROGRESS;
qed_write_l1_table(s, index, n, qed_sync_cb, &ret); qed_write_l1_table(s, index, n, qed_sync_cb, &ret);
BDRV_POLL_WHILE(s->bs, ret == -EINPROGRESS); while (ret == -EINPROGRESS) {
aio_poll(bdrv_get_aio_context(s->bs), true);
}
return ret; return ret;
} }
@@ -219,7 +217,6 @@ static void qed_read_l2_table_cb(void *opaque, int ret)
CachedL2Table *l2_table = request->l2_table; CachedL2Table *l2_table = request->l2_table;
uint64_t l2_offset = read_l2_table_cb->l2_offset; uint64_t l2_offset = read_l2_table_cb->l2_offset;
qed_acquire(s);
if (ret) { if (ret) {
/* can't trust loaded L2 table anymore */ /* can't trust loaded L2 table anymore */
qed_unref_l2_cache_entry(l2_table); qed_unref_l2_cache_entry(l2_table);
@@ -235,7 +232,6 @@ static void qed_read_l2_table_cb(void *opaque, int ret)
request->l2_table = qed_find_l2_cache_entry(&s->l2_cache, l2_offset); request->l2_table = qed_find_l2_cache_entry(&s->l2_cache, l2_offset);
assert(request->l2_table != NULL); assert(request->l2_table != NULL);
} }
qed_release(s);
gencb_complete(&read_l2_table_cb->gencb, ret); gencb_complete(&read_l2_table_cb->gencb, ret);
} }
@@ -272,7 +268,9 @@ int qed_read_l2_table_sync(BDRVQEDState *s, QEDRequest *request, uint64_t offset
int ret = -EINPROGRESS; int ret = -EINPROGRESS;
qed_read_l2_table(s, request, offset, qed_sync_cb, &ret); qed_read_l2_table(s, request, offset, qed_sync_cb, &ret);
BDRV_POLL_WHILE(s->bs, ret == -EINPROGRESS); while (ret == -EINPROGRESS) {
aio_poll(bdrv_get_aio_context(s->bs), true);
}
return ret; return ret;
} }
@@ -292,7 +290,9 @@ int qed_write_l2_table_sync(BDRVQEDState *s, QEDRequest *request,
int ret = -EINPROGRESS; int ret = -EINPROGRESS;
qed_write_l2_table(s, request, index, n, flush, qed_sync_cb, &ret); qed_write_l2_table(s, request, index, n, flush, qed_sync_cb, &ret);
BDRV_POLL_WHILE(s->bs, ret == -EINPROGRESS); while (ret == -EINPROGRESS) {
aio_poll(bdrv_get_aio_context(s->bs), true);
}
return ret; return ret;
} }

View File

@@ -19,6 +19,7 @@
#include "trace.h" #include "trace.h"
#include "qed.h" #include "qed.h"
#include "qapi/qmp/qerror.h" #include "qapi/qmp/qerror.h"
#include "migration/migration.h"
#include "sysemu/block-backend.h" #include "sysemu/block-backend.h"
static const AIOCBInfo qed_aiocb_info = { static const AIOCBInfo qed_aiocb_info = {
@@ -272,19 +273,7 @@ static CachedL2Table *qed_new_l2_table(BDRVQEDState *s)
return l2_table; return l2_table;
} }
static void qed_aio_next_io(QEDAIOCB *acb, int ret); static void qed_aio_next_io(void *opaque, int ret);
static void qed_aio_start_io(QEDAIOCB *acb)
{
qed_aio_next_io(acb, 0);
}
static void qed_aio_next_io_cb(void *opaque, int ret)
{
QEDAIOCB *acb = opaque;
qed_aio_next_io(acb, ret);
}
static void qed_plug_allocating_write_reqs(BDRVQEDState *s) static void qed_plug_allocating_write_reqs(BDRVQEDState *s)
{ {
@@ -303,7 +292,7 @@ static void qed_unplug_allocating_write_reqs(BDRVQEDState *s)
acb = QSIMPLEQ_FIRST(&s->allocating_write_reqs); acb = QSIMPLEQ_FIRST(&s->allocating_write_reqs);
if (acb) { if (acb) {
qed_aio_start_io(acb); qed_aio_next_io(acb, 0);
} }
} }
@@ -344,22 +333,10 @@ static void qed_need_check_timer_cb(void *opaque)
trace_qed_need_check_timer_cb(s); trace_qed_need_check_timer_cb(s);
qed_acquire(s);
qed_plug_allocating_write_reqs(s); qed_plug_allocating_write_reqs(s);
/* Ensure writes are on disk before clearing flag */ /* Ensure writes are on disk before clearing flag */
bdrv_aio_flush(s->bs->file->bs, qed_clear_need_check, s); bdrv_aio_flush(s->bs, qed_clear_need_check, s);
qed_release(s);
}
void qed_acquire(BDRVQEDState *s)
{
aio_context_acquire(bdrv_get_aio_context(s->bs));
}
void qed_release(BDRVQEDState *s)
{
aio_context_release(bdrv_get_aio_context(s->bs));
} }
static void qed_start_need_check_timer(BDRVQEDState *s) static void qed_start_need_check_timer(BDRVQEDState *s)
@@ -401,20 +378,7 @@ static void bdrv_qed_attach_aio_context(BlockDriverState *bs,
} }
} }
static void bdrv_qed_drain(BlockDriverState *bs) static int bdrv_qed_open(BlockDriverState *bs, QDict *options, int flags,
{
BDRVQEDState *s = bs->opaque;
/* Fire the timer immediately in order to start doing I/O as soon as the
* header is flushed.
*/
if (s->need_check_timer && timer_pending(s->need_check_timer)) {
qed_cancel_need_check_timer(s);
qed_need_check_timer_cb(s);
}
}
static int bdrv_qed_do_open(BlockDriverState *bs, QDict *options, int flags,
Error **errp) Error **errp)
{ {
BDRVQEDState *s = bs->opaque; BDRVQEDState *s = bs->opaque;
@@ -549,18 +513,6 @@ out:
return ret; return ret;
} }
static int bdrv_qed_open(BlockDriverState *bs, QDict *options, int flags,
Error **errp)
{
bs->file = bdrv_open_child(NULL, options, "file", bs, &child_file,
false, errp);
if (!bs->file) {
return -EINVAL;
}
return bdrv_qed_do_open(bs, options, flags, errp);
}
static void bdrv_qed_refresh_limits(BlockDriverState *bs, Error **errp) static void bdrv_qed_refresh_limits(BlockDriverState *bs, Error **errp)
{ {
BDRVQEDState *s = bs->opaque; BDRVQEDState *s = bs->opaque;
@@ -624,8 +576,7 @@ static int qed_create(const char *filename, uint32_t cluster_size,
} }
blk = blk_new_open(filename, NULL, NULL, blk = blk_new_open(filename, NULL, NULL,
BDRV_O_RDWR | BDRV_O_RESIZE | BDRV_O_PROTOCOL, BDRV_O_RDWR | BDRV_O_PROTOCOL, &local_err);
&local_err);
if (blk == NULL) { if (blk == NULL) {
error_propagate(errp, local_err); error_propagate(errp, local_err);
return -EIO; return -EIO;
@@ -634,7 +585,7 @@ static int qed_create(const char *filename, uint32_t cluster_size,
blk_set_allow_write_beyond_eof(blk, true); blk_set_allow_write_beyond_eof(blk, true);
/* File must start empty and grow, check truncate is supported */ /* File must start empty and grow, check truncate is supported */
ret = blk_truncate(blk, 0, errp); ret = blk_truncate(blk, 0);
if (ret < 0) { if (ret < 0) {
goto out; goto out;
} }
@@ -757,7 +708,7 @@ static void qed_is_allocated_cb(void *opaque, int ret, uint64_t offset, size_t l
} }
if (cb->co) { if (cb->co) {
aio_co_wake(cb->co); qemu_coroutine_enter(cb->co);
} }
} }
@@ -954,17 +905,15 @@ static void qed_update_l2_table(BDRVQEDState *s, QEDTable *table, int index,
static void qed_aio_complete_bh(void *opaque) static void qed_aio_complete_bh(void *opaque)
{ {
QEDAIOCB *acb = opaque; QEDAIOCB *acb = opaque;
BDRVQEDState *s = acb_to_s(acb);
BlockCompletionFunc *cb = acb->common.cb; BlockCompletionFunc *cb = acb->common.cb;
void *user_opaque = acb->common.opaque; void *user_opaque = acb->common.opaque;
int ret = acb->bh_ret; int ret = acb->bh_ret;
qemu_bh_delete(acb->bh);
qemu_aio_unref(acb); qemu_aio_unref(acb);
/* Invoke callback */ /* Invoke callback */
qed_acquire(s);
cb(user_opaque, ret); cb(user_opaque, ret);
qed_release(s);
} }
static void qed_aio_complete(QEDAIOCB *acb, int ret) static void qed_aio_complete(QEDAIOCB *acb, int ret)
@@ -985,8 +934,9 @@ static void qed_aio_complete(QEDAIOCB *acb, int ret)
/* Arrange for a bh to invoke the completion function */ /* Arrange for a bh to invoke the completion function */
acb->bh_ret = ret; acb->bh_ret = ret;
aio_bh_schedule_oneshot(bdrv_get_aio_context(acb->common.bs), acb->bh = aio_bh_new(bdrv_get_aio_context(acb->common.bs),
qed_aio_complete_bh, acb); qed_aio_complete_bh, acb);
qemu_bh_schedule(acb->bh);
/* Start next allocating write request waiting behind this one. Note that /* Start next allocating write request waiting behind this one. Note that
* requests enqueue themselves when they first hit an unallocated cluster * requests enqueue themselves when they first hit an unallocated cluster
@@ -998,7 +948,7 @@ static void qed_aio_complete(QEDAIOCB *acb, int ret)
QSIMPLEQ_REMOVE_HEAD(&s->allocating_write_reqs, next); QSIMPLEQ_REMOVE_HEAD(&s->allocating_write_reqs, next);
acb = QSIMPLEQ_FIRST(&s->allocating_write_reqs); acb = QSIMPLEQ_FIRST(&s->allocating_write_reqs);
if (acb) { if (acb) {
qed_aio_start_io(acb); qed_aio_next_io(acb, 0);
} else if (s->header.features & QED_F_NEED_CHECK) { } else if (s->header.features & QED_F_NEED_CHECK) {
qed_start_need_check_timer(s); qed_start_need_check_timer(s);
} }
@@ -1023,7 +973,7 @@ static void qed_commit_l2_update(void *opaque, int ret)
acb->request.l2_table = qed_find_l2_cache_entry(&s->l2_cache, l2_offset); acb->request.l2_table = qed_find_l2_cache_entry(&s->l2_cache, l2_offset);
assert(acb->request.l2_table != NULL); assert(acb->request.l2_table != NULL);
qed_aio_next_io(acb, ret); qed_aio_next_io(opaque, ret);
} }
/** /**
@@ -1075,7 +1025,7 @@ static void qed_aio_write_l2_update(QEDAIOCB *acb, int ret, uint64_t offset)
} else { } else {
/* Write out only the updated part of the L2 table */ /* Write out only the updated part of the L2 table */
qed_write_l2_table(s, &acb->request, index, acb->cur_nclusters, false, qed_write_l2_table(s, &acb->request, index, acb->cur_nclusters, false,
qed_aio_next_io_cb, acb); qed_aio_next_io, acb);
} }
return; return;
@@ -1127,7 +1077,7 @@ static void qed_aio_write_main(void *opaque, int ret)
} }
if (acb->find_cluster_ret == QED_CLUSTER_FOUND) { if (acb->find_cluster_ret == QED_CLUSTER_FOUND) {
next_fn = qed_aio_next_io_cb; next_fn = qed_aio_next_io;
} else { } else {
if (s->bs->backing) { if (s->bs->backing) {
next_fn = qed_aio_write_flush_before_l2_update; next_fn = qed_aio_write_flush_before_l2_update;
@@ -1240,7 +1190,7 @@ static void qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
if (acb->flags & QED_AIOCB_ZERO) { if (acb->flags & QED_AIOCB_ZERO) {
/* Skip ahead if the clusters are already zero */ /* Skip ahead if the clusters are already zero */
if (acb->find_cluster_ret == QED_CLUSTER_ZERO) { if (acb->find_cluster_ret == QED_CLUSTER_ZERO) {
qed_aio_start_io(acb); qed_aio_next_io(acb, 0);
return; return;
} }
@@ -1360,18 +1310,18 @@ static void qed_aio_read_data(void *opaque, int ret,
/* Handle zero cluster and backing file reads */ /* Handle zero cluster and backing file reads */
if (ret == QED_CLUSTER_ZERO) { if (ret == QED_CLUSTER_ZERO) {
qemu_iovec_memset(&acb->cur_qiov, 0, 0, acb->cur_qiov.size); qemu_iovec_memset(&acb->cur_qiov, 0, 0, acb->cur_qiov.size);
qed_aio_start_io(acb); qed_aio_next_io(acb, 0);
return; return;
} else if (ret != QED_CLUSTER_FOUND) { } else if (ret != QED_CLUSTER_FOUND) {
qed_read_backing_file(s, acb->cur_pos, &acb->cur_qiov, qed_read_backing_file(s, acb->cur_pos, &acb->cur_qiov,
&acb->backing_qiov, qed_aio_next_io_cb, acb); &acb->backing_qiov, qed_aio_next_io, acb);
return; return;
} }
BLKDBG_EVENT(bs->file, BLKDBG_READ_AIO); BLKDBG_EVENT(bs->file, BLKDBG_READ_AIO);
bdrv_aio_readv(bs->file, offset / BDRV_SECTOR_SIZE, bdrv_aio_readv(bs->file, offset / BDRV_SECTOR_SIZE,
&acb->cur_qiov, acb->cur_qiov.size / BDRV_SECTOR_SIZE, &acb->cur_qiov, acb->cur_qiov.size / BDRV_SECTOR_SIZE,
qed_aio_next_io_cb, acb); qed_aio_next_io, acb);
return; return;
err: err:
@@ -1381,8 +1331,9 @@ err:
/** /**
* Begin next I/O or complete the request * Begin next I/O or complete the request
*/ */
static void qed_aio_next_io(QEDAIOCB *acb, int ret) static void qed_aio_next_io(void *opaque, int ret)
{ {
QEDAIOCB *acb = opaque;
BDRVQEDState *s = acb_to_s(acb); BDRVQEDState *s = acb_to_s(acb);
QEDFindClusterFunc *io_fn = (acb->flags & QED_AIOCB_WRITE) ? QEDFindClusterFunc *io_fn = (acb->flags & QED_AIOCB_WRITE) ?
qed_aio_write_data : qed_aio_read_data; qed_aio_write_data : qed_aio_read_data;
@@ -1438,7 +1389,7 @@ static BlockAIOCB *qed_aio_setup(BlockDriverState *bs,
qemu_iovec_init(&acb->cur_qiov, qiov->niov); qemu_iovec_init(&acb->cur_qiov, qiov->niov);
/* Start request */ /* Start request */
qed_aio_start_io(acb); qed_aio_next_io(acb, 0);
return &acb->common; return &acb->common;
} }
@@ -1474,7 +1425,7 @@ static void coroutine_fn qed_co_pwrite_zeroes_cb(void *opaque, int ret)
cb->done = true; cb->done = true;
cb->ret = ret; cb->ret = ret;
if (cb->co) { if (cb->co) {
aio_co_wake(cb->co); qemu_coroutine_enter(cb->co);
} }
} }
@@ -1517,7 +1468,7 @@ static int coroutine_fn bdrv_qed_co_pwrite_zeroes(BlockDriverState *bs,
return cb.ret; return cb.ret;
} }
static int bdrv_qed_truncate(BlockDriverState *bs, int64_t offset, Error **errp) static int bdrv_qed_truncate(BlockDriverState *bs, int64_t offset)
{ {
BDRVQEDState *s = bs->opaque; BDRVQEDState *s = bs->opaque;
uint64_t old_image_size; uint64_t old_image_size;
@@ -1525,12 +1476,11 @@ static int bdrv_qed_truncate(BlockDriverState *bs, int64_t offset, Error **errp)
if (!qed_is_image_size_valid(offset, s->header.cluster_size, if (!qed_is_image_size_valid(offset, s->header.cluster_size,
s->header.table_size)) { s->header.table_size)) {
error_setg(errp, "Invalid image size specified");
return -EINVAL; return -EINVAL;
} }
/* Shrinking is currently not supported */
if ((uint64_t)offset < s->header.image_size) { if ((uint64_t)offset < s->header.image_size) {
error_setg(errp, "Shrinking images is currently not supported");
return -ENOTSUP; return -ENOTSUP;
} }
@@ -1539,7 +1489,6 @@ static int bdrv_qed_truncate(BlockDriverState *bs, int64_t offset, Error **errp)
ret = qed_write_header_sync(s); ret = qed_write_header_sync(s);
if (ret < 0) { if (ret < 0) {
s->header.image_size = old_image_size; s->header.image_size = old_image_size;
error_setg_errno(errp, -ret, "Failed to update the image size");
} }
return ret; return ret;
} }
@@ -1643,7 +1592,7 @@ static void bdrv_qed_invalidate_cache(BlockDriverState *bs, Error **errp)
bdrv_qed_close(bs); bdrv_qed_close(bs);
memset(s, 0, sizeof(BDRVQEDState)); memset(s, 0, sizeof(BDRVQEDState));
ret = bdrv_qed_do_open(bs, NULL, bs->open_flags, &local_err); ret = bdrv_qed_open(bs, NULL, bs->open_flags, &local_err);
if (local_err) { if (local_err) {
error_propagate(errp, local_err); error_propagate(errp, local_err);
error_prepend(errp, "Could not reopen qed layer: "); error_prepend(errp, "Could not reopen qed layer: ");
@@ -1706,7 +1655,6 @@ static BlockDriver bdrv_qed = {
.bdrv_open = bdrv_qed_open, .bdrv_open = bdrv_qed_open,
.bdrv_close = bdrv_qed_close, .bdrv_close = bdrv_qed_close,
.bdrv_reopen_prepare = bdrv_qed_reopen_prepare, .bdrv_reopen_prepare = bdrv_qed_reopen_prepare,
.bdrv_child_perm = bdrv_format_default_perms,
.bdrv_create = bdrv_qed_create, .bdrv_create = bdrv_qed_create,
.bdrv_has_zero_init = bdrv_has_zero_init_1, .bdrv_has_zero_init = bdrv_has_zero_init_1,
.bdrv_co_get_block_status = bdrv_qed_co_get_block_status, .bdrv_co_get_block_status = bdrv_qed_co_get_block_status,
@@ -1722,7 +1670,6 @@ static BlockDriver bdrv_qed = {
.bdrv_check = bdrv_qed_check, .bdrv_check = bdrv_qed_check,
.bdrv_detach_aio_context = bdrv_qed_detach_aio_context, .bdrv_detach_aio_context = bdrv_qed_detach_aio_context,
.bdrv_attach_aio_context = bdrv_qed_attach_aio_context, .bdrv_attach_aio_context = bdrv_qed_attach_aio_context,
.bdrv_drain = bdrv_qed_drain,
}; };
static void bdrv_qed_init(void) static void bdrv_qed_init(void)

View File

@@ -130,6 +130,7 @@ enum {
typedef struct QEDAIOCB { typedef struct QEDAIOCB {
BlockAIOCB common; BlockAIOCB common;
QEMUBH *bh;
int bh_ret; /* final return status for completion bh */ int bh_ret; /* final return status for completion bh */
QSIMPLEQ_ENTRY(QEDAIOCB) next; /* next request */ QSIMPLEQ_ENTRY(QEDAIOCB) next; /* next request */
int flags; /* QED_AIOCB_* bits ORed together */ int flags; /* QED_AIOCB_* bits ORed together */
@@ -198,9 +199,6 @@ enum {
*/ */
typedef void QEDFindClusterFunc(void *opaque, int ret, uint64_t offset, size_t len); typedef void QEDFindClusterFunc(void *opaque, int ret, uint64_t offset, size_t len);
void qed_acquire(BDRVQEDState *s);
void qed_release(BDRVQEDState *s);
/** /**
* Generic callback for chaining async callbacks * Generic callback for chaining async callbacks
*/ */

View File

@@ -97,7 +97,7 @@ typedef struct QuorumAIOCB QuorumAIOCB;
* $children_count QuorumChildRequest. * $children_count QuorumChildRequest.
*/ */
typedef struct QuorumChildRequest { typedef struct QuorumChildRequest {
BlockDriverState *bs; BlockAIOCB *aiocb;
QEMUIOVector qiov; QEMUIOVector qiov;
uint8_t *buf; uint8_t *buf;
int ret; int ret;
@@ -110,12 +110,11 @@ typedef struct QuorumChildRequest {
* used to do operations on each children and track overall progress. * used to do operations on each children and track overall progress.
*/ */
struct QuorumAIOCB { struct QuorumAIOCB {
BlockDriverState *bs; BlockAIOCB common;
Coroutine *co;
/* Request metadata */ /* Request metadata */
uint64_t offset; uint64_t sector_num;
uint64_t bytes; int nb_sectors;
QEMUIOVector *qiov; /* calling IOV */ QEMUIOVector *qiov; /* calling IOV */
@@ -131,18 +130,50 @@ struct QuorumAIOCB {
bool is_read; bool is_read;
int vote_ret; int vote_ret;
int children_read; /* how many children have been read from */ int child_iter; /* which child to read in fifo pattern */
}; };
typedef struct QuorumCo { static bool quorum_vote(QuorumAIOCB *acb);
QuorumAIOCB *acb;
int idx; static void quorum_aio_cancel(BlockAIOCB *blockacb)
} QuorumCo; {
QuorumAIOCB *acb = container_of(blockacb, QuorumAIOCB, common);
BDRVQuorumState *s = acb->common.bs->opaque;
int i;
/* cancel all callbacks */
for (i = 0; i < s->num_children; i++) {
if (acb->qcrs[i].aiocb) {
bdrv_aio_cancel_async(acb->qcrs[i].aiocb);
}
}
}
static AIOCBInfo quorum_aiocb_info = {
.aiocb_size = sizeof(QuorumAIOCB),
.cancel_async = quorum_aio_cancel,
};
static void quorum_aio_finalize(QuorumAIOCB *acb) static void quorum_aio_finalize(QuorumAIOCB *acb)
{ {
int i, ret = 0;
if (acb->vote_ret) {
ret = acb->vote_ret;
}
acb->common.cb(acb->common.opaque, ret);
if (acb->is_read) {
/* on the quorum case acb->child_iter == s->num_children - 1 */
for (i = 0; i <= acb->child_iter; i++) {
qemu_vfree(acb->qcrs[i].buf);
qemu_iovec_destroy(&acb->qcrs[i].qiov);
}
}
g_free(acb->qcrs); g_free(acb->qcrs);
g_free(acb); qemu_aio_unref(acb);
} }
static bool quorum_sha256_compare(QuorumVoteValue *a, QuorumVoteValue *b) static bool quorum_sha256_compare(QuorumVoteValue *a, QuorumVoteValue *b)
@@ -155,26 +186,30 @@ static bool quorum_64bits_compare(QuorumVoteValue *a, QuorumVoteValue *b)
return a->l == b->l; return a->l == b->l;
} }
static QuorumAIOCB *quorum_aio_get(BlockDriverState *bs, static QuorumAIOCB *quorum_aio_get(BDRVQuorumState *s,
BlockDriverState *bs,
QEMUIOVector *qiov, QEMUIOVector *qiov,
uint64_t offset, uint64_t sector_num,
uint64_t bytes) int nb_sectors,
BlockCompletionFunc *cb,
void *opaque)
{ {
BDRVQuorumState *s = bs->opaque; QuorumAIOCB *acb = qemu_aio_get(&quorum_aiocb_info, bs, cb, opaque);
QuorumAIOCB *acb = g_new(QuorumAIOCB, 1);
int i; int i;
*acb = (QuorumAIOCB) { acb->common.bs->opaque = s;
.co = qemu_coroutine_self(), acb->sector_num = sector_num;
.bs = bs, acb->nb_sectors = nb_sectors;
.offset = offset, acb->qiov = qiov;
.bytes = bytes,
.qiov = qiov,
.votes.compare = quorum_sha256_compare,
.votes.vote_list = QLIST_HEAD_INITIALIZER(acb.votes.vote_list),
};
acb->qcrs = g_new0(QuorumChildRequest, s->num_children); acb->qcrs = g_new0(QuorumChildRequest, s->num_children);
acb->count = 0;
acb->success_count = 0;
acb->rewrite_count = 0;
acb->votes.compare = quorum_sha256_compare;
QLIST_INIT(&acb->votes.vote_list);
acb->is_read = false;
acb->vote_ret = 0;
for (i = 0; i < s->num_children; i++) { for (i = 0; i < s->num_children; i++) {
acb->qcrs[i].buf = NULL; acb->qcrs[i].buf = NULL;
acb->qcrs[i].ret = 0; acb->qcrs[i].ret = 0;
@@ -184,37 +219,30 @@ static QuorumAIOCB *quorum_aio_get(BlockDriverState *bs,
return acb; return acb;
} }
static void quorum_report_bad(QuorumOpType type, uint64_t offset, static void quorum_report_bad(QuorumOpType type, uint64_t sector_num,
uint64_t bytes, char *node_name, int ret) int nb_sectors, char *node_name, int ret)
{ {
const char *msg = NULL; const char *msg = NULL;
int64_t start_sector = offset / BDRV_SECTOR_SIZE;
int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE);
if (ret < 0) { if (ret < 0) {
msg = strerror(-ret); msg = strerror(-ret);
} }
qapi_event_send_quorum_report_bad(type, !!msg, msg, node_name, start_sector, qapi_event_send_quorum_report_bad(type, !!msg, msg, node_name,
end_sector - start_sector, &error_abort); sector_num, nb_sectors, &error_abort);
} }
static void quorum_report_failure(QuorumAIOCB *acb) static void quorum_report_failure(QuorumAIOCB *acb)
{ {
const char *reference = bdrv_get_device_or_node_name(acb->bs); const char *reference = bdrv_get_device_or_node_name(acb->common.bs);
int64_t start_sector = acb->offset / BDRV_SECTOR_SIZE; qapi_event_send_quorum_failure(reference, acb->sector_num,
int64_t end_sector = DIV_ROUND_UP(acb->offset + acb->bytes, acb->nb_sectors, &error_abort);
BDRV_SECTOR_SIZE);
qapi_event_send_quorum_failure(reference, start_sector,
end_sector - start_sector, &error_abort);
} }
static int quorum_vote_error(QuorumAIOCB *acb); static int quorum_vote_error(QuorumAIOCB *acb);
static bool quorum_has_too_much_io_failed(QuorumAIOCB *acb) static bool quorum_has_too_much_io_failed(QuorumAIOCB *acb)
{ {
BDRVQuorumState *s = acb->bs->opaque; BDRVQuorumState *s = acb->common.bs->opaque;
if (acb->success_count < s->threshold) { if (acb->success_count < s->threshold) {
acb->vote_ret = quorum_vote_error(acb); acb->vote_ret = quorum_vote_error(acb);
@@ -225,7 +253,22 @@ static bool quorum_has_too_much_io_failed(QuorumAIOCB *acb)
return false; return false;
} }
static int read_fifo_child(QuorumAIOCB *acb); static void quorum_rewrite_aio_cb(void *opaque, int ret)
{
QuorumAIOCB *acb = opaque;
/* one less rewrite to do */
acb->rewrite_count--;
/* wait until all rewrite callbacks have completed */
if (acb->rewrite_count) {
return;
}
quorum_aio_finalize(acb);
}
static BlockAIOCB *read_fifo_child(QuorumAIOCB *acb);
static void quorum_copy_qiov(QEMUIOVector *dest, QEMUIOVector *source) static void quorum_copy_qiov(QEMUIOVector *dest, QEMUIOVector *source)
{ {
@@ -240,11 +283,57 @@ static void quorum_copy_qiov(QEMUIOVector *dest, QEMUIOVector *source)
} }
} }
static void quorum_report_bad_acb(QuorumChildRequest *sacb, int ret) static void quorum_aio_cb(void *opaque, int ret)
{ {
QuorumChildRequest *sacb = opaque;
QuorumAIOCB *acb = sacb->parent; QuorumAIOCB *acb = sacb->parent;
QuorumOpType type = acb->is_read ? QUORUM_OP_TYPE_READ : QUORUM_OP_TYPE_WRITE; BDRVQuorumState *s = acb->common.bs->opaque;
quorum_report_bad(type, acb->offset, acb->bytes, sacb->bs->node_name, ret); bool rewrite = false;
if (ret == 0) {
acb->success_count++;
} else {
QuorumOpType type;
type = acb->is_read ? QUORUM_OP_TYPE_READ : QUORUM_OP_TYPE_WRITE;
quorum_report_bad(type, acb->sector_num, acb->nb_sectors,
sacb->aiocb->bs->node_name, ret);
}
if (acb->is_read && s->read_pattern == QUORUM_READ_PATTERN_FIFO) {
/* We try to read next child in FIFO order if we fail to read */
if (ret < 0 && (acb->child_iter + 1) < s->num_children) {
acb->child_iter++;
read_fifo_child(acb);
return;
}
if (ret == 0) {
quorum_copy_qiov(acb->qiov, &acb->qcrs[acb->child_iter].qiov);
}
acb->vote_ret = ret;
quorum_aio_finalize(acb);
return;
}
sacb->ret = ret;
acb->count++;
assert(acb->count <= s->num_children);
assert(acb->success_count <= s->num_children);
if (acb->count < s->num_children) {
return;
}
/* Do the vote on read */
if (acb->is_read) {
rewrite = quorum_vote(acb);
} else {
quorum_has_too_much_io_failed(acb);
}
/* if no rewrite is done the code will finish right away */
if (!rewrite) {
quorum_aio_finalize(acb);
}
} }
static void quorum_report_bad_versions(BDRVQuorumState *s, static void quorum_report_bad_versions(BDRVQuorumState *s,
@@ -259,31 +348,14 @@ static void quorum_report_bad_versions(BDRVQuorumState *s,
continue; continue;
} }
QLIST_FOREACH(item, &version->items, next) { QLIST_FOREACH(item, &version->items, next) {
quorum_report_bad(QUORUM_OP_TYPE_READ, acb->offset, acb->bytes, quorum_report_bad(QUORUM_OP_TYPE_READ, acb->sector_num,
acb->nb_sectors,
s->children[item->index]->bs->node_name, 0); s->children[item->index]->bs->node_name, 0);
} }
} }
} }
static void quorum_rewrite_entry(void *opaque) static bool quorum_rewrite_bad_versions(BDRVQuorumState *s, QuorumAIOCB *acb,
{
QuorumCo *co = opaque;
QuorumAIOCB *acb = co->acb;
BDRVQuorumState *s = acb->bs->opaque;
/* Ignore any errors, it's just a correction attempt for already
* corrupted data. */
bdrv_co_pwritev(s->children[co->idx], acb->offset, acb->bytes,
acb->qiov, 0);
/* Wake up the caller after the last rewrite */
acb->rewrite_count--;
if (!acb->rewrite_count) {
qemu_coroutine_enter_if_inactive(acb->co);
}
}
static bool quorum_rewrite_bad_versions(QuorumAIOCB *acb,
QuorumVoteValue *value) QuorumVoteValue *value)
{ {
QuorumVoteVersion *version; QuorumVoteVersion *version;
@@ -302,7 +374,7 @@ static bool quorum_rewrite_bad_versions(QuorumAIOCB *acb,
} }
} }
/* quorum_rewrite_entry will count down this to zero */ /* quorum_rewrite_aio_cb will count down this to zero */
acb->rewrite_count = count; acb->rewrite_count = count;
/* now fire the correcting rewrites */ /* now fire the correcting rewrites */
@@ -311,14 +383,9 @@ static bool quorum_rewrite_bad_versions(QuorumAIOCB *acb,
continue; continue;
} }
QLIST_FOREACH(item, &version->items, next) { QLIST_FOREACH(item, &version->items, next) {
Coroutine *co; bdrv_aio_writev(s->children[item->index], acb->sector_num,
QuorumCo data = { acb->qiov, acb->nb_sectors, quorum_rewrite_aio_cb,
.acb = acb, acb);
.idx = item->index,
};
co = qemu_coroutine_create(quorum_rewrite_entry, &data);
qemu_coroutine_enter(co);
} }
} }
@@ -438,8 +505,8 @@ static void GCC_FMT_ATTR(2, 3) quorum_err(QuorumAIOCB *acb,
va_list ap; va_list ap;
va_start(ap, fmt); va_start(ap, fmt);
fprintf(stderr, "quorum: offset=%" PRIu64 " bytes=%" PRIu64 " ", fprintf(stderr, "quorum: sector_num=%" PRId64 " nb_sectors=%d ",
acb->offset, acb->bytes); acb->sector_num, acb->nb_sectors);
vfprintf(stderr, fmt, ap); vfprintf(stderr, fmt, ap);
fprintf(stderr, "\n"); fprintf(stderr, "\n");
va_end(ap); va_end(ap);
@@ -450,15 +517,16 @@ static bool quorum_compare(QuorumAIOCB *acb,
QEMUIOVector *a, QEMUIOVector *a,
QEMUIOVector *b) QEMUIOVector *b)
{ {
BDRVQuorumState *s = acb->bs->opaque; BDRVQuorumState *s = acb->common.bs->opaque;
ssize_t offset; ssize_t offset;
/* This driver will replace blkverify in this particular case */ /* This driver will replace blkverify in this particular case */
if (s->is_blkverify) { if (s->is_blkverify) {
offset = qemu_iovec_compare(a, b); offset = qemu_iovec_compare(a, b);
if (offset != -1) { if (offset != -1) {
quorum_err(acb, "contents mismatch at offset %" PRIu64, quorum_err(acb, "contents mismatch in sector %" PRId64,
acb->offset + offset); acb->sector_num +
(uint64_t)(offset / BDRV_SECTOR_SIZE));
} }
return true; return true;
} }
@@ -469,7 +537,7 @@ static bool quorum_compare(QuorumAIOCB *acb,
/* Do a vote to get the error code */ /* Do a vote to get the error code */
static int quorum_vote_error(QuorumAIOCB *acb) static int quorum_vote_error(QuorumAIOCB *acb)
{ {
BDRVQuorumState *s = acb->bs->opaque; BDRVQuorumState *s = acb->common.bs->opaque;
QuorumVoteVersion *winner = NULL; QuorumVoteVersion *winner = NULL;
QuorumVotes error_votes; QuorumVotes error_votes;
QuorumVoteValue result_value; QuorumVoteValue result_value;
@@ -498,16 +566,17 @@ static int quorum_vote_error(QuorumAIOCB *acb)
return ret; return ret;
} }
static void quorum_vote(QuorumAIOCB *acb) static bool quorum_vote(QuorumAIOCB *acb)
{ {
bool quorum = true; bool quorum = true;
bool rewrite = false;
int i, j, ret; int i, j, ret;
QuorumVoteValue hash; QuorumVoteValue hash;
BDRVQuorumState *s = acb->bs->opaque; BDRVQuorumState *s = acb->common.bs->opaque;
QuorumVoteVersion *winner; QuorumVoteVersion *winner;
if (quorum_has_too_much_io_failed(acb)) { if (quorum_has_too_much_io_failed(acb)) {
return; return false;
} }
/* get the index of the first successful read */ /* get the index of the first successful read */
@@ -535,7 +604,7 @@ static void quorum_vote(QuorumAIOCB *acb)
/* Every successful read agrees */ /* Every successful read agrees */
if (quorum) { if (quorum) {
quorum_copy_qiov(acb->qiov, &acb->qcrs[i].qiov); quorum_copy_qiov(acb->qiov, &acb->qcrs[i].qiov);
return; return false;
} }
/* compute hashes for each successful read, also store indexes */ /* compute hashes for each successful read, also store indexes */
@@ -570,48 +639,20 @@ static void quorum_vote(QuorumAIOCB *acb)
/* corruption correction is enabled */ /* corruption correction is enabled */
if (s->rewrite_corrupted) { if (s->rewrite_corrupted) {
quorum_rewrite_bad_versions(acb, &winner->value); rewrite = quorum_rewrite_bad_versions(s, acb, &winner->value);
} }
free_exit: free_exit:
/* free lists */ /* free lists */
quorum_free_vote_list(&acb->votes); quorum_free_vote_list(&acb->votes);
return rewrite;
} }
static void read_quorum_children_entry(void *opaque) static BlockAIOCB *read_quorum_children(QuorumAIOCB *acb)
{ {
QuorumCo *co = opaque; BDRVQuorumState *s = acb->common.bs->opaque;
QuorumAIOCB *acb = co->acb; int i;
BDRVQuorumState *s = acb->bs->opaque;
int i = co->idx;
QuorumChildRequest *sacb = &acb->qcrs[i];
sacb->bs = s->children[i]->bs;
sacb->ret = bdrv_co_preadv(s->children[i], acb->offset, acb->bytes,
&acb->qcrs[i].qiov, 0);
if (sacb->ret == 0) {
acb->success_count++;
} else {
quorum_report_bad_acb(sacb, sacb->ret);
}
acb->count++;
assert(acb->count <= s->num_children);
assert(acb->success_count <= s->num_children);
/* Wake up the caller after the last read */
if (acb->count == s->num_children) {
qemu_coroutine_enter_if_inactive(acb->co);
}
}
static int read_quorum_children(QuorumAIOCB *acb)
{
BDRVQuorumState *s = acb->bs->opaque;
int i, ret;
acb->children_read = s->num_children;
for (i = 0; i < s->num_children; i++) { for (i = 0; i < s->num_children; i++) {
acb->qcrs[i].buf = qemu_blockalign(s->children[i]->bs, acb->qiov->size); acb->qcrs[i].buf = qemu_blockalign(s->children[i]->bs, acb->qiov->size);
qemu_iovec_init(&acb->qcrs[i].qiov, acb->qiov->niov); qemu_iovec_init(&acb->qcrs[i].qiov, acb->qiov->niov);
@@ -619,131 +660,71 @@ static int read_quorum_children(QuorumAIOCB *acb)
} }
for (i = 0; i < s->num_children; i++) { for (i = 0; i < s->num_children; i++) {
Coroutine *co; acb->qcrs[i].aiocb = bdrv_aio_readv(s->children[i], acb->sector_num,
QuorumCo data = { &acb->qcrs[i].qiov, acb->nb_sectors,
.acb = acb, quorum_aio_cb, &acb->qcrs[i]);
.idx = i,
};
co = qemu_coroutine_create(read_quorum_children_entry, &data);
qemu_coroutine_enter(co);
} }
while (acb->count < s->num_children) { return &acb->common;
qemu_coroutine_yield();
} }
/* Do the vote on read */ static BlockAIOCB *read_fifo_child(QuorumAIOCB *acb)
quorum_vote(acb);
for (i = 0; i < s->num_children; i++) {
qemu_vfree(acb->qcrs[i].buf);
qemu_iovec_destroy(&acb->qcrs[i].qiov);
}
while (acb->rewrite_count) {
qemu_coroutine_yield();
}
ret = acb->vote_ret;
return ret;
}
static int read_fifo_child(QuorumAIOCB *acb)
{ {
BDRVQuorumState *s = acb->bs->opaque; BDRVQuorumState *s = acb->common.bs->opaque;
int n, ret;
/* We try to read the next child in FIFO order if we failed to read */ acb->qcrs[acb->child_iter].buf =
do { qemu_blockalign(s->children[acb->child_iter]->bs, acb->qiov->size);
n = acb->children_read++; qemu_iovec_init(&acb->qcrs[acb->child_iter].qiov, acb->qiov->niov);
acb->qcrs[n].bs = s->children[n]->bs; qemu_iovec_clone(&acb->qcrs[acb->child_iter].qiov, acb->qiov,
ret = bdrv_co_preadv(s->children[n], acb->offset, acb->bytes, acb->qcrs[acb->child_iter].buf);
acb->qiov, 0); acb->qcrs[acb->child_iter].aiocb =
if (ret < 0) { bdrv_aio_readv(s->children[acb->child_iter], acb->sector_num,
quorum_report_bad_acb(&acb->qcrs[n], ret); &acb->qcrs[acb->child_iter].qiov, acb->nb_sectors,
} quorum_aio_cb, &acb->qcrs[acb->child_iter]);
} while (ret < 0 && acb->children_read < s->num_children);
/* FIXME: rewrite failed children if acb->children_read > 1? */ return &acb->common;
return ret;
} }
static int quorum_co_preadv(BlockDriverState *bs, uint64_t offset, static BlockAIOCB *quorum_aio_readv(BlockDriverState *bs,
uint64_t bytes, QEMUIOVector *qiov, int flags) int64_t sector_num,
QEMUIOVector *qiov,
int nb_sectors,
BlockCompletionFunc *cb,
void *opaque)
{ {
BDRVQuorumState *s = bs->opaque; BDRVQuorumState *s = bs->opaque;
QuorumAIOCB *acb = quorum_aio_get(bs, qiov, offset, bytes); QuorumAIOCB *acb = quorum_aio_get(s, bs, qiov, sector_num,
int ret; nb_sectors, cb, opaque);
acb->is_read = true; acb->is_read = true;
acb->children_read = 0;
if (s->read_pattern == QUORUM_READ_PATTERN_QUORUM) { if (s->read_pattern == QUORUM_READ_PATTERN_QUORUM) {
ret = read_quorum_children(acb); acb->child_iter = s->num_children - 1;
} else { return read_quorum_children(acb);
ret = read_fifo_child(acb);
}
quorum_aio_finalize(acb);
return ret;
} }
static void write_quorum_entry(void *opaque) acb->child_iter = 0;
{ return read_fifo_child(acb);
QuorumCo *co = opaque;
QuorumAIOCB *acb = co->acb;
BDRVQuorumState *s = acb->bs->opaque;
int i = co->idx;
QuorumChildRequest *sacb = &acb->qcrs[i];
sacb->bs = s->children[i]->bs;
sacb->ret = bdrv_co_pwritev(s->children[i], acb->offset, acb->bytes,
acb->qiov, 0);
if (sacb->ret == 0) {
acb->success_count++;
} else {
quorum_report_bad_acb(sacb, sacb->ret);
}
acb->count++;
assert(acb->count <= s->num_children);
assert(acb->success_count <= s->num_children);
/* Wake up the caller after the last write */
if (acb->count == s->num_children) {
qemu_coroutine_enter_if_inactive(acb->co);
}
} }
static int quorum_co_pwritev(BlockDriverState *bs, uint64_t offset, static BlockAIOCB *quorum_aio_writev(BlockDriverState *bs,
uint64_t bytes, QEMUIOVector *qiov, int flags) int64_t sector_num,
QEMUIOVector *qiov,
int nb_sectors,
BlockCompletionFunc *cb,
void *opaque)
{ {
BDRVQuorumState *s = bs->opaque; BDRVQuorumState *s = bs->opaque;
QuorumAIOCB *acb = quorum_aio_get(bs, qiov, offset, bytes); QuorumAIOCB *acb = quorum_aio_get(s, bs, qiov, sector_num, nb_sectors,
int i, ret; cb, opaque);
int i;
for (i = 0; i < s->num_children; i++) { for (i = 0; i < s->num_children; i++) {
Coroutine *co; acb->qcrs[i].aiocb = bdrv_aio_writev(s->children[i], sector_num,
QuorumCo data = { qiov, nb_sectors, &quorum_aio_cb,
.acb = acb, &acb->qcrs[i]);
.idx = i,
};
co = qemu_coroutine_create(write_quorum_entry, &data);
qemu_coroutine_enter(co);
} }
while (acb->count < s->num_children) { return &acb->common;
qemu_coroutine_yield();
}
quorum_has_too_much_io_failed(acb);
ret = acb->vote_ret;
quorum_aio_finalize(acb);
return ret;
} }
static int64_t quorum_getlength(BlockDriverState *bs) static int64_t quorum_getlength(BlockDriverState *bs)
@@ -787,7 +768,7 @@ static coroutine_fn int quorum_co_flush(BlockDriverState *bs)
result = bdrv_co_flush(s->children[i]->bs); result = bdrv_co_flush(s->children[i]->bs);
if (result) { if (result) {
quorum_report_bad(QUORUM_OP_TYPE_FLUSH, 0, quorum_report_bad(QUORUM_OP_TYPE_FLUSH, 0,
bdrv_getlength(s->children[i]->bs), bdrv_nb_sectors(s->children[i]->bs),
s->children[i]->bs->node_name, result); s->children[i]->bs->node_name, result);
result_value.l = result; result_value.l = result;
quorum_count_vote(&error_votes, &result_value, i); quorum_count_vote(&error_votes, &result_value, i);
@@ -1032,17 +1013,10 @@ static void quorum_add_child(BlockDriverState *bs, BlockDriverState *child_bs,
/* We can safely add the child now */ /* We can safely add the child now */
bdrv_ref(child_bs); bdrv_ref(child_bs);
child = bdrv_attach_child(bs, child_bs, indexstr, &child_format);
child = bdrv_attach_child(bs, child_bs, indexstr, &child_format, errp);
if (child == NULL) {
s->next_child_index--;
bdrv_unref(child_bs);
goto out;
}
s->children = g_renew(BdrvChild *, s->children, s->num_children + 1); s->children = g_renew(BdrvChild *, s->children, s->num_children + 1);
s->children[s->num_children++] = child; s->children[s->num_children++] = child;
out:
bdrv_drained_end(bs); bdrv_drained_end(bs);
} }
@@ -1096,15 +1070,19 @@ static void quorum_refresh_filename(BlockDriverState *bs, QDict *options)
children = qlist_new(); children = qlist_new();
for (i = 0; i < s->num_children; i++) { for (i = 0; i < s->num_children; i++) {
QINCREF(s->children[i]->bs->full_open_options); QINCREF(s->children[i]->bs->full_open_options);
qlist_append(children, s->children[i]->bs->full_open_options); qlist_append_obj(children,
QOBJECT(s->children[i]->bs->full_open_options));
} }
opts = qdict_new(); opts = qdict_new();
qdict_put_str(opts, "driver", "quorum"); qdict_put_obj(opts, "driver", QOBJECT(qstring_from_str("quorum")));
qdict_put_int(opts, QUORUM_OPT_VOTE_THRESHOLD, s->threshold); qdict_put_obj(opts, QUORUM_OPT_VOTE_THRESHOLD,
qdict_put_bool(opts, QUORUM_OPT_BLKVERIFY, s->is_blkverify); QOBJECT(qint_from_int(s->threshold)));
qdict_put_bool(opts, QUORUM_OPT_REWRITE, s->rewrite_corrupted); qdict_put_obj(opts, QUORUM_OPT_BLKVERIFY,
qdict_put(opts, "children", children); QOBJECT(qbool_from_bool(s->is_blkverify)));
qdict_put_obj(opts, QUORUM_OPT_REWRITE,
QOBJECT(qbool_from_bool(s->rewrite_corrupted)));
qdict_put_obj(opts, "children", QOBJECT(children));
bs->full_open_options = opts; bs->full_open_options = opts;
} }
@@ -1123,14 +1101,12 @@ static BlockDriver bdrv_quorum = {
.bdrv_getlength = quorum_getlength, .bdrv_getlength = quorum_getlength,
.bdrv_co_preadv = quorum_co_preadv, .bdrv_aio_readv = quorum_aio_readv,
.bdrv_co_pwritev = quorum_co_pwritev, .bdrv_aio_writev = quorum_aio_writev,
.bdrv_add_child = quorum_add_child, .bdrv_add_child = quorum_add_child,
.bdrv_del_child = quorum_del_child, .bdrv_del_child = quorum_del_child,
.bdrv_child_perm = bdrv_filter_default_perms,
.is_filter = true, .is_filter = true,
.bdrv_recurse_is_first_non_filter = quorum_recurse_is_first_non_filter, .bdrv_recurse_is_first_non_filter = quorum_recurse_is_first_non_filter,
}; };

View File

@@ -25,6 +25,8 @@
#include "qapi/error.h" #include "qapi/error.h"
#include "qemu/cutils.h" #include "qemu/cutils.h"
#include "qemu/error-report.h" #include "qemu/error-report.h"
#include "qemu/timer.h"
#include "qemu/log.h"
#include "block/block_int.h" #include "block/block_int.h"
#include "qemu/module.h" #include "qemu/module.h"
#include "trace.h" #include "trace.h"
@@ -129,31 +131,18 @@ do { \
#define MAX_BLOCKSIZE 4096 #define MAX_BLOCKSIZE 4096
/* Posix file locking bytes. Libvirt takes byte 0, we start from higher bytes,
* leaving a few more bytes for its future use. */
#define RAW_LOCK_PERM_BASE 100
#define RAW_LOCK_SHARED_BASE 200
typedef struct BDRVRawState { typedef struct BDRVRawState {
int fd; int fd;
int lock_fd;
bool use_lock;
int type; int type;
int open_flags; int open_flags;
size_t buf_align; size_t buf_align;
/* The current permissions. */
uint64_t perm;
uint64_t shared_perm;
#ifdef CONFIG_XFS #ifdef CONFIG_XFS
bool is_xfs:1; bool is_xfs:1;
#endif #endif
bool has_discard:1; bool has_discard:1;
bool has_write_zeroes:1; bool has_write_zeroes:1;
bool discard_zeroes:1; bool discard_zeroes:1;
bool use_linux_aio:1;
bool page_cache_inconsistent:1;
bool has_fallocate; bool has_fallocate;
bool needs_alignment; bool needs_alignment;
} BDRVRawState; } BDRVRawState;
@@ -229,28 +218,28 @@ static int probe_logical_blocksize(int fd, unsigned int *sector_size_p)
{ {
unsigned int sector_size; unsigned int sector_size;
bool success = false; bool success = false;
int i;
errno = ENOTSUP; errno = ENOTSUP;
static const unsigned long ioctl_list[] = {
#ifdef BLKSSZGET
BLKSSZGET,
#endif
#ifdef DKIOCGETBLOCKSIZE
DKIOCGETBLOCKSIZE,
#endif
#ifdef DIOCGSECTORSIZE
DIOCGSECTORSIZE,
#endif
};
/* Try a few ioctls to get the right size */ /* Try a few ioctls to get the right size */
for (i = 0; i < (int)ARRAY_SIZE(ioctl_list); i++) { #ifdef BLKSSZGET
if (ioctl(fd, ioctl_list[i], &sector_size) >= 0) { if (ioctl(fd, BLKSSZGET, &sector_size) >= 0) {
*sector_size_p = sector_size; *sector_size_p = sector_size;
success = true; success = true;
} }
#endif
#ifdef DKIOCGETBLOCKSIZE
if (ioctl(fd, DKIOCGETBLOCKSIZE, &sector_size) >= 0) {
*sector_size_p = sector_size;
success = true;
} }
#endif
#ifdef DIOCGSECTORSIZE
if (ioctl(fd, DIOCGSECTORSIZE, &sector_size) >= 0) {
*sector_size_p = sector_size;
success = true;
}
#endif
return success ? 0 : -errno; return success ? 0 : -errno;
} }
@@ -378,10 +367,27 @@ static void raw_parse_flags(int bdrv_flags, int *open_flags)
} }
} }
#ifdef CONFIG_LINUX_AIO
static bool raw_use_aio(int bdrv_flags)
{
/*
* Currently Linux do AIO only for files opened with O_DIRECT
* specified so check NOCACHE flag too
*/
return (bdrv_flags & (BDRV_O_NOCACHE|BDRV_O_NATIVE_AIO)) ==
(BDRV_O_NOCACHE|BDRV_O_NATIVE_AIO);
}
#endif
static void raw_parse_filename(const char *filename, QDict *options, static void raw_parse_filename(const char *filename, QDict *options,
Error **errp) Error **errp)
{ {
bdrv_parse_filename_strip_prefix(filename, "file:", options); /* The filename does not have to be prefixed by the protocol name, since
* "file" is the default protocol; therefore, the return value of this
* function call can be ignored. */
strstart(filename, "file:", &filename);
qdict_put_obj(options, "filename", QOBJECT(qstring_from_str(filename)));
} }
static QemuOptsList raw_runtime_opts = { static QemuOptsList raw_runtime_opts = {
@@ -393,16 +399,6 @@ static QemuOptsList raw_runtime_opts = {
.type = QEMU_OPT_STRING, .type = QEMU_OPT_STRING,
.help = "File name of the image", .help = "File name of the image",
}, },
{
.name = "aio",
.type = QEMU_OPT_STRING,
.help = "host AIO implementation (threads, native)",
},
{
.name = "locking",
.type = QEMU_OPT_STRING,
.help = "file locking mode (on/off/auto, default: auto)",
},
{ /* end of list */ } { /* end of list */ }
}, },
}; };
@@ -414,10 +410,8 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
QemuOpts *opts; QemuOpts *opts;
Error *local_err = NULL; Error *local_err = NULL;
const char *filename = NULL; const char *filename = NULL;
BlockdevAioOptions aio, aio_default;
int fd, ret; int fd, ret;
struct stat st; struct stat st;
OnOffAuto locking;
opts = qemu_opts_create(&raw_runtime_opts, NULL, 0, &error_abort); opts = qemu_opts_create(&raw_runtime_opts, NULL, 0, &error_abort);
qemu_opts_absorb_qdict(opts, options, &local_err); qemu_opts_absorb_qdict(opts, options, &local_err);
@@ -435,49 +429,6 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
goto fail; goto fail;
} }
aio_default = (bdrv_flags & BDRV_O_NATIVE_AIO)
? BLOCKDEV_AIO_OPTIONS_NATIVE
: BLOCKDEV_AIO_OPTIONS_THREADS;
aio = qapi_enum_parse(BlockdevAioOptions_lookup, qemu_opt_get(opts, "aio"),
BLOCKDEV_AIO_OPTIONS__MAX, aio_default, &local_err);
if (local_err) {
error_propagate(errp, local_err);
ret = -EINVAL;
goto fail;
}
s->use_linux_aio = (aio == BLOCKDEV_AIO_OPTIONS_NATIVE);
locking = qapi_enum_parse(OnOffAuto_lookup, qemu_opt_get(opts, "locking"),
ON_OFF_AUTO__MAX, ON_OFF_AUTO_AUTO, &local_err);
if (local_err) {
error_propagate(errp, local_err);
ret = -EINVAL;
goto fail;
}
switch (locking) {
case ON_OFF_AUTO_ON:
s->use_lock = true;
#ifndef F_OFD_SETLK
fprintf(stderr,
"File lock requested but OFD locking syscall is unavailable, "
"falling back to POSIX file locks.\n"
"Due to the implementation, locks can be lost unexpectedly.\n");
#endif
break;
case ON_OFF_AUTO_OFF:
s->use_lock = false;
break;
case ON_OFF_AUTO_AUTO:
#ifdef F_OFD_SETLK
s->use_lock = true;
#else
s->use_lock = false;
#endif
break;
default:
abort();
}
s->open_flags = open_flags; s->open_flags = open_flags;
raw_parse_flags(bdrv_flags, &s->open_flags); raw_parse_flags(bdrv_flags, &s->open_flags);
@@ -485,7 +436,6 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
fd = qemu_open(filename, s->open_flags, 0644); fd = qemu_open(filename, s->open_flags, 0644);
if (fd < 0) { if (fd < 0) {
ret = -errno; ret = -errno;
error_setg_errno(errp, errno, "Could not open '%s'", filename);
if (ret == -EROFS) { if (ret == -EROFS) {
ret = -EACCES; ret = -EACCES;
} }
@@ -493,31 +443,15 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
} }
s->fd = fd; s->fd = fd;
s->lock_fd = -1;
if (s->use_lock) {
fd = qemu_open(filename, s->open_flags);
if (fd < 0) {
ret = -errno;
error_setg_errno(errp, errno, "Could not open '%s' for locking",
filename);
qemu_close(s->fd);
goto fail;
}
s->lock_fd = fd;
}
s->perm = 0;
s->shared_perm = BLK_PERM_ALL;
#ifdef CONFIG_LINUX_AIO #ifdef CONFIG_LINUX_AIO
/* Currently Linux does AIO only for files opened with O_DIRECT */ if (!raw_use_aio(bdrv_flags) && (bdrv_flags & BDRV_O_NATIVE_AIO)) {
if (s->use_linux_aio && !(s->open_flags & O_DIRECT)) {
error_setg(errp, "aio=native was specified, but it requires " error_setg(errp, "aio=native was specified, but it requires "
"cache.direct=on, which was not specified."); "cache.direct=on, which was not specified.");
ret = -EINVAL; ret = -EINVAL;
goto fail; goto fail;
} }
#else #else
if (s->use_linux_aio) { if (bdrv_flags & BDRV_O_NATIVE_AIO) {
error_setg(errp, "aio=native was specified, but is not supported " error_setg(errp, "aio=native was specified, but is not supported "
"in this build."); "in this build.");
ret = -EINVAL; ret = -EINVAL;
@@ -595,166 +529,11 @@ static int raw_open(BlockDriverState *bs, QDict *options, int flags,
return raw_open_common(bs, options, flags, 0, errp); return raw_open_common(bs, options, flags, 0, errp);
} }
typedef enum {
RAW_PL_PREPARE,
RAW_PL_COMMIT,
RAW_PL_ABORT,
} RawPermLockOp;
#define PERM_FOREACH(i) \
for ((i) = 0; (1ULL << (i)) <= BLK_PERM_ALL; i++)
/* Lock bytes indicated by @perm_lock_bits and @shared_perm_lock_bits in the
* file; if @unlock == true, also unlock the unneeded bytes.
* @shared_perm_lock_bits is the mask of all permissions that are NOT shared.
*/
static int raw_apply_lock_bytes(BDRVRawState *s,
uint64_t perm_lock_bits,
uint64_t shared_perm_lock_bits,
bool unlock, Error **errp)
{
int ret;
int i;
PERM_FOREACH(i) {
int off = RAW_LOCK_PERM_BASE + i;
if (perm_lock_bits & (1ULL << i)) {
ret = qemu_lock_fd(s->lock_fd, off, 1, false);
if (ret) {
error_setg(errp, "Failed to lock byte %d", off);
return ret;
}
} else if (unlock) {
ret = qemu_unlock_fd(s->lock_fd, off, 1);
if (ret) {
error_setg(errp, "Failed to unlock byte %d", off);
return ret;
}
}
}
PERM_FOREACH(i) {
int off = RAW_LOCK_SHARED_BASE + i;
if (shared_perm_lock_bits & (1ULL << i)) {
ret = qemu_lock_fd(s->lock_fd, off, 1, false);
if (ret) {
error_setg(errp, "Failed to lock byte %d", off);
return ret;
}
} else if (unlock) {
ret = qemu_unlock_fd(s->lock_fd, off, 1);
if (ret) {
error_setg(errp, "Failed to unlock byte %d", off);
return ret;
}
}
}
return 0;
}
/* Check "unshared" bytes implied by @perm and ~@shared_perm in the file. */
static int raw_check_lock_bytes(BDRVRawState *s,
uint64_t perm, uint64_t shared_perm,
Error **errp)
{
int ret;
int i;
PERM_FOREACH(i) {
int off = RAW_LOCK_SHARED_BASE + i;
uint64_t p = 1ULL << i;
if (perm & p) {
ret = qemu_lock_fd_test(s->lock_fd, off, 1, true);
if (ret) {
char *perm_name = bdrv_perm_names(p);
error_setg(errp,
"Failed to get \"%s\" lock",
perm_name);
g_free(perm_name);
error_append_hint(errp,
"Is another process using the image?\n");
return ret;
}
}
}
PERM_FOREACH(i) {
int off = RAW_LOCK_PERM_BASE + i;
uint64_t p = 1ULL << i;
if (!(shared_perm & p)) {
ret = qemu_lock_fd_test(s->lock_fd, off, 1, true);
if (ret) {
char *perm_name = bdrv_perm_names(p);
error_setg(errp,
"Failed to get shared \"%s\" lock",
perm_name);
g_free(perm_name);
error_append_hint(errp,
"Is another process using the image?\n");
return ret;
}
}
}
return 0;
}
static int raw_handle_perm_lock(BlockDriverState *bs,
RawPermLockOp op,
uint64_t new_perm, uint64_t new_shared,
Error **errp)
{
BDRVRawState *s = bs->opaque;
int ret = 0;
Error *local_err = NULL;
if (!s->use_lock) {
return 0;
}
if (bdrv_get_flags(bs) & BDRV_O_INACTIVE) {
return 0;
}
assert(s->lock_fd > 0);
switch (op) {
case RAW_PL_PREPARE:
ret = raw_apply_lock_bytes(s, s->perm | new_perm,
~s->shared_perm | ~new_shared,
false, errp);
if (!ret) {
ret = raw_check_lock_bytes(s, new_perm, new_shared, errp);
if (!ret) {
return 0;
}
}
op = RAW_PL_ABORT;
/* fall through to unlock bytes. */
case RAW_PL_ABORT:
raw_apply_lock_bytes(s, s->perm, ~s->shared_perm, true, &local_err);
if (local_err) {
/* Theoretically the above call only unlocks bytes and it cannot
* fail. Something weird happened, report it.
*/
error_report_err(local_err);
}
break;
case RAW_PL_COMMIT:
raw_apply_lock_bytes(s, new_perm, ~new_shared, true, &local_err);
if (local_err) {
/* Theoretically the above call only unlocks bytes and it cannot
* fail. Something weird happened, report it.
*/
error_report_err(local_err);
}
break;
}
return ret;
}
static int raw_reopen_prepare(BDRVReopenState *state, static int raw_reopen_prepare(BDRVReopenState *state,
BlockReopenQueue *queue, Error **errp) BlockReopenQueue *queue, Error **errp)
{ {
BDRVRawState *s; BDRVRawState *s;
BDRVRawReopenState *rs; BDRVRawReopenState *raw_s;
int ret = 0; int ret = 0;
Error *local_err = NULL; Error *local_err = NULL;
@@ -764,15 +543,15 @@ static int raw_reopen_prepare(BDRVReopenState *state,
s = state->bs->opaque; s = state->bs->opaque;
state->opaque = g_new0(BDRVRawReopenState, 1); state->opaque = g_new0(BDRVRawReopenState, 1);
rs = state->opaque; raw_s = state->opaque;
if (s->type == FTYPE_CD) { if (s->type == FTYPE_CD) {
rs->open_flags |= O_NONBLOCK; raw_s->open_flags |= O_NONBLOCK;
} }
raw_parse_flags(state->flags, &rs->open_flags); raw_parse_flags(state->flags, &raw_s->open_flags);
rs->fd = -1; raw_s->fd = -1;
int fcntl_flags = O_APPEND | O_NONBLOCK; int fcntl_flags = O_APPEND | O_NONBLOCK;
#ifdef O_NOATIME #ifdef O_NOATIME
@@ -781,35 +560,35 @@ static int raw_reopen_prepare(BDRVReopenState *state,
#ifdef O_ASYNC #ifdef O_ASYNC
/* Not all operating systems have O_ASYNC, and those that don't /* Not all operating systems have O_ASYNC, and those that don't
* will not let us track the state into rs->open_flags (typically * will not let us track the state into raw_s->open_flags (typically
* you achieve the same effect with an ioctl, for example I_SETSIG * you achieve the same effect with an ioctl, for example I_SETSIG
* on Solaris). But we do not use O_ASYNC, so that's fine. * on Solaris). But we do not use O_ASYNC, so that's fine.
*/ */
assert((s->open_flags & O_ASYNC) == 0); assert((s->open_flags & O_ASYNC) == 0);
#endif #endif
if ((rs->open_flags & ~fcntl_flags) == (s->open_flags & ~fcntl_flags)) { if ((raw_s->open_flags & ~fcntl_flags) == (s->open_flags & ~fcntl_flags)) {
/* dup the original fd */ /* dup the original fd */
rs->fd = qemu_dup(s->fd); raw_s->fd = qemu_dup(s->fd);
if (rs->fd >= 0) { if (raw_s->fd >= 0) {
ret = fcntl_setfl(rs->fd, rs->open_flags); ret = fcntl_setfl(raw_s->fd, raw_s->open_flags);
if (ret) { if (ret) {
qemu_close(rs->fd); qemu_close(raw_s->fd);
rs->fd = -1; raw_s->fd = -1;
} }
} }
} }
/* If we cannot use fcntl, or fcntl failed, fall back to qemu_open() */ /* If we cannot use fcntl, or fcntl failed, fall back to qemu_open() */
if (rs->fd == -1) { if (raw_s->fd == -1) {
const char *normalized_filename = state->bs->filename; const char *normalized_filename = state->bs->filename;
ret = raw_normalize_devicepath(&normalized_filename); ret = raw_normalize_devicepath(&normalized_filename);
if (ret < 0) { if (ret < 0) {
error_setg_errno(errp, -ret, "Could not normalize device path"); error_setg_errno(errp, -ret, "Could not normalize device path");
} else { } else {
assert(!(rs->open_flags & O_CREAT)); assert(!(raw_s->open_flags & O_CREAT));
rs->fd = qemu_open(normalized_filename, rs->open_flags); raw_s->fd = qemu_open(normalized_filename, raw_s->open_flags);
if (rs->fd == -1) { if (raw_s->fd == -1) {
error_setg_errno(errp, errno, "Could not reopen file"); error_setg_errno(errp, errno, "Could not reopen file");
ret = -1; ret = -1;
} }
@@ -818,11 +597,11 @@ static int raw_reopen_prepare(BDRVReopenState *state,
/* Fail already reopen_prepare() if we can't get a working O_DIRECT /* Fail already reopen_prepare() if we can't get a working O_DIRECT
* alignment with the new fd. */ * alignment with the new fd. */
if (rs->fd != -1) { if (raw_s->fd != -1) {
raw_probe_alignment(state->bs, rs->fd, &local_err); raw_probe_alignment(state->bs, raw_s->fd, &local_err);
if (local_err) { if (local_err) {
qemu_close(rs->fd); qemu_close(raw_s->fd);
rs->fd = -1; raw_s->fd = -1;
error_propagate(errp, local_err); error_propagate(errp, local_err);
ret = -EINVAL; ret = -EINVAL;
} }
@@ -833,13 +612,13 @@ static int raw_reopen_prepare(BDRVReopenState *state,
static void raw_reopen_commit(BDRVReopenState *state) static void raw_reopen_commit(BDRVReopenState *state)
{ {
BDRVRawReopenState *rs = state->opaque; BDRVRawReopenState *raw_s = state->opaque;
BDRVRawState *s = state->bs->opaque; BDRVRawState *s = state->bs->opaque;
s->open_flags = rs->open_flags; s->open_flags = raw_s->open_flags;
qemu_close(s->fd); qemu_close(s->fd);
s->fd = rs->fd; s->fd = raw_s->fd;
g_free(state->opaque); g_free(state->opaque);
state->opaque = NULL; state->opaque = NULL;
@@ -848,30 +627,27 @@ static void raw_reopen_commit(BDRVReopenState *state)
static void raw_reopen_abort(BDRVReopenState *state) static void raw_reopen_abort(BDRVReopenState *state)
{ {
BDRVRawReopenState *rs = state->opaque; BDRVRawReopenState *raw_s = state->opaque;
/* nothing to do if NULL, we didn't get far enough */ /* nothing to do if NULL, we didn't get far enough */
if (rs == NULL) { if (raw_s == NULL) {
return; return;
} }
if (rs->fd >= 0) { if (raw_s->fd >= 0) {
qemu_close(rs->fd); qemu_close(raw_s->fd);
rs->fd = -1; raw_s->fd = -1;
} }
g_free(state->opaque); g_free(state->opaque);
state->opaque = NULL; state->opaque = NULL;
} }
static int hdev_get_max_transfer_length(BlockDriverState *bs, int fd) static int hdev_get_max_transfer_length(int fd)
{ {
#ifdef BLKSECTGET #ifdef BLKSECTGET
int max_bytes = 0; int max_sectors = 0;
short max_sectors = 0; if (ioctl(fd, BLKSECTGET, &max_sectors) == 0) {
if (bs->sg && ioctl(fd, BLKSECTGET, &max_bytes) == 0) { return max_sectors;
return max_bytes;
} else if (!bs->sg && ioctl(fd, BLKSECTGET, &max_sectors) == 0) {
return max_sectors << BDRV_SECTOR_BITS;
} else { } else {
return -errno; return -errno;
} }
@@ -880,66 +656,16 @@ static int hdev_get_max_transfer_length(BlockDriverState *bs, int fd)
#endif #endif
} }
static int hdev_get_max_segments(const struct stat *st)
{
#ifdef CONFIG_LINUX
char buf[32];
const char *end;
char *sysfspath;
int ret;
int fd = -1;
long max_segments;
sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/max_segments",
major(st->st_rdev), minor(st->st_rdev));
fd = open(sysfspath, O_RDONLY);
if (fd == -1) {
ret = -errno;
goto out;
}
do {
ret = read(fd, buf, sizeof(buf) - 1);
} while (ret == -1 && errno == EINTR);
if (ret < 0) {
ret = -errno;
goto out;
} else if (ret == 0) {
ret = -EIO;
goto out;
}
buf[ret] = 0;
/* The file is ended with '\n', pass 'end' to accept that. */
ret = qemu_strtol(buf, &end, 10, &max_segments);
if (ret == 0 && end && *end == '\n') {
ret = max_segments;
}
out:
if (fd != -1) {
close(fd);
}
g_free(sysfspath);
return ret;
#else
return -ENOTSUP;
#endif
}
static void raw_refresh_limits(BlockDriverState *bs, Error **errp) static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
{ {
BDRVRawState *s = bs->opaque; BDRVRawState *s = bs->opaque;
struct stat st; struct stat st;
if (!fstat(s->fd, &st)) { if (!fstat(s->fd, &st)) {
if (S_ISBLK(st.st_mode) || S_ISCHR(st.st_mode)) { if (S_ISBLK(st.st_mode)) {
int ret = hdev_get_max_transfer_length(bs, s->fd); int ret = hdev_get_max_transfer_length(s->fd);
if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) { if (ret > 0 && ret <= BDRV_REQUEST_MAX_SECTORS) {
bs->bl.max_transfer = pow2floor(ret); bs->bl.max_transfer = pow2floor(ret << BDRV_SECTOR_BITS);
}
ret = hdev_get_max_segments(&st);
if (ret > 0) {
bs->bl.max_transfer = MIN(bs->bl.max_transfer,
ret * getpagesize());
} }
} }
} }
@@ -1036,31 +762,10 @@ static ssize_t handle_aiocb_ioctl(RawPosixAIOData *aiocb)
static ssize_t handle_aiocb_flush(RawPosixAIOData *aiocb) static ssize_t handle_aiocb_flush(RawPosixAIOData *aiocb)
{ {
BDRVRawState *s = aiocb->bs->opaque;
int ret; int ret;
if (s->page_cache_inconsistent) {
return -EIO;
}
ret = qemu_fdatasync(aiocb->aio_fildes); ret = qemu_fdatasync(aiocb->aio_fildes);
if (ret == -1) { if (ret == -1) {
/* There is no clear definition of the semantics of a failing fsync(),
* so we may have to assume the worst. The sad truth is that this
* assumption is correct for Linux. Some pages are now probably marked
* clean in the page cache even though they are inconsistent with the
* on-disk contents. The next fdatasync() call would succeed, but no
* further writeback attempt will be made. We can't get back to a state
* in which we know what is on disk (we would have to rewrite
* everything that was touched since the last fdatasync() at least), so
* make bdrv_flush() fail permanently. Given that the behaviour isn't
* really defined, I have little hope that other OSes are doing better.
*
* Obviously, this doesn't affect O_DIRECT, which bypasses the page
* cache. */
if ((s->open_flags & O_DIRECT) == 0) {
s->page_cache_inconsistent = true;
}
return -errno; return -errno;
} }
return 0; return 0;
@@ -1551,7 +1256,7 @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
if (!bdrv_qiov_is_aligned(bs, qiov)) { if (!bdrv_qiov_is_aligned(bs, qiov)) {
type |= QEMU_AIO_MISALIGNED; type |= QEMU_AIO_MISALIGNED;
#ifdef CONFIG_LINUX_AIO #ifdef CONFIG_LINUX_AIO
} else if (s->use_linux_aio) { } else if (bs->open_flags & BDRV_O_NATIVE_AIO) {
LinuxAioState *aio = aio_get_linux_aio(bdrv_get_aio_context(bs)); LinuxAioState *aio = aio_get_linux_aio(bdrv_get_aio_context(bs));
assert(qiov->size == bytes); assert(qiov->size == bytes);
return laio_co_submit(bs, aio, s->fd, offset, qiov, type); return laio_co_submit(bs, aio, s->fd, offset, qiov, type);
@@ -1580,8 +1285,7 @@ static int coroutine_fn raw_co_pwritev(BlockDriverState *bs, uint64_t offset,
static void raw_aio_plug(BlockDriverState *bs) static void raw_aio_plug(BlockDriverState *bs)
{ {
#ifdef CONFIG_LINUX_AIO #ifdef CONFIG_LINUX_AIO
BDRVRawState *s = bs->opaque; if (bs->open_flags & BDRV_O_NATIVE_AIO) {
if (s->use_linux_aio) {
LinuxAioState *aio = aio_get_linux_aio(bdrv_get_aio_context(bs)); LinuxAioState *aio = aio_get_linux_aio(bdrv_get_aio_context(bs));
laio_io_plug(bs, aio); laio_io_plug(bs, aio);
} }
@@ -1591,8 +1295,7 @@ static void raw_aio_plug(BlockDriverState *bs)
static void raw_aio_unplug(BlockDriverState *bs) static void raw_aio_unplug(BlockDriverState *bs)
{ {
#ifdef CONFIG_LINUX_AIO #ifdef CONFIG_LINUX_AIO
BDRVRawState *s = bs->opaque; if (bs->open_flags & BDRV_O_NATIVE_AIO) {
if (s->use_linux_aio) {
LinuxAioState *aio = aio_get_linux_aio(bdrv_get_aio_context(bs)); LinuxAioState *aio = aio_get_linux_aio(bdrv_get_aio_context(bs));
laio_io_unplug(bs, aio); laio_io_unplug(bs, aio);
} }
@@ -1618,37 +1321,26 @@ static void raw_close(BlockDriverState *bs)
qemu_close(s->fd); qemu_close(s->fd);
s->fd = -1; s->fd = -1;
} }
if (s->lock_fd >= 0) {
qemu_close(s->lock_fd);
s->lock_fd = -1;
}
} }
static int raw_truncate(BlockDriverState *bs, int64_t offset, Error **errp) static int raw_truncate(BlockDriverState *bs, int64_t offset)
{ {
BDRVRawState *s = bs->opaque; BDRVRawState *s = bs->opaque;
struct stat st; struct stat st;
int ret;
if (fstat(s->fd, &st)) { if (fstat(s->fd, &st)) {
ret = -errno; return -errno;
error_setg_errno(errp, -ret, "Failed to fstat() the file");
return ret;
} }
if (S_ISREG(st.st_mode)) { if (S_ISREG(st.st_mode)) {
if (ftruncate(s->fd, offset) < 0) { if (ftruncate(s->fd, offset) < 0) {
ret = -errno; return -errno;
error_setg_errno(errp, -ret, "Failed to resize the file");
return ret;
} }
} else if (S_ISCHR(st.st_mode) || S_ISBLK(st.st_mode)) { } else if (S_ISCHR(st.st_mode) || S_ISBLK(st.st_mode)) {
if (offset > raw_getlength(bs)) { if (offset > raw_getlength(bs)) {
error_setg(errp, "Cannot grow device files");
return -EINVAL; return -EINVAL;
} }
} else { } else {
error_setg(errp, "Resizing this file is not supported");
return -ENOTSUP; return -ENOTSUP;
} }
@@ -1885,17 +1577,18 @@ static int raw_create(const char *filename, QemuOpts *opts, Error **errp)
#endif #endif
} }
if (ftruncate(fd, total_size) != 0) {
result = -errno;
error_setg_errno(errp, -result, "Could not resize file");
goto out_close;
}
switch (prealloc) { switch (prealloc) {
#ifdef CONFIG_POSIX_FALLOCATE #ifdef CONFIG_POSIX_FALLOCATE
case PREALLOC_MODE_FALLOC: case PREALLOC_MODE_FALLOC:
/* /* posix_fallocate() doesn't set errno. */
* Truncating before posix_fallocate() makes it about twice slower on
* file systems that do not support fallocate(), trying to check if a
* block is allocated before allocating it, so don't do that here.
*/
result = -posix_fallocate(fd, 0, total_size); result = -posix_fallocate(fd, 0, total_size);
if (result != 0) { if (result != 0) {
/* posix_fallocate() doesn't set errno. */
error_setg_errno(errp, -result, error_setg_errno(errp, -result,
"Could not preallocate data for the new file"); "Could not preallocate data for the new file");
} }
@@ -1903,17 +1596,6 @@ static int raw_create(const char *filename, QemuOpts *opts, Error **errp)
#endif #endif
case PREALLOC_MODE_FULL: case PREALLOC_MODE_FULL:
{ {
/*
* Knowing the final size from the beginning could allow the file
* system driver to do less allocations and possibly avoid
* fragmentation of the file.
*/
if (ftruncate(fd, total_size) != 0) {
result = -errno;
error_setg_errno(errp, -result, "Could not resize file");
goto out_close;
}
int64_t num = 0, left = total_size; int64_t num = 0, left = total_size;
buf = g_malloc0(65536); buf = g_malloc0(65536);
@@ -1940,10 +1622,6 @@ static int raw_create(const char *filename, QemuOpts *opts, Error **errp)
break; break;
} }
case PREALLOC_MODE_OFF: case PREALLOC_MODE_OFF:
if (ftruncate(fd, total_size) != 0) {
result = -errno;
error_setg_errno(errp, -result, "Could not resize file");
}
break; break;
default: default:
result = -EINVAL; result = -EINVAL;
@@ -2166,25 +1844,6 @@ static QemuOptsList raw_create_opts = {
} }
}; };
static int raw_check_perm(BlockDriverState *bs, uint64_t perm, uint64_t shared,
Error **errp)
{
return raw_handle_perm_lock(bs, RAW_PL_PREPARE, perm, shared, errp);
}
static void raw_set_perm(BlockDriverState *bs, uint64_t perm, uint64_t shared)
{
BDRVRawState *s = bs->opaque;
raw_handle_perm_lock(bs, RAW_PL_COMMIT, perm, shared, NULL);
s->perm = perm;
s->shared_perm = shared;
}
static void raw_abort_perm_update(BlockDriverState *bs)
{
raw_handle_perm_lock(bs, RAW_PL_ABORT, 0, 0, NULL);
}
BlockDriver bdrv_file = { BlockDriver bdrv_file = {
.format_name = "file", .format_name = "file",
.protocol_name = "file", .protocol_name = "file",
@@ -2215,9 +1874,7 @@ BlockDriver bdrv_file = {
.bdrv_get_info = raw_get_info, .bdrv_get_info = raw_get_info,
.bdrv_get_allocated_file_size .bdrv_get_allocated_file_size
= raw_get_allocated_file_size, = raw_get_allocated_file_size,
.bdrv_check_perm = raw_check_perm,
.bdrv_set_perm = raw_set_perm,
.bdrv_abort_perm_update = raw_abort_perm_update,
.create_opts = &raw_create_opts, .create_opts = &raw_create_opts,
}; };
@@ -2390,7 +2047,10 @@ static int check_hdev_writable(BDRVRawState *s)
static void hdev_parse_filename(const char *filename, QDict *options, static void hdev_parse_filename(const char *filename, QDict *options,
Error **errp) Error **errp)
{ {
bdrv_parse_filename_strip_prefix(filename, "host_device:", options); /* The prefix is optional, just as for "file". */
strstart(filename, "host_device:", &filename);
qdict_put_obj(options, "filename", QOBJECT(qstring_from_str(filename)));
} }
static bool hdev_is_sg(BlockDriverState *bs) static bool hdev_is_sg(BlockDriverState *bs)
@@ -2398,23 +2058,13 @@ static bool hdev_is_sg(BlockDriverState *bs)
#if defined(__linux__) #if defined(__linux__)
BDRVRawState *s = bs->opaque;
struct stat st; struct stat st;
struct sg_scsi_id scsiid; struct sg_scsi_id scsiid;
int sg_version; int sg_version;
int ret;
if (stat(bs->filename, &st) < 0 || !S_ISCHR(st.st_mode)) { if (stat(bs->filename, &st) >= 0 && S_ISCHR(st.st_mode) &&
return false; !bdrv_ioctl(bs, SG_GET_VERSION_NUM, &sg_version) &&
} !bdrv_ioctl(bs, SG_GET_SCSI_ID, &scsiid)) {
ret = ioctl(s->fd, SG_GET_VERSION_NUM, &sg_version);
if (ret < 0) {
return false;
}
ret = ioctl(s->fd, SG_GET_SCSI_ID, &scsiid);
if (ret >= 0) {
DPRINTF("SG device found: type=%d, version=%d\n", DPRINTF("SG device found: type=%d, version=%d\n",
scsiid.scsi_type, sg_version); scsiid.scsi_type, sg_version);
return true; return true;
@@ -2433,12 +2083,6 @@ static int hdev_open(BlockDriverState *bs, QDict *options, int flags,
int ret; int ret;
#if defined(__APPLE__) && defined(__MACH__) #if defined(__APPLE__) && defined(__MACH__)
/*
* Caution: while qdict_get_str() is fine, getting non-string types
* would require more care. When @options come from -blockdev or
* blockdev_add, its members are typed according to the QAPI
* schema, but when they come from -drive, they're all QString.
*/
const char *filename = qdict_get_str(options, "filename"); const char *filename = qdict_get_str(options, "filename");
char bsd_path[MAXPATHLEN] = ""; char bsd_path[MAXPATHLEN] = "";
bool error_occurred = false; bool error_occurred = false;
@@ -2479,7 +2123,7 @@ static int hdev_open(BlockDriverState *bs, QDict *options, int flags,
goto hdev_open_Mac_error; goto hdev_open_Mac_error;
} }
qdict_put_str(options, "filename", bsd_path); qdict_put(options, "filename", qstring_from_str(bsd_path));
hdev_open_Mac_error: hdev_open_Mac_error:
g_free(mediaType); g_free(mediaType);
@@ -2673,9 +2317,6 @@ static BlockDriver bdrv_host_device = {
.bdrv_get_info = raw_get_info, .bdrv_get_info = raw_get_info,
.bdrv_get_allocated_file_size .bdrv_get_allocated_file_size
= raw_get_allocated_file_size, = raw_get_allocated_file_size,
.bdrv_check_perm = raw_check_perm,
.bdrv_set_perm = raw_set_perm,
.bdrv_abort_perm_update = raw_abort_perm_update,
.bdrv_probe_blocksizes = hdev_probe_blocksizes, .bdrv_probe_blocksizes = hdev_probe_blocksizes,
.bdrv_probe_geometry = hdev_probe_geometry, .bdrv_probe_geometry = hdev_probe_geometry,
@@ -2689,7 +2330,10 @@ static BlockDriver bdrv_host_device = {
static void cdrom_parse_filename(const char *filename, QDict *options, static void cdrom_parse_filename(const char *filename, QDict *options,
Error **errp) Error **errp)
{ {
bdrv_parse_filename_strip_prefix(filename, "host_cdrom:", options); /* The prefix is optional, just as for "file". */
strstart(filename, "host_cdrom:", &filename);
qdict_put_obj(options, "filename", QOBJECT(qstring_from_str(filename)));
} }
#endif #endif

View File

@@ -24,6 +24,7 @@
#include "qemu/osdep.h" #include "qemu/osdep.h"
#include "qapi/error.h" #include "qapi/error.h"
#include "qemu/cutils.h" #include "qemu/cutils.h"
#include "qemu/timer.h"
#include "block/block_int.h" #include "block/block_int.h"
#include "qemu/module.h" #include "qemu/module.h"
#include "block/raw-aio.h" #include "block/raw-aio.h"
@@ -31,7 +32,6 @@
#include "block/thread-pool.h" #include "block/thread-pool.h"
#include "qemu/iov.h" #include "qemu/iov.h"
#include "qapi/qmp/qstring.h" #include "qapi/qmp/qstring.h"
#include "qapi/util.h"
#include <windows.h> #include <windows.h>
#include <winioctl.h> #include <winioctl.h>
@@ -252,8 +252,7 @@ static void raw_probe_alignment(BlockDriverState *bs, Error **errp)
} }
} }
static void raw_parse_flags(int flags, bool use_aio, int *access_flags, static void raw_parse_flags(int flags, int *access_flags, DWORD *overlapped)
DWORD *overlapped)
{ {
assert(access_flags != NULL); assert(access_flags != NULL);
assert(overlapped != NULL); assert(overlapped != NULL);
@@ -265,7 +264,7 @@ static void raw_parse_flags(int flags, bool use_aio, int *access_flags,
} }
*overlapped = FILE_ATTRIBUTE_NORMAL; *overlapped = FILE_ATTRIBUTE_NORMAL;
if (use_aio) { if (flags & BDRV_O_NATIVE_AIO) {
*overlapped |= FILE_FLAG_OVERLAPPED; *overlapped |= FILE_FLAG_OVERLAPPED;
} }
if (flags & BDRV_O_NOCACHE) { if (flags & BDRV_O_NOCACHE) {
@@ -276,7 +275,12 @@ static void raw_parse_flags(int flags, bool use_aio, int *access_flags,
static void raw_parse_filename(const char *filename, QDict *options, static void raw_parse_filename(const char *filename, QDict *options,
Error **errp) Error **errp)
{ {
bdrv_parse_filename_strip_prefix(filename, "file:", options); /* The filename does not have to be prefixed by the protocol name, since
* "file" is the default protocol; therefore, the return value of this
* function call can be ignored. */
strstart(filename, "file:", &filename);
qdict_put_obj(options, "filename", QOBJECT(qstring_from_str(filename)));
} }
static QemuOptsList raw_runtime_opts = { static QemuOptsList raw_runtime_opts = {
@@ -288,35 +292,10 @@ static QemuOptsList raw_runtime_opts = {
.type = QEMU_OPT_STRING, .type = QEMU_OPT_STRING,
.help = "File name of the image", .help = "File name of the image",
}, },
{
.name = "aio",
.type = QEMU_OPT_STRING,
.help = "host AIO implementation (threads, native)",
},
{ /* end of list */ } { /* end of list */ }
}, },
}; };
static bool get_aio_option(QemuOpts *opts, int flags, Error **errp)
{
BlockdevAioOptions aio, aio_default;
aio_default = (flags & BDRV_O_NATIVE_AIO) ? BLOCKDEV_AIO_OPTIONS_NATIVE
: BLOCKDEV_AIO_OPTIONS_THREADS;
aio = qapi_enum_parse(BlockdevAioOptions_lookup, qemu_opt_get(opts, "aio"),
BLOCKDEV_AIO_OPTIONS__MAX, aio_default, errp);
switch (aio) {
case BLOCKDEV_AIO_OPTIONS_NATIVE:
return true;
case BLOCKDEV_AIO_OPTIONS_THREADS:
return false;
default:
error_setg(errp, "Invalid AIO option");
}
return false;
}
static int raw_open(BlockDriverState *bs, QDict *options, int flags, static int raw_open(BlockDriverState *bs, QDict *options, int flags,
Error **errp) Error **errp)
{ {
@@ -326,7 +305,6 @@ static int raw_open(BlockDriverState *bs, QDict *options, int flags,
QemuOpts *opts; QemuOpts *opts;
Error *local_err = NULL; Error *local_err = NULL;
const char *filename; const char *filename;
bool use_aio;
int ret; int ret;
s->type = FTYPE_FILE; s->type = FTYPE_FILE;
@@ -339,22 +317,9 @@ static int raw_open(BlockDriverState *bs, QDict *options, int flags,
goto fail; goto fail;
} }
if (qdict_get_try_bool(options, "locking", false)) {
error_setg(errp, "locking=on is not supported on Windows");
ret = -EINVAL;
goto fail;
}
filename = qemu_opt_get(opts, "filename"); filename = qemu_opt_get(opts, "filename");
use_aio = get_aio_option(opts, flags, &local_err); raw_parse_flags(flags, &access_flags, &overlapped);
if (local_err) {
error_propagate(errp, local_err);
ret = -EINVAL;
goto fail;
}
raw_parse_flags(flags, use_aio, &access_flags, &overlapped);
if (filename[0] && filename[1] == ':') { if (filename[0] && filename[1] == ':') {
snprintf(s->drive_path, sizeof(s->drive_path), "%c:\\", filename[0]); snprintf(s->drive_path, sizeof(s->drive_path), "%c:\\", filename[0]);
@@ -373,7 +338,6 @@ static int raw_open(BlockDriverState *bs, QDict *options, int flags,
if (s->hfile == INVALID_HANDLE_VALUE) { if (s->hfile == INVALID_HANDLE_VALUE) {
int err = GetLastError(); int err = GetLastError();
error_setg_win32(errp, err, "Could not open '%s'", filename);
if (err == ERROR_ACCESS_DENIED) { if (err == ERROR_ACCESS_DENIED) {
ret = -EACCES; ret = -EACCES;
} else { } else {
@@ -382,7 +346,7 @@ static int raw_open(BlockDriverState *bs, QDict *options, int flags,
goto fail; goto fail;
} }
if (use_aio) { if (flags & BDRV_O_NATIVE_AIO) {
s->aio = win32_aio_init(); s->aio = win32_aio_init();
if (s->aio == NULL) { if (s->aio == NULL) {
CloseHandle(s->hfile); CloseHandle(s->hfile);
@@ -461,7 +425,7 @@ static void raw_close(BlockDriverState *bs)
} }
} }
static int raw_truncate(BlockDriverState *bs, int64_t offset, Error **errp) static int raw_truncate(BlockDriverState *bs, int64_t offset)
{ {
BDRVRawState *s = bs->opaque; BDRVRawState *s = bs->opaque;
LONG low, high; LONG low, high;
@@ -476,11 +440,11 @@ static int raw_truncate(BlockDriverState *bs, int64_t offset, Error **errp)
*/ */
dwPtrLow = SetFilePointer(s->hfile, low, &high, FILE_BEGIN); dwPtrLow = SetFilePointer(s->hfile, low, &high, FILE_BEGIN);
if (dwPtrLow == INVALID_SET_FILE_POINTER && GetLastError() != NO_ERROR) { if (dwPtrLow == INVALID_SET_FILE_POINTER && GetLastError() != NO_ERROR) {
error_setg_win32(errp, GetLastError(), "SetFilePointer error"); fprintf(stderr, "SetFilePointer error: %lu\n", GetLastError());
return -EIO; return -EIO;
} }
if (SetEndOfFile(s->hfile) == 0) { if (SetEndOfFile(s->hfile) == 0) {
error_setg_win32(errp, GetLastError(), "SetEndOfFile error"); fprintf(stderr, "SetEndOfFile error: %lu\n", GetLastError());
return -EIO; return -EIO;
} }
return 0; return 0;
@@ -666,7 +630,10 @@ static int hdev_probe_device(const char *filename)
static void hdev_parse_filename(const char *filename, QDict *options, static void hdev_parse_filename(const char *filename, QDict *options,
Error **errp) Error **errp)
{ {
bdrv_parse_filename_strip_prefix(filename, "host_device:", options); /* The prefix is optional, just as for "file". */
strstart(filename, "host_device:", &filename);
qdict_put_obj(options, "filename", QOBJECT(qstring_from_str(filename)));
} }
static int hdev_open(BlockDriverState *bs, QDict *options, int flags, static int hdev_open(BlockDriverState *bs, QDict *options, int flags,
@@ -680,7 +647,6 @@ static int hdev_open(BlockDriverState *bs, QDict *options, int flags,
Error *local_err = NULL; Error *local_err = NULL;
const char *filename; const char *filename;
bool use_aio;
QemuOpts *opts = qemu_opts_create(&raw_runtime_opts, NULL, 0, QemuOpts *opts = qemu_opts_create(&raw_runtime_opts, NULL, 0,
&error_abort); &error_abort);
@@ -693,16 +659,6 @@ static int hdev_open(BlockDriverState *bs, QDict *options, int flags,
filename = qemu_opt_get(opts, "filename"); filename = qemu_opt_get(opts, "filename");
use_aio = get_aio_option(opts, flags, &local_err);
if (!local_err && use_aio) {
error_setg(&local_err, "AIO is not supported on Windows host devices");
}
if (local_err) {
error_propagate(errp, local_err);
ret = -EINVAL;
goto done;
}
if (strstart(filename, "/dev/cdrom", NULL)) { if (strstart(filename, "/dev/cdrom", NULL)) {
if (find_cdrom(device_name, sizeof(device_name)) < 0) { if (find_cdrom(device_name, sizeof(device_name)) < 0) {
error_setg(errp, "Could not open CD-ROM drive"); error_setg(errp, "Could not open CD-ROM drive");
@@ -721,7 +677,7 @@ static int hdev_open(BlockDriverState *bs, QDict *options, int flags,
} }
s->type = find_device_type(bs, filename); s->type = find_device_type(bs, filename);
raw_parse_flags(flags, use_aio, &access_flags, &overlapped); raw_parse_flags(flags, &access_flags, &overlapped);
create_flags = OPEN_EXISTING; create_flags = OPEN_EXISTING;

View File

@@ -1,4 +1,4 @@
/* BlockDriver implementation for "raw" format driver /* BlockDriver implementation for "raw"
* *
* Copyright (C) 2010-2016 Red Hat, Inc. * Copyright (C) 2010-2016 Red Hat, Inc.
* Copyright (C) 2010, Blue Swirl <blauwirbel@gmail.com> * Copyright (C) 2010, Blue Swirl <blauwirbel@gmail.com>
@@ -31,30 +31,6 @@
#include "qapi/error.h" #include "qapi/error.h"
#include "qemu/option.h" #include "qemu/option.h"
typedef struct BDRVRawState {
uint64_t offset;
uint64_t size;
bool has_size;
} BDRVRawState;
static QemuOptsList raw_runtime_opts = {
.name = "raw",
.head = QTAILQ_HEAD_INITIALIZER(raw_runtime_opts.head),
.desc = {
{
.name = "offset",
.type = QEMU_OPT_SIZE,
.help = "offset in the disk where the image starts",
},
{
.name = "size",
.type = QEMU_OPT_SIZE,
.help = "virtual disk size",
},
{ /* end of list */ }
},
};
static QemuOptsList raw_create_opts = { static QemuOptsList raw_create_opts = {
.name = "raw-create-opts", .name = "raw-create-opts",
.head = QTAILQ_HEAD_INITIALIZER(raw_create_opts.head), .head = QTAILQ_HEAD_INITIALIZER(raw_create_opts.head),
@@ -68,116 +44,16 @@ static QemuOptsList raw_create_opts = {
} }
}; };
static int raw_read_options(QDict *options, BlockDriverState *bs,
BDRVRawState *s, Error **errp)
{
Error *local_err = NULL;
QemuOpts *opts = NULL;
int64_t real_size = 0;
int ret;
real_size = bdrv_getlength(bs->file->bs);
if (real_size < 0) {
error_setg_errno(errp, -real_size, "Could not get image size");
return real_size;
}
opts = qemu_opts_create(&raw_runtime_opts, NULL, 0, &error_abort);
qemu_opts_absorb_qdict(opts, options, &local_err);
if (local_err) {
error_propagate(errp, local_err);
ret = -EINVAL;
goto end;
}
s->offset = qemu_opt_get_size(opts, "offset", 0);
if (s->offset > real_size) {
error_setg(errp, "Offset (%" PRIu64 ") cannot be greater than "
"size of the containing file (%" PRId64 ")",
s->offset, real_size);
ret = -EINVAL;
goto end;
}
if (qemu_opt_find(opts, "size") != NULL) {
s->size = qemu_opt_get_size(opts, "size", 0);
s->has_size = true;
} else {
s->has_size = false;
s->size = real_size - s->offset;
}
/* Check size and offset */
if ((real_size - s->offset) < s->size) {
error_setg(errp, "The sum of offset (%" PRIu64 ") and size "
"(%" PRIu64 ") has to be smaller or equal to the "
" actual size of the containing file (%" PRId64 ")",
s->offset, s->size, real_size);
ret = -EINVAL;
goto end;
}
/* Make sure size is multiple of BDRV_SECTOR_SIZE to prevent rounding
* up and leaking out of the specified area. */
if (s->has_size && !QEMU_IS_ALIGNED(s->size, BDRV_SECTOR_SIZE)) {
error_setg(errp, "Specified size is not multiple of %llu",
BDRV_SECTOR_SIZE);
ret = -EINVAL;
goto end;
}
ret = 0;
end:
qemu_opts_del(opts);
return ret;
}
static int raw_reopen_prepare(BDRVReopenState *reopen_state, static int raw_reopen_prepare(BDRVReopenState *reopen_state,
BlockReopenQueue *queue, Error **errp) BlockReopenQueue *queue, Error **errp)
{ {
assert(reopen_state != NULL); return 0;
assert(reopen_state->bs != NULL);
reopen_state->opaque = g_new0(BDRVRawState, 1);
return raw_read_options(
reopen_state->options,
reopen_state->bs,
reopen_state->opaque,
errp);
}
static void raw_reopen_commit(BDRVReopenState *state)
{
BDRVRawState *new_s = state->opaque;
BDRVRawState *s = state->bs->opaque;
memcpy(s, new_s, sizeof(BDRVRawState));
g_free(state->opaque);
state->opaque = NULL;
}
static void raw_reopen_abort(BDRVReopenState *state)
{
g_free(state->opaque);
state->opaque = NULL;
} }
static int coroutine_fn raw_co_preadv(BlockDriverState *bs, uint64_t offset, static int coroutine_fn raw_co_preadv(BlockDriverState *bs, uint64_t offset,
uint64_t bytes, QEMUIOVector *qiov, uint64_t bytes, QEMUIOVector *qiov,
int flags) int flags)
{ {
BDRVRawState *s = bs->opaque;
if (offset > UINT64_MAX - s->offset) {
return -EINVAL;
}
offset += s->offset;
BLKDBG_EVENT(bs->file, BLKDBG_READ_AIO); BLKDBG_EVENT(bs->file, BLKDBG_READ_AIO);
return bdrv_co_preadv(bs->file, offset, bytes, qiov, flags); return bdrv_co_preadv(bs->file, offset, bytes, qiov, flags);
} }
@@ -186,23 +62,11 @@ static int coroutine_fn raw_co_pwritev(BlockDriverState *bs, uint64_t offset,
uint64_t bytes, QEMUIOVector *qiov, uint64_t bytes, QEMUIOVector *qiov,
int flags) int flags)
{ {
BDRVRawState *s = bs->opaque;
void *buf = NULL; void *buf = NULL;
BlockDriver *drv; BlockDriver *drv;
QEMUIOVector local_qiov; QEMUIOVector local_qiov;
int ret; int ret;
if (s->has_size && (offset > s->size || bytes > (s->size - offset))) {
/* There's not enough space for the data. Don't write anything and just
* fail to prevent leaking out of the size specified in options. */
return -ENOSPC;
}
if (offset > UINT64_MAX - s->offset) {
ret = -EINVAL;
goto fail;
}
if (bs->probed && offset < BLOCK_PROBE_BUF_SIZE && bytes) { if (bs->probed && offset < BLOCK_PROBE_BUF_SIZE && bytes) {
/* Handling partial writes would be a pain - so we just /* Handling partial writes would be a pain - so we just
* require that guests have 512-byte request alignment if * require that guests have 512-byte request alignment if
@@ -237,8 +101,6 @@ static int coroutine_fn raw_co_pwritev(BlockDriverState *bs, uint64_t offset,
qiov = &local_qiov; qiov = &local_qiov;
} }
offset += s->offset;
BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO); BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
ret = bdrv_co_pwritev(bs->file, offset, bytes, qiov, flags); ret = bdrv_co_pwritev(bs->file, offset, bytes, qiov, flags);
@@ -255,10 +117,8 @@ static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
int nb_sectors, int *pnum, int nb_sectors, int *pnum,
BlockDriverState **file) BlockDriverState **file)
{ {
BDRVRawState *s = bs->opaque;
*pnum = nb_sectors; *pnum = nb_sectors;
*file = bs->file->bs; *file = bs->file->bs;
sector_num += s->offset / BDRV_SECTOR_SIZE;
return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID | BDRV_BLOCK_DATA | return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID | BDRV_BLOCK_DATA |
(sector_num << BDRV_SECTOR_BITS); (sector_num << BDRV_SECTOR_BITS);
} }
@@ -267,49 +127,18 @@ static int coroutine_fn raw_co_pwrite_zeroes(BlockDriverState *bs,
int64_t offset, int count, int64_t offset, int count,
BdrvRequestFlags flags) BdrvRequestFlags flags)
{ {
BDRVRawState *s = bs->opaque;
if (offset > UINT64_MAX - s->offset) {
return -EINVAL;
}
offset += s->offset;
return bdrv_co_pwrite_zeroes(bs->file, offset, count, flags); return bdrv_co_pwrite_zeroes(bs->file, offset, count, flags);
} }
static int coroutine_fn raw_co_pdiscard(BlockDriverState *bs, static int coroutine_fn raw_co_pdiscard(BlockDriverState *bs,
int64_t offset, int count) int64_t offset, int count)
{ {
BDRVRawState *s = bs->opaque;
if (offset > UINT64_MAX - s->offset) {
return -EINVAL;
}
offset += s->offset;
return bdrv_co_pdiscard(bs->file->bs, offset, count); return bdrv_co_pdiscard(bs->file->bs, offset, count);
} }
static int64_t raw_getlength(BlockDriverState *bs) static int64_t raw_getlength(BlockDriverState *bs)
{ {
int64_t len; return bdrv_getlength(bs->file->bs);
BDRVRawState *s = bs->opaque;
/* Update size. It should not change unless the file was externally
* modified. */
len = bdrv_getlength(bs->file->bs);
if (len < 0) {
return len;
}
if (len < s->offset) {
s->size = 0;
} else {
if (s->has_size) {
/* Try to honour the size */
s->size = MIN(s->size, len - s->offset);
} else {
s->size = len - s->offset;
}
}
return s->size;
} }
static int raw_get_info(BlockDriverState *bs, BlockDriverInfo *bdi) static int raw_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
@@ -327,23 +156,9 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
} }
} }
static int raw_truncate(BlockDriverState *bs, int64_t offset, Error **errp) static int raw_truncate(BlockDriverState *bs, int64_t offset)
{ {
BDRVRawState *s = bs->opaque; return bdrv_truncate(bs->file->bs, offset);
if (s->has_size) {
error_setg(errp, "Cannot resize fixed-size raw disks");
return -ENOTSUP;
}
if (INT64_MAX - offset < s->offset) {
error_setg(errp, "Disk size too large for the chosen offset");
return -EINVAL;
}
s->size = offset;
offset += s->offset;
return bdrv_truncate(bs->file, offset, errp);
} }
static int raw_media_changed(BlockDriverState *bs) static int raw_media_changed(BlockDriverState *bs)
@@ -361,13 +176,12 @@ static void raw_lock_medium(BlockDriverState *bs, bool locked)
bdrv_lock_medium(bs->file->bs, locked); bdrv_lock_medium(bs->file->bs, locked);
} }
static int raw_co_ioctl(BlockDriverState *bs, unsigned long int req, void *buf) static BlockAIOCB *raw_aio_ioctl(BlockDriverState *bs,
unsigned long int req, void *buf,
BlockCompletionFunc *cb,
void *opaque)
{ {
BDRVRawState *s = bs->opaque; return bdrv_aio_ioctl(bs->file->bs, req, buf, cb, opaque);
if (s->offset || s->has_size) {
return -ENOTSUP;
}
return bdrv_co_ioctl(bs->file->bs, req, buf);
} }
static int raw_has_zero_init(BlockDriverState *bs) static int raw_has_zero_init(BlockDriverState *bs)
@@ -383,15 +197,6 @@ static int raw_create(const char *filename, QemuOpts *opts, Error **errp)
static int raw_open(BlockDriverState *bs, QDict *options, int flags, static int raw_open(BlockDriverState *bs, QDict *options, int flags,
Error **errp) Error **errp)
{ {
BDRVRawState *s = bs->opaque;
int ret;
bs->file = bdrv_open_child(NULL, options, "file", bs, &child_file,
false, errp);
if (!bs->file) {
return -EINVAL;
}
bs->sg = bs->file->bs->sg; bs->sg = bs->file->bs->sg;
bs->supported_write_flags = BDRV_REQ_FUA & bs->supported_write_flags = BDRV_REQ_FUA &
bs->file->bs->supported_write_flags; bs->file->bs->supported_write_flags;
@@ -409,16 +214,6 @@ static int raw_open(BlockDriverState *bs, QDict *options, int flags,
bs->file->bs->filename); bs->file->bs->filename);
} }
ret = raw_read_options(options, bs, s, errp);
if (ret < 0) {
return ret;
}
if (bs->sg && (s->offset || s->has_size)) {
error_setg(errp, "Cannot use offset/size with SCSI generic devices");
return -EINVAL;
}
return 0; return 0;
} }
@@ -436,40 +231,20 @@ static int raw_probe(const uint8_t *buf, int buf_size, const char *filename)
static int raw_probe_blocksizes(BlockDriverState *bs, BlockSizes *bsz) static int raw_probe_blocksizes(BlockDriverState *bs, BlockSizes *bsz)
{ {
BDRVRawState *s = bs->opaque; return bdrv_probe_blocksizes(bs->file->bs, bsz);
int ret;
ret = bdrv_probe_blocksizes(bs->file->bs, bsz);
if (ret < 0) {
return ret;
}
if (!QEMU_IS_ALIGNED(s->offset, MAX(bsz->log, bsz->phys))) {
return -ENOTSUP;
}
return 0;
} }
static int raw_probe_geometry(BlockDriverState *bs, HDGeometry *geo) static int raw_probe_geometry(BlockDriverState *bs, HDGeometry *geo)
{ {
BDRVRawState *s = bs->opaque;
if (s->offset || s->has_size) {
return -ENOTSUP;
}
return bdrv_probe_geometry(bs->file->bs, geo); return bdrv_probe_geometry(bs->file->bs, geo);
} }
BlockDriver bdrv_raw = { BlockDriver bdrv_raw = {
.format_name = "raw", .format_name = "raw",
.instance_size = sizeof(BDRVRawState),
.bdrv_probe = &raw_probe, .bdrv_probe = &raw_probe,
.bdrv_reopen_prepare = &raw_reopen_prepare, .bdrv_reopen_prepare = &raw_reopen_prepare,
.bdrv_reopen_commit = &raw_reopen_commit,
.bdrv_reopen_abort = &raw_reopen_abort,
.bdrv_open = &raw_open, .bdrv_open = &raw_open,
.bdrv_close = &raw_close, .bdrv_close = &raw_close,
.bdrv_child_perm = bdrv_filter_default_perms,
.bdrv_create = &raw_create, .bdrv_create = &raw_create,
.bdrv_co_preadv = &raw_co_preadv, .bdrv_co_preadv = &raw_co_preadv,
.bdrv_co_pwritev = &raw_co_pwritev, .bdrv_co_pwritev = &raw_co_pwritev,
@@ -486,7 +261,7 @@ BlockDriver bdrv_raw = {
.bdrv_media_changed = &raw_media_changed, .bdrv_media_changed = &raw_media_changed,
.bdrv_eject = &raw_eject, .bdrv_eject = &raw_eject,
.bdrv_lock_medium = &raw_lock_medium, .bdrv_lock_medium = &raw_lock_medium,
.bdrv_co_ioctl = &raw_co_ioctl, .bdrv_aio_ioctl = &raw_aio_ioctl,
.create_opts = &raw_create_opts, .create_opts = &raw_create_opts,
.bdrv_has_zero_init = &raw_has_zero_init .bdrv_has_zero_init = &raw_has_zero_init
}; };

View File

@@ -13,14 +13,13 @@
#include "qemu/osdep.h" #include "qemu/osdep.h"
#include <rbd/librbd.h>
#include "qapi/error.h" #include "qapi/error.h"
#include "qemu/error-report.h" #include "qemu/error-report.h"
#include "block/block_int.h" #include "block/block_int.h"
#include "crypto/secret.h" #include "crypto/secret.h"
#include "qemu/cutils.h" #include "qemu/cutils.h"
#include "qapi/qmp/qstring.h"
#include "qapi/qmp/qjson.h" #include <rbd/librbd.h>
/* /*
* When specifying the image filename use: * When specifying the image filename use:
@@ -56,15 +55,13 @@
#define OBJ_MAX_SIZE (1UL << OBJ_DEFAULT_OBJ_ORDER) #define OBJ_MAX_SIZE (1UL << OBJ_DEFAULT_OBJ_ORDER)
#define RBD_MAX_CONF_NAME_SIZE 128
#define RBD_MAX_CONF_VAL_SIZE 512
#define RBD_MAX_CONF_SIZE 1024
#define RBD_MAX_POOL_NAME_SIZE 128
#define RBD_MAX_SNAP_NAME_SIZE 128
#define RBD_MAX_SNAPS 100 #define RBD_MAX_SNAPS 100
/* The LIBRBD_SUPPORTS_IOVEC is defined in librbd.h */
#ifdef LIBRBD_SUPPORTS_IOVEC
#define LIBRBD_USE_IOVEC 1
#else
#define LIBRBD_USE_IOVEC 0
#endif
typedef enum { typedef enum {
RBD_AIO_READ, RBD_AIO_READ,
RBD_AIO_WRITE, RBD_AIO_WRITE,
@@ -74,6 +71,7 @@ typedef enum {
typedef struct RBDAIOCB { typedef struct RBDAIOCB {
BlockAIOCB common; BlockAIOCB common;
QEMUBH *bh;
int64_t ret; int64_t ret;
QEMUIOVector *qiov; QEMUIOVector *qiov;
char *bounce; char *bounce;
@@ -94,16 +92,21 @@ typedef struct BDRVRBDState {
rados_t cluster; rados_t cluster;
rados_ioctx_t io_ctx; rados_ioctx_t io_ctx;
rbd_image_t image; rbd_image_t image;
char *image_name; char name[RBD_MAX_IMAGE_NAME_SIZE];
char *snap; char *snap;
} BDRVRBDState; } BDRVRBDState;
static char *qemu_rbd_next_tok(char *src, char delim, char **p) static int qemu_rbd_next_tok(char *dst, int dst_len,
char *src, char delim,
const char *name,
char **p, Error **errp)
{ {
int l;
char *end; char *end;
*p = NULL; *p = NULL;
if (delim != '\0') {
for (end = src; *end; ++end) { for (end = src; *end; ++end) {
if (*end == delim) { if (*end == delim) {
break; break;
@@ -116,7 +119,19 @@ static char *qemu_rbd_next_tok(char *src, char delim, char **p)
*p = end + 1; *p = end + 1;
*end = '\0'; *end = '\0';
} }
return src; }
l = strlen(src);
if (l >= dst_len) {
error_setg(errp, "%s too long", name);
return -EINVAL;
} else if (l == 0) {
error_setg(errp, "%s too short", name);
return -EINVAL;
}
pstrcpy(dst, dst_len, src);
return 0;
} }
static void qemu_rbd_unescape(char *src) static void qemu_rbd_unescape(char *src)
@@ -132,92 +147,87 @@ static void qemu_rbd_unescape(char *src)
*p = '\0'; *p = '\0';
} }
static void qemu_rbd_parse_filename(const char *filename, QDict *options, static int qemu_rbd_parsename(const char *filename,
char *pool, int pool_len,
char *snap, int snap_len,
char *name, int name_len,
char *conf, int conf_len,
Error **errp) Error **errp)
{ {
const char *start; const char *start;
char *p, *buf; char *p, *buf;
QList *keypairs = NULL; int ret;
char *found_str;
if (!strstart(filename, "rbd:", &start)) { if (!strstart(filename, "rbd:", &start)) {
error_setg(errp, "File name must start with 'rbd:'"); error_setg(errp, "File name must start with 'rbd:'");
return; return -EINVAL;
} }
buf = g_strdup(start); buf = g_strdup(start);
p = buf; p = buf;
*snap = '\0';
*conf = '\0';
found_str = qemu_rbd_next_tok(p, '/', &p); ret = qemu_rbd_next_tok(pool, pool_len, p,
if (!p) { '/', "pool name", &p, errp);
error_setg(errp, "Pool name is required"); if (ret < 0 || !p) {
ret = -EINVAL;
goto done; goto done;
} }
qemu_rbd_unescape(found_str); qemu_rbd_unescape(pool);
qdict_put_str(options, "pool", found_str);
if (strchr(p, '@')) { if (strchr(p, '@')) {
found_str = qemu_rbd_next_tok(p, '@', &p); ret = qemu_rbd_next_tok(name, name_len, p,
qemu_rbd_unescape(found_str); '@', "object name", &p, errp);
qdict_put_str(options, "image", found_str); if (ret < 0) {
goto done;
found_str = qemu_rbd_next_tok(p, ':', &p);
qemu_rbd_unescape(found_str);
qdict_put_str(options, "snapshot", found_str);
} else {
found_str = qemu_rbd_next_tok(p, ':', &p);
qemu_rbd_unescape(found_str);
qdict_put_str(options, "image", found_str);
} }
if (!p) { ret = qemu_rbd_next_tok(snap, snap_len, p,
':', "snap name", &p, errp);
qemu_rbd_unescape(snap);
} else {
ret = qemu_rbd_next_tok(name, name_len, p,
':', "object name", &p, errp);
}
qemu_rbd_unescape(name);
if (ret < 0 || !p) {
goto done; goto done;
} }
/* The following are essentially all key/value pairs, and we treat ret = qemu_rbd_next_tok(conf, conf_len, p,
* 'id' and 'conf' a bit special. Key/value pairs may be in any order. */ '\0', "configuration", &p, errp);
while (p) {
char *name, *value;
name = qemu_rbd_next_tok(p, '=', &p);
if (!p) {
error_setg(errp, "conf option %s has no value", name);
break;
}
qemu_rbd_unescape(name);
value = qemu_rbd_next_tok(p, ':', &p);
qemu_rbd_unescape(value);
if (!strcmp(name, "conf")) {
qdict_put_str(options, "conf", value);
} else if (!strcmp(name, "id")) {
qdict_put_str(options, "user", value);
} else {
/*
* We pass these internally to qemu_rbd_set_keypairs(), so
* we can get away with the simpler list of [ "key1",
* "value1", "key2", "value2" ] rather than a raw dict
* { "key1": "value1", "key2": "value2" } where we can't
* guarantee order, or even a more correct but complex
* [ { "key1": "value1" }, { "key2": "value2" } ]
*/
if (!keypairs) {
keypairs = qlist_new();
}
qlist_append_str(keypairs, name);
qlist_append_str(keypairs, value);
}
}
if (keypairs) {
qdict_put(options, "=keyvalue-pairs",
qobject_to_json(QOBJECT(keypairs)));
}
done: done:
g_free(buf); g_free(buf);
QDECREF(keypairs); return ret;
return; }
static char *qemu_rbd_parse_clientname(const char *conf, char *clientname)
{
const char *p = conf;
while (*p) {
int len;
const char *end = strchr(p, ':');
if (end) {
len = end - p;
} else {
len = strlen(p);
}
if (strncmp(p, "id=", 3) == 0) {
len -= 3;
strncpy(clientname, p + 3, len);
clientname[len] = '\0';
return clientname;
}
if (end == NULL) {
break;
}
p = end + 1;
}
return NULL;
} }
@@ -240,129 +250,94 @@ static int qemu_rbd_set_auth(rados_t cluster, const char *secretid,
return 0; return 0;
} }
static int qemu_rbd_set_keypairs(rados_t cluster, const char *keypairs_json,
static int qemu_rbd_set_conf(rados_t cluster, const char *conf,
bool only_read_conf_file,
Error **errp) Error **errp)
{ {
QList *keypairs; char *p, *buf;
QString *name; char name[RBD_MAX_CONF_NAME_SIZE];
QString *value; char value[RBD_MAX_CONF_VAL_SIZE];
const char *key;
size_t remaining;
int ret = 0; int ret = 0;
if (!keypairs_json) { buf = g_strdup(conf);
return ret; p = buf;
}
keypairs = qobject_to_qlist(qobject_from_json(keypairs_json,
&error_abort));
remaining = qlist_size(keypairs) / 2;
assert(remaining);
while (remaining--) { while (p) {
name = qobject_to_qstring(qlist_pop(keypairs)); ret = qemu_rbd_next_tok(name, sizeof(name), p,
value = qobject_to_qstring(qlist_pop(keypairs)); '=', "conf option name", &p, errp);
assert(name && value);
key = qstring_get_str(name);
ret = rados_conf_set(cluster, key, qstring_get_str(value));
QDECREF(name);
QDECREF(value);
if (ret < 0) { if (ret < 0) {
error_setg_errno(errp, -ret, "invalid conf option %s", key); break;
}
qemu_rbd_unescape(name);
if (!p) {
error_setg(errp, "conf option %s has no value", name);
ret = -EINVAL;
break;
}
ret = qemu_rbd_next_tok(value, sizeof(value), p,
':', "conf option value", &p, errp);
if (ret < 0) {
break;
}
qemu_rbd_unescape(value);
if (strcmp(name, "conf") == 0) {
/* read the conf file alone, so it doesn't override more
specific settings for a particular device */
if (only_read_conf_file) {
ret = rados_conf_read_file(cluster, value);
if (ret < 0) {
error_setg_errno(errp, -ret, "error reading conf file %s",
value);
break;
}
}
} else if (strcmp(name, "id") == 0) {
/* ignore, this is parsed by qemu_rbd_parse_clientname() */
} else if (!only_read_conf_file) {
ret = rados_conf_set(cluster, name, value);
if (ret < 0) {
error_setg_errno(errp, -ret, "invalid conf option %s", name);
ret = -EINVAL; ret = -EINVAL;
break; break;
} }
} }
}
QDECREF(keypairs); g_free(buf);
return ret; return ret;
} }
static void qemu_rbd_memset(RADOSCB *rcb, int64_t offs)
{
if (LIBRBD_USE_IOVEC) {
RBDAIOCB *acb = rcb->acb;
iov_memset(acb->qiov->iov, acb->qiov->niov, offs, 0,
acb->qiov->size - offs);
} else {
memset(rcb->buf + offs, 0, rcb->size - offs);
}
}
static QemuOptsList runtime_opts = {
.name = "rbd",
.head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
.desc = {
{
.name = "pool",
.type = QEMU_OPT_STRING,
.help = "Rados pool name",
},
{
.name = "image",
.type = QEMU_OPT_STRING,
.help = "Image name in the pool",
},
{
.name = "conf",
.type = QEMU_OPT_STRING,
.help = "Rados config file location",
},
{
.name = "snapshot",
.type = QEMU_OPT_STRING,
.help = "Ceph snapshot name",
},
{
/* maps to 'id' in rados_create() */
.name = "user",
.type = QEMU_OPT_STRING,
.help = "Rados id name",
},
/*
* server.* extracted manually, see qemu_rbd_mon_host()
*/
{
.name = "password-secret",
.type = QEMU_OPT_STRING,
.help = "ID of secret providing the password",
},
/*
* Keys for qemu_rbd_parse_filename(), not in the QAPI schema
*/
{
/*
* HACK: name starts with '=' so that qemu_opts_parse()
* can't set it
*/
.name = "=keyvalue-pairs",
.type = QEMU_OPT_STRING,
.help = "Legacy rados key/value option parameters",
},
{
.name = "filename",
.type = QEMU_OPT_STRING,
},
{ /* end of list */ }
},
};
static int qemu_rbd_create(const char *filename, QemuOpts *opts, Error **errp) static int qemu_rbd_create(const char *filename, QemuOpts *opts, Error **errp)
{ {
Error *local_err = NULL; Error *local_err = NULL;
int64_t bytes = 0; int64_t bytes = 0;
int64_t objsize; int64_t objsize;
int obj_order = 0; int obj_order = 0;
const char *pool, *image_name, *conf, *user, *keypairs; char pool[RBD_MAX_POOL_NAME_SIZE];
char name[RBD_MAX_IMAGE_NAME_SIZE];
char snap_buf[RBD_MAX_SNAP_NAME_SIZE];
char conf[RBD_MAX_CONF_SIZE];
char clientname_buf[RBD_MAX_CONF_SIZE];
char *clientname;
const char *secretid; const char *secretid;
rados_t cluster; rados_t cluster;
rados_ioctx_t io_ctx; rados_ioctx_t io_ctx;
QDict *options = NULL; int ret;
int ret = 0;
secretid = qemu_opt_get(opts, "password-secret"); secretid = qemu_opt_get(opts, "password-secret");
if (qemu_rbd_parsename(filename, pool, sizeof(pool),
snap_buf, sizeof(snap_buf),
name, sizeof(name),
conf, sizeof(conf), &local_err) < 0) {
error_propagate(errp, local_err);
return -EINVAL;
}
/* Read out options */ /* Read out options */
bytes = ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0), bytes = ROUND_UP(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0),
BDRV_SECTOR_SIZE); BDRV_SECTOR_SIZE);
@@ -370,86 +345,66 @@ static int qemu_rbd_create(const char *filename, QemuOpts *opts, Error **errp)
if (objsize) { if (objsize) {
if ((objsize - 1) & objsize) { /* not a power of 2? */ if ((objsize - 1) & objsize) { /* not a power of 2? */
error_setg(errp, "obj size needs to be power of 2"); error_setg(errp, "obj size needs to be power of 2");
ret = -EINVAL; return -EINVAL;
goto exit;
} }
if (objsize < 4096) { if (objsize < 4096) {
error_setg(errp, "obj size too small"); error_setg(errp, "obj size too small");
ret = -EINVAL; return -EINVAL;
goto exit;
} }
obj_order = ctz32(objsize); obj_order = ctz32(objsize);
} }
options = qdict_new(); clientname = qemu_rbd_parse_clientname(conf, clientname_buf);
qemu_rbd_parse_filename(filename, options, &local_err); ret = rados_create(&cluster, clientname);
if (local_err) {
ret = -EINVAL;
error_propagate(errp, local_err);
goto exit;
}
/*
* Caution: while qdict_get_try_str() is fine, getting non-string
* types would require more care. When @options come from -blockdev
* or blockdev_add, its members are typed according to the QAPI
* schema, but when they come from -drive, they're all QString.
*/
pool = qdict_get_try_str(options, "pool");
conf = qdict_get_try_str(options, "conf");
user = qdict_get_try_str(options, "user");
image_name = qdict_get_try_str(options, "image");
keypairs = qdict_get_try_str(options, "=keyvalue-pairs");
ret = rados_create(&cluster, user);
if (ret < 0) { if (ret < 0) {
error_setg_errno(errp, -ret, "error initializing"); error_setg_errno(errp, -ret, "error initializing");
goto exit; return ret;
} }
/* try default location when conf=NULL, but ignore failure */ if (strstr(conf, "conf=") == NULL) {
ret = rados_conf_read_file(cluster, conf); /* try default location, but ignore failure */
if (conf && ret < 0) { rados_conf_read_file(cluster, NULL);
error_setg_errno(errp, -ret, "error reading conf file %s", conf); } else if (conf[0] != '\0' &&
ret = -EIO; qemu_rbd_set_conf(cluster, conf, true, &local_err) < 0) {
goto shutdown; rados_shutdown(cluster);
error_propagate(errp, local_err);
return -EIO;
} }
ret = qemu_rbd_set_keypairs(cluster, keypairs, errp); if (conf[0] != '\0' &&
if (ret < 0) { qemu_rbd_set_conf(cluster, conf, false, &local_err) < 0) {
ret = -EIO; rados_shutdown(cluster);
goto shutdown; error_propagate(errp, local_err);
return -EIO;
} }
if (qemu_rbd_set_auth(cluster, secretid, errp) < 0) { if (qemu_rbd_set_auth(cluster, secretid, errp) < 0) {
ret = -EIO; rados_shutdown(cluster);
goto shutdown; return -EIO;
} }
ret = rados_connect(cluster); ret = rados_connect(cluster);
if (ret < 0) { if (ret < 0) {
error_setg_errno(errp, -ret, "error connecting"); error_setg_errno(errp, -ret, "error connecting");
goto shutdown; rados_shutdown(cluster);
return ret;
} }
ret = rados_ioctx_create(cluster, pool, &io_ctx); ret = rados_ioctx_create(cluster, pool, &io_ctx);
if (ret < 0) { if (ret < 0) {
error_setg_errno(errp, -ret, "error opening pool %s", pool); error_setg_errno(errp, -ret, "error opening pool %s", pool);
goto shutdown; rados_shutdown(cluster);
return ret;
} }
ret = rbd_create(io_ctx, image_name, bytes, &obj_order); ret = rbd_create(io_ctx, name, bytes, &obj_order);
rados_ioctx_destroy(io_ctx);
rados_shutdown(cluster);
if (ret < 0) { if (ret < 0) {
error_setg_errno(errp, -ret, "error rbd create"); error_setg_errno(errp, -ret, "error rbd create");
return ret;
} }
rados_ioctx_destroy(io_ctx);
shutdown:
rados_shutdown(cluster);
exit:
QDECREF(options);
return ret; return ret;
} }
@@ -473,11 +428,11 @@ static void qemu_rbd_complete_aio(RADOSCB *rcb)
} }
} else { } else {
if (r < 0) { if (r < 0) {
qemu_rbd_memset(rcb, 0); memset(rcb->buf, 0, rcb->size);
acb->ret = r; acb->ret = r;
acb->error = 1; acb->error = 1;
} else if (r < rcb->size) { } else if (r < rcb->size) {
qemu_rbd_memset(rcb, r); memset(rcb->buf + r, 0, rcb->size - r);
if (!acb->error) { if (!acb->error) {
acb->ret = rcb->size; acb->ret = rcb->size;
} }
@@ -488,137 +443,92 @@ static void qemu_rbd_complete_aio(RADOSCB *rcb)
g_free(rcb); g_free(rcb);
if (!LIBRBD_USE_IOVEC) {
if (acb->cmd == RBD_AIO_READ) { if (acb->cmd == RBD_AIO_READ) {
qemu_iovec_from_buf(acb->qiov, 0, acb->bounce, acb->qiov->size); qemu_iovec_from_buf(acb->qiov, 0, acb->bounce, acb->qiov->size);
} }
qemu_vfree(acb->bounce); qemu_vfree(acb->bounce);
}
acb->common.cb(acb->common.opaque, (acb->ret > 0 ? 0 : acb->ret)); acb->common.cb(acb->common.opaque, (acb->ret > 0 ? 0 : acb->ret));
qemu_aio_unref(acb); qemu_aio_unref(acb);
} }
static char *qemu_rbd_mon_host(QDict *options, Error **errp) /* TODO Convert to fine grained options */
static QemuOptsList runtime_opts = {
.name = "rbd",
.head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
.desc = {
{ {
const char **vals = g_new(const char *, qdict_size(options) + 1); .name = "filename",
char keybuf[32]; .type = QEMU_OPT_STRING,
const char *host, *port; .help = "Specification of the rbd image",
char *rados_str; },
int i; {
.name = "password-secret",
for (i = 0;; i++) { .type = QEMU_OPT_STRING,
sprintf(keybuf, "server.%d.host", i); .help = "ID of secret providing the password",
host = qdict_get_try_str(options, keybuf); },
qdict_del(options, keybuf); { /* end of list */ }
sprintf(keybuf, "server.%d.port", i); },
port = qdict_get_try_str(options, keybuf); };
qdict_del(options, keybuf);
if (!host && !port) {
break;
}
if (!host) {
error_setg(errp, "Parameter server.%d.host is missing", i);
rados_str = NULL;
goto out;
}
if (strchr(host, ':')) {
vals[i] = port ? g_strdup_printf("[%s]:%s", host, port)
: g_strdup_printf("[%s]", host);
} else {
vals[i] = port ? g_strdup_printf("%s:%s", host, port)
: g_strdup(host);
}
}
vals[i] = NULL;
rados_str = i ? g_strjoinv(";", (char **)vals) : NULL;
out:
g_strfreev((char **)vals);
return rados_str;
}
static int qemu_rbd_open(BlockDriverState *bs, QDict *options, int flags, static int qemu_rbd_open(BlockDriverState *bs, QDict *options, int flags,
Error **errp) Error **errp)
{ {
BDRVRBDState *s = bs->opaque; BDRVRBDState *s = bs->opaque;
const char *pool, *snap, *conf, *user, *image_name, *keypairs; char pool[RBD_MAX_POOL_NAME_SIZE];
const char *secretid, *filename; char snap_buf[RBD_MAX_SNAP_NAME_SIZE];
char conf[RBD_MAX_CONF_SIZE];
char clientname_buf[RBD_MAX_CONF_SIZE];
char *clientname;
const char *secretid;
QemuOpts *opts; QemuOpts *opts;
Error *local_err = NULL; Error *local_err = NULL;
char *mon_host = NULL; const char *filename;
int r; int r;
/* If we are given a filename, parse the filename, with precedence given to
* filename encoded options */
filename = qdict_get_try_str(options, "filename");
if (filename) {
error_report("Warning: 'filename' option specified. "
"This is an unsupported option, and may be deprecated "
"in the future");
qemu_rbd_parse_filename(filename, options, &local_err);
if (local_err) {
r = -EINVAL;
error_propagate(errp, local_err);
goto exit;
}
}
opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort); opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort);
qemu_opts_absorb_qdict(opts, options, &local_err); qemu_opts_absorb_qdict(opts, options, &local_err);
if (local_err) { if (local_err) {
error_propagate(errp, local_err); error_propagate(errp, local_err);
r = -EINVAL; qemu_opts_del(opts);
goto failed_opts; return -EINVAL;
}
mon_host = qemu_rbd_mon_host(options, &local_err);
if (local_err) {
error_propagate(errp, local_err);
r = -EINVAL;
goto failed_opts;
} }
filename = qemu_opt_get(opts, "filename");
secretid = qemu_opt_get(opts, "password-secret"); secretid = qemu_opt_get(opts, "password-secret");
pool = qemu_opt_get(opts, "pool"); if (qemu_rbd_parsename(filename, pool, sizeof(pool),
conf = qemu_opt_get(opts, "conf"); snap_buf, sizeof(snap_buf),
snap = qemu_opt_get(opts, "snapshot"); s->name, sizeof(s->name),
user = qemu_opt_get(opts, "user"); conf, sizeof(conf), errp) < 0) {
image_name = qemu_opt_get(opts, "image");
keypairs = qemu_opt_get(opts, "=keyvalue-pairs");
if (!pool || !image_name) {
error_setg(errp, "Parameters 'pool' and 'image' are required");
r = -EINVAL; r = -EINVAL;
goto failed_opts; goto failed_opts;
} }
r = rados_create(&s->cluster, user); clientname = qemu_rbd_parse_clientname(conf, clientname_buf);
r = rados_create(&s->cluster, clientname);
if (r < 0) { if (r < 0) {
error_setg_errno(errp, -r, "error initializing"); error_setg_errno(errp, -r, "error initializing");
goto failed_opts; goto failed_opts;
} }
s->snap = g_strdup(snap); s->snap = NULL;
s->image_name = g_strdup(image_name); if (snap_buf[0] != '\0') {
s->snap = g_strdup(snap_buf);
/* try default location when conf=NULL, but ignore failure */
r = rados_conf_read_file(s->cluster, conf);
if (conf && r < 0) {
error_setg_errno(errp, -r, "error reading conf file %s", conf);
goto failed_shutdown;
} }
r = qemu_rbd_set_keypairs(s->cluster, keypairs, errp); if (strstr(conf, "conf=") == NULL) {
/* try default location, but ignore failure */
rados_conf_read_file(s->cluster, NULL);
} else if (conf[0] != '\0') {
r = qemu_rbd_set_conf(s->cluster, conf, true, errp);
if (r < 0) { if (r < 0) {
goto failed_shutdown; goto failed_shutdown;
} }
}
if (mon_host) { if (conf[0] != '\0') {
r = rados_conf_set(s->cluster, "mon_host", mon_host); r = qemu_rbd_set_conf(s->cluster, conf, false, errp);
if (r < 0) { if (r < 0) {
goto failed_shutdown; goto failed_shutdown;
} }
@@ -654,23 +564,13 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict *options, int flags,
goto failed_shutdown; goto failed_shutdown;
} }
/* rbd_open is always r/w */ r = rbd_open(s->io_ctx, s->name, &s->image, s->snap);
r = rbd_open(s->io_ctx, s->image_name, &s->image, s->snap);
if (r < 0) { if (r < 0) {
error_setg_errno(errp, -r, "error reading header from %s", error_setg_errno(errp, -r, "error reading header from %s", s->name);
s->image_name);
goto failed_open; goto failed_open;
} }
/* If we are using an rbd snapshot, we must be r/o, otherwise bs->read_only = (s->snap != NULL);
* leave as-is */
if (s->snap != NULL) {
r = bdrv_set_read_only(bs, true, &local_err);
if (r < 0) {
error_propagate(errp, local_err);
goto failed_open;
}
}
qemu_opts_del(opts); qemu_opts_del(opts);
return 0; return 0;
@@ -680,34 +580,11 @@ failed_open:
failed_shutdown: failed_shutdown:
rados_shutdown(s->cluster); rados_shutdown(s->cluster);
g_free(s->snap); g_free(s->snap);
g_free(s->image_name);
failed_opts: failed_opts:
qemu_opts_del(opts); qemu_opts_del(opts);
g_free(mon_host);
exit:
return r; return r;
} }
/* Since RBD is currently always opened R/W via the API,
* we just need to check if we are using a snapshot or not, in
* order to determine if we will allow it to be R/W */
static int qemu_rbd_reopen_prepare(BDRVReopenState *state,
BlockReopenQueue *queue, Error **errp)
{
BDRVRBDState *s = state->bs->opaque;
int ret = 0;
if (s->snap && state->flags & BDRV_O_RDWR) {
error_setg(errp,
"Cannot change node '%s' to r/w when using RBD snapshot",
bdrv_get_device_or_node_name(state->bs));
ret = -EINVAL;
}
return ret;
}
static void qemu_rbd_close(BlockDriverState *bs) static void qemu_rbd_close(BlockDriverState *bs)
{ {
BDRVRBDState *s = bs->opaque; BDRVRBDState *s = bs->opaque;
@@ -715,7 +592,6 @@ static void qemu_rbd_close(BlockDriverState *bs)
rbd_close(s->image); rbd_close(s->image);
rados_ioctx_destroy(s->io_ctx); rados_ioctx_destroy(s->io_ctx);
g_free(s->snap); g_free(s->snap);
g_free(s->image_name);
rados_shutdown(s->cluster); rados_shutdown(s->cluster);
} }
@@ -726,6 +602,7 @@ static const AIOCBInfo rbd_aiocb_info = {
static void rbd_finish_bh(void *opaque) static void rbd_finish_bh(void *opaque)
{ {
RADOSCB *rcb = opaque; RADOSCB *rcb = opaque;
qemu_bh_delete(rcb->acb->bh);
qemu_rbd_complete_aio(rcb); qemu_rbd_complete_aio(rcb);
} }
@@ -744,8 +621,9 @@ static void rbd_finish_aiocb(rbd_completion_t c, RADOSCB *rcb)
rcb->ret = rbd_aio_get_return_value(c); rcb->ret = rbd_aio_get_return_value(c);
rbd_aio_release(c); rbd_aio_release(c);
aio_bh_schedule_oneshot(bdrv_get_aio_context(acb->common.bs), acb->bh = aio_bh_new(bdrv_get_aio_context(acb->common.bs),
rbd_finish_bh, rcb); rbd_finish_bh, rcb);
qemu_bh_schedule(acb->bh);
} }
static int rbd_aio_discard_wrapper(rbd_image_t image, static int rbd_aio_discard_wrapper(rbd_image_t image,
@@ -781,6 +659,7 @@ static BlockAIOCB *rbd_start_aio(BlockDriverState *bs,
RBDAIOCB *acb; RBDAIOCB *acb;
RADOSCB *rcb = NULL; RADOSCB *rcb = NULL;
rbd_completion_t c; rbd_completion_t c;
char *buf;
int r; int r;
BDRVRBDState *s = bs->opaque; BDRVRBDState *s = bs->opaque;
@@ -789,10 +668,6 @@ static BlockAIOCB *rbd_start_aio(BlockDriverState *bs,
acb->cmd = cmd; acb->cmd = cmd;
acb->qiov = qiov; acb->qiov = qiov;
assert(!qiov || qiov->size == size); assert(!qiov || qiov->size == size);
rcb = g_new(RADOSCB, 1);
if (!LIBRBD_USE_IOVEC) {
if (cmd == RBD_AIO_DISCARD || cmd == RBD_AIO_FLUSH) { if (cmd == RBD_AIO_DISCARD || cmd == RBD_AIO_FLUSH) {
acb->bounce = NULL; acb->bounce = NULL;
} else { } else {
@@ -801,17 +676,20 @@ static BlockAIOCB *rbd_start_aio(BlockDriverState *bs,
goto failed; goto failed;
} }
} }
if (cmd == RBD_AIO_WRITE) {
qemu_iovec_to_buf(acb->qiov, 0, acb->bounce, qiov->size);
}
rcb->buf = acb->bounce;
}
acb->ret = 0; acb->ret = 0;
acb->error = 0; acb->error = 0;
acb->s = s; acb->s = s;
acb->bh = NULL;
if (cmd == RBD_AIO_WRITE) {
qemu_iovec_to_buf(acb->qiov, 0, acb->bounce, qiov->size);
}
buf = acb->bounce;
rcb = g_new(RADOSCB, 1);
rcb->acb = acb; rcb->acb = acb;
rcb->buf = buf;
rcb->s = acb->s; rcb->s = acb->s;
rcb->size = size; rcb->size = size;
r = rbd_aio_create_completion(rcb, (rbd_callback_t) rbd_finish_aiocb, &c); r = rbd_aio_create_completion(rcb, (rbd_callback_t) rbd_finish_aiocb, &c);
@@ -821,18 +699,10 @@ static BlockAIOCB *rbd_start_aio(BlockDriverState *bs,
switch (cmd) { switch (cmd) {
case RBD_AIO_WRITE: case RBD_AIO_WRITE:
#ifdef LIBRBD_SUPPORTS_IOVEC r = rbd_aio_write(s->image, off, size, buf, c);
r = rbd_aio_writev(s->image, qiov->iov, qiov->niov, off, c);
#else
r = rbd_aio_write(s->image, off, size, rcb->buf, c);
#endif
break; break;
case RBD_AIO_READ: case RBD_AIO_READ:
#ifdef LIBRBD_SUPPORTS_IOVEC r = rbd_aio_read(s->image, off, size, buf, c);
r = rbd_aio_readv(s->image, qiov->iov, qiov->niov, off, c);
#else
r = rbd_aio_read(s->image, off, size, rcb->buf, c);
#endif
break; break;
case RBD_AIO_DISCARD: case RBD_AIO_DISCARD:
r = rbd_aio_discard_wrapper(s->image, off, size, c); r = rbd_aio_discard_wrapper(s->image, off, size, c);
@@ -847,16 +717,14 @@ static BlockAIOCB *rbd_start_aio(BlockDriverState *bs,
if (r < 0) { if (r < 0) {
goto failed_completion; goto failed_completion;
} }
return &acb->common; return &acb->common;
failed_completion: failed_completion:
rbd_aio_release(c); rbd_aio_release(c);
failed: failed:
g_free(rcb); g_free(rcb);
if (!LIBRBD_USE_IOVEC) {
qemu_vfree(acb->bounce); qemu_vfree(acb->bounce);
}
qemu_aio_unref(acb); qemu_aio_unref(acb);
return NULL; return NULL;
} }
@@ -869,7 +737,7 @@ static BlockAIOCB *qemu_rbd_aio_readv(BlockDriverState *bs,
void *opaque) void *opaque)
{ {
return rbd_start_aio(bs, sector_num << BDRV_SECTOR_BITS, qiov, return rbd_start_aio(bs, sector_num << BDRV_SECTOR_BITS, qiov,
(int64_t) nb_sectors << BDRV_SECTOR_BITS, cb, opaque, nb_sectors << BDRV_SECTOR_BITS, cb, opaque,
RBD_AIO_READ); RBD_AIO_READ);
} }
@@ -881,7 +749,7 @@ static BlockAIOCB *qemu_rbd_aio_writev(BlockDriverState *bs,
void *opaque) void *opaque)
{ {
return rbd_start_aio(bs, sector_num << BDRV_SECTOR_BITS, qiov, return rbd_start_aio(bs, sector_num << BDRV_SECTOR_BITS, qiov,
(int64_t) nb_sectors << BDRV_SECTOR_BITS, cb, opaque, nb_sectors << BDRV_SECTOR_BITS, cb, opaque,
RBD_AIO_WRITE); RBD_AIO_WRITE);
} }
@@ -936,14 +804,13 @@ static int64_t qemu_rbd_getlength(BlockDriverState *bs)
return info.size; return info.size;
} }
static int qemu_rbd_truncate(BlockDriverState *bs, int64_t offset, Error **errp) static int qemu_rbd_truncate(BlockDriverState *bs, int64_t offset)
{ {
BDRVRBDState *s = bs->opaque; BDRVRBDState *s = bs->opaque;
int r; int r;
r = rbd_resize(s->image, offset); r = rbd_resize(s->image, offset);
if (r < 0) { if (r < 0) {
error_setg_errno(errp, -r, "Failed to resize file");
return r; return r;
} }
@@ -1112,10 +979,9 @@ static QemuOptsList qemu_rbd_create_opts = {
static BlockDriver bdrv_rbd = { static BlockDriver bdrv_rbd = {
.format_name = "rbd", .format_name = "rbd",
.instance_size = sizeof(BDRVRBDState), .instance_size = sizeof(BDRVRBDState),
.bdrv_parse_filename = qemu_rbd_parse_filename, .bdrv_needs_filename = true,
.bdrv_file_open = qemu_rbd_open, .bdrv_file_open = qemu_rbd_open,
.bdrv_close = qemu_rbd_close, .bdrv_close = qemu_rbd_close,
.bdrv_reopen_prepare = qemu_rbd_reopen_prepare,
.bdrv_create = qemu_rbd_create, .bdrv_create = qemu_rbd_create,
.bdrv_has_zero_init = bdrv_has_zero_init_1, .bdrv_has_zero_init = bdrv_has_zero_init_1,
.bdrv_get_info = qemu_rbd_getinfo, .bdrv_get_info = qemu_rbd_getinfo,

View File

@@ -1,692 +0,0 @@
/*
* Replication Block filter
*
* Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
* Copyright (c) 2016 Intel Corporation
* Copyright (c) 2016 FUJITSU LIMITED
*
* Author:
* Wen Congyang <wency@cn.fujitsu.com>
*
* This work is licensed under the terms of the GNU GPL, version 2 or later.
* See the COPYING file in the top-level directory.
*/
#include "qemu/osdep.h"
#include "qemu-common.h"
#include "block/nbd.h"
#include "block/blockjob.h"
#include "block/block_int.h"
#include "block/block_backup.h"
#include "sysemu/block-backend.h"
#include "qapi/error.h"
#include "replication.h"
typedef enum {
BLOCK_REPLICATION_NONE, /* block replication is not started */
BLOCK_REPLICATION_RUNNING, /* block replication is running */
BLOCK_REPLICATION_FAILOVER, /* failover is running in background */
BLOCK_REPLICATION_FAILOVER_FAILED, /* failover failed */
BLOCK_REPLICATION_DONE, /* block replication is done */
} ReplicationStage;
typedef struct BDRVReplicationState {
ReplicationMode mode;
ReplicationStage stage;
BdrvChild *active_disk;
BdrvChild *hidden_disk;
BdrvChild *secondary_disk;
char *top_id;
ReplicationState *rs;
Error *blocker;
int orig_hidden_flags;
int orig_secondary_flags;
int error;
} BDRVReplicationState;
static void replication_start(ReplicationState *rs, ReplicationMode mode,
Error **errp);
static void replication_do_checkpoint(ReplicationState *rs, Error **errp);
static void replication_get_error(ReplicationState *rs, Error **errp);
static void replication_stop(ReplicationState *rs, bool failover,
Error **errp);
#define REPLICATION_MODE "mode"
#define REPLICATION_TOP_ID "top-id"
static QemuOptsList replication_runtime_opts = {
.name = "replication",
.head = QTAILQ_HEAD_INITIALIZER(replication_runtime_opts.head),
.desc = {
{
.name = REPLICATION_MODE,
.type = QEMU_OPT_STRING,
},
{
.name = REPLICATION_TOP_ID,
.type = QEMU_OPT_STRING,
},
{ /* end of list */ }
},
};
static ReplicationOps replication_ops = {
.start = replication_start,
.checkpoint = replication_do_checkpoint,
.get_error = replication_get_error,
.stop = replication_stop,
};
static int replication_open(BlockDriverState *bs, QDict *options,
int flags, Error **errp)
{
int ret;
BDRVReplicationState *s = bs->opaque;
Error *local_err = NULL;
QemuOpts *opts = NULL;
const char *mode;
const char *top_id;
bs->file = bdrv_open_child(NULL, options, "file", bs, &child_file,
false, errp);
if (!bs->file) {
return -EINVAL;
}
ret = -EINVAL;
opts = qemu_opts_create(&replication_runtime_opts, NULL, 0, &error_abort);
qemu_opts_absorb_qdict(opts, options, &local_err);
if (local_err) {
goto fail;
}
mode = qemu_opt_get(opts, REPLICATION_MODE);
if (!mode) {
error_setg(&local_err, "Missing the option mode");
goto fail;
}
if (!strcmp(mode, "primary")) {
s->mode = REPLICATION_MODE_PRIMARY;
top_id = qemu_opt_get(opts, REPLICATION_TOP_ID);
if (top_id) {
error_setg(&local_err, "The primary side does not support option top-id");
goto fail;
}
} else if (!strcmp(mode, "secondary")) {
s->mode = REPLICATION_MODE_SECONDARY;
top_id = qemu_opt_get(opts, REPLICATION_TOP_ID);
s->top_id = g_strdup(top_id);
if (!s->top_id) {
error_setg(&local_err, "Missing the option top-id");
goto fail;
}
} else {
error_setg(&local_err,
"The option mode's value should be primary or secondary");
goto fail;
}
s->rs = replication_new(bs, &replication_ops);
ret = 0;
fail:
qemu_opts_del(opts);
error_propagate(errp, local_err);
return ret;
}
static void replication_close(BlockDriverState *bs)
{
BDRVReplicationState *s = bs->opaque;
if (s->stage == BLOCK_REPLICATION_RUNNING) {
replication_stop(s->rs, false, NULL);
}
if (s->stage == BLOCK_REPLICATION_FAILOVER) {
block_job_cancel_sync(s->active_disk->bs->job);
}
if (s->mode == REPLICATION_MODE_SECONDARY) {
g_free(s->top_id);
}
replication_remove(s->rs);
}
static void replication_child_perm(BlockDriverState *bs, BdrvChild *c,
const BdrvChildRole *role,
uint64_t perm, uint64_t shared,
uint64_t *nperm, uint64_t *nshared)
{
*nperm = *nshared = BLK_PERM_CONSISTENT_READ \
| BLK_PERM_WRITE \
| BLK_PERM_WRITE_UNCHANGED;
return;
}
static int64_t replication_getlength(BlockDriverState *bs)
{
return bdrv_getlength(bs->file->bs);
}
static int replication_get_io_status(BDRVReplicationState *s)
{
switch (s->stage) {
case BLOCK_REPLICATION_NONE:
return -EIO;
case BLOCK_REPLICATION_RUNNING:
return 0;
case BLOCK_REPLICATION_FAILOVER:
return s->mode == REPLICATION_MODE_PRIMARY ? -EIO : 0;
case BLOCK_REPLICATION_FAILOVER_FAILED:
return s->mode == REPLICATION_MODE_PRIMARY ? -EIO : 1;
case BLOCK_REPLICATION_DONE:
/*
* active commit job completes, and active disk and secondary_disk
* is swapped, so we can operate bs->file directly
*/
return s->mode == REPLICATION_MODE_PRIMARY ? -EIO : 0;
default:
abort();
}
}
static int replication_return_value(BDRVReplicationState *s, int ret)
{
if (s->mode == REPLICATION_MODE_SECONDARY) {
return ret;
}
if (ret < 0) {
s->error = ret;
ret = 0;
}
return ret;
}
static coroutine_fn int replication_co_readv(BlockDriverState *bs,
int64_t sector_num,
int remaining_sectors,
QEMUIOVector *qiov)
{
BDRVReplicationState *s = bs->opaque;
BdrvChild *child = s->secondary_disk;
BlockJob *job = NULL;
CowRequest req;
int ret;
if (s->mode == REPLICATION_MODE_PRIMARY) {
/* We only use it to forward primary write requests */
return -EIO;
}
ret = replication_get_io_status(s);
if (ret < 0) {
return ret;
}
if (child && child->bs) {
job = child->bs->job;
}
if (job) {
backup_wait_for_overlapping_requests(child->bs->job, sector_num,
remaining_sectors);
backup_cow_request_begin(&req, child->bs->job, sector_num,
remaining_sectors);
ret = bdrv_co_readv(bs->file, sector_num, remaining_sectors,
qiov);
backup_cow_request_end(&req);
goto out;
}
ret = bdrv_co_readv(bs->file, sector_num, remaining_sectors, qiov);
out:
return replication_return_value(s, ret);
}
static coroutine_fn int replication_co_writev(BlockDriverState *bs,
int64_t sector_num,
int remaining_sectors,
QEMUIOVector *qiov)
{
BDRVReplicationState *s = bs->opaque;
QEMUIOVector hd_qiov;
uint64_t bytes_done = 0;
BdrvChild *top = bs->file;
BdrvChild *base = s->secondary_disk;
BdrvChild *target;
int ret, n;
ret = replication_get_io_status(s);
if (ret < 0) {
goto out;
}
if (ret == 0) {
ret = bdrv_co_writev(top, sector_num,
remaining_sectors, qiov);
return replication_return_value(s, ret);
}
/*
* Failover failed, only write to active disk if the sectors
* have already been allocated in active disk/hidden disk.
*/
qemu_iovec_init(&hd_qiov, qiov->niov);
while (remaining_sectors > 0) {
ret = bdrv_is_allocated_above(top->bs, base->bs, sector_num,
remaining_sectors, &n);
if (ret < 0) {
goto out1;
}
qemu_iovec_reset(&hd_qiov);
qemu_iovec_concat(&hd_qiov, qiov, bytes_done, n * BDRV_SECTOR_SIZE);
target = ret ? top : base;
ret = bdrv_co_writev(target, sector_num, n, &hd_qiov);
if (ret < 0) {
goto out1;
}
remaining_sectors -= n;
sector_num += n;
bytes_done += n * BDRV_SECTOR_SIZE;
}
out1:
qemu_iovec_destroy(&hd_qiov);
out:
return ret;
}
static bool replication_recurse_is_first_non_filter(BlockDriverState *bs,
BlockDriverState *candidate)
{
return bdrv_recurse_is_first_non_filter(bs->file->bs, candidate);
}
static void secondary_do_checkpoint(BDRVReplicationState *s, Error **errp)
{
Error *local_err = NULL;
int ret;
if (!s->secondary_disk->bs->job) {
error_setg(errp, "Backup job was cancelled unexpectedly");
return;
}
backup_do_checkpoint(s->secondary_disk->bs->job, &local_err);
if (local_err) {
error_propagate(errp, local_err);
return;
}
ret = s->active_disk->bs->drv->bdrv_make_empty(s->active_disk->bs);
if (ret < 0) {
error_setg(errp, "Cannot make active disk empty");
return;
}
ret = s->hidden_disk->bs->drv->bdrv_make_empty(s->hidden_disk->bs);
if (ret < 0) {
error_setg(errp, "Cannot make hidden disk empty");
return;
}
}
static void reopen_backing_file(BlockDriverState *bs, bool writable,
Error **errp)
{
BDRVReplicationState *s = bs->opaque;
BlockReopenQueue *reopen_queue = NULL;
int orig_hidden_flags, orig_secondary_flags;
int new_hidden_flags, new_secondary_flags;
Error *local_err = NULL;
if (writable) {
orig_hidden_flags = s->orig_hidden_flags =
bdrv_get_flags(s->hidden_disk->bs);
new_hidden_flags = (orig_hidden_flags | BDRV_O_RDWR) &
~BDRV_O_INACTIVE;
orig_secondary_flags = s->orig_secondary_flags =
bdrv_get_flags(s->secondary_disk->bs);
new_secondary_flags = (orig_secondary_flags | BDRV_O_RDWR) &
~BDRV_O_INACTIVE;
} else {
orig_hidden_flags = (s->orig_hidden_flags | BDRV_O_RDWR) &
~BDRV_O_INACTIVE;
new_hidden_flags = s->orig_hidden_flags;
orig_secondary_flags = (s->orig_secondary_flags | BDRV_O_RDWR) &
~BDRV_O_INACTIVE;
new_secondary_flags = s->orig_secondary_flags;
}
if (orig_hidden_flags != new_hidden_flags) {
reopen_queue = bdrv_reopen_queue(reopen_queue, s->hidden_disk->bs, NULL,
new_hidden_flags);
}
if (!(orig_secondary_flags & BDRV_O_RDWR)) {
reopen_queue = bdrv_reopen_queue(reopen_queue, s->secondary_disk->bs,
NULL, new_secondary_flags);
}
if (reopen_queue) {
bdrv_reopen_multiple(bdrv_get_aio_context(bs),
reopen_queue, &local_err);
error_propagate(errp, local_err);
}
}
static void backup_job_cleanup(BlockDriverState *bs)
{
BDRVReplicationState *s = bs->opaque;
BlockDriverState *top_bs;
top_bs = bdrv_lookup_bs(s->top_id, s->top_id, NULL);
if (!top_bs) {
return;
}
bdrv_op_unblock_all(top_bs, s->blocker);
error_free(s->blocker);
reopen_backing_file(bs, false, NULL);
}
static void backup_job_completed(void *opaque, int ret)
{
BlockDriverState *bs = opaque;
BDRVReplicationState *s = bs->opaque;
if (s->stage != BLOCK_REPLICATION_FAILOVER) {
/* The backup job is cancelled unexpectedly */
s->error = -EIO;
}
backup_job_cleanup(bs);
}
static bool check_top_bs(BlockDriverState *top_bs, BlockDriverState *bs)
{
BdrvChild *child;
/* The bs itself is the top_bs */
if (top_bs == bs) {
return true;
}
/* Iterate over top_bs's children */
QLIST_FOREACH(child, &top_bs->children, next) {
if (child->bs == bs || check_top_bs(child->bs, bs)) {
return true;
}
}
return false;
}
static void replication_start(ReplicationState *rs, ReplicationMode mode,
Error **errp)
{
BlockDriverState *bs = rs->opaque;
BDRVReplicationState *s;
BlockDriverState *top_bs;
int64_t active_length, hidden_length, disk_length;
AioContext *aio_context;
Error *local_err = NULL;
BlockJob *job;
aio_context = bdrv_get_aio_context(bs);
aio_context_acquire(aio_context);
s = bs->opaque;
if (s->stage != BLOCK_REPLICATION_NONE) {
error_setg(errp, "Block replication is running or done");
aio_context_release(aio_context);
return;
}
if (s->mode != mode) {
error_setg(errp, "The parameter mode's value is invalid, needs %d,"
" but got %d", s->mode, mode);
aio_context_release(aio_context);
return;
}
switch (s->mode) {
case REPLICATION_MODE_PRIMARY:
break;
case REPLICATION_MODE_SECONDARY:
s->active_disk = bs->file;
if (!s->active_disk || !s->active_disk->bs ||
!s->active_disk->bs->backing) {
error_setg(errp, "Active disk doesn't have backing file");
aio_context_release(aio_context);
return;
}
s->hidden_disk = s->active_disk->bs->backing;
if (!s->hidden_disk->bs || !s->hidden_disk->bs->backing) {
error_setg(errp, "Hidden disk doesn't have backing file");
aio_context_release(aio_context);
return;
}
s->secondary_disk = s->hidden_disk->bs->backing;
if (!s->secondary_disk->bs || !bdrv_has_blk(s->secondary_disk->bs)) {
error_setg(errp, "The secondary disk doesn't have block backend");
aio_context_release(aio_context);
return;
}
/* verify the length */
active_length = bdrv_getlength(s->active_disk->bs);
hidden_length = bdrv_getlength(s->hidden_disk->bs);
disk_length = bdrv_getlength(s->secondary_disk->bs);
if (active_length < 0 || hidden_length < 0 || disk_length < 0 ||
active_length != hidden_length || hidden_length != disk_length) {
error_setg(errp, "Active disk, hidden disk, secondary disk's length"
" are not the same");
aio_context_release(aio_context);
return;
}
if (!s->active_disk->bs->drv->bdrv_make_empty ||
!s->hidden_disk->bs->drv->bdrv_make_empty) {
error_setg(errp,
"Active disk or hidden disk doesn't support make_empty");
aio_context_release(aio_context);
return;
}
/* reopen the backing file in r/w mode */
reopen_backing_file(bs, true, &local_err);
if (local_err) {
error_propagate(errp, local_err);
aio_context_release(aio_context);
return;
}
/* start backup job now */
error_setg(&s->blocker,
"Block device is in use by internal backup job");
top_bs = bdrv_lookup_bs(s->top_id, s->top_id, NULL);
if (!top_bs || !bdrv_is_root_node(top_bs) ||
!check_top_bs(top_bs, bs)) {
error_setg(errp, "No top_bs or it is invalid");
reopen_backing_file(bs, false, NULL);
aio_context_release(aio_context);
return;
}
bdrv_op_block_all(top_bs, s->blocker);
bdrv_op_unblock(top_bs, BLOCK_OP_TYPE_DATAPLANE, s->blocker);
job = backup_job_create(NULL, s->secondary_disk->bs, s->hidden_disk->bs,
0, MIRROR_SYNC_MODE_NONE, NULL, false,
BLOCKDEV_ON_ERROR_REPORT,
BLOCKDEV_ON_ERROR_REPORT, BLOCK_JOB_INTERNAL,
backup_job_completed, bs, NULL, &local_err);
if (local_err) {
error_propagate(errp, local_err);
backup_job_cleanup(bs);
aio_context_release(aio_context);
return;
}
block_job_start(job);
break;
default:
aio_context_release(aio_context);
abort();
}
s->stage = BLOCK_REPLICATION_RUNNING;
if (s->mode == REPLICATION_MODE_SECONDARY) {
secondary_do_checkpoint(s, errp);
}
s->error = 0;
aio_context_release(aio_context);
}
static void replication_do_checkpoint(ReplicationState *rs, Error **errp)
{
BlockDriverState *bs = rs->opaque;
BDRVReplicationState *s;
AioContext *aio_context;
aio_context = bdrv_get_aio_context(bs);
aio_context_acquire(aio_context);
s = bs->opaque;
if (s->mode == REPLICATION_MODE_SECONDARY) {
secondary_do_checkpoint(s, errp);
}
aio_context_release(aio_context);
}
static void replication_get_error(ReplicationState *rs, Error **errp)
{
BlockDriverState *bs = rs->opaque;
BDRVReplicationState *s;
AioContext *aio_context;
aio_context = bdrv_get_aio_context(bs);
aio_context_acquire(aio_context);
s = bs->opaque;
if (s->stage != BLOCK_REPLICATION_RUNNING) {
error_setg(errp, "Block replication is not running");
aio_context_release(aio_context);
return;
}
if (s->error) {
error_setg(errp, "I/O error occurred");
aio_context_release(aio_context);
return;
}
aio_context_release(aio_context);
}
static void replication_done(void *opaque, int ret)
{
BlockDriverState *bs = opaque;
BDRVReplicationState *s = bs->opaque;
if (ret == 0) {
s->stage = BLOCK_REPLICATION_DONE;
/* refresh top bs's filename */
bdrv_refresh_filename(bs);
s->active_disk = NULL;
s->secondary_disk = NULL;
s->hidden_disk = NULL;
s->error = 0;
} else {
s->stage = BLOCK_REPLICATION_FAILOVER_FAILED;
s->error = -EIO;
}
}
static void replication_stop(ReplicationState *rs, bool failover, Error **errp)
{
BlockDriverState *bs = rs->opaque;
BDRVReplicationState *s;
AioContext *aio_context;
aio_context = bdrv_get_aio_context(bs);
aio_context_acquire(aio_context);
s = bs->opaque;
if (s->stage != BLOCK_REPLICATION_RUNNING) {
error_setg(errp, "Block replication is not running");
aio_context_release(aio_context);
return;
}
switch (s->mode) {
case REPLICATION_MODE_PRIMARY:
s->stage = BLOCK_REPLICATION_DONE;
s->error = 0;
break;
case REPLICATION_MODE_SECONDARY:
/*
* This BDS will be closed, and the job should be completed
* before the BDS is closed, because we will access hidden
* disk, secondary disk in backup_job_completed().
*/
if (s->secondary_disk->bs->job) {
block_job_cancel_sync(s->secondary_disk->bs->job);
}
if (!failover) {
secondary_do_checkpoint(s, errp);
s->stage = BLOCK_REPLICATION_DONE;
aio_context_release(aio_context);
return;
}
s->stage = BLOCK_REPLICATION_FAILOVER;
commit_active_start(NULL, s->active_disk->bs, s->secondary_disk->bs,
BLOCK_JOB_INTERNAL, 0, BLOCKDEV_ON_ERROR_REPORT,
NULL, replication_done, bs, true, errp);
break;
default:
aio_context_release(aio_context);
abort();
}
aio_context_release(aio_context);
}
BlockDriver bdrv_replication = {
.format_name = "replication",
.protocol_name = "replication",
.instance_size = sizeof(BDRVReplicationState),
.bdrv_open = replication_open,
.bdrv_close = replication_close,
.bdrv_child_perm = replication_child_perm,
.bdrv_getlength = replication_getlength,
.bdrv_co_readv = replication_co_readv,
.bdrv_co_writev = replication_co_writev,
.is_filter = true,
.bdrv_recurse_is_first_non_filter = replication_recurse_is_first_non_filter,
.has_variable_length = true,
};
static void bdrv_replication_init(void)
{
bdrv_register(&bdrv_replication);
}
block_init(bdrv_replication_init);

File diff suppressed because it is too large Load Diff

View File

@@ -27,7 +27,6 @@
#include "block/block_int.h" #include "block/block_int.h"
#include "qapi/error.h" #include "qapi/error.h"
#include "qapi/qmp/qerror.h" #include "qapi/qmp/qerror.h"
#include "qapi/qmp/qstring.h"
QemuOptsList internal_snapshot_opts = { QemuOptsList internal_snapshot_opts = {
.name = "snapshot", .name = "snapshot",
@@ -190,33 +189,14 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
} }
if (bs->file) { if (bs->file) {
BlockDriverState *file;
QDict *options = qdict_clone_shallow(bs->options);
QDict *file_options;
file = bs->file->bs;
/* Prevent it from getting deleted when detached from bs */
bdrv_ref(file);
qdict_extract_subqdict(options, &file_options, "file.");
QDECREF(file_options);
qdict_put_str(options, "file", bdrv_get_node_name(file));
drv->bdrv_close(bs); drv->bdrv_close(bs);
bdrv_unref_child(bs, bs->file); ret = bdrv_snapshot_goto(bs->file->bs, snapshot_id);
bs->file = NULL; open_ret = drv->bdrv_open(bs, NULL, bs->open_flags, NULL);
ret = bdrv_snapshot_goto(file, snapshot_id);
open_ret = drv->bdrv_open(bs, options, bs->open_flags, NULL);
QDECREF(options);
if (open_ret < 0) { if (open_ret < 0) {
bdrv_unref(file); bdrv_unref(bs->file->bs);
bs->drv = NULL; bs->drv = NULL;
return open_ret; return open_ret;
} }
assert(bs->file->bs == file);
bdrv_unref(file);
return ret; return ret;
} }

View File

@@ -30,14 +30,10 @@
#include "block/block_int.h" #include "block/block_int.h"
#include "qapi/error.h" #include "qapi/error.h"
#include "qemu/error-report.h" #include "qemu/error-report.h"
#include "qemu/cutils.h"
#include "qemu/sockets.h" #include "qemu/sockets.h"
#include "qemu/uri.h" #include "qemu/uri.h"
#include "qapi-visit.h"
#include "qapi/qmp/qint.h" #include "qapi/qmp/qint.h"
#include "qapi/qmp/qstring.h" #include "qapi/qmp/qstring.h"
#include "qapi/qobject-input-visitor.h"
#include "qapi/qobject-output-visitor.h"
/* DEBUG_SSH=1 enables the DPRINTF (debugging printf) statements in /* DEBUG_SSH=1 enables the DPRINTF (debugging printf) statements in
* this block driver code. * this block driver code.
@@ -78,9 +74,8 @@ typedef struct BDRVSSHState {
*/ */
LIBSSH2_SFTP_ATTRIBUTES attrs; LIBSSH2_SFTP_ATTRIBUTES attrs;
InetSocketAddress *inet;
/* Used to warn if 'flush' is not supported. */ /* Used to warn if 'flush' is not supported. */
char *hostport;
bool unsafe_flush_warning; bool unsafe_flush_warning;
} BDRVSSHState; } BDRVSSHState;
@@ -94,6 +89,7 @@ static void ssh_state_init(BDRVSSHState *s)
static void ssh_state_free(BDRVSSHState *s) static void ssh_state_free(BDRVSSHState *s)
{ {
g_free(s->hostport);
if (s->sftp_handle) { if (s->sftp_handle) {
libssh2_sftp_close(s->sftp_handle); libssh2_sftp_close(s->sftp_handle);
} }
@@ -197,7 +193,6 @@ static int parse_uri(const char *filename, QDict *options, Error **errp)
{ {
URI *uri = NULL; URI *uri = NULL;
QueryParams *qp; QueryParams *qp;
char *port_str;
int i; int i;
uri = uri_parse(filename); uri = uri_parse(filename);
@@ -227,23 +222,24 @@ static int parse_uri(const char *filename, QDict *options, Error **errp)
} }
if(uri->user && strcmp(uri->user, "") != 0) { if(uri->user && strcmp(uri->user, "") != 0) {
qdict_put_str(options, "user", uri->user); qdict_put(options, "user", qstring_from_str(uri->user));
} }
qdict_put_str(options, "server.host", uri->server); qdict_put(options, "host", qstring_from_str(uri->server));
port_str = g_strdup_printf("%d", uri->port ?: 22); if (uri->port) {
qdict_put_str(options, "server.port", port_str); qdict_put(options, "port", qint_from_int(uri->port));
g_free(port_str); }
qdict_put_str(options, "path", uri->path); qdict_put(options, "path", qstring_from_str(uri->path));
/* Pick out any query parameters that we understand, and ignore /* Pick out any query parameters that we understand, and ignore
* the rest. * the rest.
*/ */
for (i = 0; i < qp->n; ++i) { for (i = 0; i < qp->n; ++i) {
if (strcmp(qp->p[i].name, "host_key_check") == 0) { if (strcmp(qp->p[i].name, "host_key_check") == 0) {
qdict_put_str(options, "host_key_check", qp->p[i].value); qdict_put(options, "host_key_check",
qstring_from_str(qp->p[i].value));
} }
} }
@@ -258,31 +254,15 @@ static int parse_uri(const char *filename, QDict *options, Error **errp)
return -EINVAL; return -EINVAL;
} }
static bool ssh_has_filename_options_conflict(QDict *options, Error **errp)
{
const QDictEntry *qe;
for (qe = qdict_first(options); qe; qe = qdict_next(options, qe)) {
if (!strcmp(qe->key, "host") ||
!strcmp(qe->key, "port") ||
!strcmp(qe->key, "path") ||
!strcmp(qe->key, "user") ||
!strcmp(qe->key, "host_key_check") ||
strstart(qe->key, "server.", NULL))
{
error_setg(errp, "Option '%s' cannot be used with a file name",
qe->key);
return true;
}
}
return false;
}
static void ssh_parse_filename(const char *filename, QDict *options, static void ssh_parse_filename(const char *filename, QDict *options,
Error **errp) Error **errp)
{ {
if (ssh_has_filename_options_conflict(options, errp)) { if (qdict_haskey(options, "user") ||
qdict_haskey(options, "host") ||
qdict_haskey(options, "port") ||
qdict_haskey(options, "path") ||
qdict_haskey(options, "host_key_check")) {
error_setg(errp, "user, host, port, path, host_key_check cannot be used at the same time as a file option");
return; return;
} }
@@ -560,75 +540,14 @@ static QemuOptsList ssh_runtime_opts = {
}, },
}; };
static bool ssh_process_legacy_socket_options(QDict *output_opts,
QemuOpts *legacy_opts,
Error **errp)
{
const char *host = qemu_opt_get(legacy_opts, "host");
const char *port = qemu_opt_get(legacy_opts, "port");
if (!host && port) {
error_setg(errp, "port may not be used without host");
return false;
}
if (host) {
qdict_put_str(output_opts, "server.host", host);
qdict_put_str(output_opts, "server.port", port ?: stringify(22));
}
return true;
}
static InetSocketAddress *ssh_config(QDict *options, Error **errp)
{
InetSocketAddress *inet = NULL;
QDict *addr = NULL;
QObject *crumpled_addr = NULL;
Visitor *iv = NULL;
Error *local_error = NULL;
qdict_extract_subqdict(options, &addr, "server.");
if (!qdict_size(addr)) {
error_setg(errp, "SSH server address missing");
goto out;
}
crumpled_addr = qdict_crumple(addr, errp);
if (!crumpled_addr) {
goto out;
}
/*
* FIXME .numeric, .to, .ipv4 or .ipv6 don't work with -drive.
* .to doesn't matter, it's ignored anyway.
* That's because when @options come from -blockdev or
* blockdev_add, members are typed according to the QAPI schema,
* but when they come from -drive, they're all QString. The
* visitor expects the former.
*/
iv = qobject_input_visitor_new(crumpled_addr);
visit_type_InetSocketAddress(iv, NULL, &inet, &local_error);
if (local_error) {
error_propagate(errp, local_error);
goto out;
}
out:
QDECREF(addr);
qobject_decref(crumpled_addr);
visit_free(iv);
return inet;
}
static int connect_to_ssh(BDRVSSHState *s, QDict *options, static int connect_to_ssh(BDRVSSHState *s, QDict *options,
int ssh_flags, int creat_mode, Error **errp) int ssh_flags, int creat_mode, Error **errp)
{ {
int r, ret; int r, ret;
QemuOpts *opts = NULL; QemuOpts *opts = NULL;
Error *local_err = NULL; Error *local_err = NULL;
const char *user, *path, *host_key_check; const char *host, *user, *path, *host_key_check;
long port = 0; int port;
opts = qemu_opts_create(&ssh_runtime_opts, NULL, 0, &error_abort); opts = qemu_opts_create(&ssh_runtime_opts, NULL, 0, &error_abort);
qemu_opts_absorb_qdict(opts, options, &local_err); qemu_opts_absorb_qdict(opts, options, &local_err);
@@ -638,11 +557,15 @@ static int connect_to_ssh(BDRVSSHState *s, QDict *options,
goto err; goto err;
} }
if (!ssh_process_legacy_socket_options(options, opts, errp)) { host = qemu_opt_get(opts, "host");
if (!host) {
ret = -EINVAL; ret = -EINVAL;
error_setg(errp, "No hostname was specified");
goto err; goto err;
} }
port = qemu_opt_get_number(opts, "port", 22);
path = qemu_opt_get(opts, "path"); path = qemu_opt_get(opts, "path");
if (!path) { if (!path) {
ret = -EINVAL; ret = -EINVAL;
@@ -665,21 +588,12 @@ static int connect_to_ssh(BDRVSSHState *s, QDict *options,
host_key_check = "yes"; host_key_check = "yes";
} }
/* Pop the config into our state object, Exit if invalid */ /* Construct the host:port name for inet_connect. */
s->inet = ssh_config(options, errp); g_free(s->hostport);
if (!s->inet) { s->hostport = g_strdup_printf("%s:%d", host, port);
ret = -EINVAL;
goto err;
}
if (qemu_strtol(s->inet->port, NULL, 10, &port) < 0) {
error_setg(errp, "Use only numeric port value");
ret = -EINVAL;
goto err;
}
/* Open the socket and connect. */ /* Open the socket and connect. */
s->sock = inet_connect_saddr(s->inet, NULL, NULL, errp); s->sock = inet_connect(s->hostport, errp);
if (s->sock < 0) { if (s->sock < 0) {
ret = -EIO; ret = -EIO;
goto err; goto err;
@@ -705,8 +619,7 @@ static int connect_to_ssh(BDRVSSHState *s, QDict *options,
} }
/* Check the remote host's key against known_hosts. */ /* Check the remote host's key against known_hosts. */
ret = check_host_key(s, s->inet->host, port, host_key_check, ret = check_host_key(s, host, port, host_key_check, errp);
errp);
if (ret < 0) { if (ret < 0) {
goto err; goto err;
} }
@@ -895,14 +808,10 @@ static void restart_coroutine(void *opaque)
DPRINTF("co=%p", co); DPRINTF("co=%p", co);
aio_co_wake(co); qemu_coroutine_enter(co);
} }
/* A non-blocking call returned EAGAIN, so yield, ensuring the static coroutine_fn void set_fd_handler(BDRVSSHState *s, BlockDriverState *bs)
* handlers are set up so that we'll be rescheduled when there is an
* interesting event on the socket.
*/
static coroutine_fn void co_yield(BDRVSSHState *s, BlockDriverState *bs)
{ {
int r; int r;
IOHandler *rd_handler = NULL, *wr_handler = NULL; IOHandler *rd_handler = NULL, *wr_handler = NULL;
@@ -921,11 +830,26 @@ static coroutine_fn void co_yield(BDRVSSHState *s, BlockDriverState *bs)
rd_handler, wr_handler); rd_handler, wr_handler);
aio_set_fd_handler(bdrv_get_aio_context(bs), s->sock, aio_set_fd_handler(bdrv_get_aio_context(bs), s->sock,
false, rd_handler, wr_handler, NULL, co); false, rd_handler, wr_handler, co);
}
static coroutine_fn void clear_fd_handler(BDRVSSHState *s,
BlockDriverState *bs)
{
DPRINTF("s->sock=%d", s->sock);
aio_set_fd_handler(bdrv_get_aio_context(bs), s->sock,
false, NULL, NULL, NULL);
}
/* A non-blocking call returned EAGAIN, so yield, ensuring the
* handlers are set up so that we'll be rescheduled when there is an
* interesting event on the socket.
*/
static coroutine_fn void co_yield(BDRVSSHState *s, BlockDriverState *bs)
{
set_fd_handler(s, bs);
qemu_coroutine_yield(); qemu_coroutine_yield();
DPRINTF("s->sock=%d - back", s->sock); clear_fd_handler(s, bs);
aio_set_fd_handler(bdrv_get_aio_context(bs), s->sock, false,
NULL, NULL, NULL, NULL);
} }
/* SFTP has a function `libssh2_sftp_seek64' which seeks to a position /* SFTP has a function `libssh2_sftp_seek64' which seeks to a position
@@ -1116,7 +1040,7 @@ static void unsafe_flush_warning(BDRVSSHState *s, const char *what)
{ {
if (!s->unsafe_flush_warning) { if (!s->unsafe_flush_warning) {
error_report("warning: ssh server %s does not support fsync", error_report("warning: ssh server %s does not support fsync",
s->inet->host); s->hostport);
if (what) { if (what) {
error_report("to support fsync, you need %s", what); error_report("to support fsync, you need %s", what);
} }

View File

@@ -14,7 +14,7 @@
#include "qemu/osdep.h" #include "qemu/osdep.h"
#include "trace.h" #include "trace.h"
#include "block/block_int.h" #include "block/block_int.h"
#include "block/blockjob_int.h" #include "block/blockjob.h"
#include "qapi/error.h" #include "qapi/error.h"
#include "qapi/qmp/qerror.h" #include "qapi/qmp/qerror.h"
#include "qemu/ratelimit.h" #include "qemu/ratelimit.h"
@@ -37,7 +37,6 @@ typedef struct StreamBlockJob {
BlockDriverState *base; BlockDriverState *base;
BlockdevOnError on_error; BlockdevOnError on_error;
char *backing_file_str; char *backing_file_str;
int bs_flags;
} StreamBlockJob; } StreamBlockJob;
static int coroutine_fn stream_populate(BlockBackend *blk, static int coroutine_fn stream_populate(BlockBackend *blk,
@@ -68,7 +67,6 @@ static void stream_complete(BlockJob *job, void *opaque)
StreamCompleteData *data = opaque; StreamCompleteData *data = opaque;
BlockDriverState *bs = blk_bs(job->blk); BlockDriverState *bs = blk_bs(job->blk);
BlockDriverState *base = s->base; BlockDriverState *base = s->base;
Error *local_err = NULL;
if (!block_job_is_cancelled(&s->common) && data->reached_end && if (!block_job_is_cancelled(&s->common) && data->reached_end &&
data->ret == 0) { data->ret == 0) {
@@ -80,20 +78,7 @@ static void stream_complete(BlockJob *job, void *opaque)
} }
} }
data->ret = bdrv_change_backing_file(bs, base_id, base_fmt); data->ret = bdrv_change_backing_file(bs, base_id, base_fmt);
bdrv_set_backing_hd(bs, base, &local_err); bdrv_set_backing_hd(bs, base);
if (local_err) {
error_report_err(local_err);
data->ret = -EPERM;
goto out;
}
}
out:
/* Reopen the image back in read-only mode if necessary */
if (s->bs_flags != bdrv_get_flags(bs)) {
/* Give up write permissions before making it read-only */
blk_set_perm(job->blk, 0, BLK_PERM_ALL, &error_abort);
bdrv_reopen(bs, s->bs_flags, NULL);
} }
g_free(s->backing_file_str); g_free(s->backing_file_str);
@@ -227,59 +212,26 @@ static const BlockJobDriver stream_job_driver = {
.instance_size = sizeof(StreamBlockJob), .instance_size = sizeof(StreamBlockJob),
.job_type = BLOCK_JOB_TYPE_STREAM, .job_type = BLOCK_JOB_TYPE_STREAM,
.set_speed = stream_set_speed, .set_speed = stream_set_speed,
.start = stream_run,
}; };
void stream_start(const char *job_id, BlockDriverState *bs, void stream_start(const char *job_id, BlockDriverState *bs,
BlockDriverState *base, const char *backing_file_str, BlockDriverState *base, const char *backing_file_str,
int64_t speed, BlockdevOnError on_error, Error **errp) int64_t speed, BlockdevOnError on_error,
BlockCompletionFunc *cb, void *opaque, Error **errp)
{ {
StreamBlockJob *s; StreamBlockJob *s;
BlockDriverState *iter;
int orig_bs_flags;
/* Make sure that the image is opened in read-write mode */ s = block_job_create(job_id, &stream_job_driver, bs, speed,
orig_bs_flags = bdrv_get_flags(bs); cb, opaque, errp);
if (!(orig_bs_flags & BDRV_O_RDWR)) {
if (bdrv_reopen(bs, orig_bs_flags | BDRV_O_RDWR, errp) != 0) {
return;
}
}
/* Prevent concurrent jobs trying to modify the graph structure here, we
* already have our own plans. Also don't allow resize as the image size is
* queried only at the job start and then cached. */
s = block_job_create(job_id, &stream_job_driver, bs,
BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED |
BLK_PERM_GRAPH_MOD,
BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED |
BLK_PERM_WRITE,
speed, BLOCK_JOB_DEFAULT, NULL, NULL, errp);
if (!s) { if (!s) {
goto fail; return;
}
/* Block all intermediate nodes between bs and base, because they will
* disappear from the chain after this operation. The streaming job reads
* every block only once, assuming that it doesn't change, so block writes
* and resizes. */
for (iter = backing_bs(bs); iter && iter != base; iter = backing_bs(iter)) {
block_job_add_bdrv(&s->common, "intermediate node", iter, 0,
BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED,
&error_abort);
} }
s->base = base; s->base = base;
s->backing_file_str = g_strdup(backing_file_str); s->backing_file_str = g_strdup(backing_file_str);
s->bs_flags = orig_bs_flags;
s->on_error = on_error; s->on_error = on_error;
trace_stream_start(bs, base, s); s->common.co = qemu_coroutine_create(stream_run, s);
block_job_start(&s->common); trace_stream_start(bs, base, s, s->common.co, opaque);
return; qemu_coroutine_enter(s->common.co);
fail:
if (orig_bs_flags != bdrv_get_flags(bs)) {
bdrv_reopen(bs, orig_bs_flags, NULL);
}
} }

379
block/tar.c Normal file
View File

@@ -0,0 +1,379 @@
/*
* Tar block driver
*
* Copyright (c) 2009 Alexander Graf <agraf@suse.de>
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*/
#include "qemu/osdep.h"
#include "qapi/error.h"
#include "qemu-common.h"
#include "block/block_int.h"
// #define DEBUG
#ifdef DEBUG
#define dprintf(fmt, ...) do { printf("tar: " fmt, ## __VA_ARGS__); } while (0)
#else
#define dprintf(fmt, ...) do { } while (0)
#endif
#define SECTOR_SIZE 512
#define POSIX_TAR_MAGIC "ustar"
#define OFFS_LENGTH 0x7c
#define OFFS_TYPE 0x9c
#define OFFS_MAGIC 0x101
#define OFFS_S_SP 0x182
#define OFFS_S_EXT 0x1e2
#define OFFS_S_LENGTH 0x1e3
#define OFFS_SX_EXT 0x1f8
typedef struct SparseCache {
uint64_t start;
uint64_t end;
} SparseCache;
typedef struct BDRVTarState {
BlockDriverState *hd;
size_t file_sec;
uint64_t file_len;
SparseCache *sparse;
int sparse_num;
uint64_t last_end;
char longfile[2048];
} BDRVTarState;
static int str_ends(char *str, const char *end)
{
int end_len = strlen(end);
int str_len = strlen(str);
if (str_len < end_len)
return 0;
return !strncmp(str + str_len - end_len, end, end_len);
}
static int is_target_file(BlockDriverState *bs, char *filename,
char *header)
{
int retval = 0;
if (str_ends(filename, ".raw"))
retval = 1;
if (str_ends(filename, ".qcow"))
retval = 1;
if (str_ends(filename, ".qcow2"))
retval = 1;
if (str_ends(filename, ".vmdk"))
retval = 1;
if (retval &&
(header[OFFS_TYPE] != '0') &&
(header[OFFS_TYPE] != 'S')) {
retval = 0;
}
dprintf("does filename %s match? %s\n", filename, retval ? "yes" : "no");
/* make sure we're not using this name again */
filename[0] = '\0';
return retval;
}
static uint64_t tar2u64(char *ptr)
{
uint64_t retval;
char oldend = ptr[12];
ptr[12] = '\0';
if (*ptr & 0x80) {
/* XXX we only support files up to 64 bit length */
retval = be64_to_cpu(*(uint64_t *)(ptr+4));
dprintf("Convert %lx -> %#lx\n", *(uint64_t*)(ptr+4), retval);
} else {
retval = strtol(ptr, NULL, 8);
dprintf("Convert %s -> %#lx\n", ptr, retval);
}
ptr[12] = oldend;
return retval;
}
static void tar_sparse(BDRVTarState *s, uint64_t offs, uint64_t len)
{
SparseCache *sparse;
if (!len)
return;
if (!(offs - s->last_end)) {
s->last_end += len;
return;
}
if (s->last_end > offs)
return;
dprintf("Last chunk until %lx new chunk at %lx\n", s->last_end, offs);
s->sparse = g_realloc(s->sparse, (s->sparse_num + 1) * sizeof(SparseCache));
sparse = &s->sparse[s->sparse_num];
sparse->start = s->last_end;
sparse->end = offs;
s->last_end = offs + len;
s->sparse_num++;
dprintf("Sparse at %lx end=%lx\n", sparse->start,
sparse->end);
}
static QemuOptsList runtime_opts = {
.name = "tar",
.head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
.desc = {
{
.name = "filename",
.type = QEMU_OPT_STRING,
.help = "URL to the tar file",
},
{ /* end of list */ }
},
};
static int tar_open(BlockDriverState *bs, QDict *options, int flags, Error **errp)
{
BDRVTarState *s = bs->opaque;
char header[SECTOR_SIZE];
char *real_file = header;
char *magic;
size_t header_offs = 0;
int ret;
QemuOpts *opts;
Error *local_err = NULL;
const char *filename;
opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort);
qemu_opts_absorb_qdict(opts, options, &local_err);
if (local_err != NULL) {
error_propagate(errp, local_err);
ret = -EINVAL;
goto fail;
}
filename = qemu_opt_get(opts, "filename");
if (!strncmp(filename, "tar://", 6))
filename += 6;
else if (!strncmp(filename, "tar:", 4))
filename += 4;
s->hd = bdrv_open(filename, NULL, NULL, flags | BDRV_O_PROTOCOL, errp);
if (!s->hd) {
ret = -EINVAL;
qemu_opts_del(opts);
return ret;
}
/* Search the file for an image */
do {
/* tar header */
if (bdrv_pread(s->hd->file, header_offs, header, SECTOR_SIZE) != SECTOR_SIZE)
goto fail;
if ((header_offs > 1) && !header[0]) {
fprintf(stderr, "Tar: No image file found in archive\n");
goto fail;
}
magic = &header[OFFS_MAGIC];
if (strncmp(magic, POSIX_TAR_MAGIC, 5)) {
fprintf(stderr, "Tar: Invalid magic: %s\n", magic);
goto fail;
}
dprintf("file type: %c\n", header[OFFS_TYPE]);
/* file length*/
s->file_len = (tar2u64(&header[OFFS_LENGTH]) + (SECTOR_SIZE - 1)) &
~(SECTOR_SIZE - 1);
s->file_sec = (header_offs / SECTOR_SIZE) + 1;
header_offs += s->file_len + SECTOR_SIZE;
if (header[OFFS_TYPE] == 'L') {
bdrv_pread(s->hd->file, header_offs - s->file_len, s->longfile,
sizeof(s->longfile));
s->longfile[sizeof(s->longfile)-1] = '\0';
real_file = header;
} else if (s->longfile[0]) {
real_file = s->longfile;
} else {
real_file = header;
}
} while(!is_target_file(bs, real_file, header));
/* We found an image! */
if (header[OFFS_TYPE] == 'S') {
uint8_t isextended;
int i;
for (i = OFFS_S_SP; i < (OFFS_S_SP + (4 * 24)); i += 24)
tar_sparse(s, tar2u64(&header[i]), tar2u64(&header[i+12]));
s->file_len = tar2u64(&header[OFFS_S_LENGTH]);
isextended = header[OFFS_S_EXT];
while (isextended) {
if (bdrv_pread(s->hd->file, s->file_sec * SECTOR_SIZE, header,
SECTOR_SIZE) != SECTOR_SIZE)
goto fail;
for (i = 0; i < (21 * 24); i += 24)
tar_sparse(s, tar2u64(&header[i]), tar2u64(&header[i+12]));
isextended = header[OFFS_SX_EXT];
s->file_sec++;
}
tar_sparse(s, s->file_len, 1);
}
qemu_opts_del(opts);
return 0;
fail:
fprintf(stderr, "Tar: Error opening file\n");
bdrv_unref(s->hd);
qemu_opts_del(opts);
return -EINVAL;
}
typedef struct TarAIOCB {
BlockAIOCB common;
QEMUBH *bh;
} TarAIOCB;
/* This callback gets invoked when we have pure sparseness */
static void tar_sparse_cb(void *opaque)
{
TarAIOCB *acb = (TarAIOCB *)opaque;
acb->common.cb(acb->common.opaque, 0);
qemu_bh_delete(acb->bh);
qemu_aio_unref(acb);
}
static AIOCBInfo tar_aiocb_info = {
.aiocb_size = sizeof(TarAIOCB),
};
/* This is where we get a request from a caller to read something */
static BlockAIOCB *tar_aio_readv(BlockDriverState *bs,
int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
BlockCompletionFunc *cb, void *opaque)
{
BDRVTarState *s = bs->opaque;
SparseCache *sparse;
int64_t sec_file = sector_num + s->file_sec;
int64_t start = sector_num * SECTOR_SIZE;
int64_t end = start + (nb_sectors * SECTOR_SIZE);
int i;
TarAIOCB *acb;
for (i = 0; i < s->sparse_num; i++) {
sparse = &s->sparse[i];
if (sparse->start > end) {
/* We expect the cache to be start increasing */
break;
} else if ((sparse->start < start) && (sparse->end <= start)) {
/* sparse before our offset */
sec_file -= (sparse->end - sparse->start) / SECTOR_SIZE;
} else if ((sparse->start <= start) && (sparse->end >= end)) {
/* all our sectors are sparse */
char *buf = g_malloc0(nb_sectors * SECTOR_SIZE);
acb = qemu_aio_get(&tar_aiocb_info, bs, cb, opaque);
qemu_iovec_from_buf(qiov, 0, buf, nb_sectors * SECTOR_SIZE);
g_free(buf);
acb->bh = qemu_bh_new(tar_sparse_cb, acb);
qemu_bh_schedule(acb->bh);
return &acb->common;
} else if (((sparse->start >= start) && (sparse->start < end)) ||
((sparse->end >= start) && (sparse->end < end))) {
/* we're semi-sparse (worst case) */
/* let's go synchronous and read all sectors individually */
char *buf = g_malloc(nb_sectors * SECTOR_SIZE);
uint64_t offs;
for (offs = 0; offs < (nb_sectors * SECTOR_SIZE);
offs += SECTOR_SIZE) {
bdrv_pread(bs->file, (sector_num * SECTOR_SIZE) + offs,
buf + offs, SECTOR_SIZE);
}
qemu_iovec_from_buf(qiov, 0, buf, nb_sectors * SECTOR_SIZE);
acb = qemu_aio_get(&tar_aiocb_info, bs, cb, opaque);
acb->bh = qemu_bh_new(tar_sparse_cb, acb);
qemu_bh_schedule(acb->bh);
return &acb->common;
}
}
return bdrv_aio_readv(s->hd->file, sec_file, qiov, nb_sectors,
cb, opaque);
}
static void tar_close(BlockDriverState *bs)
{
dprintf("Close\n");
}
static int64_t tar_getlength(BlockDriverState *bs)
{
BDRVTarState *s = bs->opaque;
dprintf("getlength -> %ld\n", s->file_len);
return s->file_len;
}
static BlockDriver bdrv_tar = {
.format_name = "tar",
.protocol_name = "tar",
.instance_size = sizeof(BDRVTarState),
.bdrv_file_open = tar_open,
.bdrv_close = tar_close,
.bdrv_getlength = tar_getlength,
.bdrv_aio_readv = tar_aio_readv,
};
static void tar_block_init(void)
{
bdrv_register(&bdrv_tar);
}
block_init(tar_block_init);

View File

@@ -168,22 +168,6 @@ static BlockBackend *throttle_group_next_blk(BlockBackend *blk)
return blk_by_public(next); return blk_by_public(next);
} }
/*
* Return whether a BlockBackend has pending requests.
*
* This assumes that tg->lock is held.
*
* @blk: the BlockBackend
* @is_write: the type of operation (read/write)
* @ret: whether the BlockBackend has pending requests.
*/
static inline bool blk_has_pending_reqs(BlockBackend *blk,
bool is_write)
{
const BlockBackendPublic *blkp = blk_get_public(blk);
return blkp->pending_reqs[is_write];
}
/* Return the next BlockBackend in the round-robin sequence with pending I/O /* Return the next BlockBackend in the round-robin sequence with pending I/O
* requests. * requests.
* *
@@ -204,7 +188,7 @@ static BlockBackend *next_throttle_token(BlockBackend *blk, bool is_write)
/* get next bs round in round robin style */ /* get next bs round in round robin style */
token = throttle_group_next_blk(token); token = throttle_group_next_blk(token);
while (token != start && !blk_has_pending_reqs(token, is_write)) { while (token != start && !blkp->pending_reqs[is_write]) {
token = throttle_group_next_blk(token); token = throttle_group_next_blk(token);
} }
@@ -212,13 +196,10 @@ static BlockBackend *next_throttle_token(BlockBackend *blk, bool is_write)
* then decide the token is the current bs because chances are * then decide the token is the current bs because chances are
* the current bs get the current request queued. * the current bs get the current request queued.
*/ */
if (token == start && !blk_has_pending_reqs(token, is_write)) { if (token == start && !blkp->pending_reqs[is_write]) {
token = blk; token = blk;
} }
/* Either we return the original BB, or one with pending requests */
assert(token == blk || blk_has_pending_reqs(token, is_write));
return token; return token;
} }
@@ -240,7 +221,7 @@ static bool throttle_group_schedule_timer(BlockBackend *blk, bool is_write)
ThrottleGroup *tg = container_of(ts, ThrottleGroup, ts); ThrottleGroup *tg = container_of(ts, ThrottleGroup, ts);
bool must_wait; bool must_wait;
if (atomic_read(&blkp->io_limits_disabled)) { if (blkp->io_limits_disabled) {
return false; return false;
} }
@@ -260,25 +241,6 @@ static bool throttle_group_schedule_timer(BlockBackend *blk, bool is_write)
return must_wait; return must_wait;
} }
/* Start the next pending I/O request for a BlockBackend. Return whether
* any request was actually pending.
*
* @blk: the current BlockBackend
* @is_write: the type of operation (read/write)
*/
static bool coroutine_fn throttle_group_co_restart_queue(BlockBackend *blk,
bool is_write)
{
BlockBackendPublic *blkp = blk_get_public(blk);
bool ret;
qemu_co_mutex_lock(&blkp->throttled_reqs_lock);
ret = qemu_co_queue_next(&blkp->throttled_reqs[is_write]);
qemu_co_mutex_unlock(&blkp->throttled_reqs_lock);
return ret;
}
/* Look for the next pending I/O request and schedule it. /* Look for the next pending I/O request and schedule it.
* *
* This assumes that tg->lock is held. * This assumes that tg->lock is held.
@@ -295,7 +257,7 @@ static void schedule_next_request(BlockBackend *blk, bool is_write)
/* Check if there's any pending request to schedule next */ /* Check if there's any pending request to schedule next */
token = next_throttle_token(blk, is_write); token = next_throttle_token(blk, is_write);
if (!blk_has_pending_reqs(token, is_write)) { if (!blkp->pending_reqs[is_write]) {
return; return;
} }
@@ -306,12 +268,12 @@ static void schedule_next_request(BlockBackend *blk, bool is_write)
if (!must_wait) { if (!must_wait) {
/* Give preference to requests from the current blk */ /* Give preference to requests from the current blk */
if (qemu_in_coroutine() && if (qemu_in_coroutine() &&
throttle_group_co_restart_queue(blk, is_write)) { qemu_co_queue_next(&blkp->throttled_reqs[is_write])) {
token = blk; token = blk;
} else { } else {
ThrottleTimers *tt = &blk_get_public(token)->throttle_timers; ThrottleTimers *tt = &blkp->throttle_timers;
int64_t now = qemu_clock_get_ns(tt->clock_type); int64_t now = qemu_clock_get_ns(tt->clock_type);
timer_mod(tt->timers[is_write], now); timer_mod(tt->timers[is_write], now + 1);
tg->any_timer_armed[is_write] = true; tg->any_timer_armed[is_write] = true;
} }
tg->tokens[is_write] = token; tg->tokens[is_write] = token;
@@ -345,10 +307,7 @@ void coroutine_fn throttle_group_co_io_limits_intercept(BlockBackend *blk,
if (must_wait || blkp->pending_reqs[is_write]) { if (must_wait || blkp->pending_reqs[is_write]) {
blkp->pending_reqs[is_write]++; blkp->pending_reqs[is_write]++;
qemu_mutex_unlock(&tg->lock); qemu_mutex_unlock(&tg->lock);
qemu_co_mutex_lock(&blkp->throttled_reqs_lock); qemu_co_queue_wait(&blkp->throttled_reqs[is_write]);
qemu_co_queue_wait(&blkp->throttled_reqs[is_write],
&blkp->throttled_reqs_lock);
qemu_co_mutex_unlock(&blkp->throttled_reqs_lock);
qemu_mutex_lock(&tg->lock); qemu_mutex_lock(&tg->lock);
blkp->pending_reqs[is_write]--; blkp->pending_reqs[is_write]--;
} }
@@ -362,50 +321,15 @@ void coroutine_fn throttle_group_co_io_limits_intercept(BlockBackend *blk,
qemu_mutex_unlock(&tg->lock); qemu_mutex_unlock(&tg->lock);
} }
typedef struct {
BlockBackend *blk;
bool is_write;
} RestartData;
static void coroutine_fn throttle_group_restart_queue_entry(void *opaque)
{
RestartData *data = opaque;
BlockBackend *blk = data->blk;
bool is_write = data->is_write;
BlockBackendPublic *blkp = blk_get_public(blk);
ThrottleGroup *tg = container_of(blkp->throttle_state, ThrottleGroup, ts);
bool empty_queue;
empty_queue = !throttle_group_co_restart_queue(blk, is_write);
/* If the request queue was empty then we have to take care of
* scheduling the next one */
if (empty_queue) {
qemu_mutex_lock(&tg->lock);
schedule_next_request(blk, is_write);
qemu_mutex_unlock(&tg->lock);
}
}
static void throttle_group_restart_queue(BlockBackend *blk, bool is_write)
{
Coroutine *co;
RestartData rd = {
.blk = blk,
.is_write = is_write
};
co = qemu_coroutine_create(throttle_group_restart_queue_entry, &rd);
aio_co_enter(blk_get_aio_context(blk), co);
}
void throttle_group_restart_blk(BlockBackend *blk) void throttle_group_restart_blk(BlockBackend *blk)
{ {
BlockBackendPublic *blkp = blk_get_public(blk); BlockBackendPublic *blkp = blk_get_public(blk);
int i;
if (blkp->throttle_state) { for (i = 0; i < 2; i++) {
throttle_group_restart_queue(blk, 0); while (qemu_co_enter_next(&blkp->throttled_reqs[i])) {
throttle_group_restart_queue(blk, 1); ;
}
} }
} }
@@ -433,7 +357,8 @@ void throttle_group_config(BlockBackend *blk, ThrottleConfig *cfg)
throttle_config(ts, tt, cfg); throttle_config(ts, tt, cfg);
qemu_mutex_unlock(&tg->lock); qemu_mutex_unlock(&tg->lock);
throttle_group_restart_blk(blk); qemu_co_enter_next(&blkp->throttled_reqs[0]);
qemu_co_enter_next(&blkp->throttled_reqs[1]);
} }
/* Get the throttle configuration from a particular group. Similar to /* Get the throttle configuration from a particular group. Similar to
@@ -464,6 +389,7 @@ static void timer_cb(BlockBackend *blk, bool is_write)
BlockBackendPublic *blkp = blk_get_public(blk); BlockBackendPublic *blkp = blk_get_public(blk);
ThrottleState *ts = blkp->throttle_state; ThrottleState *ts = blkp->throttle_state;
ThrottleGroup *tg = container_of(ts, ThrottleGroup, ts); ThrottleGroup *tg = container_of(ts, ThrottleGroup, ts);
bool empty_queue;
/* The timer has just been fired, so we can update the flag */ /* The timer has just been fired, so we can update the flag */
qemu_mutex_lock(&tg->lock); qemu_mutex_lock(&tg->lock);
@@ -471,7 +397,15 @@ static void timer_cb(BlockBackend *blk, bool is_write)
qemu_mutex_unlock(&tg->lock); qemu_mutex_unlock(&tg->lock);
/* Run the request that was waiting for this timer */ /* Run the request that was waiting for this timer */
throttle_group_restart_queue(blk, is_write); empty_queue = !qemu_co_enter_next(&blkp->throttled_reqs[is_write]);
/* If the request queue was empty then we have to take care of
* scheduling the next one */
if (empty_queue) {
qemu_mutex_lock(&tg->lock);
schedule_next_request(blk, is_write);
qemu_mutex_unlock(&tg->lock);
}
} }
static void read_timer_cb(void *opaque) static void read_timer_cb(void *opaque)

View File

@@ -9,6 +9,7 @@ blk_co_preadv(void *blk, void *bs, int64_t offset, unsigned int bytes, int flags
blk_co_pwritev(void *blk, void *bs, int64_t offset, unsigned int bytes, int flags) "blk %p bs %p offset %"PRId64" bytes %u flags %x" blk_co_pwritev(void *blk, void *bs, int64_t offset, unsigned int bytes, int flags) "blk %p bs %p offset %"PRId64" bytes %u flags %x"
# block/io.c # block/io.c
bdrv_aio_pdiscard(void *bs, int64_t offset, int count, void *opaque) "bs %p offset %"PRId64" count %d opaque %p"
bdrv_aio_flush(void *bs, void *opaque) "bs %p opaque %p" bdrv_aio_flush(void *bs, void *opaque) "bs %p opaque %p"
bdrv_aio_readv(void *bs, int64_t sector_num, int nb_sectors, void *opaque) "bs %p sector_num %"PRId64" nb_sectors %d opaque %p" bdrv_aio_readv(void *bs, int64_t sector_num, int nb_sectors, void *opaque) "bs %p sector_num %"PRId64" nb_sectors %d opaque %p"
bdrv_aio_writev(void *bs, int64_t sector_num, int nb_sectors, void *opaque) "bs %p sector_num %"PRId64" nb_sectors %d opaque %p" bdrv_aio_writev(void *bs, int64_t sector_num, int nb_sectors, void *opaque) "bs %p sector_num %"PRId64" nb_sectors %d opaque %p"
@@ -19,14 +20,14 @@ bdrv_co_do_copy_on_readv(void *bs, int64_t offset, unsigned int bytes, int64_t c
# block/stream.c # block/stream.c
stream_one_iteration(void *s, int64_t sector_num, int nb_sectors, int is_allocated) "s %p sector_num %"PRId64" nb_sectors %d is_allocated %d" stream_one_iteration(void *s, int64_t sector_num, int nb_sectors, int is_allocated) "s %p sector_num %"PRId64" nb_sectors %d is_allocated %d"
stream_start(void *bs, void *base, void *s) "bs %p base %p s %p" stream_start(void *bs, void *base, void *s, void *co, void *opaque) "bs %p base %p s %p co %p opaque %p"
# block/commit.c # block/commit.c
commit_one_iteration(void *s, int64_t sector_num, int nb_sectors, int is_allocated) "s %p sector_num %"PRId64" nb_sectors %d is_allocated %d" commit_one_iteration(void *s, int64_t sector_num, int nb_sectors, int is_allocated) "s %p sector_num %"PRId64" nb_sectors %d is_allocated %d"
commit_start(void *bs, void *base, void *top, void *s) "bs %p base %p top %p s %p" commit_start(void *bs, void *base, void *top, void *s, void *co, void *opaque) "bs %p base %p top %p s %p co %p opaque %p"
# block/mirror.c # block/mirror.c
mirror_start(void *bs, void *s, void *opaque) "bs %p s %p opaque %p" mirror_start(void *bs, void *s, void *co, void *opaque) "bs %p s %p co %p opaque %p"
mirror_restart_iter(void *s, int64_t cnt) "s %p dirty count %"PRId64 mirror_restart_iter(void *s, int64_t cnt) "s %p dirty count %"PRId64
mirror_before_flush(void *s) "s %p" mirror_before_flush(void *s) "s %p"
mirror_before_drain(void *s, int64_t cnt) "s %p dirty count %"PRId64 mirror_before_drain(void *s, int64_t cnt) "s %p dirty count %"PRId64
@@ -35,6 +36,8 @@ mirror_one_iteration(void *s, int64_t sector_num, int nb_sectors) "s %p sector_n
mirror_iteration_done(void *s, int64_t sector_num, int nb_sectors, int ret) "s %p sector_num %"PRId64" nb_sectors %d ret %d" mirror_iteration_done(void *s, int64_t sector_num, int nb_sectors, int ret) "s %p sector_num %"PRId64" nb_sectors %d ret %d"
mirror_yield(void *s, int64_t cnt, int buf_free_count, int in_flight) "s %p dirty count %"PRId64" free buffers %d in_flight %d" mirror_yield(void *s, int64_t cnt, int buf_free_count, int in_flight) "s %p dirty count %"PRId64" free buffers %d in_flight %d"
mirror_yield_in_flight(void *s, int64_t sector_num, int in_flight) "s %p sector_num %"PRId64" in_flight %d" mirror_yield_in_flight(void *s, int64_t sector_num, int in_flight) "s %p sector_num %"PRId64" in_flight %d"
mirror_yield_buf_busy(void *s, int nb_chunks, int in_flight) "s %p requested chunks %d in_flight %d"
mirror_break_buf_busy(void *s, int nb_chunks, int in_flight) "s %p requested chunks %d in_flight %d"
# block/backup.c # block/backup.c
backup_do_cow_enter(void *job, int64_t start, int64_t sector_num, int nb_sectors) "job %p start %"PRId64" sector_num %"PRId64" nb_sectors %d" backup_do_cow_enter(void *job, int64_t start, int64_t sector_num, int nb_sectors) "job %p start %"PRId64" sector_num %"PRId64" nb_sectors %d"
@@ -49,10 +52,11 @@ qmp_block_job_cancel(void *job) "job %p"
qmp_block_job_pause(void *job) "job %p" qmp_block_job_pause(void *job) "job %p"
qmp_block_job_resume(void *job) "job %p" qmp_block_job_resume(void *job) "job %p"
qmp_block_job_complete(void *job) "job %p" qmp_block_job_complete(void *job) "job %p"
block_job_cb(void *bs, void *job, int ret) "bs %p job %p ret %d"
qmp_block_stream(void *bs, void *job) "bs %p job %p" qmp_block_stream(void *bs, void *job) "bs %p job %p"
# block/file-win32.c # block/raw-win32.c
# block/file-posix.c # block/raw-posix.c
paio_submit_co(int64_t offset, int count, int type) "offset %"PRId64" count %d type %d" paio_submit_co(int64_t offset, int count, int type) "offset %"PRId64" count %d type %d"
paio_submit(void *acb, void *opaque, int64_t offset, int count, int type) "acb %p opaque %p offset %"PRId64" count %d type %d" paio_submit(void *acb, void *opaque, int64_t offset, int count, int type) "acb %p opaque %p offset %"PRId64" count %d type %d"
@@ -110,20 +114,3 @@ qed_aio_write_data(void *s, void *acb, int ret, uint64_t offset, size_t len) "s
qed_aio_write_prefill(void *s, void *acb, uint64_t start, size_t len, uint64_t offset) "s %p acb %p start %"PRIu64" len %zu offset %"PRIu64 qed_aio_write_prefill(void *s, void *acb, uint64_t start, size_t len, uint64_t offset) "s %p acb %p start %"PRIu64" len %zu offset %"PRIu64
qed_aio_write_postfill(void *s, void *acb, uint64_t start, size_t len, uint64_t offset) "s %p acb %p start %"PRIu64" len %zu offset %"PRIu64 qed_aio_write_postfill(void *s, void *acb, uint64_t start, size_t len, uint64_t offset) "s %p acb %p start %"PRIu64" len %zu offset %"PRIu64
qed_aio_write_main(void *s, void *acb, int ret, uint64_t offset, size_t len) "s %p acb %p ret %d offset %"PRIu64" len %zu" qed_aio_write_main(void *s, void *acb, int ret, uint64_t offset, size_t len) "s %p acb %p ret %d offset %"PRIu64" len %zu"
# block/vxhs.c
vxhs_iio_callback(int error) "ctx is NULL: error %d"
vxhs_iio_callback_chnfail(int err, int error) "QNIO channel failed, no i/o %d, %d"
vxhs_iio_callback_unknwn(int opcode, int err) "unexpected opcode %d, errno %d"
vxhs_aio_rw_invalid(int req) "Invalid I/O request iodir %d"
vxhs_aio_rw_ioerr(char *guid, int iodir, uint64_t size, uint64_t off, void *acb, int ret, int err) "IO ERROR (vDisk %s) FOR : Read/Write = %d size = %"PRIu64" offset = %"PRIu64" ACB = %p. Error = %d, errno = %d"
vxhs_get_vdisk_stat_err(char *guid, int ret, int err) "vDisk (%s) stat ioctl failed, ret = %d, errno = %d"
vxhs_get_vdisk_stat(char *vdisk_guid, uint64_t vdisk_size) "vDisk %s stat ioctl returned size %"PRIu64
vxhs_complete_aio(void *acb, uint64_t ret) "aio failed acb %p ret %"PRIu64
vxhs_parse_uri_filename(const char *filename) "URI passed via bdrv_parse_filename %s"
vxhs_open_vdiskid(const char *vdisk_id) "Opening vdisk-id %s"
vxhs_open_hostinfo(char *of_vsa_addr, int port) "Adding host %s:%d to BDRVVXHSState"
vxhs_open_iio_open(const char *host) "Failed to connect to storage agent on host %s"
vxhs_parse_uri_hostinfo(char *host, int port) "Host: IP %s, Port %d"
vxhs_close(char *vdisk_guid) "Closing vdisk %s"
vxhs_get_creds(const char *cacert, const char *client_key, const char *client_cert) "cacert %s, client_key %s, client_cert %s"

View File

@@ -55,10 +55,17 @@
#include "sysemu/block-backend.h" #include "sysemu/block-backend.h"
#include "qemu/module.h" #include "qemu/module.h"
#include "qemu/bswap.h" #include "qemu/bswap.h"
#include "migration/blocker.h" #include "migration/migration.h"
#include "qemu/coroutine.h" #include "qemu/coroutine.h"
#include "qemu/cutils.h" #include "qemu/cutils.h"
#include "qemu/uuid.h"
#if defined(CONFIG_UUID)
#include <uuid/uuid.h>
#else
/* TODO: move uuid emulation to some central place in QEMU. */
#include "sysemu/sysemu.h" /* UUID_FMT */
typedef unsigned char uuid_t[16];
#endif
/* Code configuration options. */ /* Code configuration options. */
@@ -133,6 +140,28 @@
#define VDI_DISK_SIZE_MAX ((uint64_t)VDI_BLOCKS_IN_IMAGE_MAX * \ #define VDI_DISK_SIZE_MAX ((uint64_t)VDI_BLOCKS_IN_IMAGE_MAX * \
(uint64_t)DEFAULT_CLUSTER_SIZE) (uint64_t)DEFAULT_CLUSTER_SIZE)
#if !defined(CONFIG_UUID)
static inline void uuid_generate(uuid_t out)
{
memset(out, 0, sizeof(uuid_t));
}
static inline int uuid_is_null(const uuid_t uu)
{
uuid_t null_uuid = { 0 };
return memcmp(uu, null_uuid, sizeof(uuid_t)) == 0;
}
# if defined(CONFIG_VDI_DEBUG)
static inline void uuid_unparse(const uuid_t uu, char *out)
{
snprintf(out, 37, UUID_FMT,
uu[0], uu[1], uu[2], uu[3], uu[4], uu[5], uu[6], uu[7],
uu[8], uu[9], uu[10], uu[11], uu[12], uu[13], uu[14], uu[15]);
}
# endif
#endif
typedef struct { typedef struct {
char text[0x40]; char text[0x40];
uint32_t signature; uint32_t signature;
@@ -153,10 +182,10 @@ typedef struct {
uint32_t block_extra; /* unused here */ uint32_t block_extra; /* unused here */
uint32_t blocks_in_image; uint32_t blocks_in_image;
uint32_t blocks_allocated; uint32_t blocks_allocated;
QemuUUID uuid_image; uuid_t uuid_image;
QemuUUID uuid_last_snap; uuid_t uuid_last_snap;
QemuUUID uuid_link; uuid_t uuid_link;
QemuUUID uuid_parent; uuid_t uuid_parent;
uint64_t unused2[7]; uint64_t unused2[7];
} QEMU_PACKED VdiHeader; } QEMU_PACKED VdiHeader;
@@ -177,6 +206,16 @@ typedef struct {
Error *migration_blocker; Error *migration_blocker;
} BDRVVdiState; } BDRVVdiState;
/* Change UUID from little endian (IPRT = VirtualBox format) to big endian
* format (network byte order, standard, see RFC 4122) and vice versa.
*/
static void uuid_convert(uuid_t uuid)
{
bswap32s((uint32_t *)&uuid[0]);
bswap16s((uint16_t *)&uuid[4]);
bswap16s((uint16_t *)&uuid[6]);
}
static void vdi_header_to_cpu(VdiHeader *header) static void vdi_header_to_cpu(VdiHeader *header)
{ {
le32_to_cpus(&header->signature); le32_to_cpus(&header->signature);
@@ -195,10 +234,10 @@ static void vdi_header_to_cpu(VdiHeader *header)
le32_to_cpus(&header->block_extra); le32_to_cpus(&header->block_extra);
le32_to_cpus(&header->blocks_in_image); le32_to_cpus(&header->blocks_in_image);
le32_to_cpus(&header->blocks_allocated); le32_to_cpus(&header->blocks_allocated);
qemu_uuid_bswap(&header->uuid_image); uuid_convert(header->uuid_image);
qemu_uuid_bswap(&header->uuid_last_snap); uuid_convert(header->uuid_last_snap);
qemu_uuid_bswap(&header->uuid_link); uuid_convert(header->uuid_link);
qemu_uuid_bswap(&header->uuid_parent); uuid_convert(header->uuid_parent);
} }
static void vdi_header_to_le(VdiHeader *header) static void vdi_header_to_le(VdiHeader *header)
@@ -219,10 +258,10 @@ static void vdi_header_to_le(VdiHeader *header)
cpu_to_le32s(&header->block_extra); cpu_to_le32s(&header->block_extra);
cpu_to_le32s(&header->blocks_in_image); cpu_to_le32s(&header->blocks_in_image);
cpu_to_le32s(&header->blocks_allocated); cpu_to_le32s(&header->blocks_allocated);
qemu_uuid_bswap(&header->uuid_image); uuid_convert(header->uuid_image);
qemu_uuid_bswap(&header->uuid_last_snap); uuid_convert(header->uuid_last_snap);
qemu_uuid_bswap(&header->uuid_link); uuid_convert(header->uuid_link);
qemu_uuid_bswap(&header->uuid_parent); uuid_convert(header->uuid_parent);
} }
#if defined(CONFIG_VDI_DEBUG) #if defined(CONFIG_VDI_DEBUG)
@@ -361,13 +400,6 @@ static int vdi_open(BlockDriverState *bs, QDict *options, int flags,
VdiHeader header; VdiHeader header;
size_t bmap_size; size_t bmap_size;
int ret; int ret;
Error *local_err = NULL;
bs->file = bdrv_open_child(NULL, options, "file", bs, &child_file,
false, errp);
if (!bs->file) {
return -EINVAL;
}
logout("\n"); logout("\n");
@@ -437,11 +469,11 @@ static int vdi_open(BlockDriverState *bs, QDict *options, int flags,
(uint64_t)header.blocks_in_image * header.block_size); (uint64_t)header.blocks_in_image * header.block_size);
ret = -ENOTSUP; ret = -ENOTSUP;
goto fail; goto fail;
} else if (!qemu_uuid_is_null(&header.uuid_link)) { } else if (!uuid_is_null(header.uuid_link)) {
error_setg(errp, "unsupported VDI image (non-NULL link UUID)"); error_setg(errp, "unsupported VDI image (non-NULL link UUID)");
ret = -ENOTSUP; ret = -ENOTSUP;
goto fail; goto fail;
} else if (!qemu_uuid_is_null(&header.uuid_parent)) { } else if (!uuid_is_null(header.uuid_parent)) {
error_setg(errp, "unsupported VDI image (non-NULL parent UUID)"); error_setg(errp, "unsupported VDI image (non-NULL parent UUID)");
ret = -ENOTSUP; ret = -ENOTSUP;
goto fail; goto fail;
@@ -478,12 +510,7 @@ static int vdi_open(BlockDriverState *bs, QDict *options, int flags,
error_setg(&s->migration_blocker, "The vdi format used by node '%s' " error_setg(&s->migration_blocker, "The vdi format used by node '%s' "
"does not support live migration", "does not support live migration",
bdrv_get_device_or_node_name(bs)); bdrv_get_device_or_node_name(bs));
ret = migrate_add_blocker(s->migration_blocker, &local_err); migrate_add_blocker(s->migration_blocker);
if (local_err) {
error_propagate(errp, local_err);
error_free(s->migration_blocker);
goto fail_free_bmap;
}
qemu_co_mutex_init(&s->write_lock); qemu_co_mutex_init(&s->write_lock);
@@ -763,8 +790,7 @@ static int vdi_create(const char *filename, QemuOpts *opts, Error **errp)
} }
blk = blk_new_open(filename, NULL, NULL, blk = blk_new_open(filename, NULL, NULL,
BDRV_O_RDWR | BDRV_O_RESIZE | BDRV_O_PROTOCOL, BDRV_O_RDWR | BDRV_O_PROTOCOL, &local_err);
&local_err);
if (blk == NULL) { if (blk == NULL) {
error_propagate(errp, local_err); error_propagate(errp, local_err);
ret = -EIO; ret = -EIO;
@@ -795,8 +821,8 @@ static int vdi_create(const char *filename, QemuOpts *opts, Error **errp)
if (image_type == VDI_TYPE_STATIC) { if (image_type == VDI_TYPE_STATIC) {
header.blocks_allocated = blocks; header.blocks_allocated = blocks;
} }
qemu_uuid_generate(&header.uuid_image); uuid_generate(header.uuid_image);
qemu_uuid_generate(&header.uuid_last_snap); uuid_generate(header.uuid_last_snap);
/* There is no need to set header.uuid_link or header.uuid_parent here. */ /* There is no need to set header.uuid_link or header.uuid_parent here. */
#if defined(CONFIG_VDI_DEBUG) #if defined(CONFIG_VDI_DEBUG)
vdi_header_print(&header); vdi_header_print(&header);
@@ -832,9 +858,9 @@ static int vdi_create(const char *filename, QemuOpts *opts, Error **errp)
} }
if (image_type == VDI_TYPE_STATIC) { if (image_type == VDI_TYPE_STATIC) {
ret = blk_truncate(blk, offset + blocks * block_size, errp); ret = blk_truncate(blk, offset + blocks * block_size);
if (ret < 0) { if (ret < 0) {
error_prepend(errp, "Failed to statically allocate %s", filename); error_setg(errp, "Failed to statically allocate %s", filename);
goto exit; goto exit;
} }
} }
@@ -892,7 +918,6 @@ static BlockDriver bdrv_vdi = {
.bdrv_open = vdi_open, .bdrv_open = vdi_open,
.bdrv_close = vdi_close, .bdrv_close = vdi_close,
.bdrv_reopen_prepare = vdi_reopen_prepare, .bdrv_reopen_prepare = vdi_reopen_prepare,
.bdrv_child_perm = bdrv_format_default_perms,
.bdrv_create = vdi_create, .bdrv_create = vdi_create,
.bdrv_has_zero_init = bdrv_has_zero_init_1, .bdrv_has_zero_init = bdrv_has_zero_init_1,
.bdrv_co_get_block_status = vdi_co_get_block_status, .bdrv_co_get_block_status = vdi_co_get_block_status,

Some files were not shown because too many files have changed in this diff Show More