Compare commits

...

417 Commits

Author SHA1 Message Date
Thomas Huth
10f371c756 hw/scsi/lsi53c895a: Fix reentrancy issues in the LSI controller (CVE-2023-0330)
Git-commit: b987718bbb
References: bsc#1207205 (CVE-2023-0330)

We cannot use the generic reentrancy guard in the LSI code, so
we have to manually prevent endless reentrancy here. The problematic
lsi_execute_script() function has already a way to detect whether
too many instructions have been executed - we just have to slightly
change the logic here that it also takes into account if the function
has been called too often in a reentrant way.

The code in fuzz-lsi53c895a-test.c has been taken from an earlier
patch by Mauro Matteo Cascella.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1563
Message-Id: <20230522091011.1082574-1-thuth@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alexander Bulekov <alxndr@bu.edu>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2023-10-20 16:02:13 +02:00
zhenwei pi
d0c2a993b8 virtio-crypto: verify src&dst buffer length for sym request
For symmetric algorithms, the length of ciphertext must be as same
as the plaintext.
The missing verification of the src_len and the dst_len in
virtio_crypto_sym_op_helper() may lead buffer overflow/divulged.

This patch is originally written by Yiming Tao for QEMU-SECURITY,
resend it(a few changes of error message) in qemu-devel.

Fixes: CVE-2023-3180
Fixes: 04b9b37edda("virtio-crypto: add data queue processing handler")
Cc: Gonglei <arei.gonglei@huawei.com>
Cc: Mauro Matteo Cascella <mcascell@redhat.com>
Cc: Yiming Tao <taoym@zju.edu.cn>
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Message-Id: <20230803024314.29962-2-pizhenwei@bytedance.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 9d38a84347)
References: bsc#1213925
References: CVE-2023-3180
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2023-10-20 15:56:38 +02:00
Daniel P. Berrang_
6e201a2068 io: remove io watch if TLS channel is closed during handshake
Git-commit: 10be627d2b
References: bsc#1212850 (CVE-2023-3354)

The TLS handshake make take some time to complete, during which time an
I/O watch might be registered with the main loop. If the owner of the
I/O channel invokes qio_channel_close() while the handshake is waiting
to continue the I/O watch must be removed. Failing to remove it will
later trigger the completion callback which the owner is not expecting
to receive. In the case of the VNC server, this results in a SEGV as
vnc_disconnect_start() tries to shutdown a client connection that is
already gone / NULL.

CVE-2023-3354
Reported-by: jiangyegen <jiangyegen@huawei.com>
Signed-off-by: Daniel P. Berrang© <berrange@redhat.co>
[DB: Open-code g_clear_handle_id(), as it's not available in glib
     until version 2.56.]
Signed-off-by: Brahmajit Das <brahmajit.das@suse.com>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2023-10-20 15:50:22 +02:00
Olaf Hering
ac78038d9f disable reentrancy guard on ppc
References: bsc#1190011 (CVE-2021-3750)

40p and g3beige will hang forever with these messages:
qemu-system-ppc: warning: Blocked re-entrant IO on MemoryRegion: pci-conf-idx at addr: 0x0
qemu-system-ppc64: warning: Blocked re-entrant IO on MemoryRegion: lpc-hc at addr: 0x34

Signed-off-by: Olaf Hering <olaf@aepfle.de>
2023-10-20 15:47:57 +02:00
Alexander Bulekov
d218f827af memory: prevent dma-reentracy issues
Git-commit: a2e1753b80
References: bsc#1190011 (CVE-2021-3750)

Add a flag to the DeviceState, when a device is engaged in PIO/MMIO/DMA.
This flag is set/checked prior to calling a device's MemoryRegion
handlers, and set when device code initiates DMA.  The purpose of this
flag is to prevent two types of DMA-based reentrancy issues:

1.) mmio -> dma -> mmio case
2.) bh -> dma write -> mmio case

These issues have led to problems such as stack-exhaustion and
use-after-frees.

Summary of the problem from Peter Maydell:
https://lore.kernel.org/qemu-devel/CAFEAcA_23vc7hE3iaM-JVA6W38LK4hJoWae5KcknhPRD5fPBZA@mail.gmail.com

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/62
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/540
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/541
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/556
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/557
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/827
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1282
Resolves: CVE-2023-0330

Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20230427211013.2994127-2-alxndr@bu.edu>
[thuth: Replace warn_report() with warn_report_once()]
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2023-10-20 15:46:57 +02:00
Philippe Mathieu-Daudé
9ccb8f0280 hw/sd/sdcard: Do not switch to ReceivingData if address is invalid
Git-commit: 790762e548
References: bsc#1172033, CVE-2020-13253

Only move the state machine to ReceivingData if there is no
pending error. This avoids later OOB access while processing
commands queued.

  "SD Specifications Part 1 Physical Layer Simplified Spec. v3.01"

  4.3.3 Data Read

  Read command is rejected if BLOCK_LEN_ERROR or ADDRESS_ERROR
  occurred and no data transfer is performed.

  4.3.4 Data Write

  Write command is rejected if BLOCK_LEN_ERROR or ADDRESS_ERROR
  occurred and no data transfer is performed.

WP_VIOLATION errors are not modified: the error bit is set, we
stay in receive-data state, wait for a stop command. All further
data transfer is ignored. See the check on sd->card_status at the
beginning of sd_read_data() and sd_write_data().

Fixes: CVE-2020-13253
Cc: qemu-stable@nongnu.org
Reported-by: Alexander Bulekov <alxndr@bu.edu>
Buglink: https://bugs.launchpad.net/qemu/+bug/1880822
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20200630133912.9428-6-f4bug@amsat.org>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2023-03-08 09:59:28 +01:00
Philippe Mathieu-Daudé
e698f326be hw/sd/sdcard: Update coding style to make checkpatch.pl happy
Git-commit: 794d68de2f
References: bsc#1172033, CVE-2020-13253

To make the next commit easier to review, clean this code first.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Alexander Bulekov <alxndr@bu.edu>
Message-Id: <20200630133912.9428-3-f4bug@amsat.org>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2023-03-08 09:59:28 +01:00
Thomas Huth
b3f6320ee4 hw/usb/hcd-xhci: Fix unbounded loop in xhci_ring_chain_length() (CVE-2020-14394)
Git-commit: effaf5a240
References: bsc#1180207, CVE-2020-14394

The loop condition in xhci_ring_chain_length() is under control of
the guest, and additionally the code does not check for failed DMA
transfers (e.g. if reaching the end of the RAM), so the loop there
could run for a very long time or even forever. Fix it by checking
the return value of dma_memory_read() and by introducing a maximum
loop length.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/646
Message-Id: <20220804131300.96368-1-thuth@redhat.com>
Reviewed-by: Mauro Matteo Cascella <mcascell@redhat.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2023-03-08 09:59:28 +01:00
Philippe Mathieu-Daudé
1de0d28447 hw/block/fdc: Kludge missing floppy drive to fix CVE-2021-20196
Git-commit: 1ab95af033
References: bsc#1181361, CVE-2021-20196

Guest might select another drive on the bus by setting the
DRIVE_SEL bit of the DIGITAL OUTPUT REGISTER (DOR).
The current controller model doesn't expect a BlockBackend
to be NULL. A simple way to fix CVE-2021-20196 is to create
an empty BlockBackend when it is missing. All further
accesses will be safely handled, and the controller state
machines keep behaving correctly.

Cc: qemu-stable@nongnu.org
Fixes: CVE-2021-20196
Reported-by: Gaoning Pan (Ant Security Light-Year Lab) <pgn@zju.edu.cn>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20211124161536.631563-3-philmd@redhat.com
BugLink: https://bugs.launchpad.net/qemu/+bug/1912780
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/338
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2023-03-08 09:59:28 +01:00
Philippe Mathieu-Daudé
33604e6a3b hw/block/fdc: Extract blk_create_empty_drive()
Git-commit: b154791e7b
References: bsc#1181361, CVE-2021-20196

We are going to re-use this code in the next commit,
so extract it as a new blk_create_empty_drive() function.

Inspired-by: Hanna Reitz <hreitz@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20211124161536.631563-2-philmd@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2023-03-08 09:59:28 +01:00
Mauro Matteo Cascella
d24c9ab148 hw/scsi/scsi-disk: MODE_PAGE_ALLS not allowed in MODE SELECT commands
Git-commit: b3af7fdf9c
References: bsc#1192525, CVE-2021-3930

This avoids an off-by-one read of 'mode_sense_valid' buffer in
hw/scsi/scsi-disk.c:mode_sense_page().

Fixes: CVE-2021-3930
Cc: qemu-stable@nongnu.org
Reported-by: Alexander Bulekov <alxndr@bu.edu>
Fixes: a8f4bbe290 ("scsi-disk: store valid mode pages in a table")
Fixes: #546
Reported-by: Qiuhao Li <Qiuhao.Li@outlook.com>
Signed-off-by: Mauro Matteo Cascella <mcascell@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2023-03-08 09:59:28 +01:00
Mauro Matteo Cascella
670358f687 scsi/lsi53c895a: really fix use-after-free in lsi_do_msgout (CVE-2022-0216)
Git-commit: 4367a20cc4
References: bsc#1198038, CVE-2022-0216

Set current_req to NULL, not current_req->req, to prevent reusing a free'd
buffer in case of repeated SCSI cancel requests.  Also apply the fix to
CLEAR QUEUE and BUS DEVICE RESET messages as well, since they also cancel
the request.

Thanks to Alexander Bulekov for providing a reproducer.

Fixes: CVE-2022-0216
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/972
Signed-off-by: Mauro Matteo Cascella <mcascell@redhat.com>
Tested-by: Alexander Bulekov <alxndr@bu.edu>
Message-Id: <20220711123316.421279-1-mcascell@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
[DF: Kill the hunk that patches the test as it's not there]
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2023-03-08 09:59:28 +01:00
Klaus Jensen
8e7f55c6b8 hw/nvme: fix CVE-2021-3929
Git-commit: 736b01642d
References: bsc#1193880, CVE-2021-3929

This fixes CVE-2021-3929 "locally" by denying DMA to the iomem of the
device itself. This still allows DMA to MMIO regions of other devices
(e.g. doing P2P DMA to the controller memory buffer of another NVMe
device).

Fixes: CVE-2021-3929
Reported-by: Qiuhao Li <Qiuhao.Li@outlook.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>41
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Lin Ma <lma@suse.com>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2023-03-08 09:59:27 +01:00
Stefano Garzarella
3716c2d7d9 vhost-vsock: detach the virqueue element in case of error
Git-commit: 8d1b247f37
References: bsc#1198712, CVE-2022-26354

In vhost_vsock_common_send_transport_reset(), if an element popped from
the virtqueue is invalid, we should call virtqueue_detach_element() to
detach it from the virtqueue before freeing its memory.

Fixes: fc0b9b0e1c ("vhost-vsock: add virtio sockets device")
Fixes: CVE-2022-26354
Cc: qemu-stable@nongnu.org
Reported-by: VictorV <vv474172261@gmail.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20220228095058.27899-1-sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2023-03-08 09:59:27 +01:00
Mauro Matteo Cascella
92f45c0d76 display/qxl-render: fix race condition in qxl_cursor (CVE-2021-4207)
Git-commit: 9569f5cb5b
References: bsc#1198037, CVE-2021-4207

Avoid fetching 'width' and 'height' a second time to prevent possible
race condition. Refer to security advisory
https://starlabs.sg/advisories/22-4207/ for more information.

Fixes: CVE-2021-4207
Signed-off-by: Mauro Matteo Cascella <mcascell@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20220407081106.343235-1-mcascell@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2023-03-08 09:59:27 +01:00
Philippe Mathieu-Daudé
b36e5b2fbe hw/display/qxl: Assert memory slot fits in preallocated MemoryRegion
Git-commit: 86fdb0582c
References: bsc#1205808, CVE-2022-4144

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20221128202741.4945-6-philmd@linaro.org>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2023-03-08 09:59:27 +01:00
Philippe Mathieu-Daudé
0f022f64b4 hw/display/qxl: Avoid buffer overrun in qxl_phys2virt (CVE-2022-4144)
Git-commit: 6dbbf05514
References: bsc#1205808, CVE-2022-4144

Have qxl_get_check_slot_offset() return false if the requested
buffer size does not fit within the slot memory region.

Similarly qxl_phys2virt() now returns NULL in such case, and
qxl_dirty_one_surface() aborts.

This avoids buffer overrun in the host pointer returned by
memory_region_get_ram_ptr().

Fixes: CVE-2022-4144 (out-of-bounds read)
Reported-by: Wenxu Yin (@awxylitol)
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1336
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20221128202741.4945-5-philmd@linaro.org>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2023-03-08 09:59:27 +01:00
Philippe Mathieu-Daudé
6602b423e4 hw/display/qxl: Pass requested buffer size to qxl_phys2virt()
Git-commit: 8efec0ef8b)
References: bsc#1205808, CVE-2022-4144

Currently qxl_phys2virt() doesn't check for buffer overrun.
In order to do so in the next commit, pass the buffer size
as argument.

For QXLCursor in qxl_render_cursor() -> qxl_cursor() we
verify the size of the chunked data ahead, checking we can
access 'sizeof(QXLCursor) + chunk->data_size' bytes.
Since in the SPICE_CURSOR_TYPE_MONO case the cursor is
assumed to fit in one chunk, no change are required.
In SPICE_CURSOR_TYPE_ALPHA the ahead read is handled in
qxl_unpack_chunks().

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20221128202741.4945-4-philmd@linaro.org>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2023-03-08 09:59:27 +01:00
Philippe Mathieu-Daudé
4485ad113b hw/display/qxl: Document qxl_phys2virt()
Git-commit: b1901de83a
References: bac#1205808, CVE-2022-4144

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20221128202741.4945-3-philmd@linaro.org>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2023-03-08 09:59:27 +01:00
Philippe Mathieu-Daudé
62dc82929e hw/display/qxl: Have qxl_log_command Return early if no log_cmd handler
Git-commit: 61c34fc194
References: bac#1205808, CVE-2022-4144

Only 3 command types are logged: no need to call qxl_phys2virt()
for the other types. Using different cases will help to pass
different structure sizes to qxl_phys2virt() in a pair of commits.

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20221128202741.4945-2-philmd@linaro.org>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2023-03-08 09:59:27 +01:00
Gerd Hoffmann
713a23057b qxl: use guest_monitor_config for local renderer.
Git-commit: 979f7ef896
References: bsc#1198035, CVE-2021-4206

When processing monitor config from guest store head0 width and height
for single-head configurations.  Use these when creating the
DisplaySurface in the local renderer.

This fixes a rendering issue with wayland.  Wayland rounds up the
framebuffer width and height to a multiple of 64, so with odd
resolutions (800x600 for example) the framebuffer is larger than the
actual screen.  The monitor config has the actual screen size though.

This fixes guest display for anything using the local renderer
(non-spice UI, screendump monitor command).

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20180919103057.9666-1-kraxel@redhat.com
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2023-03-08 09:59:27 +01:00
Philippe Mathieu-Daudé
c49a901446 hw/block/fdc: Prevent end-of-track overrun (CVE-2021-3507)
Git-commit: defac5e2fb
References: bsc#1185000, CVE-2021-3507

Per the 82078 datasheet, if the end-of-track (EOT byte in
the FIFO) is more than the number of sectors per side, the
command is terminated unsuccessfully:

* 5.2.5 DATA TRANSFER TERMINATION

  The 82078 supports terminal count explicitly through
  the TC pin and implicitly through the underrun/over-
  run and end-of-track (EOT) functions. For full sector
  transfers, the EOT parameter can define the last
  sector to be transferred in a single or multisector
  transfer. If the last sector to be transferred is a par-
  tial sector, the host can stop transferring the data in
  mid-sector, and the 82078 will continue to complete
  the sector as if a hardware TC was received. The
  only difference between these implicit functions and
  TC is that they return "abnormal termination" result
  status. Such status indications can be ignored if they
  were expected.

* 6.1.3 READ TRACK

  This command terminates when the EOT specified
  number of sectors have been read. If the 82078
  does not find an I D Address Mark on the diskette
  after the second· occurrence of a pulse on the
  INDX# pin, then it sets the IC code in Status Regis-
  ter 0 to "01" (Abnormal termination), sets the MA bit
  in Status Register 1 to "1", and terminates the com-
  mand.

* 6.1.6 VERIFY

  Refer to Table 6-6 and Table 6-7 for information
  concerning the values of MT and EC versus SC and
  EOT value.

* Table 6·6. Result Phase Table

* Table 6-7. Verify Command Result Phase Table

Fix by aborting the transfer when EOT > # Sectors Per Side.

Cc: qemu-stable@nongnu.org
Cc: Hervé Poussineau <hpoussin@reactos.org>
Fixes: baca51faff ("floppy driver: disk geometry auto detect")
Reported-by: Alexander Bulekov <alxndr@bu.edu>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/339
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20211118115733.4038610-2-philmd@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2023-03-08 09:59:27 +01:00
Mauro Matteo Cascella
9ccbc11f9d ui/cursor: fix integer overflow in cursor_alloc (CVE-2021-4206)
Git-commit: fa892e9abb
References: bsc#1198035, CVE-2021-4206

Prevent potential integer overflow by limiting 'width' and 'height' to
512x512. Also change 'datasize' type to size_t. Refer to security
advisory https://starlabs.sg/advisories/22-4206/ for more information.

Fixes: CVE-2021-4206
Signed-off-by: Mauro Matteo Cascella <mcascell@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20220407081712.345609-1-mcascell@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Signed-off-by: Lin Ma <lma@suse.com>
2023-03-08 09:59:27 +01:00
Bin Meng
681b504aad hw/sd: sdhci: Reset the data pointer of s->fifo_buffer[] when a different block size is programmed
Git-commit: cffb446e8f
References: bsc#1175144

If the block size is programmed to a different value from the
previous one, reset the data pointer of s->fifo_buffer[] so that
s->fifo_buffer[] can be filled in using the new block size in
the next transfer.

With this fix, the following reproducer:

outl 0xcf8 0x80001010
outl 0xcfc 0xe0000000
outl 0xcf8 0x80001001
outl 0xcfc 0x06000000
write 0xe000002c 0x1 0x05
write 0xe0000005 0x1 0x02
write 0xe0000007 0x1 0x01
write 0xe0000028 0x1 0x10
write 0x0 0x1 0x23
write 0x2 0x1 0x08
write 0xe000000c 0x1 0x01
write 0xe000000e 0x1 0x20
write 0xe000000f 0x1 0x00
write 0xe000000c 0x1 0x32
write 0xe0000004 0x2 0x0200
write 0xe0000028 0x1 0x00
write 0xe0000003 0x1 0x40

cannot be reproduced with the following QEMU command line:

$ qemu-system-x86_64 -nographic -machine accel=qtest -m 512M \
      -nodefaults -device sdhci-pci,sd-spec-version=3 \
      -drive if=sd,index=0,file=null-co://,format=raw,id=mydrive \
      -device sd-card,drive=mydrive -qtest stdio

Cc: qemu-stable@nongnu.org
Fixes: CVE-2020-17380
Fixes: CVE-2020-25085
Fixes: CVE-2021-3409
Fixes: d7dfca0807 ("hw/sdhci: introduce standard SD host controller")
Reported-by: Alexander Bulekov <alxndr@bu.edu>
Reported-by: Cornelius Aschermann (Ruhr-Universität Bochum)
Reported-by: Sergej Schumilo (Ruhr-Universität Bochum)
Reported-by: Simon Wörner (Ruhr-Universität Bochum)
Buglink: https://bugs.launchpad.net/qemu/+bug/1892960
Buglink: https://bugs.launchpad.net/qemu/+bug/1909418
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1928146
Tested-by: Alexander Bulekov <alxndr@bu.edu>
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
Message-Id: <20210303122639.20004-6-bmeng.cn@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2023-03-08 09:59:27 +01:00
Bin Meng
0efc35a9cf hw/sd: sdhci: Limit block size only when SDHC_BLKSIZE register is writable
Git-commit: 5cd7aa3451
References: bsc#1175144

The codes to limit the maximum block size is only necessary when
SDHC_BLKSIZE register is writable.

Tested-by: Alexander Bulekov <alxndr@bu.edu>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
Message-Id: <20210303122639.20004-5-bmeng.cn@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2023-03-08 09:59:27 +01:00
Bin Meng
6adc1ceeed hw/sd: sdhci: Correctly set the controller status for ADMA
Git-commit: bc6f28995f)
References: bsc#1175144

When an ADMA transfer is started, the codes forget to set the
controller status to indicate a transfer is in progress.

With this fix, the following 2 reproducers:

https://paste.debian.net/plain/1185136
https://paste.debian.net/plain/1185141

cannot be reproduced with the following QEMU command line:

$ qemu-system-x86_64 -nographic -machine accel=qtest -m 512M \
      -nodefaults -device sdhci-pci,sd-spec-version=3 \
      -drive if=sd,index=0,file=null-co://,format=raw,id=mydrive \
      -device sd-card,drive=mydrive -qtest stdio

Cc: qemu-stable@nongnu.org
Fixes: CVE-2020-17380
Fixes: CVE-2020-25085
Fixes: CVE-2021-3409
Fixes: d7dfca0807 ("hw/sdhci: introduce standard SD host controller")
Reported-by: Alexander Bulekov <alxndr@bu.edu>
Reported-by: Cornelius Aschermann (Ruhr-Universität Bochum)
Reported-by: Sergej Schumilo (Ruhr-Universität Bochum)
Reported-by: Simon Wörner (Ruhr-Universität Bochum)
Buglink: https://bugs.launchpad.net/qemu/+bug/1892960
Buglink: https://bugs.launchpad.net/qemu/+bug/1909418
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1928146
Tested-by: Alexander Bulekov <alxndr@bu.edu>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
Message-Id: <20210303122639.20004-4-bmeng.cn@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2023-03-08 09:59:27 +01:00
Bin Meng
03feb45b58 hw/sd: sdhci: Don't write to SDHC_SYSAD register when transfer is in progress
Git-commit: 8be45cc947
References: bsc#1175144

Per "SD Host Controller Standard Specification Version 7.00"
chapter 2.2.1 SDMA System Address Register:

This register can be accessed only if no transaction is executing
(i.e., after a transaction has stopped).

With this fix, the following reproducer:

outl 0xcf8 0x80001010
outl 0xcfc 0xfbefff00
outl 0xcf8 0x80001001
outl 0xcfc 0x06000000
write 0xfbefff2c 0x1 0x05
write 0xfbefff0f 0x1 0x37
write 0xfbefff0a 0x1 0x01
write 0xfbefff0f 0x1 0x29
write 0xfbefff0f 0x1 0x02
write 0xfbefff0f 0x1 0x03
write 0xfbefff04 0x1 0x01
write 0xfbefff05 0x1 0x01
write 0xfbefff07 0x1 0x02
write 0xfbefff0c 0x1 0x33
write 0xfbefff0e 0x1 0x20
write 0xfbefff0f 0x1 0x00
write 0xfbefff2a 0x1 0x01
write 0xfbefff0c 0x1 0x00
write 0xfbefff03 0x1 0x00
write 0xfbefff05 0x1 0x00
write 0xfbefff2a 0x1 0x02
write 0xfbefff0c 0x1 0x32
write 0xfbefff01 0x1 0x01
write 0xfbefff02 0x1 0x01
write 0xfbefff03 0x1 0x01

cannot be reproduced with the following QEMU command line:

$ qemu-system-x86_64 -nographic -machine accel=qtest -m 512M \
       -nodefaults -device sdhci-pci,sd-spec-version=3 \
       -drive if=sd,index=0,file=null-co://,format=raw,id=mydrive \
       -device sd-card,drive=mydrive -qtest stdio

Cc: qemu-stable@nongnu.org
Fixes: CVE-2020-17380
Fixes: CVE-2020-25085
Fixes: CVE-2021-3409
Fixes: d7dfca0807 ("hw/sdhci: introduce standard SD host controller")
Reported-by: Alexander Bulekov <alxndr@bu.edu>
Reported-by: Cornelius Aschermann (Ruhr-Universität Bochum)
Reported-by: Sergej Schumilo (Ruhr-Universität Bochum)
Reported-by: Simon Wörner (Ruhr-Universität Bochum)
Buglink: https://bugs.launchpad.net/qemu/+bug/1892960
Buglink: https://bugs.launchpad.net/qemu/+bug/1909418
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1928146
Tested-by: Alexander Bulekov <alxndr@bu.edu>
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
Message-Id: <20210303122639.20004-3-bmeng.cn@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2023-03-08 09:59:27 +01:00
Bin Meng
df85fb2e8b hw/sd: sdhci: Don't transfer any data when command time out
Git-commit: b263d8f928
References: bsc#1175144

At the end of sdhci_send_command(), it starts a data transfer if the
command register indicates data is associated. But the data transfer
should only be initiated when the command execution has succeeded.

With this fix, the following reproducer:

outl 0xcf8 0x80001810
outl 0xcfc 0xe1068000
outl 0xcf8 0x80001804
outw 0xcfc 0x7
write 0xe106802c 0x1 0x0f
write 0xe1068004 0xc 0x2801d10101fffffbff28a384
write 0xe106800c 0x1f 0x9dacbbcad9e8f7061524334251606f7e8d9cabbac9d8e7f60514233241505f
write 0xe1068003 0x28 0x80d000251480d000252280d000253080d000253e80d000254c80d000255a80d000256880d0002576
write 0xe1068003 0x1 0xfe

cannot be reproduced with the following QEMU command line:

$ qemu-system-x86_64 -nographic -M pc-q35-5.0 \
      -device sdhci-pci,sd-spec-version=3 \
      -drive if=sd,index=0,file=null-co://,format=raw,id=mydrive \
      -device sd-card,drive=mydrive \
      -monitor none -serial none -qtest stdio

Cc: qemu-stable@nongnu.org
Fixes: CVE-2020-17380
Fixes: CVE-2020-25085
Fixes: CVE-2021-3409
Fixes: d7dfca0807 ("hw/sdhci: introduce standard SD host controller")
Reported-by: Alexander Bulekov <alxndr@bu.edu>
Reported-by: Cornelius Aschermann (Ruhr-Universität Bochum)
Reported-by: Sergej Schumilo (Ruhr-Universität Bochum)
Reported-by: Simon Wörner (Ruhr-Universität Bochum)
Buglink: https://bugs.launchpad.net/qemu/+bug/1892960
Buglink: https://bugs.launchpad.net/qemu/+bug/1909418
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1928146
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Tested-by: Alexander Bulekov <alxndr@bu.edu>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
Message-Id: <20210303122639.20004-2-bmeng.cn@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2023-03-08 09:59:27 +01:00
Philippe Mathieu-Daudé
8faab1254a hw/sd/sdhci: Stop multiple transfers when block count is cleared
Git-commit: 6a9e5cc61c
References: bsc#

Clearing BlockCount stops multiple transfers.

See "SD Host Controller Simplified Specification Version 2.00":

- 2.2.3. Block Count Register (Offset 006h)
- Table 2-8 : Determination of Transfer Type

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Alexander Bulekov <alxndr@bu.edu>
Message-Id: <20200903172806.489710-2-f4bug@amsat.org>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2023-03-08 09:59:27 +01:00
Philippe Mathieu-Daudé
4444fb5020 hw/sd/sdhci: Fix DMA Transfer Block Size field
Git-commit: dfba99f17f
Reference: bsc#1176681, CVE-2020-25085

The 'Transfer Block Size' field is 12-bit wide.

See section '2.2.2. Block Size Register (Offset 004h)' in datasheet.

Two different bug reproducer available:
- https://bugs.launchpad.net/qemu/+bug/1892960
- https://ruhr-uni-bochum.sciebo.de/s/NNWP2GfwzYKeKwE?path=%2Fsdhci_oob_write1

Cc: qemu-stable@nongnu.org
Buglink: https://bugs.launchpad.net/qemu/+bug/1892960
Fixes: d7dfca0807 ("hw/sdhci: introduce standard SD host controller")
Reported-by: Alexander Bulekov <alxndr@bu.edu>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Prasad J Pandit <pjp@fedoraproject.org>
Tested-by: Alexander Bulekov <alxndr@bu.edu>
Message-Id: <20200901140411.112150-3-f4bug@amsat.org>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2023-03-08 09:59:17 +01:00
Jessica Clarke
48c8cfdf55 Partially revert "build: -no-pie is no functional linker flag"
Git-commit: ffd205ef29
References: bsc#1192463

This partially reverts commit bbd2d5a812.

This commit was misguided and broke using --disable-pie on any distro
that enables PIE by default in their compiler driver, including Debian
and its derivatives. Whilst -no-pie is not a linker flag, it is a
compiler driver flag that ensures -pie is not automatically passed by it
to the linker. Without it, all compile_prog checks will fail as any code
built with the explicit -fno-pie will fail to link with the implicit
default -pie due to trying to use position-dependent relocations. The
only bug that needed fixing was LDFLAGS_NOPIE being used as a flag for
the linker itself in pc-bios/optionrom/Makefile.

Note this does not reinstate exporting LDFLAGS_NOPIE, as it is unused,
since the only previous use was the one that should not have existed. I
have also updated the comment for the -fno-pie and -no-pie checks to
reflect what they're actually needed for.

Fixes: bbd2d5a812
Cc: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Jessica Clarke <jrtc27@jrtc27.com>
Message-Id: <20210805192545.38279-1-jrtc27@jrtc27.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Liang Yan <lyan@suse.com>
2021-12-19 00:46:38 -05:00
Christian Ehrhardt
fd85006157 build: -no-pie is no functional linker flag
Git-commit: bbd2d5a812
References: bsc#1192463

Recent binutils changes dropping unsupported options [1] caused a build
issue in regard to the optionroms.

  ld -m elf_i386 -T /<<PKGBUILDDIR>>/pc-bios/optionrom//flat.lds -no-pie \
    -s -o multiboot.img multiboot.o
  ld.bfd: Error: unable to disambiguate: -no-pie (did you mean --no-pie ?)

This isn't really a regression in ld.bfd, filing the bug upstream
revealed that this never worked as a ld flag [2] - in fact it seems we
were by accident setting --nmagic).

Since it never had the wanted effect this usage of LDFLAGS_NOPIE, should be
droppable without any effect. This also is the only use-case of LDFLAGS_NOPIE
in .mak, therefore we can also remove it from being added there.

[1]: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=983d925d
[2]: https://sourceware.org/bugzilla/show_bug.cgi?id=27050#c5

Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Message-Id: <20201214150938.1297512-1-christian.ehrhardt@canonical.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Liang Yan <lyan@suse.com>
2021-12-19 00:46:27 -05:00
Jason Wang
6e65a35ab2 virtio-net: fix use after unmap/free for sg
Git-commit: bedd7e93d0
References: bsc#1189938 CVE-2021-3748

When mergeable buffer is enabled, we try to set the num_buffers after
the virtqueue elem has been unmapped. This will lead several issues,
E.g a use after free when the descriptor has an address which belongs
to the non direct access region. In this case we use bounce buffer
that is allocated during address_space_map() and freed during
address_space_unmap().

Fixing this by storing the elems temporarily in an array and delay the
unmap after we set the the num_buffers.

This addresses CVE-2021-3748.

Reported-by: Alexander Bulekov <alxndr@bu.edu>
Fixes: fbe78f4f55 ("virtio-net support")
Cc: qemu-stable@nongnu.org
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jose R Ziviani <jose.ziviani@suse.com>
2021-09-30 07:59:28 -06:00
Gerd Hoffmann
75bbd01a2a uas: add stream number sanity checks.
Git-commit: 13b250b12a
References: bsc#1189702 CVE-2021-3713

The device uses the guest-supplied stream number unchecked, which can
lead to guest-triggered out-of-band access to the UASDevice->data3 and
UASDevice->status3 fields.  Add the missing checks.

Fixes: CVE-2021-3713
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reported-by: Chen Zhe <chenzhe@huawei.com>
Reported-by: Tan Jingguo <tanjingguo@huawei.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210818120505.1258262-2-kraxel@redhat.com>
Signed-off-by: Jose R Ziviani <jose.ziviani@suse.com>
2021-09-30 07:59:27 -06:00
Gerd Hoffmann
7045286d96 usbredir: fix free call
Git-commit: 5e796671e6
References: bsc#1189145 CVE-2021-3682

data might point into the middle of a larger buffer, there is a separate
free_on_destroy pointer passed into bufp_alloc() to handle that.  It is
only used in the normal workflow though, not when dropping packets due
to the queue being full.  Fix that.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/491
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20210722072756.647673-1-kraxel@redhat.com>
Signed-off-by: Jose R Ziviani <jose.ziviani@suse.com>
2021-08-06 09:36:10 -06:00
Mark Cave-Ayland
56628efd25 esp: ensure cmdfifo is not empty and current_dev is non-NULL
Git-commit: 9954575173
References: bsc#1180433, CVE-2020-35504
            bsc#1180434, CVE-2020-35505
            bsc#1180435, CVE-2020-35506

When about to execute a SCSI command, ensure that cmdfifo is not empty and
current_dev is non-NULL. This can happen if the guest tries to execute a TI
(Transfer Information) command without issuing one of the select commands
first.

Buglink: https://bugs.launchpad.net/qemu/+bug/1910723
Buglink: https://bugs.launchpad.net/qemu/+bug/1909247
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Alexander Bulekov <alxndr@bu.edu>
Message-Id: <20210407195801.685-7-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Jose R Ziviani <jose.ziviani@suse.com>
2021-08-02 12:38:52 -06:00
Mark Cave-Ayland
1e67d33f27 esp: always check current_req is not NULL before use in DMA callbacks
Git-commit: 0db895361b
References: bsc#1180433, CVE-2020-35504
            bsc#1180434, CVE-2020-35505
            bsc#1180435, CVE-2020-35506

After issuing a SCSI command the SCSI layer can call the SCSIBusInfo .cancel
callback which resets both current_req and current_dev to NULL. If any data
is left in the transfer buffer (async_len != 0) then the next TI (Transfer
Information) command will attempt to reference the NULL pointer causing a
segfault.

Buglink: https://bugs.launchpad.net/qemu/+bug/1910723
Buglink: https://bugs.launchpad.net/qemu/+bug/1909247
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Alexander Bulekov <alxndr@bu.edu>
Message-Id: <20210407195801.685-2-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Jose R Ziviani <jose.ziviani@suse.com>
2021-08-02 12:38:52 -06:00
Gerd Hoffmann
e230164c8a usb: limit combined packets to 1 MiB (CVE-2021-3527)
Git-commit: 05a40b172e
References: bsc#1186012, CVE-2021-3527

usb-host and usb-redirect try to batch bulk transfers by combining many
small usb packets into a single, large transfer request, to reduce the
overhead and improve performance.

This patch adds a size limit of 1 MiB for those combined packets to
restrict the host resources the guest can bind that way.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-Id: <20210503132915.2335822-6-kraxel@redhat.com>
Signed-off-by: Jose R Ziviani <jose.ziviani@suse.com>
2021-08-02 12:38:52 -06:00
Gerd Hoffmann
2aed7ecb23 usb/mtp: avoid dynamic stack allocation
Git-commit: 06aa50c06c
References: bsc#1186012, CVE-2021-3527

Use autofree heap allocation instead.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210503132915.2335822-4-kraxel@redhat.com>
Signed-off-by: Jose R Ziviani <jose.ziviani@suse.com>
2021-08-02 12:38:52 -06:00
Gerd Hoffmann
8b0de5d282 usb/redir: avoid dynamic stack allocation (CVE-2021-3527)
Git-commit: 7ec54f9eb6
References: bsc#1186012, CVE-2021-3527

Use autofree heap allocation instead.

Fixes: 4f4321c11f ("usb: use iovecs in USBPacket")
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210503132915.2335822-3-kraxel@redhat.com>
Signed-off-by: Jose R Ziviani <jose.ziviani@suse.com>
2021-08-02 12:38:52 -06:00
Gerd Hoffmann
d2e365e5a7 usb/hid: avoid dynamic stack allocation
Git-commit: 3f67e2e7f1
References: bsc#1186012, CVE-2021-3527

Use autofree heap allocation instead.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210503132915.2335822-2-kraxel@redhat.com>
Signed-off-by: Jose R Ziviani <jose.ziviani@suse.com>
2021-08-02 12:38:52 -06:00
Philippe Mathieu-Daudé
35e06371ed hw/usb/host-stub: Remove unused header
Git-commit: 1081607bfa
References: bsc#1186012, CVE-2021-3527

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210424224110.3442424-2-f4bug@amsat.org>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Jose R Ziviani <jose.ziviani@suse.com>
2021-08-02 12:38:52 -06:00
Jose R Ziviani
5f975d28d8 net: eepro100: validate various address values
Git-commit: 000000000000000000000000000000000000000000000
References: bsc#1182651, CVE-2021-20255

Patch based on discussion:
https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg06098.html

While processing controller commands, eepro100 emulator gets
command unit(CU) base address OR receive unit (RU) base address
OR command block (CB) address from guest. If these values are not
checked, it may lead to an infinite loop kind of issues. Add checks
to avoid it.

Reported-by: Ruhr-University Bochum <bugs-syssec@rub.de>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Acked-By: Jose R Ziviani <jose.ziviani@suse.com>
2021-07-30 08:04:54 -06:00
Mauro Matteo Cascella
523c8d9df4 hw/scsi/megasas: check for NULL frame in megasas_command_cancelled()
Git-commit: 00000000000000000000000000000000000000000000
References: bsc#1180432, CVE-2020-35503

Ensure that 'cmd->frame' is not NULL before accessing the 'header' field.
This check prevents a potential NULL pointer dereference issue.

RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1910346
Signed-off-by: Mauro Matteo Cascella <mcascell@redhat.com>
Reported-by: Cheolwoo Myung <cwmyung@snu.ac.kr>
Acked-By: Jose R Ziviani <jose.ziviani@suse.com>
2021-07-30 08:04:54 -06:00
Jose R Ziviani
ddab4997b5 libslirp: tftp: check tftp_input buffer size
Git-commit: 3f17948137155f025f7809fdc38576d5d2451c3d
References: bsc#1187366, CVE-2021-3595

Fixes: CVE-2021-3595
Fixes: https://gitlab.freedesktop.org/slirp/libslirp/-/issues/46

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Jose R Ziviani <jose.ziviani@suse.com>
2021-07-08 08:20:55 -06:00
Jose R Ziviani
d0824d8b73 libslirp: upd6: check udp6_input buffer size
Git-commit: de71c15de66ba9350bf62c45b05f8fbff166517b
References: bsc#1187365, CVE-2021-3593

Fixes: CVE-2021-3593
Fixes: https://gitlab.freedesktop.org/slirp/libslirp/-/issues/45

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Jose R Ziviani <jose.ziviani@suse.com>
2021-07-07 13:41:20 -06:00
Jose R Ziviani
fc8535d39a libslirp: dhcp: Always send DHCP_OPT_LEN bytes in options
Git-commit: d7fb54218424c3b2517aee5b391ced0f75386a5d
References: bsc#1187364, CVE-2021-3592

RFC2131 suggests that the options field may be at least 312 bytes.
Some DHCP clients seem to assume that it has to be at least 312 bytes.

Fixes #51
Fixes: f13cad45b25d92760bb0ad67bec0300a4d7d5275 ("bootp: limit
vendor-specific area to input packet memory buffer")

Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: Jose R Ziviani <jose.ziviani@suse.com>
2021-07-07 13:41:10 -06:00
Jose R Ziviani
15504e92d8 libslirp: udp: check upd_input buffer size
Git-commit: 74572be49247c8c5feae7c6e0b50c4f569ca9824
References: bsc#1187367, CVE-2021-3594

Fixes: CVE-2021-3594
Fixes: https://gitlab.freedesktop.org/slirp/libslirp/-/issues/47

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Jose R Ziviani <jose.ziviani@suse.com>
2021-07-07 11:36:25 -06:00
Jose R Ziviani
9703efb0b0 libslirp: bootp: check bootp_input buffer size
Git-commit: 2eca0838eee1da96204545e22cdaed860d9d7c6c
References: bsc#1187364, CVE-2021-3592

Fixes: CVE-2021-3592
Fixes: https://gitlab.freedesktop.org/slirp/libslirp/-/issues/44

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Jose R Ziviani <jose.ziviani@suse.com>
2021-07-07 11:36:24 -06:00
Jose R Ziviani
21bb584214 libslirp: bootp: limit vendor-specific area to input packet memory buffer
Git-commit: f13cad45b25d92760bb0ad67bec0300a4d7d5275
References: bsc#1187364, CVE-2021-3592

sizeof(bootp_t) currently holds DHCP_OPT_LEN. Remove this optional field
from the structure, to help with the following patch checking for
minimal header size. Modify the bootp_reply() function to take the
buffer boundaries and avoiding potential buffer overflow.

Related to CVE-2021-3592.

https://gitlab.freedesktop.org/slirp/libslirp/-/issues/44

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Jose R Ziviani <jose.ziviani@suse.com>
2021-07-07 11:36:23 -06:00
Jose R Ziviani
4abc1dad65 libslirp: slirp: Add sanity check for str option length
Git commit: 6fd057f7960bc7b3a69f3c53de5c9d0d5d34a79c
References: bsc#1187364, CVE-2021-3592

When user provides a long domainname or hostname that doesn't fit in the
DHCP packet, we mustn't overflow the response packet buffer. Instead,
report errors, following the g_warning() in the slirp->vdnssearch
branch.

Also check the strlen against 256 when initializing slirp, which limit
is also from the protocol where one byte represents the string length.
This gives an early error before the warning which is harder to notice
or diagnose.

Reported-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Fam Zheng <famz@redhat.com>
Tested-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: Jose R Ziviani <jose.ziviani@suse.com>
2021-07-07 11:36:21 -06:00
Jose R Ziviani
1631e50a75 libslirp: Add mtod_check()
Git-commit: 93e645e72a056ec0b2c16e0299fc5c6b94e4ca17
References: bsc#1187364, CVE-2021-3592
            bsc#1187367, CVE-2021-3594

Recent security issues demonstrate the lack of safety care when casting
a mbuf to a particular structure type. At least, it should check that
the buffer is large enough. The following patches will make use of this
function.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Jose R Ziviani <jose.ziviani@suse.com>
2021-07-07 11:36:20 -06:00
Ani Sinha
407cff2a1b qom: code hardening - have bound checking while looping with integer value
Git-commit: 1bf8b88f14
References: bsc#1187529 CVE-2021-3611

Object property insertion code iterates over an integer to get an unused
index that can be used as an unique name for an object property. This loop
increments the integer value indefinitely. Although very unlikely, this can
still cause an integer overflow.
In this change, we fix the above code by checking against INT16_MAX and making
sure that the interger index does not overflow beyond that value. If no
available index is found, the code would cause an assertion failure. This
assertion failure is necessary because the callers of the function do not check
the return value for NULL.

Signed-off-by: Ani Sinha <ani@anisinha.ca>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20200921093325.25617-1-ani@anisinha.ca>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Cho, Yu-Chen <acho@suse.com>
2021-07-05 18:35:45 +08:00
Philippe Mathieu-Daudé
d0d021a34f hw/char/bcm2835_aux: Allow less than 32-bit accesses
Git-commit: 3059344f01
References: bsc#1172382 CVE-2020-13754

The "BCM2835 ARM Peripherals" datasheet [*] chapter 2
("Auxiliaries: UART1 & SPI1, SPI2"), list the register
sizes as 3/8/16/32 bits. We assume this means this
peripheral allows 8-bit accesses.

This was not an issue until commit 5d971f9e67 which reverted
("memory: accept mismatching sizes in memory_region_access_valid").

The model is implemented as 32-bit accesses (see commit 97398d900c,
all registers are 32-bit) so replace MemoryRegionOps.valid as
MemoryRegionOps.impl, and re-introduce MemoryRegionOps.valid
with a 8/32-bit range.

[*] https://www.raspberrypi.org/app/uploads/2012/02/BCM2835-ARM-Peripherals.pdf

Fixes: 97398d900c ("bcm2835_aux: add emulation of BCM2835 AUX (aka UART1) block")
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20201002181032.1899463-1-f4bug@amsat.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jose R Ziviani <jose.ziviani@suse.com>
2021-05-26 15:53:58 -06:00
Michael Tokarev
115c2cc2c9 acpi: accept byte and word access to core ACPI registers
Git-commit: dba04c3488
References: bsc#1172382 CVE-2020-13754

All ISA registers should be accessible as bytes, words or dwords
(if wide enough).  Fix the access constraints for acpi-pm-evt,
acpi-pm-tmr & acpi-cnt registers.

Fixes: 5d971f9e67 (memory: Revert "memory: accept mismatching sizes in memory_region_access_valid")
Fixes: afafe4bbe0 (apci: switch cnt to memory api)
Fixes: 77d58b1e47 (apci: switch timer to memory api)
Fixes: b5a7c024d2 (apci: switch evt to memory api)
Buglink: https://lore.kernel.org/xen-devel/20200630170913.123646-1-anthony.perard@citrix.com/T/
Buglink: https://bugs.debian.org/964793
BugLink: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=964247
BugLink: https://bugs.launchpad.net/bugs/1886318
Reported-By: Simon John <git@the-jedi.co.uk>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Message-Id: <20200720160627.15491-1-mjt@msgid.tls.msk.ru>
Cc: qemu-stable@nongnu.org
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jose R Ziviani <jose.ziviani@suse.com>
2021-05-11 13:18:29 -06:00
Laurent Vivier
57adab99f3 xhci: fix valid.max_access_size to access address registers
Git-commit: 8e67fda2dd
References: bsc#1172382 CVE-2020-13754

QEMU XHCI advertises AC64 (64-bit addressing) but doesn't allow
64-bit mode access in "runtime" and "operational" MemoryRegionOps.

Set the max_access_size based on sizeof(dma_addr_t) as AC64 is set.

XHCI specs:
"If the xHC supports 64-bit addressing (AC64 = ‘1’), then software
should write 64-bit registers using only Qword accesses.  If a
system is incapable of issuing Qword accesses, then writes to the
64-bit address fields shall be performed using 2 Dword accesses;
low Dword-first, high-Dword second.  If the xHC supports 32-bit
addressing (AC64 = ‘0’), then the high Dword of registers containing
64-bit address fields are unused and software should write addresses
using only Dword accesses"

The problem has been detected with SLOF, as linux kernel always accesses
registers using 32-bit access even if AC64 is set and revealed by
5d971f9e67 ("memory: Revert "memory: accept mismatching sizes in memory_region_access_valid"")

Suggested-by: Alexey Kardashevskiy <aik@au1.ibm.com>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-id: 20200721083322.90651-1-lvivier@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Jose R Ziviani <jose.ziviani@suse.com>
2021-05-11 13:18:26 -06:00
Andrew Melnychenko
cbb156e2f0 Fixed integer overflow in e1000e
Git-commit: f22a57ac09
References: bsc#1172382 CVE-2020-13754

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1737400
Fixed setting max_queue_num if there are no peers in
NICConf. qemu_new_nic() creates NICState with 1 NetClientState(index
0) without peers, set max_queue_num to 0 - It prevents undefined
behavior and possible crashes, especially during pcie hotplug.

Fixes: 6f3fbe4ed0 ("net: Introduce e1000e device emulation")
Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Dmitry Fleytman <dmitry.fleytman@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jose R Ziviani <jose.ziviani@suse.com>
2021-05-11 13:18:23 -06:00
Jose R Ziviani
ddd3735454 memory: Revert "memory: accept mismatching sizes in memory_region_access_valid"
Git-commit: 5d971f9e67
References: bsc#1172382 CVE-2020-13754

Memory API documentation documents valid .min_access_size and .max_access_size
fields and explains that any access outside these boundaries is blocked.

This is what devices seem to assume.

However this is not what the implementation does: it simply
ignores the boundaries unless there's an "accepts" callback.

Naturally, this breaks a bunch of devices.

Revert to the documented behaviour.

Devices that want to allow any access can just drop the valid field,
or add the impl field to have accesses converted to appropriate
length.

Cc: qemu-stable@nongnu.org
Reviewed-by: Richard Henderson <rth@twiddle.net>
Fixes: CVE-2020-13754
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1842363
Fixes: a014ed07bd ("memory: accept mismatching sizes in memory_region_access_valid")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20200610134731.1514409-1-mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jose R. Ziviani <jose.ziviani@suse.com>
2021-05-11 13:18:20 -06:00
Jose R Ziviani
dbab676a49 libqos: usb-hcd-ehci: use 32-bit write for config register
Git-commit: 89ed83d8b2
References: bsc#1172382 CVE-2020-13754

The memory region ops have min_access_size == 4 so obey it.

Tested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jose R Ziviani <jose.ziviani@suse.com>
2021-05-11 13:18:18 -06:00
Jose R Ziviani
5e131a074c libqos: pci-pc: use 32-bit write for EJ register
Git-commit: 4b7c06837a
References: bsc#1172382 CVE-2020-13754

The memory region ops have min_access_size == 4 so obey it.

Tested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jose R Ziviani <jose.ziviani@suse.com>
2021-05-11 13:18:15 -06:00
Ralf Haferkamp
f5f9d9355d Drop bogus IPv6 messages
Git-commit: c7ede54cbd2e2b25385325600958ba0124e31cc0
References: bsc#1172380 CVE-2020-10756

Drop IPv6 message shorter than what's mentioned in the payload
length header (+ the size of the IPv6 header). They're invalid an could
lead to data leakage in icmp6_send_echoreply().

Signed-off-by: Jose R Ziviani <jose.ziviani@suse.com>
2021-05-07 15:03:36 -06:00
Philippe Mathieu-Daudé
b52ccc94f9 hw/intc/arm_gic: Fix interrupt ID in GICD_SGIR register
Git-commit: edfe2eb436
References: bsc#1181933 CVE-2021-20221

Per the ARM Generic Interrupt Controller Architecture specification
(document "ARM IHI 0048B.b (ID072613)"), the SGIINTID field is 4 bit,
not 10:

  - 4.3 Distributor register descriptions
  - 4.3.15 Software Generated Interrupt Register, GICD_SG

    - Table 4-21 GICD_SGIR bit assignments

    The Interrupt ID of the SGI to forward to the specified CPU
    interfaces. The value of this field is the Interrupt ID, in
    the range 0-15, for example a value of 0b0011 specifies
    Interrupt ID 3.

Correct the irq mask to fix an undefined behavior (which eventually
lead to a heap-buffer-overflow, see [Buglink]):

   $ echo 'writel 0x8000f00 0xff4affb0' | qemu-system-aarch64 -M virt,accel=qtest -qtest stdio
   [I 1612088147.116987] OPENED
  [R +0.278293] writel 0x8000f00 0xff4affb0
  ../hw/intc/arm_gic.c:1498:13: runtime error: index 944 out of bounds for type 'uint8_t [16][8]'
  SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../hw/intc/arm_gic.c:1498:13

This fixes a security issue when running with KVM on Arm with
kernel-irqchip=off. (The default is kernel-irqchip=on, which is
unaffected, and which is also the correct choice for performance.)

Cc: qemu-stable@nongnu.org
Fixes: CVE-2021-20221
Fixes: 9ee6e8bb85 ("ARMv7 support.")
Buglink: https://bugs.launchpad.net/qemu/+bug/1913916
Buglink: https://bugs.launchpad.net/qemu/+bug/1913917
Reported-by: Alexander Bulekov <alxndr@bu.edu>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20210131103401.217160-1-f4bug@amsat.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-05-07 15:03:28 -06:00
Gerd Hoffmann
57641b14d1 usb: fix setup_len init (CVE-2020-14364)
Git-commit: b946434f26
References: bsc#1175441, CVE-2020-14364

Store calculated setup_len in a local variable, verify it, and only
write it to the struct (USBDevice->setup_len) in case it passed the
sanity checks.

This prevents other code (do_token_{in,out} functions specifically)
from working with invalid USBDevice->setup_len values and overrunning
the USBDevice->setup_buf[] buffer.

Fixes: CVE-2020-14364
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Tested-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Message-id: 20200825053636.29648-1-kraxel@redhat.com
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-04-06 16:55:06 -06:00
Mauro Matteo Cascella
42d70f9850 hw/net/net_tx_pkt: fix assertion failure in net_tx_pkt_add_raw_fragment()
Git-commit: 035e69b063
References: bsc#1174641, CVE-2020-16092

An assertion failure issue was found in the code that processes network packets
while adding data fragments into the packet context. It could be abused by a
malicious guest to abort the QEMU process on the host. This patch replaces the
affected assert() with a conditional statement, returning false if the current
data fragment exceeds max_raw_frags.

Reported-by: Alexander Bulekov <alxndr@bu.edu>
Reported-by: Ziming Zhang <ezrakiez@gmail.com>
Reviewed-by: Dmitry Fleytman <dmitry.fleytman@gmail.com>
Signed-off-by: Mauro Matteo Cascella <mcascell@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-04-06 16:55:06 -06:00
Mauro Matteo Cascella
fa9605ad43 hw/net/xgmac: Fix buffer overflow in xgmac_enet_send()
Git-commit: 5519724a13
References: bsc#1174386 CVE-2020-15863

A buffer overflow issue was reported by Mr. Ziming Zhang, CC'd here. It
occurs while sending an Ethernet frame due to missing break statements
and improper checking of the buffer size.

Reported-by: Ziming Zhang <ezrakiez@gmail.com>
Signed-off-by: Mauro Matteo Cascella <mcascell@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-04-06 16:55:06 -06:00
Prasad J Pandit
f1e2517d3c net: vmxnet3: validate configuration values during activate (CVE-2021-20203)
Git-commit: 0000000000000000000000000000000000000000
References: bsc#1181639

While activating device in vmxnet3_acticate_device(), it does not
validate guest supplied configuration values against predefined
minimum - maximum limits. This may lead to integer overflow or
OOB access issues. Add checks to avoid it.

Fixes: CVE-2021-20203
Buglink: https://bugs.launchpad.net/qemu/+bug/1913873
Reported-by: Gaoning Pan <pgn@zju.edu.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-04-06 16:55:06 -06:00
Chen Qun
38ecc93f0a block/iscsi:fix heap-buffer-overflow in iscsi_aio_ioctl_cb
Git-commit: ff0507c239
References: bsc#1180523, CVE-2020-11947

There is an overflow, the source 'datain.data[2]' is 100 bytes,
 but the 'ss' is 252 bytes.This may cause a security issue because
 we can access a lot of unrelated memory data.

The len for sbp copy data should take the minimum of mx_sb_len and
 sb_len_wr, not the maximum.

If we use iscsi device for VM backend storage, ASAN show stack:

READ of size 252 at 0xfffd149dcfc4 thread T0
    #0 0xaaad433d0d34 in __asan_memcpy (aarch64-softmmu/qemu-system-aarch64+0x2cb0d34)
    #1 0xaaad45f9d6d0 in iscsi_aio_ioctl_cb /qemu/block/iscsi.c:996:9
    #2 0xfffd1af0e2dc  (/usr/lib64/iscsi/libiscsi.so.8+0xe2dc)
    #3 0xfffd1af0d174  (/usr/lib64/iscsi/libiscsi.so.8+0xd174)
    #4 0xfffd1af19fac  (/usr/lib64/iscsi/libiscsi.so.8+0x19fac)
    #5 0xaaad45f9acc8 in iscsi_process_read /qemu/block/iscsi.c:403:5
    #6 0xaaad4623733c in aio_dispatch_handler /qemu/util/aio-posix.c:467:9
    #7 0xaaad4622f350 in aio_dispatch_handlers /qemu/util/aio-posix.c:510:20
    #8 0xaaad4622f350 in aio_dispatch /qemu/util/aio-posix.c:520
    #9 0xaaad46215944 in aio_ctx_dispatch /qemu/util/async.c:298:5
    #10 0xfffd1bed12f4 in g_main_context_dispatch (/lib64/libglib-2.0.so.0+0x512f4)
    #11 0xaaad46227de0 in glib_pollfds_poll /qemu/util/main-loop.c:219:9
    #12 0xaaad46227de0 in os_host_main_loop_wait /qemu/util/main-loop.c:242
    #13 0xaaad46227de0 in main_loop_wait /qemu/util/main-loop.c:518
    #14 0xaaad43d9d60c in qemu_main_loop /qemu/softmmu/vl.c:1662:9
    #15 0xaaad4607a5b0 in main /qemu/softmmu/main.c:49:5
    #16 0xfffd1a460b9c in __libc_start_main (/lib64/libc.so.6+0x20b9c)
    #17 0xaaad43320740 in _start (aarch64-softmmu/qemu-system-aarch64+0x2c00740)

0xfffd149dcfc4 is located 0 bytes to the right of 100-byte region [0xfffd149dcf60,0xfffd149dcfc4)
allocated by thread T0 here:
    #0 0xaaad433d1e70 in __interceptor_malloc (aarch64-softmmu/qemu-system-aarch64+0x2cb1e70)
    #1 0xfffd1af0e254  (/usr/lib64/iscsi/libiscsi.so.8+0xe254)
    #2 0xfffd1af0d174  (/usr/lib64/iscsi/libiscsi.so.8+0xd174)
    #3 0xfffd1af19fac  (/usr/lib64/iscsi/libiscsi.so.8+0x19fac)
    #4 0xaaad45f9acc8 in iscsi_process_read /qemu/block/iscsi.c:403:5
    #5 0xaaad4623733c in aio_dispatch_handler /qemu/util/aio-posix.c:467:9
    #6 0xaaad4622f350 in aio_dispatch_handlers /qemu/util/aio-posix.c:510:20
    #7 0xaaad4622f350 in aio_dispatch /qemu/util/aio-posix.c:520
    #8 0xaaad46215944 in aio_ctx_dispatch /qemu/util/async.c:298:5
    #9 0xfffd1bed12f4 in g_main_context_dispatch (/lib64/libglib-2.0.so.0+0x512f4)
    #10 0xaaad46227de0 in glib_pollfds_poll /qemu/util/main-loop.c:219:9
    #11 0xaaad46227de0 in os_host_main_loop_wait /qemu/util/main-loop.c:242
    #12 0xaaad46227de0 in main_loop_wait /qemu/util/main-loop.c:518
    #13 0xaaad43d9d60c in qemu_main_loop /qemu/softmmu/vl.c:1662:9
    #14 0xaaad4607a5b0 in main /qemu/softmmu/main.c:49:5
    #15 0xfffd1a460b9c in __libc_start_main (/lib64/libc.so.6+0x20b9c)
    #16 0xaaad43320740 in _start (aarch64-softmmu/qemu-system-aarch64+0x2c00740)

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20200418062602.10776-1-kuhn.chenqun@huawei.com
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-04-06 16:55:06 -06:00
Prasad J Pandit
ea41fa8656 exec: set map length to zero when returning NULL
Git-commit: 77f55eac6c
References: bsc#1172386, CVE-2020-13659

When mapping physical memory into host's virtual address space,
'address_space_map' may return NULL if BounceBuffer is in_use.
Set and return '*plen = 0' to avoid later NULL pointer dereference.

Reported-by: Alexander Bulekov <alxndr@bu.edu>
Fixes: https://bugs.launchpad.net/qemu/+bug/1878259
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <20200526111743.428367-1-ppandit@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-04-06 16:55:06 -06:00
Prasad J Pandit
a9e0c4dfad slirp: check pkt_len before reading protocol header
Git-commit: 2e1dcbc0c2af64fcb17009eaf2ceedd81be2b27f
References: bsc#1179467

While processing ARP/NCSI packets in 'arp_input' or 'ncsi_input'
routines, ensure that pkt_len is large enough to accommodate the
respective protocol headers, lest it should do an OOB access.
Add check to avoid it.

CVE-2020-29129 CVE-2020-29130
  QEMU: slirp: out-of-bounds access while processing ARP/NCSI packets
 -> https://www.openwall.com/lists/oss-security/2020/11/27/1

Reported-by: Qiuhao Li <Qiuhao.Li@outlook.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <20201126135706.273950-1-ppandit@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Bruce Rogers <brogers@suse.com>
[BR: no NCSI code, so only ARP patched]
2021-04-06 16:54:54 -06:00
Alexander Bulekov
f1c4f22d7b lan9118: switch to use qemu_receive_packet() for loopback
Git-commit: 37cee01784

This patch switches to use qemu_receive_packet() which can detect
reentrancy and return early.

This is intended to address CVE-2021-3416.

Cc: Prasad J Pandit <ppandit@redhat.com>
Cc: qemu-stable@nongnu.org
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com
Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-30 19:26:46 -06:00
Alexander Bulekov
c79f80370e cadence_gem: switch to use qemu_receive_packet() for loopback
Git-commit: e73adfbeec

This patch switches to use qemu_receive_packet() which can detect
reentrancy and return early.

This is intended to address CVE-2021-3416.

Cc: Prasad J Pandit <ppandit@redhat.com>
Cc: qemu-stable@nongnu.org
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-30 19:26:46 -06:00
Alexander Bulekov
38676a402b pcnet: switch to use qemu_receive_packet() for loopback
Git-commit: 99ccfaa1ed

This patch switches to use qemu_receive_packet() which can detect
reentrancy and return early.

This is intended to address CVE-2021-3416.

Cc: Prasad J Pandit <ppandit@redhat.com>
Cc: qemu-stable@nongnu.org
Buglink: https://bugs.launchpad.net/qemu/+bug/1917085
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com
Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-30 19:26:46 -06:00
Jason Wang
0ac2182285 tx_pkt: switch to use qemu_receive_packet_iov() for loopback
Git-commit: 8c552542b8

This patch switches to use qemu_receive_receive_iov() which can detect
reentrancy and return early.

This is intended to address CVE-2021-3416.

Cc: Prasad J Pandit <ppandit@redhat.com>
Cc: qemu-stable@nongnu.org
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-30 19:26:46 -06:00
Jason Wang
c9b45adb86 dp8393x: switch to use qemu_receive_packet() for loopback packet
Git-commit: 331d2ac9ea

This patch switches to use qemu_receive_packet() which can detect
reentrancy and return early.

This is intended to address CVE-2021-3416.

Cc: Prasad J Pandit <ppandit@redhat.com>
Cc: qemu-stable@nongnu.org
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-30 19:26:46 -06:00
Jason Wang
e016294ce3 e1000: switch to use qemu_receive_packet() for loopback
Git-commit: 1caff0340f

This patch switches to use qemu_receive_packet() which can detect
reentrancy and return early.

This is intended to address CVE-2021-3416.

Cc: Prasad J Pandit <ppandit@redhat.com>
Cc: qemu-stable@nongnu.org
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-30 19:26:46 -06:00
Alexander Bulekov
f075f63c7e rtl8139: switch to use qemu_receive_packet() for loopback
Git-commit: 5311fb805a
References: bsc#1182968, CVE-2021-3416

This patch switches to use qemu_receive_packet() which can detect
reentrancy and return early.

This is intended to address CVE-2021-3416.

Cc: Prasad J Pandit <ppandit@redhat.com>
Cc: qemu-stable@nongnu.org
Buglink: https://bugs.launchpad.net/qemu/+bug/1910826
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com
Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-30 19:26:46 -06:00
Jason Wang
fd85166fe9 net: introduce qemu_receive_packet()
Git-commit: 705df5466c
References: bsc#1182968, CVE-2021-3416
Some NIC supports loopback mode and this is done by calling
nc->info->receive() directly which in fact suppresses the effort of
reentrancy check that is done in qemu_net_queue_send().

Unfortunately we can't use qemu_net_queue_send() here since for
loopback there's no sender as peer, so this patch introduce a
qemu_receive_packet() which is used for implementing loopback mode
for a NIC with this check.

NIC that supports loopback mode will be converted to this helper.

This is intended to address CVE-2021-3416.

Cc: Prasad J Pandit <ppandit@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-30 19:26:46 -06:00
Jason Wang
e880f7e5d4 e1000: fail early for evil descriptor
Git-commit: 3de46e6fc4
References: bsc#1182577, CVE-2021-20257

During procss_tx_desc(), driver can try to chain data descriptor with
legacy descriptor, when will lead underflow for the following
calculation in process_tx_desc() for bytes:

            if (tp->size + bytes > msh)
                bytes = msh - tp->size;

This will lead a infinite loop. So check and fail early if tp->size if
greater or equal to msh.

Reported-by: Alexander Bulekov <alxndr@bu.edu>
Reported-by: Cheolwoo Myung <cwmyung@snu.ac.kr>
Reported-by: Ruhr-University Bochum <bugs-syssec@rub.de>
Cc: Prasad J Pandit <ppandit@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-30 19:26:46 -06:00
Prasad J Pandit
2266a34ef4 spapr_pci: add spapr msi read method
Git-commit: 921604e175
References: bsc#1173612, CVE-2020-15469

Add spapr msi mmio read method to avoid NULL pointer dereference
issue.

Reported-by: Lei Sun <slei.casper@gmail.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <20200811114133.672647-7-ppandit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-30 19:26:46 -06:00
Prasad J Pandit
891cde0691 prep: add ppc-parity write method
Git-commit: f867cebaed
References: bsc#1173612, CVE-2020-15469

Add ppc-parity mmio write method to avoid NULL pointer dereference
issue.

Reported-by: Lei Sun <slei.casper@gmail.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Message-Id: <20200811114133.672647-5-ppandit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-30 19:26:46 -06:00
Prasad J Pandit
839bb7ede8 vfio: add quirk device write method
Git-commit: 24202d2b56
References: bsc#1173612, CVE-2020-15469

Add vfio quirk device mmio write method to avoid NULL pointer
dereference issue.

Reported-by: Lei Sun <slei.casper@gmail.com>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <20200811114133.672647-4-ppandit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-30 19:26:46 -06:00
Prasad J Pandit
5e19462069 hw/pci-host: add pci-intack write method
Git-commit: 520f26fc6d
References: bsc#1173612, CVE-2020-15469

Add pci-intack mmio write method to avoid NULL pointer dereference
issue.

Reported-by: Lei Sun <slei.casper@gmail.com>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <20200811114133.672647-2-ppandit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-30 19:26:46 -06:00
Paolo Bonzini
b48c0e8c41 ide: atapi: assert that the buffer pointer is in range
Git-commit: 8132122889
References: bsc#1181108, CVE-2020-29443

A case was reported where s->io_buffer_index can be out of range.
The report skimped on the details but it seems to be triggered
by s->lba == -1 on the READ/READ CD paths (e.g. by sending an
ATAPI command with LBA = 0xFFFFFFFF).  For now paper over it
with assertions.  The first one ensures that there is no overflow
when incrementing s->io_buffer_index, the second checks for the
buffer overrun.

Note that the buffer overrun is only a read, so I am not sure
if the assertion failure is actually less harmful than the overrun.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20201201120926.56559-1-pbonzini@redhat.com
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-30 19:26:46 -06:00
Prasad J Pandit
c8b804bed2 hw/net/e1000e: advance desc_offset in case of null descriptor
Git-commit: c2cb511634
References: bsc#1179468, CVE-2020-28916

While receiving packets via e1000e_write_packet_to_guest() routine,
'desc_offset' is advanced only when RX descriptor is processed. And
RX descriptor is not processed if it has NULL buffer address.
This may lead to an infinite loop condition. Increament 'desc_offset'
to process next descriptor in the ring to avoid infinite loop.

Reported-by: Cheol-woo Myung <330cjfdn@gmail.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-30 19:26:46 -06:00
Prasad J Pandit
746152bd8e net: remove an assert call in eth_get_gso_type
Git-commit: 7564bf7701
References: bsc#1178174, CVE-2020-27617

eth_get_gso_type() routine returns segmentation offload type based on
L3 protocol type. It calls g_assert_not_reached if L3 protocol is
unknown, making the following return statement unreachable. Remove the
g_assert call, it maybe triggered by a guest user.

Reported-by: Gaoning Pan <pgn@zju.edu.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-30 19:26:46 -06:00
Prasad J Pandit
4d2180574c hw: usb: hcd-ohci: check for processed TD before retire
Git-commit: 1be90ebecc
References: bsc#1176684, CVE-2020-25625

While servicing OHCI transfer descriptors(TD), ohci_service_iso_td
retires a TD if it has passed its time frame. It does not check if
the TD was already processed once and holds an error code in TD_CC.
It may happen if the TD list has a loop. Add check to avoid an
infinite loop condition.

Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Message-id: 20200915182259.68522-3-ppandit@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-30 19:26:46 -06:00
Prasad J Pandit
aec8c7da86 hw: usb: hcd-ohci: check len and frame_number variables
Git-commit: 1328fe0c32
References: bsc#1176682, CVE-2020-25624

While servicing the OHCI transfer descriptors(TD), OHCI host
controller derives variables 'start_addr', 'end_addr', 'len'
etc. from values supplied by the host controller driver.
Host controller driver may supply values such that using
above variables leads to out-of-bounds access issues.
Add checks to avoid them.

AddressSanitizer: stack-buffer-overflow on address 0x7ffd53af76a0
  READ of size 2 at 0x7ffd53af76a0 thread T0
  #0 ohci_service_iso_td ../hw/usb/hcd-ohci.c:734
  #1 ohci_service_ed_list ../hw/usb/hcd-ohci.c:1180
  #2 ohci_process_lists ../hw/usb/hcd-ohci.c:1214
  #3 ohci_frame_boundary ../hw/usb/hcd-ohci.c:1257
  #4 timerlist_run_timers ../util/qemu-timer.c:572
  #5 qemu_clock_run_timers ../util/qemu-timer.c:586
  #6 qemu_clock_run_all_timers ../util/qemu-timer.c:672
  #7 main_loop_wait ../util/main-loop.c:527
  #8 qemu_main_loop ../softmmu/vl.c:1676
  #9 main ../softmmu/main.c:50

Reported-by: Gaoning Pan <pgn@zju.edu.cn>
Reported-by: Yongkang Jia <j_kangel@163.com>
Reported-by: Yi Ren <yunye.ry@alibaba-inc.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-id: 20200915182259.68522-2-ppandit@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-30 19:26:46 -06:00
Li Qiang
9ea47050ed hw: ehci: check return value of 'usb_packet_map'
Git-commit: 2fdb42d840
References: bsc#1178934, CVE-2020-25723

If 'usb_packet_map' fails, we should stop to process the usb
request.

Signed-off-by: Li Qiang <liq3ea@163.com>
Message-Id: <20200812161727.29412-1-liq3ea@163.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-30 19:26:46 -06:00
Li Qiang
d82ce8d705 hw: xhci: check return value of 'usb_packet_map'
Git-commit: 21bc31524e
References: bsc#1176673, CVE-2020-25084

Currently we don't check the return value of 'usb_packet_map',
this will cause an UAF issue. This is LP#1891341.
Following is the reproducer provided in:
-->https://bugs.launchpad.net/qemu/+bug/1891341

cat << EOF | ./i386-softmmu/qemu-system-i386 -device nec-usb-xhci \
-trace usb\* -device usb-audio -device usb-storage,drive=mydrive \
-drive id=mydrive,file=null-co://,size=2M,format=raw,if=none \
-nodefaults -nographic -qtest stdio
outl 0xcf8 0x80001016
outl 0xcfc 0x3c009f0d
outl 0xcf8 0x80001004
outl 0xcfc 0xc77695e
writel 0x9f0d000000000040 0xffff3655
writeq 0x9f0d000000002000 0xff2f9e0000000000
write 0x1d 0x1 0x27
write 0x2d 0x1 0x2e
write 0x17232 0x1 0x03
write 0x17254 0x1 0x06
write 0x17278 0x1 0x34
write 0x3d 0x1 0x27
write 0x40 0x1 0x2e
write 0x41 0x1 0x72
write 0x42 0x1 0x01
write 0x4d 0x1 0x2e
write 0x4f 0x1 0x01
writeq 0x9f0d000000002000 0x5c051a0100000000
write 0x34001d 0x1 0x13
write 0x340026 0x1 0x30
write 0x340028 0x1 0x08
write 0x34002c 0x1 0xfe
write 0x34002d 0x1 0x08
write 0x340037 0x1 0x5e
write 0x34003a 0x1 0x05
write 0x34003d 0x1 0x05
write 0x34004d 0x1 0x13
writeq 0x9f0d000000002000 0xff00010100400009
EOF

This patch fixes this.

Buglink: https://bugs.launchpad.net/qemu/+bug/1891341
Reported-by: Alexander Bulekov <alxndr@bu.edu>
Signed-off-by: Li Qiang <liq3ea@163.com>
Message-id: 20200812153139.15146-1-liq3ea@163.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-30 19:26:46 -06:00
Prasad J Pandit
5232fb8315 megasas: use unsigned type for reply_queue_head and check index
Git-commit: f50ab86a26
References: bsc#1172383, CVE-2020-13362

A guest user may set 'reply_queue_head' field of MegasasState to
a negative value. Later in 'megasas_lookup_frame' it is used to
index into s->frames[] array. Use unsigned type to avoid OOB
access issue.

Also check that 'index' value stays within s->frames[] bounds
through the while() loop in 'megasas_lookup_frame' to avoid OOB
access.

Reported-by: Ren Ding <rding@gatech.edu>
Reported-by: Hanqing Zhao <hanqing@gatech.edu>
Reported-by: Alexander Bulekov <alxndr@bu.edu>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Acked-by: Alexander Bulekov <alxndr@bu.edu>
Message-Id: <20200513192540.1583887-2-ppandit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-29 14:13:10 -06:00
Bruce Rogers
47ff090c3d sm501: add qemu/log.h include
References: bsc#1172385, CVE-2020-12829

This is in place of commit 4a1f253adb which added the log.h
reference, but did other changes we're not interested in.

Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-29 14:13:10 -06:00
BALATON Zoltan
20514a7521 sm501: Replace hand written implementation with pixman where possible
Git-commit: b15a22bbcb
References: bsc#1172385, CVE-2020-12829

Besides being faster this should also prevent malicious guests to
abuse 2D engine to overwrite data or cause a crash.

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Message-id: 58666389b6cae256e4e972a32c05cf8aa51bffc0.1590089984.git.balaton@eik.bme.hu
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Bruce Rogers <brogers@suse.com>
(cherry picked from commit 6bed160dd4e4f68a85e032b08d40d6bd5449274f)
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-29 14:13:10 -06:00
BALATON Zoltan
881edaf75a sm501: Clean up local variables in sm501_2d_operation
Git-commit: 3d0b096298
References: bsc#1172385, CVE-2020-12829

Make variables local to the block they are used in to make it clearer
which operation they are needed for.

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: ae59f8138afe7f6a5a4a82539d0f61496a906b06.1590089984.git.balaton@eik.bme.hu
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Bruce Rogers <brogers@suse.com>
(cherry picked from commit e8a152c862db6fdd316345d9cd932e4602e9b29e)
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-29 14:13:09 -06:00
BALATON Zoltan
4674b6bb04 sm501: Use BIT(x) macro to shorten constant
Git-commit: 2824809b7f
References: bsc#1172385, CVE-2020-12829

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 124bf5de8d7cf503b32b377d0445029a76bfbd49.1590089984.git.balaton@eik.bme.hu
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Bruce Rogers <brogers@suse.com>
(cherry picked from commit 6f7231dc2939aca80adc5080464696e7d5f0ca45)
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-29 14:13:09 -06:00
BALATON Zoltan
7dd455124e sm501: Shorten long variable names in sm501_2d_operation
Git-commit: 6f8183b5dc
References: bsc#1172385, CVE-2020-12829

This increases readability and cleans up some confusing naming.

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Message-id: b9b67b94c46e945252a73c77dfd117132c63c4fb.1590089984.git.balaton@eik.bme.hu
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-29 14:12:30 -06:00
BALATON Zoltan
a4b9fd7dde sm501: Convert printf + abort to qemu_log_mask
Git-commit: e29da77e5f
References: bsc#1172385, CVE-2020-12829

Some places already use qemu_log_mask() to log unimplemented features
or errors but some others have printf() then abort(). Convert these to
qemu_log_mask() and avoid aborting to prevent guests to easily cause
denial of service.

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 305af87f59d81e92f2aaff09eb8a3603b8baa322.1590089984.git.balaton@eik.bme.hu
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-29 14:12:30 -06:00
Marcus Comstedt
204d25c450 sm501: Adjust endianness of pixel value in rectangle fill
Git-commit: f3a60058c9
References: bsc#1172385, CVE-2020-12829

The value from twoD_foreground (which is in host endian format) must
be converted to the endianness of the framebuffer (currently always
little endian) before it can be used to perform the fill operation.

Signed-off-by: Marcus Comstedt <marcus@mc.pp.se>
Reviewed-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-29 14:12:30 -06:00
BALATON Zoltan
386ca5f7b6 sm501: Set updated region dirty after 2D operation
Git-commit: eb76243c9d
References: bsc#1172385, CVE-2020-12829

Set the changed memory region dirty after performed a 2D operation to
ensure that the screen is updated properly.

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-29 14:12:30 -06:00
BALATON Zoltan
e6a428806a sm501: Fix support for non-zero frame buffer start address
Git-commit: 33159dd7ce
References: bsc#1172385, CVE-2020-12829

Display updates and drawing hardware cursor did not work when frame
buffer address was non-zero. Fix this by taking the frame buffer
address into account in these cases. This fixes screen dragging on
AmigaOS. Based on patch by Sebastian Bauer.

Signed-off-by: Sebastian Bauer <mail@sebastianbauer.info>
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-29 14:12:30 -06:00
Sebastian Bauer
1ac6b4e5d9 sm501: Log unimplemented raster operation modes
Git-commit: 06cb926aaa
References: bsc#1172385, CVE-2020-12829

The sm501 currently implements only a very limited set of raster operation
modes. After this change, unknown raster operation modes are logged so
these can be easily spotted.

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-29 14:12:30 -06:00
Sebastian Bauer
6dac9775cd sm501: Implement negated destination raster operation mode
Git-commit: debc7e7dad
References: bsc#1172385, CVE-2020-12829

Add support for the negated destination operation mode. This is used e.g.
by AmigaOS for the INVERSEVID drawing mode. With this change, the cursor
in the shell and non-immediate window adjustment are working now.

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-29 14:12:30 -06:00
Sebastian Bauer
32be0f4fbb sm501: Use values from the pitch register for 2D operations
Git-commit: 54b2a4339c
References: bsc#1172385, CVE-2020-12829

Before, crt_h_total was used for src_width and dst_width. This is a
property of the current display setting and not relevant for the 2D
operation that also can be done off-screen. The pitch register's purpose
is to describe line pitch relevant of the 2D operation.

Signed-off-by: Sebastian Bauer <mail@sebastianbauer.info>
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-29 14:12:30 -06:00
BALATON Zoltan
10a8e77101 sm501: Add some more unimplemented registers
Git-commit: 5690d9ecef
References: bsc#1172385, CVE-2020-12829

These are not really implemented (just return zero or default values)
but add these so guests accessing them can run.

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-29 14:12:30 -06:00
BALATON Zoltan
32d7d1ea3d sm501: Add support for panel layer
Git-commit: 1ae5e6eb42
References: bsc#1172385, CVE-2020-12829

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Tested-by: Aurelien Jarno <aurelien@aurel32.net>
Message-id: 2029a276362c0c3a14c78acb56baa9466848dd51.1492787889.git.balaton@eik.bme.hu
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-29 14:11:26 -06:00
BALATON Zoltan
f7713e0f17 sm501: Fix hardware cursor
Git-commit: 6a2a5aae02
References: bsc#1172385, CVE-2020-12829

Rework HWC handling to simplify it and fix cursor not updating on
screen as needed. Previously cursor was not updated because checking
for changes in a line overrode the update flag set for the cursor but
fixing this is not enough because the cursor should also be updated if
its shape or location changes. Introduce hwc_invalidate() function to
handle that similar to other display controller models.

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Tested-by: Aurelien Jarno <aurelien@aurel32.net>
Message-id: 6970a5e9868b7246656c1d02038dc5d5fa369507.1492787889.git.balaton@eik.bme.hu
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-29 14:10:19 -06:00
Thomas Huth
858d332e0f hw/core/loader: Fix possible crash in rom_copy()
Git-commit: e423455c4f
References: bsc#1172478, CVE-2020-13765

Both, "rom->addr" and "addr" are derived from the binary image
that can be loaded with the "-kernel" paramer. The code in
rom_copy() then calculates:

    d = dest + (rom->addr - addr);

and uses "d" as destination in a memcpy() some lines later. Now with
bad kernel images, it is possible that rom->addr is smaller than addr,
thus "rom->addr - addr" gets negative and the memcpy() then tries to
copy contents from the image to a bad memory location. This could
maybe be used to inject code from a kernel image into the QEMU binary,
so we better fix it with an additional sanity check here.

Cc: qemu-stable@nongnu.org
Reported-by: Guangming Liu
Buglink: https://bugs.launchpad.net/qemu/+bug/1844635
Message-Id: <20190925130331.27825-1-thuth@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:04 -06:00
Prasad J Pandit
eeb47ad354 es1370: check total frame count against current frame
Git-commit: 369ff955a8
References: bsc#1172384, CVE-2020-13361

A guest user may set channel frame count via es1370_write()
such that, in es1370_transfer_audio(), total frame count
'size' is lesser than the number of frames that are processed
'cnt'.

    int cnt = d->frame_cnt >> 16;
    int size = d->frame_cnt & 0xffff;

if (size < cnt), it results in incorrect calculations leading
to OOB access issue(s). Add check to avoid it.

Reported-by: Ren Ding <rding@gatech.edu>
Reported-by: Hanqing Zhao <hanqing@gatech.edu>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-id: 20200514200608.1744203-1-ppandit@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:04 -06:00
Paolo Bonzini
c2122f7168 scsi: lsi: exit infinite loop while executing script (CVE-2019-12068)
When executing script in lsi_execute_script(), the LSI scsi adapter
emulator advances 's->dsp' index to read next opcode. This can lead
to an infinite loop if the next opcode is empty. Move the existing
loop exit after 10k iterations so that it covers no-op opcodes as
well.

Reported-by: Bugs SysSec <bugs-syssec@rub.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit de594e4765)
[BR: BSC#1146873 CVE-2019-12068 - minor trace related tweak applied]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:04 -06:00
Sven Schnelle
6d74a3386e lsi: use enum type for s->waiting
This makes the code easier to read - no functional change.

Signed-off-by: Sven Schnelle <svens@stackframe.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190305195519.24303-3-svens@stackframe.org>
(cherry picked from commit f08ec2b82a)
[BR: BSC#1146873 CVE-2019-12068 - small tweak made related to trace code]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:04 -06:00
Felipe Franciosi
6335f5aded iscsi: Cap block count from GET LBA STATUS (CVE-2020-1711)
Git-commit: 693fd2acdf
References: bsc#1166240

When querying an iSCSI server for the provisioning status of blocks (via
GET LBA STATUS), Qemu only validates that the response descriptor zero's
LBA matches the one requested. Given the SCSI spec allows servers to
respond with the status of blocks beyond the end of the LUN, Qemu may
have its heap corrupted by clearing/setting too many bits at the end of
its allocmap for the LUN.

A malicious guest in control of the iSCSI server could carefully program
Qemu's heap (by selectively setting the bitmap) and then smash it.

This limits the number of bits that iscsi_co_block_status() will try to
update in the allocmap so it can't overflow the bitmap.

Fixes: CVE-2020-1711
Cc: qemu-stable@nongnu.org
Signed-off-by: Felipe Franciosi <felipe@nutanix.com>
Signed-off-by: Peter Turschmid <peter.turschm@nutanix.com>
Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
[BR: Converted patch from byte based to sector based]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:04 -06:00
Marc-André Lureau
29d86468e1 Fix use-afte-free in ip_reass() (CVE-2020-1983)
Git-commit: 2faae0f778f818fadc873308f983289df697eb93
References: bsc#1170940, CVE-2020-1983

The q pointer is updated when the mbuf data is moved from m_dat to
m_ext.

m_ext buffer may also be realloc()'ed and moved during m_cat():
q should also be updated in this case.

Reported-by: Aviv Sasson <asasson@paloaltonetworks.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
(cherry picked from commit 9bd6c5913271eabcb7768a58197ed3301fe19f2d)
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:04 -06:00
Marc-André Lureau
faab0c6c0f tcp_emu: fix unsafe snprintf() usages
Git-commit: 68ccb8021a838066f0951d4b2817eb6b6f10a843
References: bsc#1163018, CVE-2020-8608

Various calls to snprintf() assume that snprintf() returns "only" the
number of bytes written (excluding terminating NUL).

https://pubs.opengroup.org/onlinepubs/9699919799/functions/snprintf.html#tag_16_159_04

"Upon successful completion, the snprintf() function shall return the
number of bytes that would be written to s had n been sufficiently
large excluding the terminating null byte."

Before patch ce131029, if there isn't enough room in "m_data" for the
"DCC ..." message, we overflow "m_data".

After the patch, if there isn't enough room for the same, we don't
overflow "m_data", but we set "m_len" out-of-bounds. The next time an
access is bounded by "m_len", we'll have a buffer overflow then.

Use slirp_fmt*() to fix potential OOB memory access.

Reported-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Message-Id: <20200127092414.169796-7-marcandre.lureau@redhat.com>
[Since porting from a quite different now libslirp, quite a bit changed]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:04 -06:00
Prasad J Pandit
df5ccaec1d slirp: use correct size while emulating commands
Git-commit: 82ebe9c370a0e2970fb5695aa19aa5214a6a1c80
References: bsc#1161066, CVE-2020-7039, bsc#1163018, CVE-2020-8608

While emulating services in tcp_emu(), it uses 'mbuf' size
'm->m_size' to write commands via snprintf(3). Use M_FREEROOM(m)
size to avoid possible OOB access.

Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Message-Id: <20200109094228.79764-3-ppandit@redhat.com>
[Since porting from a quite different now libslirp, quite a bit changed]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:04 -06:00
Prasad J Pandit
db504ecf42 slirp: use correct size while emulating IRC commands
Git-commit: ce131029d6d4a405cb7d3ac6716d03e58fb4a5d9
References: bsc#1161066, CVE-2020-7039, bsc#1163018, CVE-2020-8608

While emulating IRC DCC commands, tcp_emu() uses 'mbuf' size
'm->m_size' to write DCC commands via snprintf(3). This may
lead to OOB write access, because 'bptr' points somewhere in
the middle of 'mbuf' buffer, not at the start. Use M_FREEROOM(m)
size to avoid OOB access.

Reported-by: Vishnu Dev TJ <vishnudevtj@gmail.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Message-Id: <20200109094228.79764-2-ppandit@redhat.com>
[Since porting from a quite different now libslirp, quite a bit changed]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:04 -06:00
Samuel Thibault
3590aad90a tcp_emu: Fix oob access
Git-commit: 2655fffed7a9e765bcb4701dd876e9dab975f289
References: bsc#1161066, CVE2020-7039, bsc#1161066, CVS-2020-7039

The main loop only checks for one available byte, while we sometimes
need two bytes.

Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:04 -06:00
Marc-André Lureau
f807d653a0 util: add slirp_fmt() helpers
Git-commit: 30648c03b27fb8d9611b723184216cd3174b6775
References: bsc#1163018, CVE-2020-8608

Various calls to snprintf() in libslirp assume that snprintf() returns
"only" the number of bytes written (excluding terminating NUL).

https://pubs.opengroup.org/onlinepubs/9699919799/functions/snprintf.html#tag_16_159_04

"Upon successful completion, the snprintf() function shall return the
number of bytes that would be written to s had n been sufficiently
large excluding the terminating null byte."

Introduce slirp_fmt() that handles several pathological cases the
way libslirp usually expect:

- treat error as fatal (instead of silently returning -1)

- fmt0() will always \0 end

- return the number of bytes actually written (instead of what would
have been written, which would usually result in OOB later), including
the ending \0 for fmt0()

- warn if truncation happened (instead of ignoring)

Other less common cases can still be handled with strcpy/snprintf() etc.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Message-Id: <20200127092414.169796-2-marcandre.lureau@redhat.com>
[Since porting from a quite different now libslirp, quite a bit changed]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:04 -06:00
Marc-André Lureau
3956bab5c1 slirp: don't manipulate so_rcv in tcp_emu()
For some reason, EMU_IDENT is not like other "emulated" protocols and
tries to reconstitute the original buffer, if it came in multiple
packets. Unfortunately, it does so wrongly, as it doesn't respect the
sbuf circular buffer appending rules, nor does it maintain some of the
invariants (rptr is incremented without bounds, etc): this leads to
further memory corruption revealed by ASAN or various malloc
errors. Furthermore, the so_rcv buffer is regularly flushed, so there
is no guarantee that buffer reconstruction will do what is expected.

Instead, do what the function comment says: "XXX Assumes the whole
command came in one packet", and don't touch so_rcv.

Related to: https://bugzilla.redhat.com/show_bug.cgi?id=1664205

Cc: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2021-03-16 20:23:04 -06:00
Marc-André Lureau
807b875ac0 slirp: ensure there is enough space in mbuf to null-terminate
Prevents from buffer overflows.
Related to: https://bugzilla.redhat.com/show_bug.cgi?id=1664205

Cc: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
[BFR: In support of BSC#1123156]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:04 -06:00
Samuel Thibault
a077d9f433 slirp: fix big/little endian conversion in ident protocol
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>

---
Based-on: <1551476756-25749-1-git-send-email-will@wbowling.info>

(cherry picked from commit 1fd71067da)
[BFR: In support of BSC#1123156]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:04 -06:00
Michael Roth
06acbfe92a slrip: ip_reass: Fix use after free
Using ip_deq after m_free might read pointers from an allocation reuse.

This would be difficult to exploit, but that is still related with
CVE-2019-14378 which generates fragmented IP packets that would trigger this
issue and at least produce a DoS.

Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
(from libslirp.git commit c59279437eda91841b9d26079c70b8a540d41204)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
[BR: BSC#1149811 CVE-2019-15890]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:04 -06:00
Olaf Hering
5b64848374 xen-mapcache: Fix the bug when overlapping emulated DMA operations may cause inconsistency in guest memory mappings
Under certain circumstances normal xen-mapcache functioning may be broken
by guest's actions. This may lead to either QEMU performing exit() due to
a caught bad pointer (and with QEMU process gone the guest domain simply
appears hung afterwards) or actual use of the incorrect pointer inside
QEMU address space -- a write to unmapped memory is possible. The bug is
hard to reproduce on a i440 machine as multiple DMA sources are required
(though it's possible in theory, using multiple emulated devices), but can
be reproduced somewhat easily on a Q35 machine using an emulated AHCI
controller -- each NCQ queue command slot may be used as an independent
DMA source ex. using READ FPDMA QUEUED command, so a single storage
device on the AHCI controller port will be enough to produce multiple DMAs
(up to 32). The detailed description of the issue follows.

Xen-mapcache provides an ability to map parts of a guest memory into
QEMU's own address space to work with.

There are two types of cache lookups:
 - translating a guest physical address into a pointer in QEMU's address
   space, mapping a part of guest domain memory if necessary (while trying
   to reduce a number of such (re)mappings to a minimum)
 - translating a QEMU's pointer back to its physical address in guest RAM

These lookups are managed via two linked-lists of structures.
MapCacheEntry is used for forward cache lookups, while MapCacheRev -- for
reverse lookups.

Every guest physical address is broken down into 2 parts:
    address_index  = phys_addr >> MCACHE_BUCKET_SHIFT;
    address_offset = phys_addr & (MCACHE_BUCKET_SIZE - 1);

MCACHE_BUCKET_SHIFT depends on a system (32/64) and is equal to 20 for
a 64-bit system (which assumed for the further description). Basically,
this means that we deal with 1 MB chunks and offsets within those 1 MB
chunks. All mappings are created with 1MB-granularity, i.e. 1MB/2MB/3MB
etc. Most DMA transfers typically are less than 1MB, however, if the
transfer crosses any 1MB border(s) - than a nearest larger mapping size
will be used, so ex. a 512-byte DMA transfer with the start address
700FFF80h will actually require a 2MB range.

Current implementation assumes that MapCacheEntries are unique for a given
address_index and size pair and that a single MapCacheEntry may be reused
by multiple requests -- in this case the 'lock' field will be larger than
1. On other hand, each requested guest physical address (with 'lock' flag)
is described by each own MapCacheRev. So there may be multiple MapCacheRev
entries corresponding to a single MapCacheEntry. The xen-mapcache code
uses MapCacheRev entries to retrieve the address_index & size pair which
in turn used to find a related MapCacheEntry. The 'lock' field within
a MapCacheEntry structure is actually a reference counter which shows
a number of corresponding MapCacheRev entries.

The bug lies in ability for the guest to indirectly manipulate with the
xen-mapcache MapCacheEntries list via a special sequence of DMA
operations, typically for storage devices. In order to trigger the bug,
guest needs to issue DMA operations in specific order and timing.
Although xen-mapcache is protected by the mutex lock -- this doesn't help
in this case, as the bug is not due to a race condition.

Suppose we have 3 DMA transfers, namely A, B and C, where
- transfer A crosses 1MB border and thus uses a 2MB mapping
- transfers B and C are normal transfers within 1MB range
- and all 3 transfers belong to the same address_index

In this case, if all these transfers are to be executed one-by-one
(without overlaps), no special treatment necessary -- each transfer's
mapping lock will be set and then cleared on unmap before starting
the next transfer.
The situation changes when DMA transfers overlap in time, ex. like this:

  |===== transfer A (2MB) =====|

              |===== transfer B (1MB) =====|

                          |===== transfer C (1MB) =====|
 time --->

In this situation the following sequence of actions happens:

1. transfer A creates a mapping to 2MB area (lock=1)
2. transfer B (1MB) tries to find available mapping but cannot find one
   because transfer A is still in progress, and it has 2MB size + non-zero
   lock. So transfer B creates another mapping -- same address_index,
   but 1MB size.
3. transfer A completes, making 1st mapping entry available by setting its
   lock to 0
4. transfer C starts and tries to find available mapping entry and sees
   that 1st entry has lock=0, so it uses this entry but remaps the mapping
   to a 1MB size
5. transfer B completes and by this time
  - there are two locked entries in the MapCacheEntry list with the SAME
    values for both address_index and size
  - the entry for transfer B actually resides farther in list while
    transfer C's entry is first
6. xen_ram_addr_from_mapcache() for transfer B gets correct address_index
   and size pair from corresponding MapCacheRev entry, but then it starts
   looking for MapCacheEntry with these values and finds the first entry
   -- which belongs to transfer C.

At this point there may be following possible (bad) consequences:

1. xen_ram_addr_from_mapcache() will use a wrong entry->vaddr_base value
   in this statement:

   raddr = (reventry->paddr_index << MCACHE_BUCKET_SHIFT) +
       ((unsigned long) ptr - (unsigned long) entry->vaddr_base);

resulting in an incorrent raddr value returned from the function. The
(ptr - entry->vaddr_base) expression may produce both positive and negative
numbers and its actual value may differ greatly as there are many
map/unmap operations take place. If the value will be beyond guest RAM
limits then a "Bad RAM offset" error will be triggered and logged,
followed by exit() in QEMU.

2. If raddr value won't exceed guest RAM boundaries, the same sequence
of actions will be performed for xen_invalidate_map_cache_entry() on DMA
unmap, resulting in a wrong MapCacheEntry being unmapped while DMA
operation which uses it is still active. The above example must
be extended by one more DMA transfer in order to allow unmapping as the
first mapping in the list is sort of resident.

The patch modifies the behavior in which MapCacheEntry's are added to the
list, avoiding duplicates.

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 7fb394ad8a)
[BSC#1160024]
2021-03-16 20:23:04 -06:00
Olaf Hering
940cf557e5 xen: don't use xenstore to save/restore physmap anymore
If we have a system with xenforeignmemory_map2() implemented
we don't need to save/restore physmap on suspend/restore
anymore. In case we resume a VM without physmap - try to
recreate the physmap during memory region restore phase and
remap map cache entries accordingly. The old code is left
for compatibility reasons.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 331b5189d7)
(add only code within XEN_COMPAT_PHYSMAP)
[BSC#1160024]
2021-03-16 20:23:04 -06:00
Olaf Hering
7782ec9f50 xen/mapcache: introduce xen_replace_cache_entry()
This new call is trying to update a requested map cache entry
according to the changes in the physmap. The call is searching
for the entry, unmaps it and maps again at the same place using
a new guest address. If the mapping is dummy this call will
make it real.

This function makes use of a new xenforeignmemory_map2() call
with an extended interface that was recently introduced in
libxenforeignmemory [1].

[1] https://www.mail-archive.com/xen-devel@lists.xen.org/msg113007.html

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 5ba3d75645)
(adjusted to use xenforeignmemory_map instead of xenforeignmemory_map2)
[BSC#1160024]
2021-03-16 20:23:04 -06:00
Olaf Hering
3787b85bb4 xen/mapcache: add an ability to create dummy mappings
Dummys are simple anonymous mappings that are placed instead
of regular foreign mappings in certain situations when we need
to postpone the actual mapping but still have to give a
memory region to QEMU to play with.

This is planned to be used for restore on Xen.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 759235653d)
[BSC#1160024]
2021-03-16 20:23:04 -06:00
Leonid Bloch
d078492e46 qcow2: Explicit number replaced by a constant
Signed-off-by: Leonid Bloch <lbloch@janustech.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit bd016b912c)
[LM: BSC#1139926]
Signed-off-by: Lin Ma <lma@suse.com>
2021-03-16 20:23:04 -06:00
Leonid Bloch
3644bf7f46 qcow2: Set the default cache-clean-interval to 10 minutes
The default cache-clean-interval is set to 10 minutes, in order to lower
the overhead of the qcow2 caches (before the default was 0, i.e.
disabled).

* For non-Linux platforms the default is kept at 0, because
  cache-clean-interval is not supported there yet.

Signed-off-by: Leonid Bloch <lbloch@janustech.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit e957b50b8d)
[LM: BSC#1139926]
Signed-off-by: Lin Ma <lma@suse.com>
2021-03-16 20:23:04 -06:00
Leonid Bloch
a03218f09c qcow2: Increase the default upper limit on the L2 cache size
The upper limit on the L2 cache size is increased from 1 MB to 32 MB
on Linux platforms, and to 8 MB on other platforms (this difference is
caused by the ability to set intervals for cache cleaning on Linux
platforms only).

This is done in order to allow default full coverage with the L2 cache
for images of up to 256 GB in size (was 8 GB). Note, that only the
needed amount to cover the full image is allocated. The value which is
changed here is just the upper limit on the L2 cache size, beyond which
it will not grow, even if the size of the image will require it to.

Signed-off-by: Leonid Bloch <lbloch@janustech.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 80668d0fb7)
[LM: BSC#1139926]
Signed-off-by: Lin Ma <lma@suse.com>
2021-03-16 20:23:04 -06:00
Leonid Bloch
87bcc27ce2 qcow2: Assign the L2 cache relatively to the image size
Sufficient L2 cache can noticeably improve the performance when using
large images with frequent I/O.

Previously, unless 'cache-size' was specified and was large enough, the
L2 cache was set to a certain size without taking the virtual image size
into account.

Now, the L2 cache assignment is aware of the virtual size of the image,
and will cover the entire image, unless the cache size needed for that is
larger than a certain maximum. This maximum is set to 1 MB by default
(enough to cover an 8 GB image with the default cluster size) but can
be increased or decreased using the 'l2-cache-size' option. This option
was previously documented as the *maximum* L2 cache size, and this patch
makes it behave as such, instead of as a constant size. Also, the
existing option 'cache-size' can limit the sum of both L2 and refcount
caches, as previously.

Signed-off-by: Leonid Bloch <lbloch@janustech.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit b749562d98)
[LM: BSC#1139926]
Signed-off-by: Lin Ma <lma@suse.com>
2021-03-16 20:23:04 -06:00
Leonid Bloch
51faf966da qcow2: Avoid duplication in setting the refcount cache size
The refcount cache size does not need to be set to its minimum value in
read_cache_sizes(), as it is set to at least its minimum value in
qcow2_update_options_prepare().

Signed-off-by: Leonid Bloch <lbloch@janustech.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 657ada52ab)
[LM: BSC#1139926]
Signed-off-by: Lin Ma <lma@suse.com>
2021-03-16 20:23:04 -06:00
Leonid Bloch
980aca61cc qcow2: Make sizes more humanly readable
Signed-off-by: Leonid Bloch <lbloch@janustech.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit b6a95c6d10)
[LM: BSC#1139926]
Signed-off-by: Lin Ma <lma@suse.com>
2021-03-16 20:23:04 -06:00
Leonid Bloch
97746a0fe5 include: Add a lookup table of sizes
Adding a lookup table for the powers of two, with the appropriate size
prefixes. This is needed when a size has to be stringified, in which
case something like '(1 * KiB)' would become a literal '(1 * (1L << 10))'
string. Powers of two are used very often for sizes, so such a table
will also make it easier and more intuitive to write them.

This table is generatred using the following AWK script:

BEGIN {
    suffix="KMGTPE";
    for(i=10; i<64; i++) {
        val=2**i;
        s=substr(suffix, int(i/10), 1);
        n=2**(i%10);
        pad=21-int(log(n)/log(10));
        printf("#define S_%d%siB %*d\n", n, s, pad, val);
    }
}

Signed-off-by: Leonid Bloch <lbloch@janustech.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 540b849261)
[LM: BSC#1139926]
Signed-off-by: Lin Ma <lma@suse.com>
2021-03-16 20:23:04 -06:00
Leonid Bloch
e4e169830b qcow2: Options' documentation fixes
Signed-off-by: Leonid Bloch <lbloch@janustech.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 40fb215d48)
[LM: BSC#1139926]
Signed-off-by: Lin Ma <lma@suse.com>
2021-03-16 20:23:04 -06:00
Alberto Garcia
61b3b74481 qcow2: Fix Coverity warning when calculating the refcount cache size
MIN_REFCOUNT_CACHE_SIZE is 4 and the cluster size is guaranteed to be
at most 2MB, so the minimum refcount cache size (in bytes) is always
going to fit in a 32-bit integer.

Coverity doesn't know that, and since we're storing the result in a
uint64_t (*refcount_cache_size) it thinks that we need the 64 bits and
that we probably want to do a 64-bit multiplication to prevent the
result from being truncated.

This is a false positive in this case, but it's a fair warning.
We could do a 64-bit multiplication to get rid of it, but since we
know that a 32-bit variable is enough to store this value let's simply
reuse min_refcount_cache, make it a normal int and stop doing casts.

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 7af5eea9b3)
[LM: BSC#1139926]
Signed-off-by: Lin Ma <lma@suse.com>
2021-03-16 20:23:04 -06:00
Alberto Garcia
466ff26361 docs: Document the new default sizes of the qcow2 caches
We have just reduced the refcount cache size to the minimum unless
the user explicitly requests a larger one, so we have to update the
documentation to reflect this change.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: c5f0bde23558dd9d33b21fffc76ac9953cc19c56.1523968389.git.berto@igalia.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit 603790ef3a)
[LM: BSC#1139926]
Signed-off-by: Lin Ma <lma@suse.com>
2021-03-16 20:23:04 -06:00
Alberto Garcia
8e2f2e916c qcow2: Give the refcount cache the minimum possible size by default
The L2 and refcount caches have default sizes that can be overridden
using the l2-cache-size and refcount-cache-size (an additional
parameter named cache-size sets the combined size of both caches).

Unless forced by one of the aforementioned parameters, QEMU will set
the unspecified sizes so that the L2 cache is 4 times larger than the
refcount cache.

This is based on the premise that the refcount metadata needs to be
only a fourth of the L2 metadata to cover the same amount of disk
space. This is incorrect for two reasons:

 a) The amount of disk covered by an L2 table depends solely on the
    cluster size, but in the case of a refcount block it depends on
    the cluster size *and* the width of each refcount entry.
    The 4/1 ratio is only valid with 16-bit entries (the default).

 b) When we talk about disk space and L2 tables we are talking about
    guest space (L2 tables map guest clusters to host clusters),
    whereas refcount blocks are used for host clusters (including
    L1/L2 tables and the refcount blocks themselves). On a fully
    populated (and uncompressed) qcow2 file, image size > virtual size
    so there are more refcount entries than L2 entries.

Problem (a) could be fixed by adjusting the algorithm to take into
account the refcount entry width. Problem (b) could be fixed by
increasing a bit the refcount cache size to account for the clusters
used for qcow2 metadata.

However this patch takes a completely different approach and instead
of keeping a ratio between both cache sizes it assigns as much as
possible to the L2 cache and the remainder to the refcount cache.

The reason is that L2 tables are used for every single I/O request
from the guest and the effect of increasing the cache is significant
and clearly measurable. Refcount blocks are however only used for
cluster allocation and internal snapshots and in practice are accessed
sequentially in most cases, so the effect of increasing the cache is
negligible (even when doing random writes from the guest).

So, make the refcount cache as small as possible unless the user
explicitly asks for a larger one.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 9695182c2eb11b77cb319689a1ebaa4e7c9d6591.1523968389.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit 52253998ec)
[LM: BSC#1139926]
Signed-off-by: Lin Ma <lma@suse.com>
2021-03-16 20:23:04 -06:00
Philippe Mathieu-Daudé
f2f7cc2de1 include: Add IEC binary prefixes in "qemu/units.h"
Loosely based on 076b35b5a5.

Suggested-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20180625124238.25339-2-f4bug@amsat.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 7ecdc94c40)
[LM: BSC#1139926]
Signed-off-by: Lin Ma <lma@suse.com>
2021-03-16 20:23:04 -06:00
Samuel Thibault
fc72749949 Fix heap overflow in ip_reass on big packet input
When the first fragment does not fit in the preallocated buffer, q will
already be pointing to the ext buffer, so we mustn't try to update it.

Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
(cherry picked from commit 126c04acbabd7ad32c2b018fe10dfac2a3bc1210)
[LY: CVE-2019-14378 BSC#1143794]
Signed-off-by: Liang Yan <lyan@suse.com>
2021-03-16 20:23:04 -06:00
Liang Yan
10b7b7e12c qemu-bridge-helper: restrict interface name
The interface names in qemu-bridge-helper are defined to be
of size IFNAMSIZ(=16), including the terminating null('\0') byte.
The same is applied to interface names read from 'bridge.conf'
file to form ACLs rules. If user supplied '--br=bridge' name
is not restricted to the same length, it could lead to ACL bypass
issue. Restrict bridge name to IFNAMSIZ, including null byte.

Reported-by: Riccardo Schirone <rschiron@redhat.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
[LY: BSC#1140402 CVE-2019-13164]
Signed-off-by: Liang Yan <lyan@suse.com>
2021-03-16 20:23:04 -06:00
Prasad J Pandit
9d5500370d qxl: check release info object
When releasing spice resources in release_resource() routine,
if release info object 'ext.info' is null, it leads to null
pointer dereference. Add check to avoid it.

Reported-by: Bugs SysSec <bugs-syssec@rub.de>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-id: 20190425063534.32747-1-ppandit@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit d52680fc93)
[LY: BSC#1135902 CVE-2019-12155]
Signed-off-by: Liang Yan <lyan@suse.com>
2021-03-16 20:23:04 -06:00
bd0e04d311 Revert "postcopy: Send whole huge pages"
This reverts commit 4c011c37ec.

The commit causes speed setting of migration doesn't work while vm uses
hugepage.

Dropping this commit is harmless due to postcopy needs kernel 4.11+ support.

(bsc#1127077)
2021-03-16 20:23:04 -06:00
Paolo Bonzini
f8aedf231a target/i386: define md-clear bit
md-clear is a new CPUID bit which is set when microcode provides the
mechanism to invoke a flush of various exploitable CPU buffers by invoking
the VERW instruction.  Add the new feature.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[BR: BSC#1111331 CVE-2018-12126 CVE-2018-12127 CVE-2018-12130
CVE-2019-11091]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:04 -06:00
David Gibson
c08957b93b spapr: Simplify handling of host-serial and host-model values
27461d69a0 "ppc: add host-serial and host-model machine attributes
(CVE-2019-8934)" introduced 'host-serial' and 'host-model' machine
properties for spapr to explicitly control the values advertised to the
guest in device tree properties with the same names.

The previous behaviour on KVM was to unconditionally populate the device
tree with the real host serial number and model, which leaks possibly
sensitive information about the host to the guest.

To maintain compatibility for old machine types, we allowed those props
to be set to "passthrough" to take the value from the host as before.  Or
they could be set to "none" to explicitly omit the device tree items.

Special casing specific values on what's otherwise a user supplied string
is very ugly.  So, this patch simplifies things by implementing the
backwards compatibility in a different way: we have a machine class flag
set for the older machines, and we only load the host values into the
device tree if A) they're not set by the user and B) we have that flag set.

This does mean that the "passthrough" functionality is no longer available
with the current machine type.  That's ok though: if a user or management
layer really wants the information passed through they can read it
themselves (OpenStack Nova already does something similar for x86).

It also means the user can't explicitly ask for the values to be omitted
on the old machine types.  I think that's an acceptable trade-off: if you
care enough about not leaking the host information you can either move to
the new machine type, or use a dummy value for the properties.

For the new machine type, this also removes an odd inconsistency
between running on a POWER and non-POWER (or non-Linux) hosts: if the
host information couldn't be read from where we expect (in the host's
device tree as exposed by Linux), we'd fallback to omitting the guest
device tree items.

While we're there, improve some poorly worded comments, and the help text
for the properties.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 0a794529bd)
[LM: BSC#1126455 CVE-2019-03812]
Signed-off-by: Lin Ma <lma@suse.com>
2021-03-16 20:23:04 -06:00
Peter Maydell
3cbc2371bb device_tree.c: Don't use load_image()
The load_image() function is deprecated, as it does not let the
caller specify how large the buffer to read the file into is.
Instead use load_image_size().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20181130151712.2312-9-peter.maydell@linaro.org
(cherry picked from commit da885fe1ee)
[LM: BSC#1130675 CVE-2018-20815]
Signed-off-by: Lin Ma <lma@suse.com>
2021-03-16 20:23:04 -06:00
Gerd Hoffmann
021ad81921 i2c-ddc: fix oob read
Suggested-by: Michael Hanselmann <public@hansmi.ch>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Michael Hanselmann <public@hansmi.ch>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20190108102301.1957-1-kraxel@redhat.com
(cherry picked from commit b05b267840)
[LM: BSC#1125721 CVE-2019-3812]
Signed-off-by: Lin Ma <lma@suse.com>
2021-03-16 20:23:04 -06:00
Prasad J Pandit
289934370a ppc: add host-serial and host-model machine attributes (CVE-2019-8934)
On ppc hosts, hypervisor shares following system attributes

  - /proc/device-tree/system-id
  - /proc/device-tree/model

with a guest. This could lead to information leakage and misuse.[*]
Add machine attributes to control such system information exposure
to a guest.

[*] https://wiki.openstack.org/wiki/OSSN/OSSN-0028

Reported-by: Daniel P. Berrangé <berrange@redhat.com>
Fix-suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <20190218181349.23885-1-ppandit@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 27461d69a0)
[LM: BSC#1126455 CVE-2019-03812]
Signed-off-by: Lin Ma <lma@suse.com>
2021-03-16 20:23:04 -06:00
William Bowling
ba544a67f9 slirp: check sscanf result when emulating ident
When emulating ident in tcp_emu, if the strchr checks passed but the
sscanf check failed, two uninitialized variables would be copied and
sent in the reply, so move this code inside the if(sscanf()) clause.

Signed-off-by: William Bowling <will@wbowling.info>
Cc: qemu-stable@nongnu.org
Cc: secalert@redhat.com
Message-Id: <1551476756-25749-1-git-send-email-will@wbowling.info>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
(cherry picked from commit d3222975c7)
[LM: BSC#1129622 CVE-2019-9824 To pass our checkpatch check, I changed
the patch to use spaces, not tabs, as in the initially proposed]
Signed-off-by: Lin Ma <lma@suse.com>
2021-03-16 20:23:04 -06:00
Richard W.M. Jones
a5d4cfea95 virtio-scsi: Add virtqueue_size parameter allowing virtqueue size to be set.
Since Linux switched to blk-mq as the default in Linux commit
5c279bd9e406 ("scsi: default to scsi-mq"), virtio-scsi LUNs consume
about 10x as much guest kernel memory.

This commit allows you to choose the virtqueue size for each
virtio-scsi-pci controller like this:

  -device virtio-scsi-pci,id=scsi,virtqueue_size=16

The default is still 128 as before.  Using smaller virtqueue_size
allows many more disks to be added to small memory virtual machines.
For a 1 vCPU, 500 MB, no swap VM I observed:

  With scsi-mq enabled (upstream kernel):              175 disks
    -"- ditto -"-   virtqueue_size=64:                 318 disks
    -"- ditto -"-   virtqueue_size=16:                 775 disks
  With scsi-mq disabled (kernel before 5c279bd9e406): 1755 disks

Note that to have any effect, this requires a kernel patch:

  https://lkml.org/lkml/2017/8/10/689

Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
Message-Id: <20170810165255.20865-1-rjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 5c0919d020)
[BR: FATE#327255. Fate was for SLE12-SP2, but we need to be able to
forward migrate this to SLE12-SP3 (and beyond)]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:04 -06:00
Eduardo Habkost
6525e1a8db i386: Add new -IBRS versions of Intel CPU models
The new MSR IA32_SPEC_CTRL MSR was introduced by a recent Intel
microcode updated and can be used by OSes to mitigate
CVE-2017-5715.  Unfortunately we can't change the existing CPU
models without breaking existing setups, so users need to
explicitly update their VM configuration to use the new *-IBRS
CPU model if they want to expose IBRS to guests.

The new CPU models are simple copies of the existing CPU models,
with just CPUID_7_0_EDX_SPEC_CTRL added and model_id updated.

Cc: Jiri Denemark <jdenemar@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20180109154519.25634-6-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
(cherry picked from commit ac96c41354)
[BR: FATE#327261]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:04 -06:00
Boqun Feng (Intel)
b2b664b9d5 i386: add Skylake-Server cpu model
Introduce Skylake-Server cpu mode which inherits the features from
Skylake-Client and supports some additional features that are: AVX512,
CLWB and PGPE1GB.

Signed-off-by: Boqun Feng (Intel) <boqun.feng@gmail.com>
Message-Id: <20170621052935.20715-1-boqun.feng@gmail.com>
[ehabkost: copied comment about XSAVES from Skylake-Client]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
(cherry picked from commit 53f9a6f45f)
[BR: FATE#327261]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:04 -06:00
Gerd Hoffmann
09bc44f0e2 vga: catch depth 0
depth == 0 is used to indicate 256 color modes.  Our region calculation
goes wrong in that case.  So detect that and just take the safe code
path we already have for the wraparound case.

While being at it also catch depth == 15 (where our region size
calculation goes wrong too).  And make the comment more verbose,
explaining what is going on here.

Without this windows guest install might trigger an assert due to trying
to check dirty bitmap outside the snapshot region.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1575541
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20180514103117.21059-1-kraxel@redhat.com
(cherry picked from commit a89fe6c329)
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:03 -06:00
Gerd Hoffmann
f91ba1aa90 vga: fix region calculation
Typically the scanline length and the line offset are identical.  But
in case they are not our calculation for region_end is incorrect.  Using
line_offset is fine for all scanlines, except the last one where we have
to use the actual scanline length.

Fixes: CVE-2018-7550
Reported-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Prasad J Pandit <pjp@fedoraproject.org>
Tested-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Message-id: 20180309143704.13420-1-kraxel@redhat.com
(cherry picked from commit 7cdc61becd)
[BR: BSC#1084604 CVE-2018-7858 (it appears the above CVE ref. is wrong)]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:03 -06:00
Gerd Hoffmann
374f704b22 vga: fix region checks in wraparound case
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-id: 20171030102830.4469-1-kraxel@redhat.com
(cherry picked from commit 115788d7a7)
[BR: Support for BSC#1084604 (and other useful vga fixes)]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:03 -06:00
Gerd Hoffmann
bebba0ae38 vga: add ram_addr_t cast
Reported by Coverity.

Fixes: CID 1381409
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20171010141323.14049-4-kraxel@redhat.com
(cherry picked from commit b0898b42ef)
[BR: Support for BSC#1084604 (and other useful vga fixes)]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:03 -06:00
Gerd Hoffmann
d4af53b238 vga: handle cirrus vbe mode wraparounds.
Commit "3d90c62548 vga: stop passing pointers to vga_draw_line*
functions" is incomplete.  It doesn't handle the case that the vga
rendering code tries to create a shared surface, i.e. a pixman image
backed by vga video memory.  That can not work in case the guest display
wraps from end of video memory to the start.  So force shadowing in that
case.  Also adjust the snapshot region calculation.

Can trigger with cirrus only, when programming vbe modes using the bochs
api (stdvga, also qxl and virtio-vga in vga compat mode) wrap arounds
can't happen.

Fixes: CVE-2017-13672
Fixes: 3d90c62548
Cc: P J P <ppandit@redhat.com>
Reported-by: David Buchanan <d@vidbuchanan.co.uk>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20171010141323.14049-3-kraxel@redhat.com
(cherry picked from commit 28f77de26a)
[BR: Support for BSC#1084604 (and other useful vga fixes)]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:03 -06:00
Gerd Hoffmann
a403628b83 vga: drop line_offset variable
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 362f811793)
[BR: Support for BSC#1084604 (and other useful vga fixes)]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:03 -06:00
Gerd Hoffmann
796a476acd vga: fix display update region calculation (split screen)
vga display update mis-calculated the region for the dirty bitmap
snapshot in case split screen mode is used.  This can trigger an
assert in cpu_physical_memory_snapshot_get_dirty().

Impact:  DoS for privileged guest users.

Fixes: CVE-2017-13673
Fixes: fec5e8c92b
Cc: P J P <ppandit@redhat.com>
Reported-by: David Buchanan <d@vidbuchanan.co.uk>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170828123307.15392-1-kraxel@redhat.com
(cherry picked from commit e65294157d)
[BR: Support for BSC#1084604 (and other useful vga fixes)]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:03 -06:00
Marc-André Lureau
5b6d8acaf4 vga: use DIV_ROUND_UP
I used the clang-tidy qemu-round check to generate the fix:
https://github.com/elmarco/clang-tools-extra

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
(cherry picked from commit 2c23ce22c6)
[BR: Support for BSC#1084604 (and other useful vga fixes)]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:03 -06:00
Gerd Hoffmann
f8ed983fe7 vga: fix display update region calculation
vga display update mis-calculated the region for the dirty bitmap
snapshot in case the scanlines are padded.  This can triggere an
assert in cpu_physical_memory_snapshot_get_dirty().

Fixes: fec5e8c92b
Reported-by: Kevin Wolf <kwolf@redhat.com>
Reported-by: 李强 <liqiang6-s@360.cn>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170509104839.19415-1-kraxel@redhat.com
(cherry picked from commit bfc56535f7)
[BR: Support for BSC#1084604 (and other useful vga fixes)]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:03 -06:00
Gerd Hoffmann
a09ee91be7 vga: make display updates thread safe.
The vga code clears the dirty bits *after* reading the framebuffer
memory.  So if the guest framebuffer updates hits the race window
between vga reading the framebuffer and vga clearing the dirty bits
vga will miss that update

Fix it by using the new memory_region_copy_and_clear_dirty()
memory_region_copy_get_dirty() functions.  That way we clear the
dirty bitmap before reading the framebuffer.  Any guest display
updates happening in parallel will be properly tracked in the
dirty bitmap then and the next display refresh will pick them up.

Problem triggers with mttcg only.  Before mttcg was merged tcg
never ran in parallel to vga emulation.  Using kvm will hide the
problem too, due to qemu operating on a userspace copy of the
kernel's dirty bitmap.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170421091632.30900-5-kraxel@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit fec5e8c92b)
[BR: Support for BSC#1084604 (and other useful vga fixes)]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:03 -06:00
Gerd Hoffmann
943d2007fd vga: add vga_scanline_invalidated helper
Add vga_scanline_invalidated helper to check whenever a scanline was
invalidated.  Add a sanity check to fix OOB read access for display
heights larger than 2048.

Only cirrus uses this, for hardware cursor rendering, so having this
work properly for the first 2048 scanlines only shouldn't be a problem
as the cirrus can't handle large resolutions anyway.  Also changing the
invalidated_y_table size would break live migration.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170421091632.30900-4-kraxel@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit f3289f6f0f)
[BR: Support for BSC#1084604 (and other useful vga fixes)]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:03 -06:00
Gerd Hoffmann
af276afcfc memory: add support getting and using a dirty bitmap copy.
This patch adds support for getting and using a local copy of the dirty
bitmap.

memory_region_snapshot_and_clear_dirty() will create a snapshot of the
dirty bitmap for the specified range, clear the dirty bitmap and return
the copy.  The returned bitmap can be a bit larger than requested, the
range is expanded so the code can copy unsigned longs from the bitmap
and avoid atomic bit update operations.

memory_region_snapshot_get_dirty() will return the dirty status of
pages, pretty much like memory_region_get_dirty(), but using the copy
returned by memory_region_copy_and_clear_dirty().

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170421091632.30900-3-kraxel@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 8deaf12ca1)
[BR: Support for BSC#1084604 (and other useful vga fixes)]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:03 -06:00
Gerd Hoffmann
c684372b63 bitmap: add bitmap_copy_and_clear_atomic
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170421091632.30900-2-kraxel@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit d6eb141392)
[BR: Support for BSC#1084604 (and other useful vga fixes)]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:03 -06:00
Prasad J Pandit
e063db9bf8 ppc/pnv: check size before data buffer access
While performing PowerNV memory r/w operations, the access length
'sz' could exceed the data[4] buffer size. Add check to avoid OOB
access.

Reported-by: Moguofang <moguofang@huawei.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit d07945e78e)
[BR: BSC#1114957 CVE-2018-18954]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:03 -06:00
Greg Kurz
d571e26c32 9p: take write lock on fid path updates (CVE-2018-19364)
Recent commit 5b76ef50f6 fixed a race where v9fs_co_open2() could
possibly overwrite a fid path with v9fs_path_copy() while it is being
accessed by some other thread, ie, use-after-free that can be detected
by ASAN with a custom 9p client.

It turns out that the same can happen at several locations where
v9fs_path_copy() is used to set the fid path. The fix is again to
take the write lock.

Fixes CVE-2018-19364.

Cc: P J P <ppandit@redhat.com>
Reported-by: zhibin hu <noirfate@gmail.com>
Reviewed-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 5b3c77aa58)
[BR: BSC#1116717 CVE-2018-19364]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:03 -06:00
Greg Kurz
36a7041dc8 9p: write lock path in v9fs_co_open2()
The assumption that the fid cannot be used by any other operation is
wrong. At least, nothing prevents a misbehaving client to create a
file with a given fid, and to pass this fid to some other operation
at the same time (ie, without waiting for the response to the creation
request). The call to v9fs_path_copy() performed by the worker thread
after the file was created can race with any access to the fid path
performed by some other thread. This causes use-after-free issues that
can be detected by ASAN with a custom 9p client.

Unlike other operations that only read the fid path, v9fs_co_open2()
does modify it. It should hence take the write lock.

Cc: P J P <ppandit@redhat.com>
Reported-by: zhibin hu <noirfate@gmail.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 5b76ef50f6)
[BR: BSC#1116717 CVE-2018-19364]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:03 -06:00
Greg Kurz
d1f3397b11 9p: fix QEMU crash when renaming files
When using the 9P2000.u version of the protocol, the following shell
command line in the guest can cause QEMU to crash:

    while true; do rm -rf aa; mkdir -p a/b & touch a/b/c & mv a aa; done

With 9P2000.u, file renaming is handled by the WSTAT command. The
v9fs_wstat() function calls v9fs_complete_rename(), which calls
v9fs_fix_path() for every fid whose path is affected by the change.
The involved calls to v9fs_path_copy() may race with any other access
to the fid path performed by some worker thread, causing a crash like
shown below:

Thread 12 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
0x0000555555a25da2 in local_open_nofollow (fs_ctx=0x555557d958b8, path=0x0,
 flags=65536, mode=0) at hw/9pfs/9p-local.c:59
59          while (*path && fd != -1) {
(gdb) bt
 path=0x0, flags=65536, mode=0) at hw/9pfs/9p-local.c:59
 path=0x0) at hw/9pfs/9p-local.c:92
 fs_path=0x555556b56858, stbuf=0x7fff84830ef0) at hw/9pfs/9p-local.c:185
 path=0x555556b56858, stbuf=0x7fff84830ef0) at hw/9pfs/cofile.c:53
 at hw/9pfs/9p.c:1083
 at util/coroutine-ucontext.c:116
(gdb)

The fix is to take the path write lock when calling v9fs_complete_rename(),
like in v9fs_rename().

Impact:  DoS triggered by unprivileged guest users.

Fixes: CVE-2018-19489
Cc: P J P <ppandit@redhat.com>
Reported-by: zhibin hu <noirfate@gmail.com>
Reviewed-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 1d20398694)
[BR: BSC#1117275]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:03 -06:00
Gerd Hoffmann
5c7d55ae0d usb-mtp: use O_NOFOLLOW and O_CLOEXEC.
Open files and directories with O_NOFOLLOW to avoid symlinks attacks.
While being at it also add O_CLOEXEC.

usb-mtp only handles regular files and directories and ignores
everything else, so users should not see a difference.

Because qemu ignores symlinks, carrying out a successful symlink attack
requires swapping an existing file or directory below rootdir for a
symlink and winning the race against the inotify notification to qemu.

Fixes: CVE-2018-16872
Cc: Prasad J Pandit <ppandit@redhat.com>
Cc: Bandan Das <bsd@redhat.com>
Reported-by: Michael Hanselmann <public@hansmi.ch>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Michael Hanselmann <public@hansmi.ch>
Message-id: 20181213122511.13853-1-kraxel@redhat.com
(cherry picked from commit bab9df35ce)
[BR: BSC#1119493]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:03 -06:00
Prasad J Pandit
0d07b80846 slirp: check data length while emulating ident function
While emulating identification protocol, tcp_emu() does not check
available space in the 'sc_rcv->sb_data' buffer. It could lead to
heap buffer overflow issue. Add check to avoid it.

Reported-by: Kira <864786842@qq.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
(cherry picked from commit a7104eda7d)
[BR: BSC#1123156 CVE-2019-6778,  modify patch to use spaces instead of tabs]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:03 -06:00
Jim Somerville
9a7cd62f9d kvmclock: use the updated system_timer_msr
Fixes e2b6c17 (kvmclock: update system_time_msr address forcibly)
which makes a call to get the latest value of the address
stored in system_timer_msr, but then uses the old address anyway.

Signed-off-by: Jim Somerville <Jim.Somerville@windriver.com>
Message-Id: <59b67db0bd15a46ab47c3aa657c81a4c11f168ea.1506702472.git.Jim.Somerville@windriver.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 346b1215b1)
[BR: BSC#1113231]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:03 -06:00
Denis Plotnikov
98333963a8 kvmclock: update system_time_msr address forcibly
Do an update of system_time_msr address every time before reading
the value of tsc_timestamp from guest's kvmclock page.

There is no other code paths which ensure that qemu has an up-to-date
value of system_time_msr. So, force this update on guest's tsc_timestamp
reading.

This bug causes effect on those nested setups which turn off TPR access
interception for L2 guests and that access being intercepted by L0 doesn't
show up in L1.
Linux bootstrap initiate kvmclock before APIC initializing causing TPR access.
That's why on L1 guests, having TPR interception turned on for L2, the effect
of the bug is not revealed.

This patch fixes this problem by making sure it knows the correct
system_time_msr address every time it is needed.

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
Message-Id: <1496054944-25623-1-git-send-email-dplotnikov@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit e2b6c1712e)
[BR: BSC#1113231]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:03 -06:00
Peter Maydell
e7768429c8 linux-user: make pwrite64/pread64(fd, NULL, 0, offset) return 0
Linux returns success if pwrite64() or pread64() are called with a
zero length NULL buffer, but QEMU was returning -TARGET_EFAULT.

This is the same bug that we fixed in commit 58cfa6c2e6
for the write syscall, and long before that in 38d840e679
for the read syscall.

Fixes: https://bugs.launchpad.net/qemu/+bug/1810433

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190108184900.9654-1-peter.maydell@linaro.org>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
(cherry picked from commit 2bd3f8998e)
[LY: BSC#1121600]
Signed-off-by: Liang Yan <lyan@suse.com>
2021-03-16 20:23:03 -06:00
Tony Garnock-Jones
8da6b2014b linux-user: write(fd, NULL, 0) parity with linux's treatment of same
Bring linux-user write(2) handling into line with linux for the case
of a 0-byte write with a NULL buffer. Based on a patch originally
written by Zhuowei Zhang.

Addresses https://bugs.launchpad.net/qemu/+bug/1716292.

>From Zhuowei Zhang's patch (https://lists.gnu.org/archive/html/qemu-devel/2017-09/msg08073.html):

    Linux returns success for the special case of calling write with a
    zero-length NULL buffer: compiling and running

    int main() {
       ssize_t ret = write(STDOUT_FILENO, NULL, 0);
       fprintf(stderr, "write returned %ld\n", ret);
       return 0;
    }

    gives "write returned 0" when run directly, but "write returned
    -1" in QEMU.

    This commit checks for this situation and returns success if
    found.

Subsequent discussion raised the following questions (and my answers):

 - Q. Should TARGET_NR_read pass through to safe_read in this
      situation too?
   A. I'm wary of changing unrelated code to the specific problem I'm
      addressing. TARGET_NR_read is already consistent with Linux for
      this case.

 - Q. Do pread64/pwrite64 need to be changed similarly?
   A. Experiment suggests not: both linux and linux-user yield -1 for
      NULL 0-length reads/writes.

Signed-off-by: Tony Garnock-Jones <tonygarnockjones@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20180908182205.GB409@mornington.dcs.gla.ac.uk>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
(cherry picked from commit 58cfa6c2e6)
[LY: BSC#1121600]
Signed-off-by: Liang Yan <lyan@suse.com>
2021-03-16 20:23:03 -06:00
Tim Smith
803ece930d xen_disk: Avoid repeated memory allocation
xen_disk currently allocates memory to hold the data for each ioreq
as that ioreq is used, and frees it afterwards. Because it requires
page-aligned blocks, this interacts poorly with non-page-aligned
allocations and balloons the heap.

Instead, allocate the maximum possible requirement, which is
BLKIF_MAX_SEGMENTS_PER_REQUEST pages (currently 11 pages) when
the ioreq is created, and keep that allocation until it is destroyed.
Since the ioreqs themselves are re-used via a free list, this
should actually improve memory usage.

Signed-off-by: Tim Smith <tim.smith@citrix.com>
[BSC#1100408]
2021-03-16 20:23:03 -06:00
Marc-André Lureau
1297515103 hw/char/serial: retry write if EAGAIN
If the chardev returns -1 with EAGAIN errno on write(), it should try
to send it again (EINTR is handled by the chardev itself).

This fixes commit 019288bf13
"hw/char/serial: Only retry if qemu_chr_fe_write returns 0"

Tested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180716110755.12499-1-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit f3575af130)
[LY: BSC#1108474]
Signed-off-by: Liang Yan <lyan@suse.com>
2021-03-16 20:23:03 -06:00
Sergio Lopez
758fb5dcda hw/char/serial: Only retry if qemu_chr_fe_write returns 0
Only retry on serial_xmit if qemu_chr_fe_write returns 0, as this is the
only recoverable error.

Retrying with any other scenario, in addition to being a waste of CPU
cycles, can compromise the Guest stability if by the vCPU issuing the
write and the main loop thread are, by chance or explicit pinning,
running on the same pCPU.

Previous discussion:

https://lists.nongnu.org/archive/html/qemu-devel/2018-05/msg06998.html

Signed-off-by: Sergio Lopez <slp@redhat.com>
Message-Id: <1528185295-14199-1-git-send-email-slp@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 019288bf13)
[LY: BSC#1108474]
Signed-off-by: Liang Yan <lyan@suse.com>
2021-03-16 20:23:03 -06:00
Paolo Bonzini
ae311d01d0 lsi53c895a: check message length value is valid
While writing a message in 'lsi_do_msgin', message length value
in 'msg_len' could be invalid due to an invalid migration stream.
Add an assertion to avoid an out of bounds access, and reject
the incoming migration data if it contains an invalid message
length.

Discovered by Deja vu Security. Reported by Oracle.

Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <20181026194314.18663-1-ppandit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit e58ccf0396)
[BR: BSC#1114422 CVE-2018-18849]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:03 -06:00
Jason Wang
50ebd014ca net: ignore packet size greater than INT_MAX
There should not be a reason for passing a packet size greater than
INT_MAX. It's usually a hint of bug somewhere, so ignore packet size
greater than INT_MAX in qemu_deliver_packet_iov()

CC: qemu-stable@nongnu.org
Reported-by: Daniel Shapira <daniel@twistlock.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit 1592a99470)
[LD: BSC#1111013 CVE-2018-17963]
Signed-off-by: Larry Dewey <ldewey@suse.com>
2021-03-16 20:23:03 -06:00
Jason Wang
aad97dd60b pcnet: fix possible buffer overflow
In pcnet_receive(), we try to assign size_ to size which converts from
size_t to integer. This will cause troubles when size_ is greater
INT_MAX, this will lead a negative value in size and it can then pass
the check of size < MIN_BUF_SIZE which may lead out of bound access
for both buf and buf1.

Fixing by converting the type of size to size_t.

CC: qemu-stable@nongnu.org
Reported-by: Daniel Shapira <daniel@twistlock.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit b1d80d12c5)
[LD: BSC#1111010 CVE-2018-17962]
Signed-off-by: Larry Dewey <ldewey@suse.com>
2021-03-16 20:23:03 -06:00
Jason Wang
b8f82a41dd rtl8139: fix possible out of bound access
In rtl8139_do_receive(), we try to assign size_ to size which converts
from size_t to integer. This will cause troubles when size_ is greater
INT_MAX, this will lead a negative value in size and it can then pass
the check of size < MIN_BUF_SIZE which may lead out of bound access of
for both buf and buf1.

Fixing by converting the type of size to size_t.

CC: qemu-stable@nongnu.org
Reported-by: Daniel Shapira <daniel@twistlock.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit 1a326646fe)
[LD: BSC#1111006 CVE-2018-17958]
Signed-off-by: Larry Dewey <ldewey@suse.com>
2021-03-16 20:23:03 -06:00
Jason Wang
4eb8ed37f9 ne2000: fix possible out of bound access in ne2000_receive
In ne2000_receive(), we try to assign size_ to size which converts
from size_t to integer. This will cause troubles when size_ is greater
INT_MAX, this will lead a negative value in size and it can then pass
the check of size < MIN_BUF_SIZE which may lead out of bound access of
for both buf and buf1.

Fixing by converting the type of size to size_t.

CC: qemu-stable@nongnu.org
Reported-by: Daniel Shapira <daniel@twistlock.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit fdc89e90fa)
[LD: BSC#1110910 CVE-2018-10839]
Signed-off-by: Larry Dewey <ldewey@suse.com>
2021-03-16 20:23:03 -06:00
Paul Durrant
f92acbfb9b xen_disk: use a single entry iovec
Since xen_disk now always copies data to and from a guest there is no need
to maintain a vector entry corresponding to every page of a request.
This means there is less per-request state to maintain so the ioreq
structure can shrink significantly.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Anthony Perard <anthony.perard@citrix.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 5ebf962652)
[LD: BSC#1100408]
Signed-off-by: Larry Dewey <ldewey@suse.com>
2021-03-16 20:23:03 -06:00
Paul Durrant
d18c118c26 xen_disk: remove use of grant map/unmap
Now that the (native or emulated) xen_be_copy_grant_refs() helper is
always available, the xen_disk code can be significantly simplified by
removing direct use of grant map and unmap operations.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Anthony Perard <anthony.perard@citrix.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 06454c24ad)
[LD: BSC#1100408]
Signed-off-by: Larry Dewey <ldewey@suse.com>
2021-03-16 20:23:03 -06:00
Paul Durrant
ecabdb4271 xen_backend: add an emulation of grant copy
Not all Xen environments support the xengnttab_grant_copy() operation.
E.g. where the OS is FreeBSD or Xen is older than 4.8.0.

This patch introduces an emulation of that operation using
xengnttab_map_domain_grant_refs() and memcpy() for those environments.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 3fe12b8403)
[LD: BSC#1100408]
Signed-off-by: Larry Dewey <ldewey@suse.com>
2021-03-16 20:23:03 -06:00
Paul Durrant
40094b2ca7 xen: remove other open-coded use of libxengnttab
Now that helpers are available in xen_backend, use them throughout all
Xen PV backends.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Anthony Perard <anthony.perard@citrix.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 58560f2ae7)
[LD: BSC#1100408 - Removed xen_9p* file because it doesn't exist]
Signed-off-by: Larry Dewey <ldewey@suse.com>
2021-03-16 20:23:03 -06:00
Paul Durrant
2b91625158 xen_disk: remove open-coded use of libxengnttab
Now that helpers are present in xen_backend, this patch removes open-coded
calls to libxengnttab from the xen_disk code.

This patch also fixes one whitspace error in the assignment of the
XenDevOps initialise method.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Anthony Perard <anthony.perard@citrix.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 5ee1d99913)
[LD: BSC#1100408]
Signed-off-by: Larry Dewey <ldewey@suse.com>
2021-03-16 20:23:03 -06:00
Paul Durrant
874d0095ec xen_backend: add grant table helpers
This patch adds grant table helper functions to the xen_backend code to
localize error reporting and use of xen_domid.

The patch also defers the call to xengnttab_open() until just before the
initialise method in XenDevOps is invoked. This method is responsible for
mapping the shared ring. No prior method requires access to the grant table.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 9838824aff)
[LD: BSC#1100408]
Signed-off-by: Larry Dewey <ldewey@suse.com>
2021-03-16 20:23:03 -06:00
Paul Durrant
34ad9e61a2 xen: add a meaningful declaration of grant_copy_segment into xen_common.h
Currently the xen_disk source has to carry #ifdef exclusions to compile
against Xen older then 4.8. This is a bit messy so this patch lifts the
definition of struct xengnttab_grant_copy_segment and adds it into the
pre-4.8 compat area in xen_common.h, which allows xen_disk to be cleaned
up.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 5c0d914a9b)
[LD: BSC#1100408 - Modified as the Xen interface versioning has changed]
Signed-off-by: Larry Dewey <ldewey@suse.com>
2021-03-16 20:23:03 -06:00
Juergen Gross
f87457686f xen: dont try setting max grants multiple times
Trying to call xengnttab_set_max_grants() with the same file handle
might fail on some kernels, as this operation is allowed only once.

This is a problem for the qdisk backend as blk_connect() can be
called multiple times for a domain, e.g. in case grub-xen is being
used to boot it.

So instead of letting the generic backend code open the gnttab device
do it in blk_connect() and close it again in blk_disconnect.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit e38c3e86df)
[LD: BSC#1100408]
Signed-off-by: Larry Dewey <ldewey@suse.com>
2021-03-16 20:23:03 -06:00
Juergen Gross
be41a22032 xen: add a global indicator for grant copy being available
The Xen qdisk backend needs to test whether grant copy operations is
available in the kernel. Unfortunately this collides with using
xengnttab_set_max_grants() on some kernels as this operation has to
be the first one after opening the gnttab device.

In order to solve this problem test for the availability of grant copy
in xen_be_init() opening the gnttab device just for that purpose and
closing it again afterwards. Advertise the availability via a global
flag and use that flag in the qdisk backend.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit b5e397a79e)
[LD: BSC#1100408]
Signed-off-by: Larry Dewey <ldewey@suse.com>
2021-03-16 20:23:03 -06:00
Olaf Hering
5e528f1296 xen-disk: use g_new0 to fix build
g_malloc0_n is available since glib-2.24. To allow build with older glib
versions use the generic g_new0, which is already used in many other
places in the code.

Fixes commit 3284fad728 ("xen-disk: add support for multi-page shared rings")

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit a3fd781f65)
[LD: BSC#1100408]
Signed-off-by: Larry Dewey <ldewey@suse.com>
2021-03-16 20:23:03 -06:00
Paul Durrant
aecb30c502 xen-disk: add support for multi-page shared rings
The blkif protocol has had provision for negotiation of multi-page shared
rings for some time now and many guest OS have support in their frontend
drivers.

This patch makes the necessary modifications to xen-disk support a shared
ring up to order 4 (i.e. 16 pages).

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 3284fad728)
[LD: BSC#1100408]
Signed-off-by: Larry Dewey <ldewey@suse.com>
2021-03-16 20:23:03 -06:00
Paul Durrant
d312b398e7 xen-disk: only advertize feature-persistent if grant copy is not available
If grant copy is available then it will always be used in preference to
persistent maps. In this case feature-persistent should not be advertized
to the frontend, otherwise it may needlessly copy data into persistently
granted buffers.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 976eba1c88)
[LD: BSC#1100408]
Signed-off-by: Larry Dewey <ldewey@suse.com>
2021-03-16 20:23:03 -06:00
Paul Durrant
32fff02604 xen: use libxendevicemodel when available
This patch modifies the wrapper functions in xen_common.h to use the
new xendevicemodel interface if it is available along with compatibility
code to use the old libxenctrl interface if it is not.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Anthony Perard <anthony.perard@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit d655f34e6d)
[LD: BSC#1100408 - Commit combined with 8f25e7544 for simplification]
Signed-off-by: Larry Dewey <ldewey@suse.com>
2021-03-16 20:23:03 -06:00
Paul Durrant
1aa5e90b10 xen: make use of xen_xc implicit in xen_common.h inlines
Doing this will make the transition to using the new libxendevicemodel
interface less intrusive on the callers of these functions, since using
the new library will require a change of handle.

NOTE: The patch also moves the 'externs' for xen_xc and xen_fmem from
      xen_backend.h to xen_common.h, and the declarations from
      xen_backend.c to xen-common.c, which is where they belong.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Anthony Perard <anthony.perard@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 260cabed71)
[LD: BSC#1100408]
Signed-off-by: Larry Dewey <ldewey@suse.com>
2021-03-16 20:23:03 -06:00
Paul Durrant
859fc90b91 xen: rename xen_modified_memory() to xen_hvm_modified_memory()
This patch is a purely cosmetic change that avoids a name collision in
a subsequent patch.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Anthony Perard <anthony.perard@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 5100afb5f5)
[LD: BSC#1100408]
Signed-off-by: Larry Dewey <ldewey@suse.com>
2021-03-16 20:23:03 -06:00
Marc-André Lureau
40c6a97775 seccomp: check TSYNC host capability
Remove -sandbox option if the host is not capable of TSYNC, since the
sandbox will fail at setup time otherwise. This will help libvirt, for
ex, to figure out if -sandbox will work.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Eduardo Otubo <otubo@redhat.com>
Acked-by: Eduardo Otubo <otubo@redhat.com>
(cherry picked from commit 5780760f5e)
[LD: BSC#1106222 CVE-2018-15746]
Signed-off-by: Larry Dewey <ldewey@suse.com>
2021-03-16 20:23:03 -06:00
Yi Min Zhao
b0e032e5f2 sandbox: disable -sandbox if CONFIG_SECCOMP undefined
If CONFIG_SECCOMP is undefined, the option 'elevatedprivileges' remains
compiled. This would make libvirt set the corresponding capability and
then trigger failure during guest startup. This patch moves the code
regarding seccomp command line options to qemu-seccomp.c file and
wraps qemu_opts_foreach finding sandbox option with CONFIG_SECCOMP.
Because parse_sandbox() is moved into qemu-seccomp.c file, change
seccomp_start() to static function.

Signed-off-by: Yi Min Zhao <zyimin@linux.ibm.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Tested-by: Ján Tomko <jtomko@redhat.com>
Acked-by: Eduardo Otubo <otubo@redhat.com>
(cherry picked from commit 9d0fdecbad)
[LD: BSC#1106222 CVE-2018-15746]
Signed-off-by: Larry Dewey <ldewey@suse.com>
2021-03-16 20:23:03 -06:00
Marc-André Lureau
704c459703 seccomp: set the seccomp filter to all threads
When using "-seccomp on", the seccomp policy is only applied to the
main thread, the vcpu worker thread and other worker threads created
after seccomp policy is applied; the seccomp policy is not applied to
e.g. the RCU thread because it is created before the seccomp policy is
applied and SECCOMP_FILTER_FLAG_TSYNC isn't used.

This can be verified with
for task in /proc/`pidof qemu`/task/*; do cat $task/status | grep Secc ; done
Seccomp:	2
Seccomp:	0
Seccomp:	0
Seccomp:	2
Seccomp:	2
Seccomp:	2

Starting with libseccomp 2.2.0 and kernel >= 3.17, we can use
seccomp_attr_set(ctx, > SCMP_FLTATR_CTL_TSYNC, 1) to update the policy
on all threads.

libseccomp requirement was bumped to 2.2.0 in previous patch.
libseccomp should fail to set the filter if it can't honour
SCMP_FLTATR_CTL_TSYNC (untested), and thus -sandbox will now fail on
kernel < 3.17.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Acked-by: Eduardo Otubo <otubo@redhat.com>
(cherry picked from commit 70dfabeaa7)
[LD: BSC#1106222 CVE-2018-15746]
Signed-off-by: Larry Dewey <ldewey@suse.com>
2021-03-16 20:23:03 -06:00
Marc-André Lureau
9abac624f6 configure: require libseccomp 2.2.0
The following patch is going to require TSYNC, which is only available
since libseccomp 2.2.0.

libseccomp 2.2.0 was released February 12, 2015.

According to repology, libseccomp version in different distros:

  RHEL-7: 2.3.1
  Debian (Stretch): 2.3.1
  OpenSUSE Leap 15: 2.3.2
  Ubuntu (Xenial):  2.3.1

This will drop support for -sandbox on:

  Debian (Jessie): 2.1.1 (but 2.2.3 in backports)

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Acked-by: Eduardo Otubo <otubo@redhat.com>
(cherry picked from commit d0699bd37c)
[LD: BSC#1106222 CVE-2018-15746]
Signed-off-by: Larry Dewey <ldewey@suse.com>
2021-03-16 20:23:03 -06:00
Marc-André Lureau
79b95cc6d5 seccomp: prefer SCMP_ACT_KILL_PROCESS if available
The upcoming libseccomp release should have SCMP_ACT_KILL_PROCESS
action (https://github.com/seccomp/libseccomp/issues/96).

SCMP_ACT_KILL_PROCESS is preferable to immediately terminate the
offending process, rather than having the SIGSYS handler running.

Use SECCOMP_GET_ACTION_AVAIL to check availability of kernel support,
as libseccomp will fallback on SCMP_ACT_KILL otherwise, and we still
prefer SCMP_ACT_TRAP.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Acked-by: Eduardo Otubo <otubo@redhat.com>
(cherry picked from commit bda08a5764)
[LD: BSC#1106222 CVE-2018-15746]
Signed-off-by: Larry Dewey <ldewey@suse.com>
2021-03-16 20:23:03 -06:00
Marc-André Lureau
bc40b68ed9 seccomp: allow sched_setscheduler() with SCHED_IDLE policy
Current and upcoming mesa releases rely on a shader disk cash. It uses
a thread job queue with low priority, set with
sched_setscheduler(SCHED_IDLE). However, that syscall is rejected by
the "resourcecontrol" seccomp qemu filter.

Since it should be safe to allow lowering thread priority, let's allow
scheduling thread to idle policy.

Related to:
https://bugzilla.redhat.com/show_bug.cgi?id=1594456

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Acked-by: Eduardo Otubo <otubo@redhat.com>
(cherry picked from commit 056de1e894)
[LD: BSC#1106222 CVE-2018-15746]
Signed-off-by: Larry Dewey <ldewey@suse.com>
2021-03-16 20:23:03 -06:00
Eduardo Otubo
e41f0cbaec seccomp: add resourcecontrol argument to command line
This patch adds [,resourcecontrol=deny] to `-sandbox on' option. It
blacklists all process affinity and scheduler priority system calls to
avoid any bigger of the process.

Signed-off-by: Eduardo Otubo <otubo@redhat.com>
(cherry picked from commit 24f8cdc572)
[LD: BSC#1106222 CVE-2018-15746]
Signed-off-by: Larry Dewey <ldewey@suse.com>
2021-03-16 20:23:03 -06:00
Eduardo Otubo
c3c9110dc2 seccomp: add spawn argument to command line
This patch adds [,spawn=deny] argument to `-sandbox on' option. It
blacklists fork and execve system calls, avoiding Qemu to spawn new
threads or processes.

Signed-off-by: Eduardo Otubo <otubo@redhat.com>
(cherry picked from commit 995a226f88)
[LD: BSC#1106222 CVE-2018-15746]
Signed-off-by: Larry Dewey <ldewey@suse.com>
2021-03-16 20:23:03 -06:00
Eduardo Otubo
c42e4ce77b seccomp: add elevateprivileges argument to command line
This patch introduces the new argument
[,elevateprivileges=allow|deny|children] to the `-sandbox on'. It allows
or denies Qemu process to elevate its privileges by blacklisting all
set*uid|gid system calls. The 'children' option will let forks and
execves run unprivileged.

Signed-off-by: Eduardo Otubo <otubo@redhat.com>
(cherry picked from commit 73a1e64725)
[LD: BSC#1106222 CVE-2018-15746]
Signed-off-by: Larry Dewey <ldewey@suse.com>
2021-03-16 20:23:03 -06:00
Eduardo Otubo
80727fbf56 seccomp: add obsolete argument to command line
This patch introduces the argument [,obsolete=allow] to the `-sandbox on'
option. It allows Qemu to run safely on old system that still relies on
old system calls.

Signed-off-by: Eduardo Otubo <otubo@redhat.com>
(cherry picked from commit 2b716fa6d6)
[LD: BSC#1106222 CVE-2018-15746]
Signed-off-by: Larry Dewey <ldewey@suse.com>
2021-03-16 20:23:03 -06:00
Eduardo Otubo
4959004b58 seccomp: changing from whitelist to blacklist
This patch changes the default behavior of the seccomp filter from
whitelist to blacklist. By default now all system calls are allowed and
a small black list of definitely forbidden ones was created.

Signed-off-by: Eduardo Otubo <otubo@redhat.com>
(cherry picked from commit 1bd6152ae2)
[LD: BSC#1106222 CVE-2018-15746]
Signed-off-by: Larry Dewey <ldewey@suse.com>
2021-03-16 20:23:03 -06:00
Konrad Rzeszutek Wilk
f3082defdd i386: define the AMD 'virt-ssbd' CPUID feature bit (CVE-2018-3639)
AMD Zen expose the Intel equivalant to Speculative Store Bypass Disable
via the 0x80000008_EBX[25] CPUID feature bit.

This needs to be exposed to guest OS to allow them to protect
against CVE-2018-3639.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20180521215424.13520-3-berrange@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
(cherry picked from commit 403503b162)
[BR: BSC#1092885 CVE-2018-3639]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:03 -06:00
Konrad Rzeszutek Wilk
f6e9348cbe i386: Define the Virt SSBD MSR and handling of it (CVE-2018-3639)
"Some AMD processors only support a non-architectural means of enabling
speculative store bypass disable (SSBD).  To allow a simplified view of
this to a guest, an architectural definition has been created through a new
CPUID bit, 0x80000008_EBX[25], and a new MSR, 0xc001011f.  With this, a
hypervisor can virtualize the existence of this definition and provide an
architectural method for using SSBD to a guest.

Add the new CPUID feature, the new MSR and update the existing SSBD
support to use this MSR when present." (from x86/speculation: Add virtualized
speculative store bypass disable support in Linux).

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20180521215424.13520-4-berrange@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
(cherry picked from commit cfeea0c021)
[BR: BSC#1092885 CVE-2018-3639]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:03 -06:00
Prasad J Pandit
c30bf53560 qga: check bytes count read by guest-file-read
While reading file content via 'guest-file-read' command,
'qmp_guest_file_read' routine allocates buffer of count+1
bytes. It could overflow for large values of 'count'.
Add check to avoid it.

Reported-by: Fakhri Zulkifli <mohdfakhrizulkifli@gmail.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Cc: qemu-stable@nongnu.org
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
(cherry picked from commit 141b197408)
[FL: BSC#1098735 CVE-2018-12617]
Signed-off-by: Fei Li <fli@suse.com>
2021-03-16 20:23:03 -06:00
Prasad J Pandit
3d1025efbb slirp: correct size computation while concatenating mbuf
While reassembling incoming fragmented datagrams, 'm_cat' routine
extends the 'mbuf' buffer, if it has insufficient room. It computes
a wrong buffer size, which leads to overwriting adjacent heap buffer
area. Correct this size computation in m_cat.

Reported-by: ZDI Disclosures <zdi-disclosures@trendmicro.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
(cherry picked from commit 864036e251)
[FL: BSC#1096223 CVE-2018-11806]
Signed-off-by: Fei Li <fli@suse.com>
2021-03-16 20:23:03 -06:00
Bruce Rogers
8ef8e4b748 xen: add block resize support for xen disks
Provide monitor naming of xen disks, and plumb guest driver
notification through xenstore of resizing instigated via the
monitor.

[BR: FATE#325467]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:03 -06:00
Marc-André Lureau
777b1d3551 tpm: lookup cancel path under tpm device class
Since Linux commit 313d21eeab9282e, tpm devices have their own device
class "tpm" and the cancel path must be looked up under
/sys/class/tpm/ instead of /sys/class/misc/.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
(cherry picked from commit 05b71fb207)
[LY: BSC#1070615]
Signed-off-by: Liang Yan <lyan@suse.com>
2021-03-16 20:23:03 -06:00
Daniel P. Berrangé
a4f7d4e685 i386: define the 'ssbd' CPUID feature bit (CVE-2018-3639)
New microcode introduces the "Speculative Store Bypass Disable"
CPUID feature bit. This needs to be exposed to guest OS to allow
them to protect against CVE-2018-3639.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Message-Id: <20180521215424.13520-2-berrange@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
(cherry picked from commit d19d1f9659)
[BR: BSC#1092885 CVE-2018-3639]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:03 -06:00
Bruce Rogers
ecb7c53be7 migration: warn about inconsistent spec_ctrl state
As an attempt to help the user do the right thing, warn if we
detect spec_ctrl data in the migration stream, but where the
cpu defined doesn't have the feature. This would indicate the
migration is from the quick and dirty qemu produced in January
2018 to handle Spectre v2. That qemu version exposed the IBRS
cpu feature to all vcpu types, which helped in the short term
but wasn't a well designed approach.
Warn the user that the now migrated guest needs to be restarted
as soon as possible, using the spec_ctrl cpu feature flag or a
*-IBRS vcpu model specified as appropriate.

Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:03 -06:00
Eduardo Habkost
cdd90b4d50 i386: Add EPYC-IBPB CPU model
EPYC-IBPB is a copy of the EPYC CPU model with
just CPUID_8000_0008_EBX_IBPB added.

Cc: Jiri Denemark <jdenemar@redhat.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20180109154519.25634-7-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
(cherry picked from commit 6cfbc54e89)
[BR: BSC#1068032 CVE-2017-5715]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:03 -06:00
Eduardo Habkost
6b533935ee i386: Add new -IBRS versions of Intel CPU models
The new MSR IA32_SPEC_CTRL MSR was introduced by a recent Intel
microcode updated and can be used by OSes to mitigate
CVE-2017-5715.  Unfortunately we can't change the existing CPU
models without breaking existing setups, so users need to
explicitly update their VM configuration to use the new *-IBRS
CPU model if they want to expose IBRS to guests.

The new CPU models are simple copies of the existing CPU models,
with just CPUID_7_0_EDX_SPEC_CTRL added and model_id updated.

Cc: Jiri Denemark <jdenemar@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20180109154519.25634-6-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
(cherry picked from commit ac96c41354)
[BR: BSC#1068032 CVE-2017-5715 CPU models are reduced as appropriate
to match this QEMU version]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:03 -06:00
Eduardo Habkost
1e0e253bdb i386: Add FEAT_8000_0008_EBX CPUID feature word
Add the new feature word and the "ibpb" feature flag.

Based on a patch by Paolo Bonzini.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20180109154519.25634-5-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
(cherry picked from commit 1b3420e1c4)
[BR: BSC#1068032 CVE-2017-5715]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:03 -06:00
Eduardo Habkost
7d60f8b34c i386: Add spec-ctrl CPUID bit
Add the feature name and a CPUID_7_0_EDX_SPEC_CTRL macro.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20180109154519.25634-4-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
(cherry picked from commit a2381f0934)
[BR: BSC#1068032 CVE-2017-5715]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:03 -06:00
Bruce Rogers
689e263e1e x86: cpu: increase model_id array size from 48 to 49
This is perhaps the easiest way to handle the longer model names
introduced with the -IBRS models. This is in lew of backporting
commit 807e9869b8 and other commits
which that would require.
[BR: BSC#1068032 CVE-2017-5715]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:03 -06:00
Daniel P. Berrange
1943be857a ui: place a hard cap on VNC server output buffer size
The previous patches fix problems with throttling of forced framebuffer updates
and audio data capture that would cause the QEMU output buffer size to grow
without bound. Those fixes are graceful in that once the client catches up with
reading data from the server, everything continues operating normally.

There is some data which the server sends to the client that is impractical to
throttle. Specifically there are various pseudo framebuffer update encodings to
inform the client of things like desktop resizes, pointer changes, audio
playback start/stop, LED state and so on. These generally only involve sending
a very small amount of data to the client, but a malicious guest might be able
to do things that trigger these changes at a very high rate. Throttling them is
not practical as missed or delayed events would cause broken behaviour for the
client.

This patch thus takes a more forceful approach of setting an absolute upper
bound on the amount of data we permit to be present in the output buffer at
any time. The previous patch set a threshold for throttling the output buffer
by allowing an amount of data equivalent to one complete framebuffer update and
one seconds worth of audio data. On top of this it allowed for one further
forced framebuffer update to be queued.

To be conservative, we thus take that throttling threshold and multiply it by
5 to form an absolute upper bound. If this bound is hit during vnc_write() we
forceably disconnect the client, refusing to queue further data. This limit is
high enough that it should never be hit unless a malicious client is trying to
exploit the sever, or the network is completely saturated preventing any sending
of data on the socket.

This completes the fix for CVE-2017-15124 started in the previous patches.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-12-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit f887cf165d)
[LY: BSC#1073489 CVE-2017-15124]
Signed-off-by: Liang Yan <lyan@suse.com>
2021-03-16 20:23:02 -06:00
Daniel P. Berrange
f9a755c55d ui: fix VNC client throttling when forced update is requested
The VNC server must throttle data sent to the client to prevent the 'output'
buffer size growing without bound, if the client stops reading data off the
socket (either maliciously or due to stalled/slow network connection).

The current throttling is very crude because it simply checks whether the
output buffer offset is zero. This check is disabled if the client has requested
a forced update, because we want to send these as soon as possible.

As a result, the VNC client can cause QEMU to allocate arbitrary amounts of RAM.
They can first start something in the guest that triggers lots of framebuffer
updates eg play a youtube video. Then repeatedly send full framebuffer update
requests, but never read data back from the server. This can easily make QEMU's
VNC server send buffer consume 100MB of RAM per second, until the OOM killer
starts reaping processes (hopefully the rogue QEMU process, but it might pick
others...).

To address this we make the throttling more intelligent, so we can throttle
full updates. When we get a forced update request, we keep track of exactly how
much data we put on the output buffer. We will not process a subsequent forced
update request until this data has been fully sent on the wire. We always allow
one forced update request to be in flight, regardless of what data is queued
for incremental updates or audio data. The slight complication is that we do
not initially know how much data an update will send, as this is done in the
background by the VNC job thread. So we must track the fact that the job thread
has an update pending, and not process any further updates until this job is
has been completed & put data on the output buffer.

This unbounded memory growth affects all VNC server configurations supported by
QEMU, with no workaround possible. The mitigating factor is that it can only be
triggered by a client that has authenticated with the VNC server, and who is
able to trigger a large quantity of framebuffer updates or audio samples from
the guest OS. Mostly they'll just succeed in getting the OOM killer to kill
their own QEMU process, but its possible other processes can get taken out as
collateral damage.

This is a more general variant of the similar unbounded memory usage flaw in
the websockets server, that was previously assigned CVE-2017-15268, and fixed
in 2.11 by:

  commit a7b20a8efa
  Author: Daniel P. Berrange <berrange@redhat.com>
  Date:   Mon Oct 9 14:43:42 2017 +0100

    io: monitor encoutput buffer size from websocket GSource

This new general memory usage flaw has been assigned CVE-2017-15124, and is
partially fixed by this patch.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-11-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit ada8d2e436)
[LY: BSC#1073489 CVE-2017-15124]
Signed-off-by: Liang Yan <lyan@suse.com>
2021-03-16 20:23:02 -06:00
Daniel P. Berrange
f7c4b1bc33 ui: fix VNC client throttling when audio capture is active
The VNC server must throttle data sent to the client to prevent the 'output'
buffer size growing without bound, if the client stops reading data off the
socket (either maliciously or due to stalled/slow network connection).

The current throttling is very crude because it simply checks whether the
output buffer offset is zero. This check must be disabled if audio capture is
enabled, because when streaming audio the output buffer offset will rarely be
zero due to queued audio data, and so this would starve framebuffer updates.

As a result, the VNC client can cause QEMU to allocate arbitrary amounts of RAM.
They can first start something in the guest that triggers lots of framebuffer
updates eg play a youtube video. Then enable audio capture, and simply never
read data back from the server. This can easily make QEMU's VNC server send
buffer consume 100MB of RAM per second, until the OOM killer starts reaping
processes (hopefully the rogue QEMU process, but it might pick others...).

To address this we make the throttling more intelligent, so we can throttle
when audio capture is active too. To determine how to throttle incremental
updates or audio data, we calculate a size threshold. Normally the threshold is
the approximate number of bytes associated with a single complete framebuffer
update. ie width * height * bytes per pixel. We'll send incremental updates
until we hit this threshold, at which point we'll stop sending updates until
data has been written to the wire, causing the output buffer offset to fall
back below the threshold.

If audio capture is enabled, we increase the size of the threshold to also
allow for upto 1 seconds worth of audio data samples. ie nchannels * bytes
per sample * frequency. This allows the output buffer to have a mixture of
incremental framebuffer updates and audio data queued, but once the threshold
is exceeded, audio data will be dropped and incremental updates will be
throttled.

This unbounded memory growth affects all VNC server configurations supported by
QEMU, with no workaround possible. The mitigating factor is that it can only be
triggered by a client that has authenticated with the VNC server, and who is
able to trigger a large quantity of framebuffer updates or audio samples from
the guest OS. Mostly they'll just succeed in getting the OOM killer to kill
their own QEMU process, but its possible other processes can get taken out as
collateral damage.

This is a more general variant of the similar unbounded memory usage flaw in
the websockets server, that was previously assigned CVE-2017-15268, and fixed
in 2.11 by:

  commit a7b20a8efa
  Author: Daniel P. Berrange <berrange@redhat.com>
  Date:   Mon Oct 9 14:43:42 2017 +0100

    io: monitor encoutput buffer size from websocket GSource

This new general memory usage flaw has been assigned CVE-2017-15124, and is
partially fixed by this patch.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-10-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit e2b72cb6e0)
[LY: BSC#1073489 CVE-2017-15124]
Signed-off-by: Liang Yan <lyan@suse.com>
2021-03-16 20:23:02 -06:00
Daniel P. Berrange
beaa3c1fbc ui: refactor code for determining if an update should be sent to the client
The logic for determining if it is possible to send an update to the client
will become more complicated shortly, so pull it out into a separate method
for easier extension later.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-9-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 0bad834228)
[LY: BSC#1073489]
Signed-off-by: Liang Yan <lyan@suse.com>
2021-03-16 20:23:02 -06:00
Daniel P. Berrange
1e3261d3ba ui: correctly reset framebuffer update state after processing dirty regions
According to the RFB protocol, a client sends one or more framebuffer update
requests to the server. The server can reply with a single framebuffer update
response, that covers all previously received requests. Once the client has
read this update from the server, it may send further framebuffer update
requests to monitor future changes. The client is free to delay sending the
framebuffer update request if it needs to throttle the amount of data it is
reading from the server.

The QEMU VNC server, however, has never correctly handled the framebuffer
update requests. Once QEMU has received an update request, it will continue to
send client updates forever, even if the client hasn't asked for further
updates. This prevents the client from throttling back data it gets from the
server. This change fixes the flawed logic such that after a set of updates are
sent out, QEMU waits for a further update request before sending more data.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-8-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 728a7ac954)
[LY: BSC#1073489]
Signed-off-by: Liang Yan <lyan@suse.com>
2021-03-16 20:23:02 -06:00
Daniel P. Berrange
19f8dd8dcb ui: introduce enum to track VNC client framebuffer update request state
Currently the VNC servers tracks whether a client has requested an incremental
or forced update with two boolean flags. There are only really 3 distinct
states to track, so create an enum to more accurately reflect permitted states.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-7-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit fef1bbadfb)
[LY: BSC#1073489]
Signed-off-by: Liang Yan <lyan@suse.com>
2021-03-16 20:23:02 -06:00
Daniel P. Berrange
efa1299c85 ui: track how much decoded data we consumed when doing SASL encoding
When we encode data for writing with SASL, we encode the entire pending output
buffer. The subsequent write, however, may not be able to send the full encoded
data in one go though, particularly with a slow network. So we delay setting the
output buffer offset back to zero until all the SASL encoded data is sent.

Between encoding the data and completing sending of the SASL encoded data,
however, more data might have been placed on the pending output buffer. So it
is not valid to set offset back to zero. Instead we must keep track of how much
data we consumed during encoding and subtract only that amount.

With the current bug we would be throwing away some pending data without having
sent it at all. By sheer luck this did not previously cause any serious problem
because appending data to the send buffer is always an atomic action, so we
only ever throw away complete RFB protocol messages. In the case of frame buffer
updates we'd catch up fairly quickly, so no obvious problem was visible.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-6-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 8f61f1c5a6)
[LY: BSC#1073489]
Signed-off-by: Liang Yan <lyan@suse.com>
2021-03-16 20:23:02 -06:00
Daniel P. Berrange
adba409664 ui: avoid pointless VNC updates if framebuffer isn't dirty
The vnc_update_client() method checks the 'has_dirty' flag to see if there are
dirty regions that are pending to send to the client. Regardless of this flag,
if a forced update is requested, updates must be sent. For unknown reasons
though, the code also tries to sent updates if audio capture is enabled. This
makes no sense as audio capture state does not impact framebuffer contents, so
this check is removed.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-5-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 3541b08475)
[LY: BSC#1073489]
Signed-off-by: Liang Yan <lyan@suse.com>
2021-03-16 20:23:02 -06:00
Daniel P. Berrange
8fdd8e74ef ui: remove redundant indentation in vnc_client_update
Now that previous dead / unreachable code has been removed, we can simplify
the indentation in the vnc_client_update method.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-4-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit b939eb89b6)
[LY: BSC#1073489]
Signed-off-by: Liang Yan <lyan@suse.com>
2021-03-16 20:23:02 -06:00
Daniel P. Berrange
935902c8e6 ui: remove unreachable code in vnc_update_client
A previous commit:

  commit 5a8be0f73d
  Author: Gerd Hoffmann <kraxel@redhat.com>
  Date:   Wed Jul 13 12:21:20 2016 +0200

    vnc: make sure we finish disconnect

Added a check for vs->disconnecting at the very start of the
vnc_update_client method. This means that the very next "if"
statement check for !vs->disconnecting always evaluates true,
and is thus redundant. This in turn means the vs->disconnecting
check at the very end of the method never evaluates true, and
is thus unreachable code.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-3-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit c53df96161)
[LY: BSC#1073489]
Signed-off-by: Liang Yan <lyan@suse.com>
2021-03-16 20:23:02 -06:00
Daniel P. Berrange
62b7110af5 ui: remove 'sync' parameter from vnc_update_client
There is only one caller of vnc_update_client and that always passes false
for the 'sync' parameter.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-2-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 6af998db05)
[LY: BSC#1073489]
Signed-off-by: Liang Yan <lyan@suse.com>
2021-03-16 20:23:02 -06:00
Marc-André Lureau
6ebfa04257 vnc: fix debug spelling
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171220140618.12701-1-marcandre.lureau@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 090fdc83b0)
[LY: BSC#1073489]
Signed-off-by: Liang Yan <lyan@suse.com>
2021-03-16 20:23:02 -06:00
Prasad J Pandit
d3bfb8a47c multiboot: check mh_load_end_addr address field
While loading kernel via multiboot-v1 image, (flags & 0x00010000)
indicates that multiboot header contains valid addresses to load
the kernel image. In that, end of the data segment address
'mh_load_end_addr' should be less than the bss segment address
'mh_bss_end_addr'. Add check to validate that.

Reported-by: CERT CC <cert.cc@orange.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
[LY: BSC#1083291 CVE-2018-7550]
Signed-off-by: Liang Yan <lyan@suse.com>
2021-03-16 20:23:02 -06:00
linzhecheng
2950bc28c9 vga: check the validation of memory addr when draw text
Start a vm with qemu-kvm -enable-kvm -vnc :66 -smp 1 -m 1024 -hda
redhat_5.11.qcow2  -device pcnet -vga cirrus,
then use VNC client to connect to VM, and excute the code below in guest
OS will lead to qemu crash:

int main()
 {
    iopl(3);
    srand(time(NULL));
    int a,b;
    while(1){
	a = rand()%0x100;
	b = 0x3c0 + (rand()%0x20);
        outb(a,b);
    }
    return 0;
}

The above code is writing the registers of VGA randomly.
We can write VGA CRT controller registers index 0x0C or 0x0D
(which is the start address register) to modify the
the display memory address of the upper left pixel
or character of the screen. The address may be out of the
range of vga ram. So we should check the validation of memory address
when reading or writing it to avoid segfault.

Signed-off-by: linzhecheng <linzhecheng@huawei.com>
Message-id: 20180111132724.13744-1-linzhecheng@huawei.com
Fixes: CVE-2018-5683
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 191f59dc17)
[LY: BSC#1076114 CVE-2018-5683]
Signed-off-by: Liang Yan <lyan@suse.com>
2021-03-16 20:23:02 -06:00
Eric Blake
a2fcf05bca osdep: Fix ROUND_UP(64-bit, 32-bit)
When using bit-wise operations that exploit the power-of-two
nature of the second argument of ROUND_UP(), we still need to
ensure that the mask is as wide as the first argument (done
by using a ternary to force proper arithmetic promotion).
Unpatched, ROUND_UP(2ULL*1024*1024*1024*1024, 512U) produces 0,
instead of the intended 2TiB, because negation of an unsigned
32-bit quantity followed by widening to 64-bits does not
sign-extend the mask.

Broken since its introduction in commit 292c8e50 (v1.5.0).
Callers that passed the same width type to both macro parameters,
or that had other code to ensure the first parameter's maximum
runtime value did not exceed the second parameter's width, are
unaffected, but I did not audit to see which (if any) existing
clients of the macro could trigger incorrect behavior (I found
the bug while adding a new use of the macro).

While preparing the patch, checkpatch complained about poor
spacing, so I also fixed that here and in the nearby DIV_ROUND_UP.

CC: qemu-trivial@nongnu.org
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(cherry picked from commit 2098b073f3)
[LY: BSC#1076775 CVE-2017-18043]
Signed-off-by: Liang Yan <lyan@suse.com>
2021-03-16 20:23:02 -06:00
Eric Blake
4f97b1174d nbd/server: CVE-2017-15119 Reject options larger than 32M
The NBD spec gives us permission to abruptly disconnect on clients
that send outrageously large option requests, rather than having
to spend the time reading to the end of the option.  No real
option request requires that much data anyways; and meanwhile, we
already have the practice of abruptly dropping the connection on
any client that sends NBD_CMD_WRITE with a payload larger than 32M.

For comparison, nbdkit drops the connection on any request with
more than 4096 bytes; however, that limit is probably too low
(as the NBD spec states an export name can theoretically be up
to 4096 bytes, which means a valid NBD_OPT_INFO could be even
longer) - even if qemu doesn't permit exports longer than 256
bytes.

It could be argued that a malicious client trying to get us to
read nearly 4G of data on a bad request is a form of denial of
service.  In particular, if the server requires TLS, but a client
that does not know the TLS credentials sends any option (other
than NBD_OPT_STARTTLS or NBD_OPT_EXPORT_NAME) with a stated
payload of nearly 4G, then the server was keeping the connection
alive trying to read all the payload, tying up resources that it
would rather be spending on a client that can get past the TLS
handshake.  Hence, this warranted a CVE.

Present since at least 2.5 when handling known options, and made
worse in 2.6 when fixing support for NBD_FLAG_C_FIXED_NEWSTYLE
to handle unknown options.

CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
(cherry picked from commit fdad35ef6c)
[LY: BSC#1070144 CVE-2017-15119]
Signed-off-by: Liang Yan <lyan@suse.com>
2021-03-16 20:23:02 -06:00
Prasad J Pandit
1911813223 ps2: check PS2Queue pointers in post_load routine
During Qemu guest migration, a destination process invokes ps2
post_load function. In that, if 'rptr' and 'count' values were
invalid, it could lead to OOB access or infinite loop issue.
Add check to avoid it.

Reported-by: Cyrille Chatras <cyrille.chatras@orange.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-id: 20171116075155.22378-1-ppandit@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 802cbcb730)
[LY: BSC#1068613 CVE-2017-16845]
Signed-off-by: Liang Yan <lyan@suse.com>
2021-03-16 20:23:02 -06:00
Prasad J Pandit
24c46bd538 virtio: check VirtQueue Vring object is set
A guest could attempt to use an uninitialised VirtQueue object
or unset Vring.align leading to a arithmetic exception. Add check
to avoid it.

Reported-by: Zhangboxian <zhangboxian@huawei.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
(cherry picked from commit 758ead31c7)
[LY: BSC#1071228 CVE-2017-17381]
Signed-off-by: Liang Yan <lyan@suse.com>
2021-03-16 20:23:02 -06:00
Fei Li
c807f0a100 pcihp: fix pcihp for 1.6 machine type and older
Commit id f0c9d64a avoids adding ACPI_PCIHP_PROP_BSEL twice for
machine type 1.7, but also removes the adding for 1.6 and older
and xen, which causes an error when hotplugging a pci device using
qemu v2.9.* and v2.10.0.

There are two chances for intializing ACPI_PCIHP_PROP_BSEL before
commit f0c9d64a, one is in acpi_pcihp_init and the other is in
acpi_set_bsel. This patch restores the first chance, and adds a check
for the second chance: if the property has already been intialized,
just skip.

[FL: BSC#1074572]
Signed-off-by: Fei Li <fli@suse.com>
2021-03-16 20:23:02 -06:00
Christian Borntraeger
2bfc928c01 s390x/kvm: provide stfle.81
stfle.81 (ppa15) is a transparent facility that can be passed to the
guest without the need to implement hypervisor support. As this feature
can be provided by firmware we add it to all full models.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
(cherry picked from commit 9f0d13f4f1)
[LY: BSC#1076813]
Signed-off-by: Liang Yan <lyan@suse.com>
2021-03-16 20:23:02 -06:00
Christian Borntraeger
348253ba35 s390x/kvm: Handle bpb feature
We need to handle the bpb control on reset and migration. Normally
stfle.82 is transparent (and the normal guest part works without
hypervisor activity). To prevent any issues we require full
host kernel support for this feature.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
(cherry picked from commit b073c87517)
[LY: BSC#1076813]
Signed-off-by: Liang Yan <lyan@suse.com>
2021-03-16 20:23:02 -06:00
Christian Borntraeger
b3ec6ae512 header sync
replace with proper header sync

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
(cherry picked from commit 9cbb636270)
[LY: BSC#1076813]
Signed-off-by: Liang Yan <lyan@suse.com>
2021-03-16 20:23:02 -06:00
Paolo Bonzini
1d40b7862f i386: Add support for SPEC_CTRL MSR
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20180109154519.25634-3-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
(cherry picked from commit a33a2cfe2f)
[BR: BSC#1068032 CVE-2017-5715]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:02 -06:00
Brijesh Singh
d325162b9d target-i386/cpu: Add new EPYC CPU model
Add a new base CPU model called 'EPYC' to model processors from AMD EPYC
family (which includes EPYC 76xx,75xx,74xx, 73xx and 72xx).

The following features bits have been added/removed compare to Opteron_G5

Added: monitor, movbe, rdrand, mmxext, ffxsr, rdtscp, cr8legacy, osvw,
       fsgsbase, bmi1, avx2, smep, bmi2, rdseed, adx, smap, clfshopt, sha
       xsaveopt, xsavec, xgetbv1, arat

Removed: xop, fma4, tbm

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Tom Lendacky <Thomas.Lendacky@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Message-Id: <20170815170051.127257-1-brijesh.singh@amd.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
(cherry picked from commit 2e2efc7dbe)
[BR: BSC#1052825 FATE#324038]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:02 -06:00
Prasad J Pandit
e07ff195e3 9pfs: use g_malloc0 to allocate space for xattr
9p back-end first queries the size of an extended attribute,
allocates space for it via g_malloc() and then retrieves its
value into allocated buffer. Race between querying attribute
size and retrieving its could lead to memory bytes disclosure.
Use g_malloc0() to avoid it.

Reported-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 7bd9275630)
[BR: BSC#1062069 CVE-2017-15038]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:02 -06:00
Gerd Hoffmann
0df0623b78 cirrus: fix oob access in mode4and5 write functions
Move dst calculation into the loop, so we apply the mask on each
interation and will not overflow vga memory.

Cc: Prasad J Pandit <pjp@fedoraproject.org>
Reported-by: Niu Guoxiang <niuguoxiang@huawei.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20171011084314.21752-1-kraxel@redhat.com
[BR: BSC#1063122 CVE-2017-15289]
Signed-off-by: Bruce Rogers <brogers@suse.com
2021-03-16 20:23:02 -06:00
Daniel P. Berrange
9b477b16e2 io: monitor encoutput buffer size from websocket GSource
The websocket GSource is monitoring the size of the rawoutput
buffer to determine if the channel can accepts more writes.
The rawoutput buffer, however, is merely a temporary staging
buffer before data is copied into the encoutput buffer. Thus
its size will always be zero when the GSource runs.

This flaw causes the encoutput buffer to grow without bound
if the other end of the underlying data channel doesn't
read data being sent. This can be seen with VNC if a client
is on a slow WAN link and the guest OS is sending many screen
updates. A malicious VNC client can act like it is on a slow
link by playing a video in the guest and then reading data
very slowly, causing QEMU host memory to expand arbitrarily.

This issue is assigned CVE-2017-15268, publically reported in

  https://bugs.launchpad.net/qemu/+bug/1718964

(cherry picked from commit a7b20a8efa)

Reviewed-by: Eric Blake <eblake@redhat.com>

[Dan: Added extra checks to deal with code refactored in master but
 not stable 2.10]

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
[BR: BSC#1062942 CVE-2017-15268]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:02 -06:00
Gerd Hoffmann
329671a216 vga: stop passing pointers to vga_draw_line* functions
Instead pass around the address (aka offset into vga memory).
Add vga_read_* helper functions which apply vbe_size_mask to
the address, to make sure the address stays within the valid
range, similar to the cirrus blitter fixes (commits ffaf857778
and 026aeffcb4).

Impact:  DoS for privileged guest users.  qemu crashes with
a segfault, when hitting the guard page after vga memory
allocation, while reading vga memory for display updates.

Fixes: CVE-2017-13672
Cc: P J P <ppandit@redhat.com>
Reported-by: David Buchanan <d@vidbuchanan.co.uk>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170828122906.18993-1-kraxel@redhat.com
(cherry picked from commit 3d90c62548)
[FL: BSC#1056334 CVE-2017-13672]
Signed-off-by: Fei Li <fli@suse.com>
2021-03-16 20:23:02 -06:00
Prasad J Pandit
8d18b02dc9 multiboot: validate multiboot header address values
While loading kernel via multiboot-v1 image, (flags & 0x00010000)
indicates that multiboot header contains valid addresses to load
the kernel image. These addresses are used to compute kernel
size and kernel text offset in the OS image. Validate these
address values to avoid an OOB access issue.

This is CVE-2017-14167.

Reported-by: Thomas Garnier <thgarnie@google.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <20170907063256.7418-1-ppandit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit ed4f86e8b6)
[FL: BSC#1057585 CVE-2017-14167]
Signed-off-by: Fei Li <fli@suse.com>
2021-03-16 20:23:02 -06:00
Kevin Wolf
92f9d92993 IDE: test flush on empty CDROM
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20170809160212.29976-3-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit ce317e8dea)
[FL: BSC#1054724 CVE-2017-12809]
Signed-off-by: Fei Li <fli@suse.com>
2021-03-16 20:23:02 -06:00
Stefan Hajnoczi
9a3d9e4fc3 IDE: Do not flush empty CDROM drives
The block backend changed in a way that flushing empty CDROM drives now
crashes.  Amend IDE to avoid doing so until the root problem can be
addressed for 2.11.

Original patch by John Snow <jsnow@redhat.com>.

Reported-by: Kieron Shorrock <kshorrock@paloaltonetworks.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20170809160212.29976-2-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 4da97120d5)
[FL: BSC#1054724 CVE-2017-12809]
Signed-off-by: Fei Li <fli@suse.com>
2021-03-16 20:23:02 -06:00
Stefano Stabellini
217c09fe63 xen/disk: don't leak stack data via response ring
Rather than constructing a local structure instance on the stack, fill
the fields directly on the shared ring, just like other (Linux)
backends do. Build on the fact that all response structure flavors are
actually identical (aside from alignment and padding at the end).

This is XSA-216.

Reported by: Anthony Perard <anthony.perard@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
(cherry picked from commit b0ac694fdb)
[FL: BSC#1057378 CVE-2017-10911]
Signed-off-by: Fei Li <fli@suse.com>
2021-03-16 20:23:02 -06:00
Khem Raj
fa3a41f1b1 Replace 'struct ucontext' with 'ucontext_t' type
glibc used to have:

   typedef struct ucontext { ... } ucontext_t;

glibc now has:

   typedef struct ucontext_t { ... } ucontext_t;

(See https://sourceware.org/bugzilla/show_bug.cgi?id=21457
 for detail and rationale for the glibc change)

However, QEMU used "struct ucontext" in declarations. This is a
private name and compatibility cannot be guaranteed. Switch to
only using the standardized type name.

Signed-off-by: Khem Raj <raj.khem@gmail.com>
Message-id: 20170628204452.41230-1-raj.khem@gmail.com
Cc: Kamil Rytarowski <kamil@netbsd.org>
Cc: Riku Voipio <riku.voipio@iki.fi>
Cc: Laurent Vivier <laurent@vivier.eu>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[PMM: Rewrote commit message, based mostly on the one from
 Nathaniel McCallum]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit 04b33e2186)
[BR: BOO#1055587]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:02 -06:00
Prasad J Pandit
c110404cf0 slirp: check len against dhcp options array end
While parsing dhcp options string in 'dhcp_decode', if an options'
length 'len' appeared towards the end of 'bp_vend' array, ensuing
read could lead to an OOB memory access issue. Add check to avoid it.

This is CVE-2017-11434.

Reported-by: Reno Robert <renorobert@gmail.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
(cherry picked from commit 413d463f43)
[BR: BSC#1049381 CVE-2017-11434]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:02 -06:00
Paolo Bonzini
ce26797251 megasas: always store SCSIRequest* into MegasasCmd
This ensures that the request is unref'ed properly, and avoids a
segmentation fault in the new qtest testcase that is added.

Reported-by: Zhangyanyu <zyy4013@stu.ouc.edu.cn>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[BR: BSC#1043296 CVE-2017-9503, dropped testcase from patch]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:02 -06:00
Paolo Bonzini
b75ffd4f05 altera_timer: fix incorrect memset
Use sizeof instead of ARRAY_SIZE, fixing -Wmemset-elt-size with recent
GCC versions.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[LY: BSC#1040228]
Signed-off-by: Liang Yan <lyan@suse.com>
2021-03-16 20:23:02 -06:00
Dr. David Alan Gilbert
3ec305b8a5 slirp/smb: Replace constant strings by glib string
gcc 7 (on fedora 26) objects to many of the snprintf's
in the smb path and command creation because it can't
figure out that the smb_dir (i.e. the /tmp dir for the configuration)
is known to be short.

Replace all these fixed length buffers by g_str* functions that dynamically
allocate and use g_dir_make_tmp to make the directory.
(It's fairly new glib but we have a compat function for it).

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
(cherry picked from commit f95cc8b6cc)
[LY: BSC#1040228]
Signed-off-by: Liang Yan <lyan@suse.com>
2021-03-16 20:23:02 -06:00
Paolo Bonzini
4558a0441e jazz_led: fix bad snprintf
Detected by GCC 7's -Wformat-truncation.  snprintf writes at most
2 bytes here including the terminating NUL, so the result is
truncated.  In addition, the newline at the end is pointless.
Fix the buffer size and the format string.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(cherry picked from commit e9c6ab62c7)
[LY: BSC#1040228]
Signed-off-by: Liang Yan <lyan@suse.com>
2021-03-16 20:23:02 -06:00
Alexander Graf
12be6c73c4 i386: Allow cpuid bit override
KVM has a feature bitmap of CPUID bits that it knows works for guests.
QEMU removes bits that are not part of that bitmap automatically on VM
start.

However, some times we just don't list features in that list because
they don't make sense for normal scenarios, but may be useful in specific,
targeted workloads.

For that purpose, add a new =force option to all CPUID feature flags in
the CPU property. With that we can override the accel filtering and give
users full control over the CPUID feature bits exposed into guests.

Signed-off-by: Alexander Graf <agraf@suse.de>
2021-03-16 20:23:02 -06:00
Alexander Graf
a79c6050a7 input: Add trace event for empty keyboard queue
When driving QEMU from the outside, we have basically no chance to
determine how quickly the guest OS picks up key events, so we usually
have to limit ourselves to very slow keyboard presses to make sure
the guest always has enough chance to pick them up.

This patch adds a trace events when the keyboarde queue is drained.
An external driver can use that as hint that new keys can be pressed.

Signed-off-by: Alexander Graf <agraf@suse.de>
Message-id: 1490883775-94658-1-git-send-email-agraf@suse.de
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
[BR: BSC#1031692]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:02 -06:00
Alexander Graf
a8cc7712a8 ARM: KVM: Enable in-kernel timers with user space gic
When running with KVM enabled, you can choose between emulating the
gic in kernel or user space. If the kernel supports in-kernel virtualization
of the interrupt controller, it will default to that. If not, if will
default to user space emulation.

Unfortunately when running in user mode gic emulation, we miss out on
timer events which are only available from kernel space. This patch leverages
the new kernel/user space pending line synchronization for those timer events.

Signed-off-by: Alexander Graf <agraf@suse.de>
2021-03-16 20:23:02 -06:00
Christoffer Dall
3914146879 RFC: update Linux headers from irqs-to-user-v3
Get ioctl number and definitions for KVM_CAP_ARM_USER_IRQ.

Signed-off-by: Christoffer Dall <cdall@linaro.org>
[agraf: change cap to indicate downstream status]
Signed-off-by: Alexander Graf <agraf@suse.de>
2021-03-16 20:23:02 -06:00
Andreas Färber
af99d39bec tests: Add scsi-disk test
Test scsi-{disk,hd,cd} wwn properties for correct 64-bit parsing.

For now piggyback on virtio-scsi.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2021-03-16 20:23:02 -06:00
Andreas Färber
ffeb43f6ac tests: Add QOM property unit tests
Add a test for parsing and setting a uint64 property.

Signed-off-by: Andreas Färber <afaerber@suse.de>
2021-03-16 20:23:02 -06:00
Andreas Färber
da8ccab7b2 test-string-input-visitor: Add uint64 test
Test parsing of decimal and hexadecimal uint64 numbers with most
significant bit set.

Signed-off-by: Andreas Färber <afaerber@suse.de>
2021-03-16 20:23:02 -06:00
Andreas Färber
852914a4a3 test-string-input-visitor: Add int test case
In addition to -42 also parse the maximum int64.

Signed-off-by: Andreas Färber <afaerber@suse.de>
2021-03-16 20:23:02 -06:00
Andreas Färber
1961198aa1 string-input-visitor: Fix uint64 parsing
All integers would get parsed by strtoll(), not handling the case of
UINT64 properties with the most significient bit set.

Implement a .type_uint64 visitor callback, reusing the existing
parse_str() code through a new argument, using strtoull().

As this is a bug fix, it intentionally ignores checkpatch warnings to
prefer the use of qemu_strto[u]ll() over strto[u]ll().

Cc: qemu-stable@nongnu.org
Signed-off-by: Andreas Färber <afaerber@suse.de>
2021-03-16 20:23:02 -06:00
Chunyan Liu
3f180d13bc fix xen hvm direct kernel boot
Since commit a1666142: acpi-build: make ROMs RAM blocks resizeable,
xen HVM direct kernel boot failed. Xen HVM direct kernel boot will
insert a linuxboot.bin or multiboot.bin to /genroms, before this
commit, in acpi_setup, for rom linuxboot.bin/multiboot.bin, it
only needs 0x20000 size; after the commit, it will reserve x16
size for resize, that is 0x200000 size. It causes xen_ram_alloc
failed due to running out of memory.

To resolve it, either:
1. keep using original rom size instead of max size, don't reserve x16 size.
2. guest maxmem needs to be increased. (commit c1d322e6 "xen-hvm: increase
   maxmem before calling xc_domain_populate_physmap" solved the problem for
   a time, by accident. But then it is reverted in commit ffffbb369 due to
   other problem.)

For 2, more discussion is needed about howto. So this patch tries 1, to
use unresizable rom size in xen case in rom_set_mr.

[CYL: BSC#970791]

Signed-off-by: Chunyan Liu <cyliu@suse.com>
2021-03-16 20:23:02 -06:00
Chunyan Liu
a92541e62a Fix tigervnc long press issue
Using xen tools 'xl vncviewer' with tigervnc (default on SLE-12),
found that: the display of the guest is unexpected while keep
pressing a key. We expect the same character multiple times, but
it prints only one time. This happens on a PV guest in text mode.

After debugging, found that tigervnc sends repeated key down events
in this case, to differentiate from user pressing the same key many
times. Vnc server only prints the character when it finally receives
key up event.

To solve this issue, this patch tries to add additional key up event
before the next repeated key down event (if the key is not a control
key).

[CYL: BSC#882405]
Signed-off-by: Chunyan Liu <cyliu@suse.com>
2021-03-16 20:23:02 -06:00
Andreas Färber
79b382a16d acpi_piix4: Fix migration from SLE11 SP2
qemu-kvm 0.15 uses the same GPE format as qemu 1.4, but as version 2
rather than 3.

Addresses part of BNC#812836.

Signed-off-by: Andreas Färber <afaerber@suse.de>
2021-03-16 20:23:02 -06:00
Andreas Färber
7f17cde23a i8254: Fix migration from SLE11 SP2
qemu-kvm 0.15 had a VMSTATE_UINT32(flags, PITState) field that
qemu 1.4 does not have.

Addresses part of BNC#812836.

Signed-off-by: Andreas Färber <afaerber@suse.de>
2021-03-16 20:23:02 -06:00
Andreas Färber
0cab85fb1c vga: Raise VRAM to 16 MiB for pc-0.15 and below
qemu-kvm.git commit a7fe0297840908a4fd65a1cf742481ccd45960eb
(Extend vram size to 16MB) deviated from qemu.git since kvm-61, and only
in commit 9e56edcf8d (vga: raise default
vgamem size) did qemu.git adjust the VRAM size for v1.2.

Add compatibility properties so that up to and including pc-0.15 we
maintain migration compatibility with qemu-kvm rather than QEMU and
from pc-1.0 on with QEMU (last qemu-kvm release was 1.2).

Addresses part of BNC#812836.

Signed-off-by: Andreas Färber <afaerber@suse.de>
[BR: adjust comma position in list in macro for v2.5.0 compat]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:02 -06:00
Bruce Rogers
e4c17119dd increase x86_64 physical bits to 42
Allow for guests with higher amounts of ram. The current thought
is that 2TB specified on qemu commandline would be an appropriate
limit. Note that this requires the next higher bit value since
the highest address is actually more than 2TB due to the pci
memory hole.

Signed-off-by: Bruce Rogers <brogers@suse.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2021-03-16 20:23:02 -06:00
Andreas Färber
e8d2218fcd Raise soft address space limit to hard limit
For SLES we want users to be able to use large memory configurations
with KVM without fiddling with ulimit -Sv.

Signed-off-by: Andreas Färber <afaerber@suse.de>
[BR: add include for sys/resource.h]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:02 -06:00
Bruce Rogers
2902852e58 roms/Makefile: pass a packaging timestamp to subpackages with date info
Certain rom subpackages build from qemu git-submodules call the date
program to include date information in the packaged binaries. This
causes repeated builds of the package to be different, wkere the only
real difference is due to the fact that time build timestamp has
changed. To promote reproducible builds and avoid customers being
prompted to update packages needlessly, we'll use the timestamp of the
VERSION file as the packaging timestamp for all packages that build in a
timestamp for whatever reason.

[BR: BSC#1011213]
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:02 -06:00
0bbe0a9624 linux-user: remove all traces of qemu from /proc/self/cmdline
Instead of post-processing the real contents use the remembered target
argv.  That removes all traces of qemu, including command line options,
and handles QEMU_ARGV0.

Signed-off-by: Andreas Schwab <schwab@suse.de>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2021-03-16 20:23:02 -06:00
33a4759f97 linux-user: properly test for infinite timeout in poll (#8)
After "linux-user: use target_ulong" the poll syscall was no longer
handling infinite timeout.

/home/abuild/rpmbuild/BUILD/qemu-2.7.0-rc5/linux-user/syscall.c:9773:26: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
                 if (arg3 >= 0) {
                          ^~

Signed-off-by: Andreas Schwab <schwab@suse.de>
2021-03-16 20:23:02 -06:00
markkp
617054c693 configure: Fix detection of seccomp on s390x
Signed-off-by: Mark Post <mpost@suse.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2021-03-16 20:23:02 -06:00
5a82b83061 qemu-binfmt-conf: use qemu-ARCH-binfmt
Signed-off-by: Andreas Schwab <schwab@suse.de>
Signed-off-by: Andreas Färber <afaerber@suse.de>
2021-03-16 20:23:02 -06:00
Bruce Rogers
706a3b0849 qemu-bridge-helper: reduce security profile
Change from using glib alloc and free routines to those
from libc. Also perform safety measure of dropping privs
to user if configured no-caps.

[BR: BOO#988279]
Signed-off-by: Bruce Rogers <brogers@suse.com>
[AF: Rebased for v2.7.0-rc2]
Signed-off-by: Andreas Färber <afaerber@suse.de>
2021-03-16 20:23:02 -06:00
Bruce Rogers
f9705a9de8 xen_disk: Add suse specific flush disable handling and map to QEMU equiv
Add code to read the suse specific suse-diskcache-disable-flush flag out
of xenstore, and set the equivalent flag within QEMU.

Patch taken from Xen's patch queue, Olaf Hering being the original author.
[bsc#879425]

Signed-off-by: Bruce Rogers <brogers@suse.com>
Signed-off-by: Olaf Hering <olaf@aepfle.de>
2021-03-16 20:23:02 -06:00
Alexander Graf
6358fd6880 dictzip: Fix on big endian systems
The dictzip code in SLE11 received some treatment over time to support
running on big endian hosts. Somewhere in the transition to SLE12 this
support got lost. Add it back in again from the SLE11 code base.

Furthermore while at it, fix up the debug prints to not emit warnings.

[AG: BSC#937572]
Signed-off-by: Alexander Graf <agraf@suse.de>
[AF: Rebased for v2.7.0-rc2]
Signed-off-by: Andreas Färber <afaerber@suse.de>
2021-03-16 20:23:02 -06:00
Alexander Graf
3cc08e3691 AIO: Reduce number of threads for 32bit hosts
On hosts with limited virtual address space (32bit pointers), we can very
easily run out of virtual memory with big thread pools.

Instead, we should limit ourselves to small pools to keep memory footprint
low on those systems.

This patch fixes random VM stalls like

  (process:25114): GLib-ERROR **: gmem.c:103: failed to allocate 1048576 bytes

on 32bit ARM systems for me.

Signed-off-by: Alexander Graf <agraf@suse.de>
2021-03-16 20:23:02 -06:00
Dinar Valeev
386c721eb5 configure: Enable PIE for ppc and ppc64 hosts
Signed-off-by: Dinar Valeev <dvaleev@suse.com>
[AF: Rebased for v1.7]
Signed-off-by: Andreas Färber <afaerber@suse.de>
2021-03-16 20:23:02 -06:00
Alexander Graf
2ee5e1d731 linux-user: lseek: explicitly cast non-set offsets to signed
When doing lseek, SEEK_SET indicates that the offset is an unsigned variable.
Other seek types have parameters that can be negative.

When converting from 32bit to 64bit parameters, we need to take this into
account and enable SEEK_END and SEEK_CUR to be negative, while SEEK_SET stays
absolute positioned which we need to maintain as unsigned.

Signed-off-by: Alexander Graf <agraf@suse.de>
2021-03-16 20:23:02 -06:00
Alexander Graf
060c4bdb93 Make char muxer more robust wrt small FIFOs
Virtio-Console can only process one character at a time. Using it on S390
gave me strage "lags" where I got the character I pressed before when
pressing one. So I typed in "abc" and only received "a", then pressed "d"
but the guest received "b" and so on.

While the stdio driver calls a poll function that just processes on its
queue in case virtio-console can't take multiple characters at once, the
muxer does not have such callbacks, so it can't empty its queue.

To work around that limitation, I introduced a new timer that only gets
active when the guest can not receive any more characters. In that case
it polls again after a while to check if the guest is now receiving input.

This patch fixes input when using -nographic on s390 for me.

[AF: Rebased for v2.7.0-rc2]
2021-03-16 20:23:02 -06:00
Alexander Graf
ae4499c938 console: add question-mark escape operator
Some termcaps (found using SLES11SP1) use [? sequences. According to man
console_codes (http://linux.die.net/man/4/console_codes) the question mark
is a nop and should simply be ignored.

This patch does exactly that, rendering screen output readable when
outputting guest serial consoles to the graphical console emulator.

Signed-off-by: Alexander Graf <agraf@suse.de>
2021-03-16 20:23:02 -06:00
Alexander Graf
a52d3e0faf Legacy Patch kvm-qemu-preXX-dictzip3.patch 2021-03-16 20:23:02 -06:00
Alexander Graf
17440d3df7 block: Add tar container format
Tar is a very widely used format to store data in. Sometimes people even put
virtual machine images in there.

So it makes sense for qemu to be able to read from tar files. I implemented a
written from scratch reader that also knows about the GNU sparse format, which
is what pigz creates.

This version checks for filenames that end on well-known extensions. The logic
could be changed to search for filenames given on the command line, but that
would require changes to more parts of qemu.

The tar reader in conjunctiuon with dzip gives us the chance to download
tar'ed up virtual machine images (even via http) and instantly make use of
them.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Bruce Rogers <brogers@novell.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
[TH: Use bdrv_open options instead of filename]
Signed-off-by: Tim Hardeck <thardeck@suse.de>
[AF: bdrv_file_open got an Error **errp argument, bdrv_delete -> brd_unref]
[AF: qemu_opts_create_nofail() -> qemu_opts_create(),
     bdrv_file_open() -> bdrv_open(), based on work by brogers]
[AF: error_is_set() dropped for v2.1.0-rc0]
[AF: BlockDriverAIOCB -> BlockAIOCB,
     BlockDriverCompletionFunc -> BlockCompletionFunc,
     qemu_aio_release() -> qemu_aio_unref(),
     drop tar_aio_cancel()]
[AF: common-obj-y -> block-obj-y, drop probe hook (bsc#945778)]
[AF: Drop bdrv_open() drv parameter for 2.5]
Signed-off-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Bruce Rogers <brogers@suse.com>
[AF: Changed bdrv_open() bs parameter and return value for v2.7.0-rc2,
     for bdrv_pread() and bdrv_aio_readv() s/s->hd/s->hd->file/]
Signed-off-by: Andreas Färber <afaerber@suse.de>
2021-03-16 20:23:02 -06:00
Alexander Graf
515180dec7 block: Add support for DictZip enabled gzip files
DictZip is an extension to the gzip format that allows random seeks in gzip
compressed files by cutting the file into pieces and storing the piece offsets
in the "extra" header of the gzip format.

Thanks to that extension, we can use gzip compressed files as block backend,
though only in read mode.

This makes a lot of sense when stacked with tar files that can then be shipped
to VM users. If a VM image is inside a tar file that is inside a DictZip
enabled gzip file, the user can run the tar.gz file as is without having to
extract the image first.

Tar patch follows.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Bruce Rogers <brogers@novell.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
[TH: Use bdrv_open options instead of filename]
Signed-off-by: Tim Hardeck <thardeck@suse.de>
[AF: Error **errp added for bdrv_file_open, bdrv_delete -> bdrv_unref]
[AF: qemu_opts_create_nofail() -> qemu_opts_create(),
     bdrv_file_open() -> bdrv_open(), based on work by brogers]
[AF: error_is_set() dropped for v2.1.0-rc0]
[AF: BlockDriverAIOCB -> BlockAIOCB,
     BlockDriverCompletionFunc -> BlockCompletionFunc,
     qemu_aio_release() -> qemu_aio_unref(),
     drop dictzip_aio_cancel()]
[AF: common-obj-y -> block-obj-y, drop probe hook (bsc#945778)]
[AF: Drop bdrv_open() drv parameter for 2.5]
Signed-off-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Bruce Rogers <brogers@suse.com>
[AF: Drop bdrv_open() bs parameter and change return value for v2.7.0-rc2,
     for bdrv_pread() and bdrv_aio_readv() do s/s->hd/s->hd->file/]
Signed-off-by: Andreas Färber <afaerber@suse.de>
2021-03-16 20:23:02 -06:00
Alexander Graf
0b96efddf3 linux-user: use target_ulong
Linux syscalls pass pointers or data length or other information of that sort
to the kernel. This is all stuff you don't want to have sign extended.
Otherwise a host 64bit variable parameter with a size parameter will extend
it to a negative number, breaking lseek for example.

Pass syscall arguments as ulong always.

Signed-off-by: Alexander Graf <agraf@suse.de>
2021-03-16 20:23:02 -06:00
Andreas Färber
ca655c7032 vnc: password-file= and incoming-connections=
TBD (from SUSE Studio team)
2021-03-16 20:23:02 -06:00
Andreas Färber
eb95c111c5 slirp: -nooutgoing
TBD (from SUSE Studio team)
2021-03-16 20:23:01 -06:00
Alexander Graf
e1250d6316 linux-user: XXX disable fiemap
agraf: fiemap breaks in libarchive. Disable it for now.
2021-03-16 20:23:01 -06:00
Alexander Graf
8f51ff90ca linux-user: Fake /proc/cpuinfo
Fedora 17 for ARM reads /proc/cpuinfo and fails if it doesn't contain
ARM related contents. This patch implements a quick hack to expose real
/proc/cpuinfo data taken from a real world machine.

The real fix would be to generate at least the flags automatically based
on the selected CPU. Please do not submit this patch upstream until this
has happened.

Signed-off-by: Alexander Graf <agraf@suse.de>
[AF: Rebased for v1.6 and v1.7]
Signed-off-by: Andreas Färber <afaerber@suse.de>
2021-03-16 20:23:01 -06:00
Alexander Graf
e755ba527b linux-user: binfmt: support host binaries
When we have a working host binary equivalent for the guest binary we're
trying to run, let's just use that instead as it will be a lot faster.

Signed-off-by: Alexander Graf <agraf@suse.de>
2021-03-16 20:23:01 -06:00
Alexander Graf
1637529607 linux-user: fix segfault deadlock
When entering the guest we take a lock to ensure that nobody else messes
with our TB chaining while we're doing it. If we get a segfault inside that
code, we manage to work on, but will not unlock the lock.

This patch forces unlocking of that lock in the segv handler. I'm not sure
this is the right approach though. Maybe we should rather make sure we don't
segfault in the code? I would greatly appreciate someone more intelligible
than me to look at this :).

Example code to trigger this is at: http://csgraf.de/tmp/conftest.c

Reported-by: Fabio Erculiani <lxnay@sabayon.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
[AF: Drop spinlock_safe_unlock() and switch to tb_lock_reset() (bonzini)]
Signed-off-by: Andreas Färber <afaerber@suse.de>
2021-03-16 20:23:01 -06:00
Alexander Graf
e02f3a2c2c PPC: KVM: Disable mmu notifier check
When using hugetlbfs (which is required for HV mode KVM on 970), we
check for MMU notifiers that on 970 can not be implemented properly.

So disable the check for mmu notifiers on PowerPC guests, making
KVM guests work there, even if possibly racy in some odd circumstances.
2021-03-16 20:23:01 -06:00
Alexander Graf
aacb3a44f0 linux-user: add binfmt wrapper for argv[0] handling
When using qemu's linux-user binaries through binfmt, argv[0] gets lost
along the execution because qemu only gets passed in the full file name
to the executable while argv[0] can be something completely different.

This breaks in some subtile situations, such as the grep and make test
suites.

This patch adds a wrapper binary called qemu-$TARGET-binfmt that can be
used with binfmt's P flag which passes the full path _and_ argv[0] to
the binfmt handler.

The binary would be smart enough to be versatile and only exist in the
system once, creating the qemu binary path names from its own argv[0].
However, this seemed like it didn't fit the make system too well, so
we're currently creating a new binary for each target archictecture.

CC: Reinhard Max <max@suse.de>
Signed-off-by: Alexander Graf <agraf@suse.de>
[AF: Rebased onto new Makefile infrastructure, twice]
[AF: Updated for aarch64 for v2.0.0-rc1]
[AF: Rebased onto Makefile changes for v2.1.0-rc0]
[AF: Rebased onto script rewrite for v2.7.0-rc2 - to be fixed]
Signed-off-by: Andreas Färber <afaerber@suse.de>
2021-03-16 20:23:01 -06:00
Alexander Graf
0f4cd18742 qemu-cvs-ioctl_nodirection
the direction given in the ioctl should be correct so we can assume the
communication is uni-directional. The alsa developers did not like this
concept though and declared ioctls IOC_R and IOC_W even though they were
IOC_RW.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Ulrich Hecht <uli@suse.de>
2021-03-16 20:23:01 -06:00
Alexander Graf
c384d2b0a3 qemu-cvs-ioctl_debug
Extends unsupported ioctl debug output.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Ulrich Hecht <uli@suse.de>
2021-03-16 20:23:01 -06:00
Ulrich Hecht
2df2d22b7a qemu-cvs-gettimeofday
No clue what this is for.
2021-03-16 20:23:01 -06:00
Alexander Graf
2e88b0133e qemu-cvs-alsa_mmap
Hack to prevent ALSA from using mmap() interface to simplify emulation.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Ulrich Hecht <uli@suse.de>
2021-03-16 20:23:01 -06:00
Alexander Graf
3288a6528a qemu-cvs-alsa_ioctl
Implements ALSA ioctls on PPC hosts.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Ulrich Hecht <uli@suse.de>
[AF: Rebased for v2.7.0-rc2]
[BR: Rebased for v2.9.0-rc0: removed timespec ref. from syscall_types_alsa.h]
Signed-off-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:01 -06:00
Alexander Graf
c0535d90a0 qemu-cvs-alsa_bitfield
Implements TYPE_INTBITFIELD partially. (required for ALSA support)

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Ulrich Hecht <uli@suse.de>
2021-03-16 20:23:01 -06:00
Andreas Färber
a70e1761f0 qemu-binfmt-conf: Modify default path
Change QEMU_PATH from /usr/local/bin to /usr/bin prefix.

Signed-off-by: Andreas Färber <afaerber@suse.de>
2021-03-16 20:23:01 -06:00
Alexander Graf
16059d7ed4 XXX dont dump core on sigabort 2021-03-16 20:23:01 -06:00
Greg Kurz
5bbd90e113 9pfs: Fully restart unreclaim loop (CVE-2021-20181)
Git-commit: 89fbea8737
References: bsc#1182137

Depending on the client activity, the server can be asked to open a huge
number of file descriptors and eventually hit RLIMIT_NOFILE. This is
currently mitigated using a reclaim logic : the server closes the file
descriptors of idle fids, based on the assumption that it will be able
to re-open them later. This assumption doesn't hold of course if the
client requests the file to be unlinked. In this case, we loop on the
entire fid list and mark all related fids as unreclaimable (the reclaim
logic will just ignore them) and, of course, we open or re-open their
file descriptors if needed since we're about to unlink the file.

This is the purpose of v9fs_mark_fids_unreclaim(). Since the actual
opening of a file can cause the coroutine to yield, another client
request could possibly add a new fid that we may want to mark as
non-reclaimable as well. The loop is thus restarted if the re-open
request was actually transmitted to the backend. This is achieved
by keeping a reference on the first fid (head) before traversing
the list.

This is wrong in several ways:
- a potential clunk request from the client could tear the first
  fid down and cause the reference to be stale. This leads to a
  use-after-free error that can be detected with ASAN, using a
  custom 9p client
- fids are added at the head of the list : restarting from the
  previous head will always miss fids added by a some other
  potential request

All these problems could be avoided if fids were being added at the
end of the list. This can be achieved with a QSIMPLEQ, but this is
probably too much change for a bug fix. For now let's keep it
simple and just restart the loop from the current head.

Fixes: CVE-2021-20181
Buglink: https://bugs.launchpad.net/qemu/+bug/1911666
Reported-by: Zero Day Initiative <zdi-disclosures@trendmicro.com>
Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Message-Id: <161064025265.1838153.15185571283519390907.stgit@bahia.lan>
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Bruce Rogers <brogers@suse.com>
2021-03-16 20:23:01 -06:00
Michael Roth
4cd42653f5 Update version for 2.9.1 release
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-09-07 11:25:12 -05:00
Greg Kurz
c24c5910b7 virtfs: error out gracefully when mandatory suboptions are missing
We internally convert -virtfs to -fsdev/-device. If the user doesn't
provide the path or security_model suboptions, and the fsdev backend
requires them, we hit an assertion when populating the internal -fsdev
option:

util/qemu-option.c:547: opt_set: Assertion `opt->str' failed.
Aborted (core dumped)

Let's test the suboption presence on the command line before trying
to set it in the internal -fsdev option, and let the backend code
error out gracefully (ie, like it already does when the user passes
-fsdev on the command line).

Reported-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit 32b6943699)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-09-05 15:42:17 -05:00
Richard Henderson
2d1bbf51c2 target/arm: Fix aa64 ldp register writeback
For "ldp x0, x1, [x0]", if the second load is on a second page and
the second page is unmapped, the exception would be raised with x0
already modified.  This means the instruction couldn't be restarted.

Cc: qemu-arm@nongnu.org
Cc: qemu-stable@nongnu.org
Reported-by: Andrew <andrew@fubar.geek.nz>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20170825224833.4463-1-richard.henderson@linaro.org
Fixes: https://bugs.launchpad.net/qemu/+bug/1713066
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
[PMM: tweaked comment format]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

(cherry picked from commit 3e4d91b94c)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-09-04 23:12:48 -05:00
Anthony PERARD
30b76b2439 exec: Add lock parameter to qemu_ram_ptr_length
Commit 04bf2526ce (exec: use
qemu_ram_ptr_length to access guest ram) start using qemu_ram_ptr_length
instead of qemu_map_ram_ptr, but when used with Xen, the behavior of
both function is different. They both call xen_map_cache, but one with
"lock", meaning the mapping of guest memory is never released
implicitly, and the second one without, which means, mapping can be
release later, when needed.

In the context of address_space_{read,write}_continue, the ptr to those
mapping should not be locked because it is used immediatly and never
used again.

The lock parameter make it explicit in which context qemu_ram_ptr_length
is called.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Message-Id: <20170726165326.10327-1-anthony.perard@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit f5aa69bdc3)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-09-01 18:15:39 -05:00
Stefano Stabellini
2f64063f4e xen/mapcache: store dma information in revmapcache entries for debugging
The Xen mapcache is able to create long term mappings, they are called
"locked" mappings. The third parameter of the xen_map_cache call
specifies if a mapping is a "locked" mapping.

>From the QEMU point of view there are two kinds of long term mappings:

[a] device memory mappings, such as option roms and video memory
[b] dma mappings, created by dma_memory_map & friends

After certain operations, ballooning a VM in particular, Xen asks QEMU
kindly to destroy all mappings. However, certainly [a] mappings are
present and cannot be removed. That's not a problem as they are not
affected by balloonning. The *real* problem is that if there are any
mappings of type [b], any outstanding dma operations could fail. This is
a known shortcoming. In other words, when Xen asks QEMU to destroy all
mappings, it is an error if any [b] mappings exist.

However today we have no way of distinguishing [a] from [b]. Because of
that, we cannot even print a decent warning.

This patch introduces a new "dma" bool field to MapCacheRev entires, to
remember if a given mapping is for dma or is a long term device memory
mapping. When xen_invalidate_map_cache is called, we print a warning if
any [b] mappings exist. We ignore [a] mappings.

Mappings created by qemu_map_ram_ptr are assumed to be [a], while
mappings created by address_space_map->qemu_ram_ptr_length are assumed
to be [b].

The goal of the patch is to make debugging and system understanding
easier.

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
(cherry picked from commit 1ff7c5986a)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-09-01 18:15:24 -05:00
Prasad J Pandit
15d8f91446 exec: use qemu_ram_ptr_length to access guest ram
When accessing guest's ram block during DMA operation, use
'qemu_ram_ptr_length' to get ram block pointer. It ensures
that DMA operation of given length is possible; And avoids
any OOB memory access situations.

Reported-by: Alex <broscutamaker@gmail.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <20170712123840.29328-1-ppandit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 04bf2526ce)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-09-01 18:10:46 -05:00
Gerd Hoffmann
f9c313f70f xhci: only update dequeue ptr on completed transfers
The dequeue pointer should only be updated in case the transfer
is actually completed.  If we update it for inflight transfers
we will not pick them up again after migration, which easily
triggers with HID devices as they typically have a pending
transfer, waiting for user input to happen.

Fixes: 243afe858b
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1451631
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Tested-by: Laurent Vivier <lvivier@redhat.com>
Message-id: 20170608074122.32099-1-kraxel@redhat.com
(cherry picked from commit d54fddea98)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-09-01 18:10:36 -05:00
Dr. David Alan Gilbert
53206753ba vl.c/exit: pause cpus before closing block devices
There's a rare exit seg if the guest is accessing
IO during exit.
It's always hitting the atomic_inc(&bs->in_flight) with a NULL
bs. This was added recently in 99723548  but I don't see it
as the cause.

Flip vl.c around so we pause the cpus before closing the block devices,
that way we shouldn't have anything trying to access them when
they're gone.

This was originally Red Hat bz https://bugzilla.redhat.com/show_bug.cgi?id=1451015

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reported-by: Cong Li <coli@redhat.com>

--
This is a very rare race, I'll leave it running in a loop to see if
we hit anything else and to check this really fixes it.

I do worry if there are other cases that can trigger this - e.g.
hot-unplug or ejecting a CD.

Message-Id: <20170713190116.21608-1-dgilbert@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 452589b6b4)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-31 17:09:34 -05:00
Michael Roth
167e76494e PPC: E500: update u-boot to match shipped binary
Previous submodule commit contained some unused files with comments
that made redistribution requirements unclear, and the submodule commit
should really match the blob we're shipped anyway.

Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-31 16:59:16 -05:00
Farhan Ali
e22e199b2b s390-ccw: Fix alignment for CCW1
The commit 198c0d1f9d s390x/css: check ccw address validity
exposes an alignment issue in ccw bios.

According to PoP the CCW must be doubleword aligned. Let's fix
this in the bios.

Cc: qemu-stable@nongnu.org
Signed-off-by: Farhan Ali <alifm@linux.vnet.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: Eric Farman <farman@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Message-Id: <3ed8b810b6592daee6a775037ce21f850e40647d.1503667215.git.alifm@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
(cherry picked from commit 3a1e4561ad)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-31 12:01:54 -05:00
Alexander Graf
5035184165 vnc: Set default kbd delay to 10ms
The current VNC default keyboard delay is 1ms. With that we're constantly
typing faster than the guest receives keyboard events from an XHCI attached
USB HID device.

The default keyboard delay time in the input layer however is 10ms. I don't know
how that number came to be, but empirical tests on some OpenQA driven ARM
systems show that 10ms really is a reasonable default number for the delay.

This patch moves the VNC delay also to 10ms. That way our default is much
safer (good!) and also consistent with the input layer default (also good!).

Signed-off-by: Alexander Graf <agraf@suse.de>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1499863425-103133-1-git-send-email-agraf@suse.de
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit d3b0db6dfe)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-31 11:51:17 -05:00
Max Reitz
20920f4db6 qemu-nbd: Ignore SIGPIPE
qemu proper has done so for 13 years
(8a7ddc38a6), qemu-img and qemu-io have
done so for four years (526eda14a6).
Ignoring this signal is especially important in qemu-nbd because
otherwise a client can easily take down the qemu-nbd server by dropping
the connection when the server wants to send something, for example:

$ qemu-nbd -x foo -f raw -t null-co:// &
[1] 12726
$ qemu-io -c quit nbd://localhost/bar
can't open device nbd://localhost/bar: No export with name 'bar' available
[1]  + 12726 broken pipe  qemu-nbd -x foo -f raw -t null-co://

In this case, the client sends an NBD_OPT_ABORT and closes the
connection (because it is not required to wait for a reply), but the
server replies with an NBD_REP_ACK (because it is required to reply).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20170611123714.31292-1-mreitz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 041e32b8d9)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-31 11:51:17 -05:00
Gerd Hoffmann
4e6889b76b usb-redir: fix stack overflow in usbredir_log_data
Don't reinvent a broken wheel, just use the hexdump function we have.

Impact: low, broken code doesn't run unless you have debug logging
enabled.

Reported-by: 李强 <liqiang6-s@360.cn>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170509110128.27261-1-kraxel@redhat.com
(cherry picked from commit bd4a683505)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-31 11:51:16 -05:00
Paolo Bonzini
244a3ef947 megasas: do not read SCSI req parameters more than once from frame
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit b356807fcd)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-31 11:51:16 -05:00
Paolo Bonzini
578fb5003f megasas: do not read command more than once from frame
Avoid TOC-TOU bugs by passing the frame_cmd down, and checking
cmd->dcmd_opcode instead of cmd->frame->header.frame_cmd.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 36c327a69d)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-31 11:51:16 -05:00
Paolo Bonzini
50b9353315 megasas: do not read DCMD opcode more than once from frame
Avoid TOC-TOU bugs by storing the DCMD opcode in the MegasasCmd

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 5104fac853)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-31 11:51:16 -05:00
Paolo Bonzini
d016071ca5 megasas: do not read iovec count more than once from frame
Avoid TOC-TOU bugs depending on how the compiler behaves.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 24c0c77af5)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-31 11:51:16 -05:00
Paolo Bonzini
20fd62d374 megasas: do not read sense length more than once from frame
Avoid TOC-TOU bugs depending on how the compiler behaves.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 134550bf81)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-31 11:51:16 -05:00
Greg Kurz
7442018a00 9pfs: local: forbid client access to metadata (CVE-2017-7493)
When using the mapped-file security mode, we shouldn't let the client mess
with the metadata. The current code already tries to hide the metadata dir
from the client by skipping it in local_readdir(). But the client can still
access or modify it through several other operations. This can be used to
escalate privileges in the guest.

Affected backend operations are:
- local_mknod()
- local_mkdir()
- local_open2()
- local_symlink()
- local_link()
- local_unlinkat()
- local_renameat()
- local_rename()
- local_name_to_path()

Other operations are safe because they are only passed a fid path, which
is computed internally in local_name_to_path().

This patch converts all the functions listed above to fail and return
EINVAL when being passed the name of the metadata dir. This may look
like a poor choice for errno, but there's no such thing as an illegal
path name on Linux and I could not think of anything better.

This fixes CVE-2017-7493.

Reported-by: Leo Gaspard <leo@gaspard.io>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
(cherry picked from commit 7a95434e0c)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-31 11:51:16 -05:00
Prasad J Pandit
0f590e798f scsi: avoid an off-by-one error in megasas_mmio_write
While reading magic sequence(MFI_SEQ) in megasas_mmio_write,
an off-by-one error could occur as 's->adp_reset' index is not
reset after reading the last sequence.

Reported-by: YY Z <bigbird475958471@gmail.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <20170424120634.12268-1-ppandit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 24dfa9fa2f)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-31 11:51:16 -05:00
Gerd Hoffmann
3c69132635 audio: release capture buffers
AUD_add_capture() allocates two buffers which are never released.
Add the missing calls to AUD_del_capture().

Impact: Allows vnc clients to exhaust host memory by repeatedly
starting and stopping audio capture.

Fixes: CVE-2017-8309
Cc: P J P <ppandit@redhat.com>
Cc: Huawei PSIRT <PSIRT@huawei.com>
Reported-by: "Jiangxin (hunter, SCC)" <jiangxin1@huawei.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-id: 20170428075612.9997-1-kraxel@redhat.com
(cherry picked from commit 3268a845f4)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-31 11:51:16 -05:00
P J P
40a7d47c5a vmw_pvscsi: check message ring page count at initialisation
A guest could set the message ring page count to zero, resulting in
infinite loop. Add check to avoid it.

Reported-by: YY Z <bigbird475958471@gmail.com>
Signed-off-by: P J P <ppandit@redhat.com>
Message-Id: <20170425130623.3649-1-ppandit@redhat.com>
Reviewed-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit f68826989c)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-31 11:51:16 -05:00
Thomas Huth
9b9b4423d7 hw/ppc/spapr_iommu: Fix crash when removing the "spapr-tce-table" device
QEMU currently aborts unexpectedly when the user tries to add and
remove a "spapr-tce-table" device:

$ qemu-system-ppc64 -nographic -S -nodefaults -monitor stdio
QEMU 2.9.92 monitor - type 'help' for more information
(qemu) device_add spapr-tce-table,id=x
(qemu) device_del x
**
ERROR:qemu/qdev-monitor.c:872:qdev_unplug: assertion failed: (hotplug_ctrl)
Aborted (core dumped)

The device should not be accessable for the users at all, it's just
used internally, so mark it with user_creatable = false.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 1f98e55385)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-31 11:51:16 -05:00
Thomas Huth
980e8260e6 hw/ppc/spapr_rtc: Mark the RTC device with user_creatable = false
QEMU currently aborts unexpectedly when a user tries to do something
like this:

$ qemu-system-ppc64 -nographic -S -nodefaults -monitor stdio
QEMU 2.9.92 monitor - type 'help' for more information
(qemu) device_add spapr-rtc,id=spapr-rtc
(qemu) device_del spapr-rtc
**
ERROR:qemu/qdev-monitor.c:872:qdev_unplug: assertion failed: (hotplug_ctrl)
Aborted (core dumped)

The RTC device is not meant to be hot-pluggable - it's an internal
device only and it even should not be possible to create it a
second time with the "-device" parameter, so let's mark this
with "user_creatable = false".

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 8ccccff9dd)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-31 11:51:16 -05:00
Eduardo Habkost
aab00230aa qdev: Replace cannot_instantiate_with_device_add_yet with !user_creatable
cannot_instantiate_with_device_add_yet was introduced by commit
efec3dd631 to replace no_user. It was
supposed to be a temporary measure.

When it was introduced, we had 54
cannot_instantiate_with_device_add_yet=true lines in the code.
Today (3 years later) this number has not shrunk: we now have
57 cannot_instantiate_with_device_add_yet=true lines. I think it
is safe to say it is not a temporary measure, and we won't see
the flag go away soon.

Instead of a long field name that misleads people to believe it
is temporary, replace it a shorter and less misleading field:
user_creatable.

Except for code comments, changes were generated using the
following Coccinelle patch:

  @@
  expression DC;
  @@
  (
  -DC->cannot_instantiate_with_device_add_yet = false;
  +DC->user_creatable = true;
  |
  -DC->cannot_instantiate_with_device_add_yet = true;
  +DC->user_creatable = false;
  )

  @@
  typedef ObjectClass;
  expression dc;
  identifier class, data;
  @@
   static void device_class_init(ObjectClass *class, void *data)
   {
   ...
   dc->hotpluggable = true;
  +dc->user_creatable = true;
   ...
   }

  @@
  @@
   struct DeviceClass {
   ...
  -bool cannot_instantiate_with_device_add_yet;
  +bool user_creatable;
   ...
  }

  @@
  expression DC;
  @@
  (
  -!DC->cannot_instantiate_with_device_add_yet
  +DC->user_creatable
  |
  -DC->cannot_instantiate_with_device_add_yet
  +!DC->user_creatable
  )

Cc: Alistair Francis <alistair.francis@xilinx.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Thomas Huth <thuth@redhat.com>
Acked-by: Alistair Francis <alistair.francis@xilinx.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Acked-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20170503203604.31462-2-ehabkost@redhat.com>
[ehabkost: kept "TODO remove once we're there" comment]
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
(cherry picked from commit e90f2a8c3e)
 Conflicts:
	include/hw/qdev-core.h
* remove context dep on 08f00df
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-31 11:51:16 -05:00
Eduardo Otubo
cfc65be6fb fix qemu-system-unicore32 crashing when calling without -kernel
Starting qemu-system-unicore32 without the -kernel parameter results in
an assert() returns false and aborts qemu. This patch replaces it with a
proper error message followed by exit(1).

Signed-off-by: Eduardo Otubo <otubo@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(cherry picked from commit 36bed541ca)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-31 11:24:20 -05:00
Thomas Huth
ac0038f182 hw/s390x/ipl: Fix crash with virtio-scsi-pci device
qemu-system-s390x currently crashes when it is started with a
virtio-scsi-pci device, e.g.:

 qemu-system-s390x -nographic -enable-kvm -device virtio-scsi-pci \
                   -drive file=/tmp/disk.dat,if=none,id=d1,format=raw \
                   -device scsi-cd,drive=d1,bootindex=1

The problem is that the code in s390_gen_initial_iplb() currently assumes
that all SCSI devices are also CCW devices, which is not the case for
virtio-scsi-pci of course. Fix it by adding an appropriate check for
TYPE_CCW_DEVICE here.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Message-Id: <1493126327-13162-1-git-send-email-thuth@redhat.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
(cherry picked from commit 99efaa2696)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-31 11:24:04 -05:00
Samuel Thibault
62708c7c12 slirp: fix clearing ifq_so from pending packets
The if_fastq and if_batchq contain not only packets, but queues of packets
for the same socket. When sofree frees a socket, it thus has to clear ifq_so
from all the packets from the queues, not only the first.

Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit 1201d30851)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-31 11:22:38 -05:00
Marc-André Lureau
746e1fd000 slirp: tftp, copy sockaddr_size
ASAN detects an "unknown-crash" when running pxe-test:

/ppc64/pxe/spapr-vlan: =================================================================
==7143==ERROR: AddressSanitizer: unknown-crash on address 0x7f6dcd298d30 at pc 0x55e22218830d bp 0x7f6dcd2989e0 sp 0x7f6dcd2989d0
READ of size 128 at 0x7f6dcd298d30 thread T2
    #0 0x55e22218830c in tftp_session_allocate /home/elmarco/src/qq/slirp/tftp.c:73
    #1 0x55e22218a1f8 in tftp_handle_rrq /home/elmarco/src/qq/slirp/tftp.c:289
    #2 0x55e22218b54c in tftp_input /home/elmarco/src/qq/slirp/tftp.c:446
    #3 0x55e2221833fe in udp6_input /home/elmarco/src/qq/slirp/udp6.c:82
    #4 0x55e222137b17 in ip6_input /home/elmarco/src/qq/slirp/ip6_input.c:67

Address 0x7f6dcd298d30 is located in stack of thread T2 at offset 96 in frame
    #0 0x55e222182420 in udp6_input /home/elmarco/src/qq/slirp/udp6.c:13

  This frame has 3 object(s):
    [32, 48) '<unknown>'
    [96, 124) 'lhost' <== Memory access at offset 96 partially overflows this variable
    [160, 200) 'save_ip' <== Memory access at offset 96 partially underflows this variable

The sockaddr_storage pointer is the sockaddr_in6 lhost on the
stack. Copy only the source addr size.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
(cherry picked from commit 17eb587aeb)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-31 11:22:14 -05:00
Thomas Huth
e8679f5b45 monitor: Check whether TCG is enabled before running the "info jit" code
The "info jit" command currently aborts on Mac OS X with the message
"qemu_mutex_lock: Invalid argument" when running with "-M accel=qtest".
We should only call into the TCG code here if TCG has really been
enabled and initialized.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1493179907-22516-1-git-send-email-thuth@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Tested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
(cherry picked from commit b7da97eef7)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-31 11:21:45 -05:00
Philipp Kern
c152efc943 target-s390x: Mask the SIGP order_code to 8bit.
According to "CPU Signaling and Response", "Signal-Processor Orders",
the order field is bit position 56-63. Without this, the Linux
guest kernel is sometimes unable to stop emulation and enters
an infinite loop of "XXX unknown sigp: 0xffffffff00000005".

Signed-off-by: Philipp Kern <phil@philkern.de>
Reviewed-by: Thomas Huth <thuth@tuxfamily.org>
[agraf: add comment according to email]
Signed-off-by: Alexander Graf <agraf@suse.de>

(cherry picked from commit 601b9a9008)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-31 11:21:20 -05:00
Greg Kurz
077a67e929 9pfs: local: fix fchmodat_nofollow() limitations
This function has to ensure it doesn't follow a symlink that could be used
to escape the virtfs directory. This could be easily achieved if fchmodat()
on linux honored the AT_SYMLINK_NOFOLLOW flag as described in POSIX, but
it doesn't. There was a tentative to implement a new fchmodat2() syscall
with the correct semantics:

https://patchwork.kernel.org/patch/9596301/

but it didn't gain much momentum. Also it was suggested to look at an O_PATH
based solution in the first place.

The current implementation covers most use-cases, but it notably fails if:
- the target path has access rights equal to 0000 (openat() returns EPERM),
  => once you've done chmod(0000) on a file, you can never chmod() again
- the target path is UNIX domain socket (openat() returns ENXIO)
  => bind() of UNIX domain sockets fails if the file is on 9pfs

The solution is to use O_PATH: openat() now succeeds in both cases, and we
can ensure the path isn't a symlink with fstat(). The associated entry in
"/proc/self/fd" can hence be safely passed to the regular chmod() syscall.

The previous behavior is kept for older systems that don't have O_PATH.

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Tested-by: Zhi Yong Wu <zhiyong.wu@ucloud.cn>
Acked-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
(cherry picked from commit 4751fd5328)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-24 18:42:15 -05:00
Jeff Cody
f4f3529cfe block/nfs: fix mutex assertion in nfs_file_close()
Commit c096358e74 introduced assertion
checks for when qemu_mutex() functions are called without the
corresponding qemu_mutex_init() having initialized the mutex.

This uncovered a latent bug in qemu's nfs driver - in
nfs_client_close(), the NFSClient structure is overwritten with zeros,
prior to the mutex being destroyed.

Go ahead and destroy the mutex in nfs_client_close(), and change where
we call qemu_mutex_init() so that it is correctly balanced.

There are also a couple of memory leaks obscured by the memset, so this
fixes those as well.

Finally, we should be able to get rid of the memset(), as it isn't
necessary.

Cc: qemu-stable@nongnu.org
Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 113fe792fd)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-24 18:41:21 -05:00
Aleksandr Bezzubikov
5f7f7e4f05 hw/i386: allow SHPC for Q35 machine
Unmask previously masked SHPC feature in _OSC method.

Signed-off-by: Aleksandr Bezzubikov <zuban32s@gmail.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit a41c78c135)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-24 18:40:47 -05:00
Laurent Vivier
de9b6728ff cpu: don't allow negative core id
With pseries machine type a negative core-id is not managed properly:
-1 gives an inaccurate error message ("core -1 already populated"),
-2 crashes QEMU (core dump)

As it seems a negative value is invalid for any architecture,
instead of checking this in spapr_core_pre_plug() I think it's better
to check this in the generic part, core_prop_set_core_id()

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20170802103259.25940-1-lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
(cherry picked from commit be2960baae)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-24 18:40:13 -05:00
Kevin Wolf
a0ddbcfb68 block: Skip implicit nodes in query-block/blockstats
Commits 0db832f and 6cdbceb introduced the automatic insertion of filter
nodes above the top layer of mirror and commit block jobs. The
assumption made there was that since libvirt doesn't do node-level
management of the block layer yet, it shouldn't be affected by added
nodes.

This is true as far as commands issued by libvirt are concerned. It only
uses BlockBackend names to address nodes, so any operations it performs
still operate on the root of the tree as intended.

However, the assumption breaks down when you consider query commands,
which return data for the wrong node now. These commands also return
information on some child nodes (bs->file and/or bs->backing), which
libvirt does make use of, and which refer to the wrong nodes, too.

One of the consequences is that oVirt gets wrong information about the
image size and stops the VM in response as long as a mirror or commit
job is running:

https://bugzilla.redhat.com/show_bug.cgi?id=1470634

This patch fixes the problem by hiding the implicit nodes created
automatically by the mirror and commit block jobs in the output of
query-block and BlockBackend-based query-blockstats as long as the user
doesn't indicate that they are aware of those nodes by providing a node
name for them in the QMP command to start the block job.

The node-based commands query-named-block-nodes and query-blockstats
with query-nodes=true still show all nodes, including implicit ones.
This ensures that users that are capable of node-level management can
still access the full information; users that only know BlockBackends
won't use these commands.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Tested-by: Eric Blake <eblake@redhat.com>
(cherry picked from commit d3c8c67469)
 Conflicts:
	block/qapi.c
	include/block/block_int.h
* fix context deps on 46eade7b and 5a9347c6
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-24 18:21:56 -05:00
Kevin Wolf
d445e0a022 qemu-iotests: Test automatic commit job cancel on hot unplug
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
(cherry picked from commit c3971b883a)
*prereq for d3c8c674
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-24 18:18:34 -05:00
Alexander Graf
ad480ab20b input: Decrement queue count on kbd delay
Delays in the input layer are special cased input events. Every input
event is accounted for in a global intput queue count. The special cased
delays however did not get removed from the queue, leading to queue overruns
and thus silent key drops after typing quite a few characters.

Signed-off-by: Alexander Graf <agraf@suse.de>
Message-id: 1498117318-162102-1-git-send-email-agraf@suse.de
Fixes: be1a7176 ("input: add support for kbd delays")
Cc: qemu-stable@nongnu.org
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 77b0359bf4)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-24 18:07:42 -05:00
Gerd Hoffmann
f8d050a20a input: limit kbd queue depth
Apply a limit to the number of items we accept into the keyboard queue.

Impact: Without this limit vnc clients can exhaust host memory by
sending keyboard events faster than qemu feeds them to the guest.

Fixes: CVE-2017-8379
Cc: P J P <ppandit@redhat.com>
Cc: Huawei PSIRT <PSIRT@huawei.com>
Reported-by: jiangxin1@huawei.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170428084237.23960-1-kraxel@redhat.com
(cherry picked from commit fa18f36a46)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-24 18:07:25 -05:00
Jason Wang
9527514ceb virtio-net: fix offload ctrl endian
Spec said offloads should be le64, so use virtio_ldq_p() to guarantee
valid endian.

Fixes: 644c98587d ("virtio-net: dynamic network offloads configuration")
Cc: qemu-stable@nongnu.org
Cc: Dmitry Fleytman <dfleytma@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit 189ae6bb5c)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-24 18:07:16 -05:00
Greg Kurz
2a7526b0ce spapr: fix memory leak in spapr_core_pre_plug()
In case of error, we must ensure the dynamically allocated base_core_type
is freed, like it is done everywhere else in this function.

This is a regression introduced in QEMU 2.9 by commit 8149e2992f.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit df8658de43)
 Conflicts:
	hw/ppc/spapr.c
* fix context dep on 459264ef2
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-24 18:07:16 -05:00
Kevin Wolf
2e40aad231 commit: Add NULL check for overlay_bs
I can't see how overlay_bs could become NULL with the current code, but
other code in this function already checks it and we can make Coverity
happy with this check, so let's add it.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit b1e1fa0c3a)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-24 18:07:16 -05:00
Jason Wang
70da03f10c virtio-scsi: finalize IOMMU support
After converting to use DMA api for virtio devices, we should use
dma_as instead of address_space_memory. Otherwise it won't work if
IOMMU is enabled.

Fixes: commit 8607f5c307 ("virtio: convert to use DMA api")
Cc: qemu-stable@nongnu.org
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <1499170866-9068-1-git-send-email-jasowang@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 025bdeab3c)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-24 18:07:16 -05:00
Laurent Vivier
19284a0141 spapr: fix migration to pseries machine < 2.8
since commit 5c4537bd ("spapr: Fix 2.7<->2.8 migration of PCI host bridge"),
some migration fields are forged from the new ones in spapr_pci_pre_save().

It works well, except when the number of MSI devices is 0,
because in this case the function exits immediately.

This fix moves the migration code before the exit code.

The problem can be reproduced with these commands:

source qemu-2.9:

    qemu-system-ppc64 -monitor stdio -M pseries-2.6 -nodefaults -S

destination qemu-2.6:

    qemu-system-ppc64 -monitor stdio -M pseries-2.6 -nodefaults \
                      -incoming tcp:0:4444

on the source:

    migrate tcp:localhost:4444

Destination fails with the following error:

    qemu-system-ppc64: error while loading state for
                       instance 0x0 of device 'spapr_pci'
    qemu-system-ppc64: load of migration failed: Invalid argument

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit e806b4db14)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-24 18:07:16 -05:00
Alexander Graf
0060a3e936 hid: Reset kbd modifiers on reset
When resetting the keyboard, we need to reset not just the pending keystrokes,
but also any pending modifiers. Otherwise there's a race when we're getting
reset while running an escape sequence (modifier 0x100).

Cc: qemu-stable@nongnu.org
Signed-off-by: Alexander Graf <agraf@suse.de>
Message-id: 1498117295-162030-1-git-send-email-agraf@suse.de
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 51dbea77a2)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-24 16:59:22 -05:00
Bruce Rogers
e0398ccdf4 9pfs: local: remove: use correct path component
Commit a0e640a8 introduced a path processing error.
Pass fstatat the dirpath based path component instead
of the entire path.

Signed-off-by: Bruce Rogers <brogers@suse.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 790db7efdb)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-24 16:58:51 -05:00
Max Reitz
438cd1e6aa block: Do not strcmp() with NULL uri->scheme
uri_parse(...)->scheme may be NULL. In fact, probably every field may be
NULL, and the callers do test this for all of the other fields but not
for scheme (except for block/gluster.c; block/vxhs.c does not access
that field at all).

We can easily fix this by using g_strcmp0() instead of strcmp().

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170613205726.13544-1-mreitz@redhat.com
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit f69165a8fe)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-24 16:58:11 -05:00
Paolo Bonzini
40ed5cdf72 nbd: fix NBD over TLS
When attaching the NBD QIOChannel to an AioContext, the TLS channel should
be used, not the underlying socket channel.  This is because, trivially,
the TLS channel will be the one that we read/write to and thus the one
that will get the qio_channel_yield() call.

Fixes: ff82911cd3
Cc: qemu-stable@nongnu.org
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Tested-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 96d06835dc)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-24 16:57:32 -05:00
Max Reitz
2182791734 blkverify: Catch bs->exact_filename overflow
The bs->exact_filename field may not be sufficient to store the full
blkverify node filename. In this case, we should not generate a filename
at all instead of an unusable one.

Cc: qemu-stable@nongnu.org
Reported-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170613172006.19685-3-mreitz@redhat.com
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit 05cc758a3d)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-24 16:57:05 -05:00
Max Reitz
1828d47845 blkdebug: Catch bs->exact_filename overflow
The bs->exact_filename field may not be sufficient to store the full
blkdebug node filename. In this case, we should not generate a filename
at all instead of an unusable one.

Cc: qemu-stable@nongnu.org
Reported-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170613172006.19685-2-mreitz@redhat.com
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit de81d72d3d)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-24 16:57:00 -05:00
Kevin Wolf
1dd3ba38f6 commit: Fix completion with extra reference
commit_complete() can't assume that after its block_job_completed() the
job is actually immediately freed; someone else may still be holding
references. In this case, the op blockers on the intermediate nodes make
the graph reconfiguration in the completion code fail.

Call block_job_remove_all_bdrv() manually so that we know for sure that
any blockers on intermediate nodes are given up.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit 4f78a16fee)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-24 16:53:45 -05:00
Eric Blake
ecc7a24c11 nbd: Fix regression on resiliency to port scan
Back in qemu 2.5, qemu-nbd was immune to port probes (a transient
server would not quit, regardless of how many probe connections
came and went, until a connection actually negotiated).  But we
broke that in commit ee7d7aa when removing the return value to
nbd_client_new(), although that patch also introduced a bug causing
an assertion failure on a client that fails negotiation.  We then
made it worse during refactoring in commit 1a6245a (a segfault
before we could even assert); the (masked) assertion was cleaned
up in d3780c2 (still in 2.6), and just recently we finally fixed
the segfault ("nbd: Fully intialize client in case of failed
negotiation").  But that still means that ever since we added
TLS support to qemu-nbd, we have been vulnerable to an ill-timed
port-scan being able to cause a denial of service by taking down
qemu-nbd before a real client has a chance to connect.

Since negotiation is now handled asynchronously via coroutines,
we no longer have a synchronous point of return by re-adding a
return value to nbd_client_new().  So this patch instead wires
things up to pass the negotiation status through the close_fn
callback function.

Simple test across two terminals:
$ qemu-nbd -f raw -p 30001 file
$ nmap 127.0.0.1 -p 30001 && \
  qemu-io -c 'r 0 512' -f raw nbd://localhost:30001

Note that this patch does not change what constitutes successful
negotiation (thus, a client must enter transmission phase before
that client can be considered as a reason to terminate the server
when the connection ends).  Perhaps we may want to tweak things
in a later patch to also treat a client that uses NBD_OPT_ABORT
as being a 'successful' negotiation (the client correctly talked
the NBD protocol, and informed us it was not going to use our
export after all), but that's a discussion for another day.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1451614

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170608222617.20376-1-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 0c9390d978)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-24 16:52:49 -05:00
Eric Blake
ec49c8ac57 nbd: Fully initialize client in case of failed negotiation
If a non-NBD client connects to qemu-nbd, we would end up with
a SIGSEGV in nbd_client_put() because we were trying to
unregister the client's association to the export, even though
we skipped inserting the client into that list.  Easy trigger
in two terminals:

$ qemu-nbd -p 30001 --format=raw file
$ nmap 127.0.0.1 -p 30001

nmap claims that it thinks it connected to a pago-services1
server (which probably means nmap could be updated to learn the
NBD protocol and give a more accurate diagnosis of the open
port - but that's not our problem), then terminates immediately,
so our call to nbd_negotiate() fails.  The fix is to reorder
nbd_co_client_start() to ensure that all initialization occurs
before we ever try talking to a client in nbd_negotiate(), so
that the teardown sequence on negotiation failure doesn't fault
while dereferencing a half-initialized object.

While debugging this, I also noticed that nbd_update_server_watch()
called by nbd_client_closed() was still adding a channel to accept
the next client, even when the state was no longer RUNNING.  That
is fixed by making nbd_can_accept() pay attention to the current
state.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1451614

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170527030421.28366-1-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit df8ad9f128)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-24 16:52:10 -05:00
Kevin Wolf
f28b8906dd commit: Fix use after free in completion
The final bdrv_set_backing_hd() could be working on already freed nodes
because the commit job drops its references (through BlockBackends) to
both overlay_bs and top already a bit earlier.

One way to trigger the bug is hot unplugging a disk for which
blockdev_mark_auto_del() cancels the block job.

Fix this by taking BDS-level references while we're still using the
nodes.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
(cherry picked from commit 19ebd13ed4)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-24 16:51:19 -05:00
Max Filippov
bace1f90f9 target/xtensa: handle unknown registers in gdbstub
Xtensa cores may have registers of types/sizes not supported by the
gdbstub accessors. Ignore writes to such registers and return zero on
read, but always return correct register size, so that gdb on the other
side is able to access all registers in the packet holding unsupported
registers in the middle. This fixes gdb interaction with cores that have
vector/custom TIE registers.

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
(cherry picked from commit dd7b952b79)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-24 16:50:05 -05:00
Greg Kurz
3b2f3a4691 spapr: fix memory leak in spapr_memory_pre_plug()
The string returned by object_property_get_str() is dynamically allocated.

(Spotted by Coverity, CID 1375942)

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 8a9e0e7b89)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-24 16:47:06 -05:00
Laurent Vivier
7f4c9f5f3b spapr: add pre_plug function for memory
This allows to manage errors before the memory
has started to be hotplugged. We already have
the function for the CPU cores.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
[dwg: Fixed a couple of style nits]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

(cherry picked from commit c871bc70bb)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-24 16:46:50 -05:00
Greg Kurz
592ee4007e target/ppc: fix memory leak in kvmppc_is_mem_backend_page_size_ok()
The string returned by object_property_get_str() is dynamically allocated.

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 2d3e302ec2)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-24 16:39:31 -05:00
Greg Kurz
917a5b9f2f target/ppc: pass const string to kvmppc_is_mem_backend_page_size_ok()
This function has three implementations. Two are stubs that do nothing
and the third one only passes the obj_path argument to:

Object *object_resolve_path(const char *path, bool *ambiguous);

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit ec69355bef)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-24 16:39:25 -05:00
Eduardo Habkost
2401d8a490 pc: Use "min-[x]level" on compat_props
Since the automatic cpuid-level code was introduced in commit
c39c0edf9b ("target-i386: Automatically
set level/xlevel/xlevel2 when needed"), the CPU model tables just define
the default CPUID level code (set using "min-level").  Setting
"[x]level" forces CPUID level to a specific value and disable the
automatic-level logic.

But the PC compat code was not updated and the existing "[x]level"
compat properties broke compatibility for people using features that
triggered the auto-level code.  To keep previous behavior, we should set
"min-[x]level" instead of "[x]level" on compat_props.

This was not a problem for most cases, because old machine-types don't
have full-cpuid-auto-level enabled.  The only common use case it broke
was the CPUID[7] auto-level code, that was already enabled since the
first CPUID[7] feature was introduced (in QEMU 1.4.0).

This causes the regression reported at:
https://bugzilla.redhat.com/show_bug.cgi?id=1454641

Change the PC compat code to use "min-[x]level" instead of "[x]level" on
compat_props, and add new test cases to ensure we don't break this
again.

Reported-by: "Guo, Zhiyi" <zhguo@redhat.com>
Fixes: c39c0edf9b ("target-i386: Automatically set level/xlevel/xlevel2 when needed")
Cc: qemu-stable@nongnu.org
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
(cherry picked from commit 1f43571604)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-03 16:01:29 -05:00
Michael Roth
1775fe6148 monitor: fix object_del for command-line-created objects
Currently objects specified on the command-line are only partially
cleaned up when 'object_del' is issued in either HMP or QMP: the
object itself is fully finalized, but the QemuOpts are not removed.
This results in the following behavior:

  x86_64-softmmu/qemu-system-x86_64 -monitor stdio \
    -object memory-backend-ram,id=ram1,size=256M

  QEMU 2.7.91 monitor - type 'help' for more information
  (qemu) object_del ram1
  (qemu) object_del ram1
  object 'ram1' not found
  (qemu) object_add memory-backend-ram,id=ram1,size=256M
  Duplicate ID 'ram1' for object
  Try "help object_add" for more information

which can be an issue for use-cases like memory hotplug.

This happens on the HMP side because hmp_object_add() attempts to
create a temporary QemuOpts entry with ID 'ram1', which ends up
conflicting with the command-line-created entry, since it was never
cleaned up during the previous hmp_object_del() call.

We address this by adding a check in user_creatable_del(), which
is called by both qmp_object_del() and hmp_object_del() to handle
the actual object cleanup, to determine whether an option group entry
matching the object's ID is present and removing it if it is.

Note that qmp_object_add() never attempts to create a temporary
QemuOpts entry, so it does not encounter the duplicate ID error,
which is why this isn't generally visible in libvirt.

Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: Daniel Berrange <berrange@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1496531612-22166-3-git-send-email-mdroth@linux.vnet.ibm.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
(cherry picked from commit c645d5acee)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-03 16:01:28 -05:00
Michael Roth
b0a3eadd8c tests: check-qom-proplist: add checks for cmdline-created objects
check-qom-proplist originally added tests for verifying that
object-creation helpers object_new_with_{props,propv} behaved in
similar fashion to the "traditional" method involving setting each
individual property separately after object creation rather than
via a single call.

Another similar "helper" for creating Objects exists in the form of
objects specified via -object command-line parameters. By that
rationale, we extend check-qom-proplist to include similar checks
for command-line-created objects by employing the same
qemu_opts_parse()-based parsing the vl.c employs.

This parser has a side-effect of parsing the object's options into
a QemuOpt structure and registering this in the global QemuOptsList
using the Object's ID. This can conflict with future Object instances
that attempt to use the same ID if we don't ensure this is cleaned
up as part of Object finalization, so we include a FIXME stub to test
for this case, which will then be resolved in a subsequent patch.

Suggested-by: Daniel Berrange <berrange@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: Daniel Berrange <berrange@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1496531612-22166-2-git-send-email-mdroth@linux.vnet.ibm.com>
[Comment formatting tidied up]
Signed-off-by: Markus Armbruster <armbru@redhat.com>

(cherry picked from commit a1af255f06)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-03 16:01:28 -05:00
Paolo Bonzini
3b428e9512 linuxboot_dma: compile for i486
The ROM uses the cmovne instruction, which is new in Pentium Pro and does not
work when running QEMU with "-cpu 486".  Avoid producing that instruction.

Suggested-by: Richard W.M. Jones <rjones@redhat.com>
Suggested-by: Thomas Huth <thuth@redhat.com>
Reported-by: Rob Landley <rob@landley.net>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 7e01838510)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-03 16:01:28 -05:00
Ladi Prosek
11bac2f941 virtio-serial-bus: Unset hotplug handler when unrealize
Virtio serial device controls the lifetime of virtio-serial-bus and
virtio-serial-bus links back to the device via its hotplug-handler
property. This extra ref-count prevents the device from getting
finalized, leaving the VirtIODevice memory listener registered and
leading to use-after-free later on.

This patch addresses the same issue as Fam Zheng's
"virtio-scsi: Unset hotplug handler when unrealize"
only for a different virtio device.

Cc: qemu-stable@nongnu.org
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
(cherry picked from commit f811f97040)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-03 16:01:28 -05:00
Kevin Wolf
0ebbef1fa1 mirror: Drop permissions on s->target on completion
This fixes an assertion failure that was triggered by qemu-iotests 129
on some CI host, while the same test case didn't seem to fail on other
hosts.

Essentially the problem is that the blk_unref(s->target) in
mirror_exit() doesn't necessarily mean that the BlockBackend goes away
immediately. It is possible that the job completion was triggered nested
in mirror_drain(), which looks like this:

    BlockBackend *target = s->target;
    blk_ref(target);
    blk_drain(target);
    blk_unref(target);

In this case, the write permissions for s->target are retained until
after blk_drain(), which makes removing mirror_top_bs fail for the
active commit case (can't have a writable backing file in the chain
without the filter driver).

Explicitly dropping the permissions first means that the additional
reference doesn't hurt and the job can complete successfully even if
called from the nested blk_drain().

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit 63c8ef2890)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-03 16:01:28 -05:00
Eric Blake
64945cb5f3 block: Guarantee that *file is set on bdrv_get_block_status()
We document that *file is valid if the return is not an error and
includes BDRV_BLOCK_OFFSET_VALID, but forgot to obey this contract
when a driver (such as blkdebug) lacks a callback.  Messed up in
commit 67a0fd2 (v2.6), when we added the file parameter.

Enhance qemu-iotest 177 to cover this, using a sequence that would
print garbage or even SEGV, because it was dererefencing through
uninitialized memory.  [The resulting test output shows that we
have less-than-ideal block status from the blkdebug driver, but
that's a separate fix coming up soon.]

Setting *file on all paths that return BDRV_BLOCK_OFFSET_VALID is
enough to fix the crash, but we can go one step further: always
setting *file, even on error, means that a broken caller that
blindly dereferences file without checking for error is now more
likely to get a reliable SEGV instead of randomly acting on garbage,
making it easier to diagnose such buggy callers.  Adding an
assertion that file is set where expected doesn't hurt either.

CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 81c219ac6c)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-03 16:01:28 -05:00
Eric Blake
6a3f9c5c6e block: Simplify BDRV_BLOCK_RAW recursion
Since we are already in coroutine context during the body of
bdrv_co_get_block_status(), we can shave off a few layers of
wrappers when recursing to query the protocol when a format driver
returned BDRV_BLOCK_RAW.

Note that we are already using the correct recursion later on in
the same function, when probing whether the protocol layer is sparse
in order to find out if we can add BDRV_BLOCK_ZERO to an existing
BDRV_BLOCK_DATA|BDRV_BLOCK_OFFSET_VALID.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20170504173745.27414-1-eblake@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit ee29d6adef)
* prereq for 81c219a
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-03 16:01:28 -05:00
Eric Blake
3f3fe284ef tests: Add coverage for recent block geometry fixes
Use blkdebug's new geometry constraints to emulate setups that
have needed past regression fixes: write zeroes asserting
when running through a loopback block device with max-transfer
smaller than cluster size, and discard rounding away portions
of requests not aligned to preferred boundaries.  Also, add
coverage that the block layer is honoring max transfer limits.

For now, a single iotest performs all actions, with the idea
that we can add future blkdebug constraint test cases in the
same file; but it can be split into multiple iotests if we find
reason to run one portion of the test in more setups than what
are possible in the other.

For reference, the final portion of the test (checking whether
discard passes as much as possible to the lowest layers of the
stack) works as follows:

qemu-io: discard 30M at 80000001, passed to blkdebug
  blkdebug: discard 511 bytes at 80000001, -ENOTSUP (smaller than
blkdebug's 512 align)
  blkdebug: discard 14371328 bytes at 80000512, passed to qcow2
    qcow2: discard 739840 bytes at 80000512, -ENOTSUP (smaller than
qcow2's 1M align)
    qcow2: discard 13M bytes at 77M, succeeds
  blkdebug: discard 15M bytes at 90M, passed to qcow2
    qcow2: discard 15M bytes at 90M, succeeds
  blkdebug: discard 1356800 bytes at 105M, passed to qcow2
    qcow2: discard 1M at 105M, succeeds
    qcow2: discard 308224 bytes at 106M, -ENOTSUP (smaller than qcow2's
1M align)
  blkdebug: discard 1 byte at 111457280, -ENOTSUP (smaller than
blkdebug's 512 align)

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170429191419.30051-10-eblake@redhat.com
[mreitz: For cooperation with image locking, add -r to the qemu-io
         invocation which verifies the image content]
Signed-off-by: Max Reitz <mreitz@redhat.com>

(cherry picked from commit 40812d9373)
 Conflicts:
	tests/qemu-iotests/group
* dropped context dependency on other test groups
* prereq for 81c219a
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-03 16:01:24 -05:00
Eric Blake
48f2dc0657 blkdebug: Add ability to override unmap geometries
Make it easier to simulate various unusual hardware setups (for
example, recent commits 3482b9b and b8d0a98 affect the Dell
Equallogic iSCSI with its 15M preferred and maximum unmap and
write zero sizing, or b2f95fe deals with the Linux loopback
block device having a max_transfer of 64k), by allowing blkdebug
to wrap any other device with further restrictions on various
alignments.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170429191419.30051-9-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit 430b26a82d)
* prereq for 81c219a
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-03 16:01:19 -05:00
Eric Blake
3ae74003b5 blkdebug: Simplify override logic
Rather than store into a local variable, then copy to the struct
if the value is valid, then reporting errors otherwise, it is
simpler to just store into the struct and report errors if the
value is invalid.  This however requires that the struct store
a 64-bit number, rather than a narrower type.  Likewise, setting
a sane errno value in ret prior to the sequence of parsing and
jumping to out: on error makes it easier for the next patch to
add a chain of similar checks.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-id: 20170429191419.30051-8-eblake@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit 3dc834f879)
* prereq for 81c219a
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-03 16:01:15 -05:00
Eric Blake
577cf9e6bb blkdebug: Add pass-through write_zero and discard support
In order to test the effects of artificial geometry constraints
on operations like write zero or discard, we first need blkdebug
to manage these actions.  It also allows us to inject errors on
those operations, just like we can for read/write/flush.

We can also test the contract promised by the block layer; namely,
if a device has specified limits on alignment or maximum size,
then those limits must be obeyed (for now, the blkdebug driver
merely inherits limits from whatever it is wrapping, but the next
patch will further enhance it to allow specific limit overrides).

This patch intentionally refuses to service requests smaller than
the requested alignments; this is because an upcoming patch adds
a qemu-iotest to prove that the block layer is correctly handling
fragmentation, but the test only works if there is a way to tell
the difference at artificial alignment boundaries when blkdebug is
using a larger-than-default alignment.  If we let the blkdebug
layer always defer to the underlying layer, which potentially has
a smaller granularity, the iotest will be thwarted.

Tested by setting up an NBD server with export 'foo', then invoking:
$ ./qemu-io
qemu-io> open -o driver=blkdebug blkdebug::nbd://localhost:10809/foo
qemu-io> d 0 15M
qemu-io> w -z 0 15M

Pre-patch, the server never sees the discard (it was silently
eaten by the block layer); post-patch it is passed across the
wire.  Likewise, pre-patch the write is always passed with
NBD_WRITE (with 15M of zeroes on the wire), while post-patch
it can utilize NBD_WRITE_ZEROES (for less traffic).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170429191419.30051-7-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit 63188c2450)
* prereq for 81c219a
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-03 16:01:11 -05:00
Eric Blake
138cf638ba blkdebug: Refactor error injection
Rather than repeat the logic at each caller of checking if a Rule
exists that warrants an error injection, fold that logic into
inject_error(); and rename it to rule_check() for legibility.
This will help the next patch, which adds two more callers that
need to check rules for the potential of injecting errors.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170429191419.30051-6-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit d157ed5f72)
* prereq for 81c219a
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-03 16:01:08 -05:00
Eric Blake
a1a3d603c4 blkdebug: Sanity check block layer guarantees
Commits 04ed95f4 and 1a62d0ac updated the block layer to auto-fragment
any I/O to fit within device boundaries. Additionally, when using a
minimum alignment of 4k, we want to ensure the block layer does proper
read-modify-write rather than requesting I/O on a slice of a sector.
Let's enforce that the contract is obeyed when using blkdebug.  For
now, blkdebug only allows alignment overrides, and just inherits other
limits from whatever device it is wrapping, but a future patch will
further enhance things.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170429191419.30051-5-eblake@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit e0ef439588)
* prereq for 81c219a
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-03 16:00:51 -05:00
Yunjian Wang
0b185544c9 virtio-net: fix wild pointer when remove virtio-net queues
The tx_bh or tx_timer will free in virtio_net_del_queue() function, when
removing virtio-net queues if the guest doesn't support multiqueue. But
it might be still referenced by virtio_net_set_status(), which needs to
be set NULL. And also the tx_waiting needs to be set zero to prevent
virtio_net_set_status() accessing tx_bh or tx_timer.

Cc: qemu-stable@nongnu.org
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit f989c30cf8)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-03 14:52:02 -05:00
Halil Pasic
f3676379cc s390x/css: catch section mismatch on load
Prior to the virtio-ccw-2.7 machine (and commit 2a79eb1a), our virtio
devices residing under the virtual-css bus do not have qdev_path based
migration stream identifiers (because their qdev_path is NULL). The ids
are instead generated when the device is registered as a composition of
the so called idstr, which takes the vmsd name as its value, and an
instance_id, which is which is calculated as a maximal instance_id
registered with the same idstr plus one, or zero (if none was registered
previously).

That means, under certain circumstances, one device might try, and even
succeed, to load the state of a different device. This can lead to
trouble.

Let us fail the migration if the above problem is detected during load.

How to reproduce the problem:
1) start qemu-system-s390x making sure you have the following devices
   defined on your command line:
     -device virtio-rng-ccw,id=rng1,devno=fe.0.0001
     -device virtio-rng-ccw,id=rng2,devno=fe.0.0002
2) detach the devices and reattach in reverse order using the monitor:
     (qemu) device_del rng1
     (qemu) device_del rng2
     (qemu) device_add virtio-rng-ccw,id=rng2,devno=fe.0.0002
     (qemu) device_add virtio-rng-ccw,id=rng1,devno=fe.0.0001
3) save the state of the vm into a temporary file and quit QEMU:
     (qemu) migrate "exec:gzip -c > /tmp/tmp_vmstate.gz"
     (qemu) q
4) use your command line from step 1 with
     -incoming "exec:gzip -c -d /tmp/tmp_vmstate.gz"
   appended to reproduce the problem (while trying to to load the saved vm)

CC: qemu-stable@nongnu.org
Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Message-Id: <20170518111405.56947-1-pasic@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
(cherry picked from commit 8ed179c937)
* removed context dep on d8d98db5
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-03 14:50:49 -05:00
Sameeh Jubran
4921c573c8 e1000e: Fix ICR "Other" causes clear logic
This commit fixes a bug which causes the guest to hang. The bug was
observed upon a "receive overrun" (bit #6 of the ICR register)
interrupt which could be triggered post migration in a heavy traffic
environment. Even though the "receive overrun" bit (#6) is masked out
by the IMS register (refer to the log below) the driver still receives
an interrupt as the "receive overrun" bit (#6) causes the "Other" -
bit #24 of the ICR register - bit to be set as documented below. The
driver handles the interrupt and clears the "Other" bit (#24) but
doesn't clear the "receive overrun" bit (#6) which leads to an
infinite loop. Apparently the Windows driver expects that the "receive
overrun" bit and other ones - documented below - to be cleared when
the "Other" bit (#24) is cleared.

So to sum that up:
1. Bit #6 of the ICR register is set by heavy traffic
2. As a results of setting bit #6, bit #24 is set
3. The driver receives an interrupt for bit 24 (it doesn't receieve an
   interrupt for bit #6 as it is masked out by IMS)
4. The driver handles and clears the interrupt of bit #24
5. Bit #6 is still set.
6. 2 happens all over again

The Interrupt Cause Read - ICR register:

The ICR has the "Other" bit - bit #24 - that is set when one or more
of the following ICR register's bits are set:

LSC - bit #2, RXO - bit #6, MDAC - bit #9, SRPD - bit #16, ACK - bit
#17, MNG - bit #18

This bug can occur with any of these bits depending on the driver's
behaviour and the way it configures the device. However, trying to
reproduce it with any bit other than RX0 is challenging and came to
failure as the drivers don't implement most of these bits, trying to
reproduce it with LSC (Link Status Change - bit #2) bit didn't succeed
too as it seems that Windows handles this bit differently.

Log sample of the storm:

27563@1494850819.411877:e1000e_irq_pending_interrupts ICR PENDING: 0x1000000 (ICR: 0x815000c2, IMS: 0x1a00004)
27563@1494850819.411900:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x815000c2, IMS: 0xa00004)
27563@1494850819.411915:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x815000c2, IMS: 0xa00004)
27563@1494850819.412380:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x815000c2, IMS: 0xa00004)
27563@1494850819.412395:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x815000c2, IMS: 0xa00004)
27563@1494850819.412436:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x815000c2, IMS: 0xa00004)
27563@1494850819.412441:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x815000c2, IMS: 0xa00004)
27563@1494850819.412998:e1000e_irq_pending_interrupts ICR PENDING: 0x1000000 (ICR: 0x815000c2, IMS: 0x1a00004)

* This bug behaviour wasn't observed with the Linux driver.

This commit solves:
https://bugzilla.redhat.com/show_bug.cgi?id=1447935
https://bugzilla.redhat.com/show_bug.cgi?id=1449490

Cc: qemu-stable@nongnu.org
Signed-off-by: Sameeh Jubran <sjubran@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit 82342e91b6)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-03 14:44:08 -05:00
Fam Zheng
952cc38204 virtio-scsi: Unset hotplug handler when unrealize
This matches the qbus_set_hotplug_handler in realize, and it releases
the final reference to the embedded VirtIODevice so that it is
properly finalized.

A use-after-free is fixed with this patch, indirectly:
virtio_device_instance_finalize wasn't called at hot-unplug, and the
vdev->listener would be a dangling pointer in the global and the per
address space listener list. See also RHBZ 1449031.

Cc: qemu-stable@nongnu.org
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20170518102808.30046-1-famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 2cbe2de545)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-03 14:42:49 -05:00
Greg Kurz
c6b510d1e5 virtio: allow broken device to notify guest
According to section 2.1.2 of the virtio-1 specification:

"The device SHOULD set DEVICE_NEEDS_RESET when it enters an error state that
a reset is needed. If DRIVER_OK is set, after it sets DEVICE_NEEDS_RESET,
the device MUST send a device configuration change notification to the
driver."

Commit "f5ed36635d8f virtio: stop virtqueue processing if device is broken"
introduced a virtio_error() call that just does that:

- internally mark the device as broken
- set the DEVICE_NEEDS_RESET bit in the status
- send a configuration change notification

Unfortunately, virtio_notify_vector(), called by virtio_notify_config(),
returns right away when the device is marked as broken and the notification
isn't sent in this case.

The spec doesn't say whether a broken device can send notifications
in other situations or not. But since the driver isn't supposed to do
anything but to reset the device, it makes sense to keep the check in
virtio_notify_config().

Marking the device as broken AFTER the configuration change notification was
sent is enough to fix the issue.

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 66453cff9e)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-03 14:42:05 -05:00
Hervé Poussineau
636eacb641 vvfat: fix qemu-img map and qemu-img convert
- bs->total_sectors is the number of sectors of the whole disk
- s->sector_count is the number of sectors of the FAT partition

This fixes the following assert in qemu-img map:
qemu-img.c:2641: get_block_status: Assertion `nb_sectors' failed.

This also fixes an infinite loop in qemu-img convert.

Fixes: 4480e0f924
Fixes: https://bugs.launchpad.net/qemu/+bug/1599539
Cc: qemu-stable@nongnu.org
Signed-off-by: Hervé Poussineau <hpoussin@reactos.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 139921aaa7)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-03 14:35:15 -05:00
Alberto Garcia
c60a8ed89b stream: fix crash in stream_start() when block_job_create() fails
The code that tries to reopen a BlockDriverState in stream_start()
when the creation of a new block job fails crashes because it attempts
to dereference a pointer that is known to be NULL.

This is a regression introduced in a170a91fd3,
likely because the code was copied from stream_complete().

Cc: qemu-stable@nongnu.org
Reported-by: Kashyap Chamarthy <kchamart@redhat.com>
Signed-off-by: Alberto Garcia <berto@igalia.com>
Tested-by: Kashyap Chamarthy <kchamart@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 525989a50a)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-03 14:33:26 -05:00
Paolo Bonzini
c79bef68c4 curl: avoid recursive locking of BDRVCURLState mutex
The curl driver has a ugly hack where, if it cannot find an empty CURLState,
it just uses aio_poll to wait for one to be empty.  This is probably
buggy when used together with dataplane, and the simplest way to fix it
is to use coroutines instead.

A more immediate effect of the bug however is that it can cause a
recursive call to curl_readv_bh_cb and recursively taking the
BDRVCURLState mutex.  This causes a deadlock.

The fix is to unlock the mutex around aio_poll, but for cleanliness we
should also take the mutex around all calls to curl_init_state, even if
reaching the unlock/lock pair is impossible.  The same is true for
curl_clean_state.

Reported-by: Kun Wei <kuwei@redhat.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20170515100059.15795-4-pbonzini@redhat.com
Cc: qemu-stable@nongnu.org
Cc: Jeff Cody <jcody@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
(cherry picked from commit 456af34629)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-03 14:32:58 -05:00
Paolo Bonzini
4b519b9fd7 curl: never invoke callbacks with s->mutex held
All curl callbacks go through curl_multi_do, and hence are called with
s->mutex held.  Note that with comments, and make curl_read_cb drop the
lock before invoking the callback.

Likewise for curl_find_buf, where the callback can be invoked by the
caller.

Cc: qemu-stable@nongnu.org
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170515100059.15795-3-pbonzini@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
(cherry picked from commit 34db05e7ff)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-03 14:32:52 -05:00
Paolo Bonzini
f00c08cbac curl: strengthen assertion in curl_clean_state
curl_clean_state should only be called after all AIOCBs have been
completed.  This is not so obvious for the call from curl_detach_aio_context,
so assert that.

Cc: qemu-stable@nongnu.org
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170515100059.15795-2-pbonzini@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
(cherry picked from commit 675a775633)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-03 14:32:46 -05:00
Max Filippov
d81db0beca target/xtensa: fix return value of read/write simcalls
Return value of read/write simcalls is not calculated correctly in case
of operations crossing page boundary and in case of short reads/writes.
Read and write simcalls should return the size of data actually
read/written or -1 in case of error.

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
(cherry picked from commit 347ec03093)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-03 14:32:06 -05:00
Max Filippov
e442253479 target/xtensa: fix mapping direction in read/write simcalls
Read and write simcalls map physical memory to access I/O buffers, but
'read' simcall need to map it for writing and 'write' simcall need to
map it for reading, i.e. the opposite of what they do now. Fix that.

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
(cherry picked from commit 30c2afd151)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-03 14:31:52 -05:00
John Snow
af8ca55a6b blockdev: use drained_begin/end for qmp_block_resize
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1447551

If one tries to issue a block_resize while a guest is busy
accessing the disk, it is possible that qemu may deadlock
when invoking aio_poll from both the main loop and the iothread.

Replace another instance of bdrv_drain_all that doesn't
quite belong.

Cc: qemu-stable@nongnu.org
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 698bdfa07d)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-03 14:29:55 -05:00
Max Reitz
5797a36abd block: Add errp to b{lk,drv}_truncate()
For one thing, this allows us to drop the error message generation from
qemu-img.c and blockdev.c and instead have it unified in
bdrv_truncate().

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170328205129.15138-3-mreitz@redhat.com
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit ed3d2ec98a)
* prereq for 698bdfa
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-03 14:29:55 -05:00
Max Reitz
73aa7ad7d2 block/vhdx: Make vhdx_create() always set errp
This patch makes vhdx_create() always set errp in case of an error. It
also adds errp parameters to vhdx_create_bat() and
vhdx_create_new_region_table() so we can pass on the error object
generated by blk_truncate() as of a future commit.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 20170328205129.15138-2-mreitz@redhat.com
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
(cherry picked from commit 55b9392b98)
* prereq for 698bdfa
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-08-03 14:29:10 -05:00
Anton Nefedov
d8cddcc2b9 qemu-img: wait for convert coroutines to complete
On error path (like i/o error in one of the coroutines), it's required to
  - wait for coroutines completion before cleaning the common structures
  - reenter dependent coroutines so they ever finish

Introduced in 2d9187bc65.

Cc: qemu-stable@nongnu.org
Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit b91127edd0)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-07-31 17:23:52 -05:00
Stefan Hajnoczi
ce119247a1 aio: add missing aio_notify() to aio_enable_external()
The main loop uses aio_disable_external()/aio_enable_external() to
temporarily disable processing of external AioContext clients like
device emulation.

This allows monitor commands to quiesce I/O and prevent the guest from
submitting new requests while a monitor command is in progress.

The aio_enable_external() API is currently broken when an IOThread is in
aio_poll() waiting for fd activity when the main loop re-enables
external clients.  Incrementing ctx->external_disable_cnt does not wake
the IOThread from ppoll(2) so fd processing remains suspended and leads
to unresponsive emulated devices.

This patch adds an aio_notify() call to aio_enable_external() so the
IOThread is kicked out of ppoll(2) and will re-arm the file descriptors.

The bug can be reproduced as follows:

  $ qemu -M accel=kvm -m 1024 \
         -object iothread,id=iothread0 \
         -device virtio-scsi-pci,iothread=iothread0,id=virtio-scsi-pci0 \
         -drive if=none,id=drive0,aio=native,cache=none,format=raw,file=test.img \
         -device scsi-hd,id=scsi-hd0,drive=drive0 \
         -qmp tcp::5555,server,nowait

  $ scripts/qmp/qmp-shell localhost:5555
  (qemu) blockdev-snapshot-sync device=drive0 snapshot-file=sn1.qcow2
         mode=absolute-paths format=qcow2

After blockdev-snapshot-sync completes the SCSI disk will be
unresponsive.  This leads to request timeouts inside the guest.

Reported-by: Qianqian Zhu <qizhu@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20170508180705.20609-1-stefanha@redhat.com
Suggested-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 321d1dba8b)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-07-31 17:21:41 -05:00
Zhiyong Yang
0e727a207e hw/virtio: fix vhost user fails to startup when MQ
Qemu2.7~2.9 and vhost user for dpdk 17.02 release work together
to cause failures of new connection when negotiating to set MQ.
(one queue pair works well).
   Because there exist some bugs in qemu code when introducing
VHOST_USER_PROTOCOL_F_REPLY_ACK to qemu. When vhost_user_set_mem_table
is invoked to deal with the vhost message VHOST_USER_SET_MEM_TABLE
for the second time, qemu indeed doesn't send the messge (The message
needs to be sent only once)but still will be waiting for dpdk's reply
ack, then, qemu is always freezing, while DPDK is always waiting for
next vhost message from qemu.
  The patch aims to fix the bug, MQ can work well.
  The same bug is found in function vhost_user_net_set_mtu, it is fixed
at the same time.
  DPDK related patch is as following:
  http://www.dpdk.org/dev/patchwork/patch/23955/

Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
Cc: qemu-stable@nongnu.org
Fixes: ca525ce561 ("vhost-user: Introduce a new protocol feature REPLY_ACK.")
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Jens Freimann <jfreiman@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
(cherry picked from commit 60cd11024f)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-07-31 17:19:32 -05:00
Fam Zheng
d2fcb92b18 block: Reuse bs as backing hd for drive-backup sync=none
Opening the backing image for the second time is bad, especially here
when it is also in use as the active image as the source. The
drive-backup job itself doesn't read from target->backing for COW,
instead it gets data from the write notifier, so it's not a big problem.
However, exporting the target to NBD etc. won't work, because of the
likely stale metadata cache.

Use BDRV_O_NO_BACKING in this case and manually set up the backing
BdrvChild.

Cc: qemu-stable@nongnu.org
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit fc0932fdcf)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-07-31 17:11:50 -05:00
Eric Blake
e59084b5b2 qobject: Use simpler QDict/QList scalar insertion macros
We now have macros in place to make it less verbose to add a scalar
to QDict and QList, so use them.

Patch created mechanically via:
  spatch --sp-file scripts/coccinelle/qobject.cocci \
    --macro-file scripts/cocci-macro-file.h --dir . --in-place
then touched up manually to fix a couple of '?:' back to original
spacing, as well as avoiding a long line in monitor.c.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20170427215821.19397-7-eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
(cherry picked from commit 46f5ac205a)
* prereq for fc0932f
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-07-31 17:11:11 -05:00
Eric Blake
1eaf431077 s390x: Drop useless casts
An upcoming Coccinelle cleanup script wanted to reformat the casts
present in this file - but on closer look, we don't need the casts
at all because C automatically converts void* to any other pointer.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170405194741.18956-4-eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
(cherry picked from commit cb55c19a26)
* prereq for 46f5ac2
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-07-31 17:09:19 -05:00
Eric Blake
396474a18c qobject: Add helper macros for common scalar insertions
Rather than making lots of callers wrap a scalar in a QInt, QString,
or QBool, provide helper macros that do the wrapping automatically.

Update the Coccinelle script to make mass conversions easy, although
the conversion itself will be done as a separate patches to ease
review and backport efforts.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20170427215821.19397-6-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
(cherry picked from commit a92c21591b)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-07-31 17:07:14 -05:00
Eric Blake
3f308bf3ac qobject: Drop useless QObject casts
We have macros in place to make it less verbose to add a subtype
of QObject to both QDict and QList. While we have made cleanups
like this in the past (see commit fcfcd8ffc, for example), having
it be automated by Coccinelle makes it easier to maintain.

Patch created mechanically via:
  spatch --sp-file scripts/coccinelle/qobject.cocci \
    --macro-file scripts/cocci-macro-file.h --dir . --in-place
then I verified that no manual touchups were required.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20170427215821.19397-5-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
(cherry picked from commit de6e7951fe)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-07-31 17:07:06 -05:00
Eric Blake
21047240d0 coccinelle: Add script to remove useless QObject casts
We have macros in place to make it less verbose to add a subtype
of QObject to both QDict and QList. While we have made cleanups
like this in the past (see commit fcfcd8ffc, for example), having
it be automated by Coccinelle makes it easier to maintain.

The script is separate from the cleanups, for ease of review and
backporting.  A later patch will then add further possible cleanups.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20170427215821.19397-4-eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
(cherry picked from commit a2f3453ebc)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-07-31 17:06:58 -05:00
Greg Kurz
785d9ab216 9pfs: local: fix unlink of alien files in mapped-file mode
When trying to remove a file from a directory, both created in non-mapped
mode, the file remains and EBADF is returned to the guest.

This is a regression introduced by commit "df4938a6651b 9pfs: local:
unlinkat: don't follow symlinks" when fixing CVE-2016-9602. It changed the
way we unlink the metadata file from

    ret = remove("$dir/.virtfs_metadata/$name");
    if (ret < 0 && errno != ENOENT) {
         /* Error out */
    }
    /* Ignore absence of metadata */

to

    fd = openat("$dir/.virtfs_metadata")
    unlinkat(fd, "$name")
    if (ret < 0 && errno != ENOENT) {
         /* Error out */
    }
    /* Ignore absence of metadata */

If $dir was created in non-mapped mode, openat() fails with ENOENT and
we pass -1 to unlinkat(), which fails in turn with EBADF.

We just need to check the return of openat() and ignore ENOENT, in order
to restore the behaviour we had with remove().

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
[groug: rewrote the comments as suggested by Eric]

(cherry picked from commit 6a87e7929f)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-07-31 16:56:59 -05:00
Markus Armbruster
45b3eac752 replication: Make --disable-replication compile again
Broken in commit daa33c5.

Cc: qemu-stable@nongnu.org
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
Message-id: 1493298053-17140-1-git-send-email-armbru@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 38bb54f323)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-07-31 16:55:42 -05:00
Bruce Rogers
c64d184584 ACPI: don't call acpi_pcihp_device_plug_cb on xen
Commit f0c9d64a exposed the issue that with a xenfv machine using
pci passthrough, acpi pci hotplug code was being executed by mistake.
Guard calls to acpi_pcihp_device_plug_cb (and corresponding
acpi_pcihp_device_unplug_cb) with a check for xen_enabled(). Without
this check I am seeing an error that the bus doesn't have the
acpi-pcihp-bsel property set.

Signed-off-by: Bruce Rogers <brogers@suse.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 153eba4726)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-07-31 16:55:12 -05:00
Max Reitz
c1059a3a3c block: Do not unref bs->file on error in BD's open
The block layer takes care of removing the bs->file child if the block
driver's bdrv_open()/bdrv_file_open() implementation fails. The block
driver therefore does not need to do so, and indeed should not unless it
sets bs->file to NULL afterwards -- because if this is not done, the
bdrv_unref_child() in bdrv_open_inherit() will dereference the freed
memory block at bs->file afterwards, which is not good.

We can now decide whether to add a "bs->file = NULL;" after each of the
offending bdrv_unref_child() invocations, or just drop them altogether.
The latter is simpler, so let's do that.

Cc: qemu-stable <qemu-stable@nongnu.org>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit de234897b6)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-07-31 16:44:54 -05:00
Herongguang (Stephen)
0b906e4d0a pci: deassert intx when pci device unrealize
If a pci device is not reset by VM (by writing into config space)
and unplugged by VM, after that when VM reboots, qemu may assert:
pcibus_reset: Assertion `bus->irq_count[i] == 0' failed

Cc: qemu-stable@nongnu.org
Signed-off-by: herongguang <herongguang.he@huawei.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 3936161f1f)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-07-31 16:43:34 -05:00
Daniel P. Berrange
181e005f14 migration: setup bi-directional I/O channel for exec: protocol
Historically the migration data channel has only needed to be
unidirectional. Thus the 'exec:' protocol was requesting an
I/O channel with O_RDONLY on incoming side, and O_WRONLY on
the outgoing side.

This is fine for classic migration, but if you then try to run
TLS over it, this fails because the TLS handshake requires a
bi-directional channel.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
(cherry picked from commit 062d81f0e9)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-07-31 16:42:57 -05:00
Max Reitz
b8420f7102 iotests/051: Add test for empty filename
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 42dc10f17a)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-07-31 16:23:33 -05:00
Max Reitz
bd1039b463 block: An empty filename counts as no filename
Reproducer:
    $ ./qemu-img info ''
    qemu-img: ./block.c:1008: bdrv_open_driver: Assertion
        `!drv->bdrv_needs_filename || bs->filename[0]' failed.
    [1]    26105 abort (core dumped)  ./qemu-img info ''

This patch fixes this to be:
    $ ./qemu-img info ''
    qemu-img: Could not open '': The 'file' block driver requires a file
    name

Cc: qemu-stable <qemu-stable@nongnu.org>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 4a0082401a)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-07-31 16:23:04 -05:00
Max Reitz
bc70597a48 qemu-img/convert: Move bs_n > 1 && -B check down
It does not make much sense to use a backing image for the target when
you concatenate multiple images (because then there is no correspondence
between the source images' backing files and the target's); but it was
still possible to give one by using -o backing_file=X instead of -B X.

Fix this by moving the check.

(Also, change the error message because -B is not the only way to
 specify the backing file, evidently.)

Cc: qemu-stable <qemu-stable@nongnu.org>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
* applied patch from v1 of series as suggested by author
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-07-31 16:22:00 -05:00
Max Reitz
a1c850fd9d qemu-img/convert: Use @opts for one thing only
After storing the creation options for the new image into @opts, we
fetch some things for our own information, like the backing file name,
or whether to use encryption or preallocation.

With the -n parameter, there will not be any creation options; this is
not too bad because this just means that querying a NULL @opts will
always return the default value.

However, we also use @opts for the --object options. Therefore, @opts is
not necessarily NULL if -n was specified; instead, it may contain those
options. In practice, this probably does not cause any problems because
there most likely is no object that supports any of the parameters we
query here, but this is neither something we should rely on nor does
this variable reuse make the code very nice to read.

Therefore, just use an own variable for the --object options.

Cc: qemu-stable <qemu-stable@nongnu.org>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
* applied patch from v1 of series as suggested by author
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-07-31 16:20:06 -05:00
Max Reitz
c37a62b751 qemu-img/convert: Always set ret < 0 on error
Otherwise the qemu-img process will exit with EXIT_SUCCESS instead of
EXIT_FAILURE.

Cc: qemu-stable <qemu-stable@nongnu.org>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
* applied directly to stable, upstream code has issue fixed via a
  refactoring introduced by 9fd77f9, which isn't targetted for stable
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-07-31 15:55:25 -05:00
Eric Blake
4aa16db9cf dirty-bitmap: Report BlockDirtyInfo.count in bytes, as documented
We've been documenting the value in bytes since its introduction
in commit b9a9b3a4 (v1.3), where it was actually reported in bytes.

Commit e4654d2 (v2.0) then removed things from block/qapi.c, in
preparation for a rewrite to a list of dirty sectors in the next
commit 21b5683 in block.c, but the new code mistakenly started
reporting in sectors.

Fixes: https://bugzilla.redhat.com/1441460

CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 6c98c57af3)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-07-31 15:33:09 -05:00
Sameeh Jubran
27dd31f164 qga-win: Enable 'can-offline' field in 'guest-get-vcpus' reply
The QGA schema states:

@can-offline: Whether offlining the VCPU is possible. This member
               is always filled in by the guest agent when the structure
               is returned, and always ignored on input (hence it can be
               omitted then).

Currently 'can-offline' is missing entirely from the reply. This causes
errors in libvirt which is expecting the reply to be compliant with the
schema docs.

BZ#1438735: https://bugzilla.redhat.com/show_bug.cgi?id=1438735

Signed-off-by: Sameeh Jubran <sameeh@daynix.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
(cherry picked from commit 54858553de)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-07-31 15:25:56 -05:00
369 changed files with 11438 additions and 2835 deletions

View File

@@ -957,6 +957,7 @@ M: Paolo Bonzini <pbonzini@redhat.com>
S: Supported
F: include/hw/scsi/*
F: hw/scsi/*
F: tests/scsi-disk-test.c
F: tests/virtio-scsi-test.c
T: git git://github.com/bonzini/qemu.git scsi-next
@@ -1395,6 +1396,7 @@ S: Supported
F: qobject/
F: include/qapi/qmp/
X: include/qapi/qmp/dispatch.h
F: scripts/coccinelle/qobject.cocci
F: tests/check-qdict.c
F: tests/check-qfloat.c
F: tests/check-qint.c
@@ -1419,6 +1421,7 @@ F: qom/
X: qom/cpu.c
F: tests/check-qom-interface.c
F: tests/check-qom-proplist.c
F: tests/check-qom-props.c
F: tests/qom-test.c
QMP

View File

@@ -36,6 +36,10 @@ endif
PROGS=$(QEMU_PROG) $(QEMU_PROGW)
STPFILES=
ifdef CONFIG_LINUX_USER
PROGS+=$(QEMU_PROG)-binfmt
endif
config-target.h: config-target.h-timestamp
config-target.h-timestamp: config-target.mak
@@ -121,6 +125,8 @@ QEMU_CFLAGS+=-I$(SRC_PATH)/linux-user/$(TARGET_ABI_DIR) \
obj-y += linux-user/
obj-y += gdbstub.o thunk.o user-exec.o user-exec-stub.o
obj-binfmt-y += linux-user/
endif #CONFIG_LINUX_USER
#########################################################
@@ -169,7 +175,11 @@ endif # CONFIG_SOFTMMU
# Workaround for http://gcc.gnu.org/PR55489, see configure.
%/translate.o: QEMU_CFLAGS += $(TRANSLATE_OPT_CFLAGS)
ifdef CONFIG_LINUX_USER
dummy := $(call unnest-vars,,obj-y obj-binfmt-y)
else
dummy := $(call unnest-vars,,obj-y)
endif
all-obj-y := $(obj-y)
target-obj-y :=
@@ -211,6 +221,9 @@ ifdef CONFIG_DARWIN
$(call quiet-command,SetFile -a C $@,"SETFILE","$(TARGET_DIR)$@")
endif
$(QEMU_PROG)-binfmt: $(obj-binfmt-y)
$(call LINK,$^)
gdbstub-xml.c: $(TARGET_XML_FILES) $(SRC_PATH)/scripts/feature_to_c.sh
$(call quiet-command,rm -f $@ && $(SHELL) $(SRC_PATH)/scripts/feature_to_c.sh $@ $(TARGET_XML_FILES),"GEN","$(TARGET_DIR)$@")

View File

@@ -1 +1 @@
2.9.0
2.9.1

View File

@@ -2028,6 +2028,8 @@ void AUD_del_capture (CaptureVoiceOut *cap, void *cb_opaque)
sw = sw1;
}
QLIST_REMOVE (cap, entries);
g_free (cap->hw.mix_buf);
g_free (cap->buf);
g_free (cap);
}
return;

82
block.c
View File

@@ -937,16 +937,14 @@ static void update_flags_from_options(int *flags, QemuOpts *opts)
static void update_options_from_flags(QDict *options, int flags)
{
if (!qdict_haskey(options, BDRV_OPT_CACHE_DIRECT)) {
qdict_put(options, BDRV_OPT_CACHE_DIRECT,
qbool_from_bool(flags & BDRV_O_NOCACHE));
qdict_put_bool(options, BDRV_OPT_CACHE_DIRECT, flags & BDRV_O_NOCACHE);
}
if (!qdict_haskey(options, BDRV_OPT_CACHE_NO_FLUSH)) {
qdict_put(options, BDRV_OPT_CACHE_NO_FLUSH,
qbool_from_bool(flags & BDRV_O_NO_FLUSH));
qdict_put_bool(options, BDRV_OPT_CACHE_NO_FLUSH,
flags & BDRV_O_NO_FLUSH);
}
if (!qdict_haskey(options, BDRV_OPT_READ_ONLY)) {
qdict_put(options, BDRV_OPT_READ_ONLY,
qbool_from_bool(!(flags & BDRV_O_RDWR)));
qdict_put_bool(options, BDRV_OPT_READ_ONLY, !(flags & BDRV_O_RDWR));
}
}
@@ -1167,7 +1165,7 @@ static int bdrv_open_common(BlockDriverState *bs, BlockBackend *file,
filename = qdict_get_try_str(options, "filename");
}
if (drv->bdrv_needs_filename && !filename) {
if (drv->bdrv_needs_filename && (!filename || !filename[0])) {
error_setg(errp, "The '%s' block driver requires a file name",
drv->format_name);
ret = -EINVAL;
@@ -1362,7 +1360,7 @@ static int bdrv_fill_options(QDict **options, const char *filename,
/* Fetch the file name from the options QDict if necessary */
if (protocol && filename) {
if (!qdict_haskey(*options, "filename")) {
qdict_put(*options, "filename", qstring_from_str(filename));
qdict_put_str(*options, "filename", filename);
parse_filename = true;
} else {
error_setg(errp, "Can't specify 'file' and 'filename' options at "
@@ -1383,7 +1381,7 @@ static int bdrv_fill_options(QDict **options, const char *filename,
}
drvname = drv->format_name;
qdict_put(*options, "driver", qstring_from_str(drvname));
qdict_put_str(*options, "driver", drvname);
} else {
error_setg(errp, "Must specify either driver or file");
return -EINVAL;
@@ -2038,7 +2036,7 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict *parent_options,
}
if (bs->backing_format[0] != '\0' && !qdict_haskey(options, "driver")) {
qdict_put(options, "driver", qstring_from_str(bs->backing_format));
qdict_put_str(options, "driver", bs->backing_format);
}
backing_hd = bdrv_open_inherit(*backing_filename ? backing_filename : NULL,
@@ -2193,12 +2191,9 @@ static BlockDriverState *bdrv_append_temp_snapshot(BlockDriverState *bs,
}
/* Prepare options QDict for the temporary file */
qdict_put(snapshot_options, "file.driver",
qstring_from_str("file"));
qdict_put(snapshot_options, "file.filename",
qstring_from_str(tmp_filename));
qdict_put(snapshot_options, "driver",
qstring_from_str("qcow2"));
qdict_put_str(snapshot_options, "file.driver", "file");
qdict_put_str(snapshot_options, "file.filename", tmp_filename);
qdict_put_str(snapshot_options, "driver", "qcow2");
bs_snapshot = bdrv_open(NULL, NULL, snapshot_options, flags, errp);
snapshot_options = NULL;
@@ -2373,8 +2368,7 @@ static BlockDriverState *bdrv_open_inherit(const char *filename,
goto fail;
}
qdict_put(options, "file",
qstring_from_str(bdrv_get_node_name(file_bs)));
qdict_put_str(options, "file", bdrv_get_node_name(file_bs));
}
}
@@ -2396,8 +2390,8 @@ static BlockDriverState *bdrv_open_inherit(const char *filename,
* sure to update both bs->options (which has the full effective
* options for bs) and options (which has file.* already removed).
*/
qdict_put(bs->options, "driver", qstring_from_str(drv->format_name));
qdict_put(options, "driver", qstring_from_str(drv->format_name));
qdict_put_str(bs->options, "driver", drv->format_name);
qdict_put_str(options, "driver", drv->format_name);
} else if (!drv) {
error_setg(errp, "Must specify either driver or file");
goto fail;
@@ -2772,12 +2766,12 @@ int bdrv_reopen_prepare(BDRVReopenState *reopen_state, BlockReopenQueue *queue,
* that they are checked at the end of this function. */
value = qemu_opt_get(opts, "node-name");
if (value) {
qdict_put(reopen_state->options, "node-name", qstring_from_str(value));
qdict_put_str(reopen_state->options, "node-name", value);
}
value = qemu_opt_get(opts, "driver");
if (value) {
qdict_put(reopen_state->options, "driver", qstring_from_str(value));
qdict_put_str(reopen_state->options, "driver", value);
}
/* if we are to stay read-only, do not allow permission change
@@ -3268,7 +3262,7 @@ exit:
/**
* Truncate file to 'offset' bytes (needed only for file protocols)
*/
int bdrv_truncate(BdrvChild *child, int64_t offset)
int bdrv_truncate(BdrvChild *child, int64_t offset, Error **errp)
{
BlockDriverState *bs = child->bs;
BlockDriver *drv = bs->drv;
@@ -3280,12 +3274,18 @@ int bdrv_truncate(BdrvChild *child, int64_t offset)
* cannot assert this permission in that case. */
// assert(child->perm & BLK_PERM_RESIZE);
if (!drv)
if (!drv) {
error_setg(errp, "No medium inserted");
return -ENOMEDIUM;
if (!drv->bdrv_truncate)
}
if (!drv->bdrv_truncate) {
error_setg(errp, "Image format driver does not support resize");
return -ENOTSUP;
if (bs->read_only)
}
if (bs->read_only) {
error_setg(errp, "Image is read-only");
return -EACCES;
}
ret = drv->bdrv_truncate(bs, offset);
if (ret == 0) {
@@ -3293,6 +3293,8 @@ int bdrv_truncate(BdrvChild *child, int64_t offset)
bdrv_dirty_bitmap_truncate(bs);
bdrv_parent_cb_resize(bs);
++bs->write_gen;
} else {
error_setg_errno(errp, -ret, "Failed to resize image");
}
return ret;
}
@@ -3861,19 +3863,6 @@ BlockDriverState *bdrv_find_backing_image(BlockDriverState *bs,
return retval;
}
int bdrv_get_backing_file_depth(BlockDriverState *bs)
{
if (!bs->drv) {
return 0;
}
if (!bs->backing) {
return 0;
}
return 1 + bdrv_get_backing_file_depth(bs->backing->bs);
}
void bdrv_init(void)
{
module_call_init(MODULE_INIT_BLOCK);
@@ -4268,8 +4257,7 @@ void bdrv_img_create(const char *filename, const char *fmt,
if (backing_fmt) {
backing_options = qdict_new();
qdict_put(backing_options, "driver",
qstring_from_str(backing_fmt));
qdict_put_str(backing_options, "driver", backing_fmt);
}
bs = bdrv_open(full_backing, NULL, backing_options, back_flags,
@@ -4674,11 +4662,9 @@ void bdrv_refresh_filename(BlockDriverState *bs)
* contain a representation of the filename, therefore the following
* suffices without querying the (exact_)filename of this BDS. */
if (bs->file->bs->full_open_options) {
qdict_put_obj(opts, "driver",
QOBJECT(qstring_from_str(drv->format_name)));
qdict_put_str(opts, "driver", drv->format_name);
QINCREF(bs->file->bs->full_open_options);
qdict_put_obj(opts, "file",
QOBJECT(bs->file->bs->full_open_options));
qdict_put(opts, "file", bs->file->bs->full_open_options);
bs->full_open_options = opts;
} else {
@@ -4694,8 +4680,7 @@ void bdrv_refresh_filename(BlockDriverState *bs)
opts = qdict_new();
append_open_options(opts, bs);
qdict_put_obj(opts, "driver",
QOBJECT(qstring_from_str(drv->format_name)));
qdict_put_str(opts, "driver", drv->format_name);
if (bs->exact_filename[0]) {
/* This may not work for all block protocol drivers (some may
@@ -4705,8 +4690,7 @@ void bdrv_refresh_filename(BlockDriverState *bs)
* needs some special format of the options QDict, it needs to
* implement the driver-specific bdrv_refresh_filename() function.
*/
qdict_put_obj(opts, "filename",
QOBJECT(qstring_from_str(bs->exact_filename)));
qdict_put_str(opts, "filename", bs->exact_filename);
}
bs->full_open_options = opts;

View File

@@ -21,6 +21,8 @@ block-obj-$(CONFIG_RBD) += rbd.o
block-obj-$(CONFIG_GLUSTERFS) += gluster.o
block-obj-$(CONFIG_LIBSSH2) += ssh.o
block-obj-y += accounting.o dirty-bitmap.o
block-obj-y += dictzip.o
block-obj-y += tar.o
block-obj-y += write-threshold.o
block-obj-y += backup.o
block-obj-$(CONFIG_REPLICATION) += replication.o

View File

@@ -1,6 +1,7 @@
/*
* Block protocol for I/O error injection
*
* Copyright (C) 2016-2017 Red Hat, Inc.
* Copyright (c) 2010 Kevin Wolf <kwolf@redhat.com>
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
@@ -37,7 +38,12 @@
typedef struct BDRVBlkdebugState {
int state;
int new_state;
int align;
uint64_t align;
uint64_t max_transfer;
uint64_t opt_write_zero;
uint64_t max_write_zero;
uint64_t opt_discard;
uint64_t max_discard;
/* For blkdebug_refresh_filename() */
char *config_file;
@@ -301,7 +307,7 @@ static void blkdebug_parse_filename(const char *filename, QDict *options,
if (!strstart(filename, "blkdebug:", &filename)) {
/* There was no prefix; therefore, all options have to be already
present in the QDict (except for the filename) */
qdict_put(options, "x-image", qstring_from_str(filename));
qdict_put_str(options, "x-image", filename);
return;
}
@@ -320,7 +326,7 @@ static void blkdebug_parse_filename(const char *filename, QDict *options,
/* TODO Allow multi-level nesting and set file.filename here */
filename = c + 1;
qdict_put(options, "x-image", qstring_from_str(filename));
qdict_put_str(options, "x-image", filename);
}
static QemuOptsList runtime_opts = {
@@ -342,6 +348,31 @@ static QemuOptsList runtime_opts = {
.type = QEMU_OPT_SIZE,
.help = "Required alignment in bytes",
},
{
.name = "max-transfer",
.type = QEMU_OPT_SIZE,
.help = "Maximum transfer size in bytes",
},
{
.name = "opt-write-zero",
.type = QEMU_OPT_SIZE,
.help = "Optimum write zero alignment in bytes",
},
{
.name = "max-write-zero",
.type = QEMU_OPT_SIZE,
.help = "Maximum write zero size in bytes",
},
{
.name = "opt-discard",
.type = QEMU_OPT_SIZE,
.help = "Optimum discard alignment in bytes",
},
{
.name = "max-discard",
.type = QEMU_OPT_SIZE,
.help = "Maximum discard size in bytes",
},
{ /* end of list */ }
},
};
@@ -352,8 +383,8 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags,
BDRVBlkdebugState *s = bs->opaque;
QemuOpts *opts;
Error *local_err = NULL;
uint64_t align;
int ret;
uint64_t align;
opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort);
qemu_opts_absorb_qdict(opts, options, &local_err);
@@ -382,21 +413,69 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags,
goto out;
}
/* Set request alignment */
align = qemu_opt_get_size(opts, "align", 0);
if (align < INT_MAX && is_power_of_2(align)) {
s->align = align;
} else if (align) {
error_setg(errp, "Invalid alignment");
ret = -EINVAL;
goto fail_unref;
bs->supported_write_flags = BDRV_REQ_FUA &
bs->file->bs->supported_write_flags;
bs->supported_zero_flags = (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
bs->file->bs->supported_zero_flags;
ret = -EINVAL;
/* Set alignment overrides */
s->align = qemu_opt_get_size(opts, "align", 0);
if (s->align && (s->align >= INT_MAX || !is_power_of_2(s->align))) {
error_setg(errp, "Cannot meet constraints with align %" PRIu64,
s->align);
goto out;
}
align = MAX(s->align, bs->file->bs->bl.request_alignment);
s->max_transfer = qemu_opt_get_size(opts, "max-transfer", 0);
if (s->max_transfer &&
(s->max_transfer >= INT_MAX ||
!QEMU_IS_ALIGNED(s->max_transfer, align))) {
error_setg(errp, "Cannot meet constraints with max-transfer %" PRIu64,
s->max_transfer);
goto out;
}
s->opt_write_zero = qemu_opt_get_size(opts, "opt-write-zero", 0);
if (s->opt_write_zero &&
(s->opt_write_zero >= INT_MAX ||
!QEMU_IS_ALIGNED(s->opt_write_zero, align))) {
error_setg(errp, "Cannot meet constraints with opt-write-zero %" PRIu64,
s->opt_write_zero);
goto out;
}
s->max_write_zero = qemu_opt_get_size(opts, "max-write-zero", 0);
if (s->max_write_zero &&
(s->max_write_zero >= INT_MAX ||
!QEMU_IS_ALIGNED(s->max_write_zero,
MAX(s->opt_write_zero, align)))) {
error_setg(errp, "Cannot meet constraints with max-write-zero %" PRIu64,
s->max_write_zero);
goto out;
}
s->opt_discard = qemu_opt_get_size(opts, "opt-discard", 0);
if (s->opt_discard &&
(s->opt_discard >= INT_MAX ||
!QEMU_IS_ALIGNED(s->opt_discard, align))) {
error_setg(errp, "Cannot meet constraints with opt-discard %" PRIu64,
s->opt_discard);
goto out;
}
s->max_discard = qemu_opt_get_size(opts, "max-discard", 0);
if (s->max_discard &&
(s->max_discard >= INT_MAX ||
!QEMU_IS_ALIGNED(s->max_discard,
MAX(s->opt_discard, align)))) {
error_setg(errp, "Cannot meet constraints with max-discard %" PRIu64,
s->max_discard);
goto out;
}
ret = 0;
goto out;
fail_unref:
bdrv_unref_child(bs, bs->file);
out:
if (ret < 0) {
g_free(s->config_file);
@@ -405,11 +484,30 @@ out:
return ret;
}
static int inject_error(BlockDriverState *bs, BlkdebugRule *rule)
static int rule_check(BlockDriverState *bs, uint64_t offset, uint64_t bytes)
{
BDRVBlkdebugState *s = bs->opaque;
int error = rule->options.inject.error;
bool immediately = rule->options.inject.immediately;
BlkdebugRule *rule = NULL;
int error;
bool immediately;
QSIMPLEQ_FOREACH(rule, &s->active_rules, active_next) {
uint64_t inject_offset = rule->options.inject.offset;
if (inject_offset == -1 ||
(bytes && inject_offset >= offset &&
inject_offset < offset + bytes))
{
break;
}
}
if (!rule || !rule->options.inject.error) {
return 0;
}
immediately = rule->options.inject.immediately;
error = rule->options.inject.error;
if (rule->options.inject.once) {
QSIMPLEQ_REMOVE(&s->active_rules, rule, BlkdebugRule, active_next);
@@ -428,21 +526,18 @@ static int coroutine_fn
blkdebug_co_preadv(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
QEMUIOVector *qiov, int flags)
{
BDRVBlkdebugState *s = bs->opaque;
BlkdebugRule *rule = NULL;
int err;
QSIMPLEQ_FOREACH(rule, &s->active_rules, active_next) {
uint64_t inject_offset = rule->options.inject.offset;
if (inject_offset == -1 ||
(inject_offset >= offset && inject_offset < offset + bytes))
{
break;
}
/* Sanity check block layer guarantees */
assert(QEMU_IS_ALIGNED(offset, bs->bl.request_alignment));
assert(QEMU_IS_ALIGNED(bytes, bs->bl.request_alignment));
if (bs->bl.max_transfer) {
assert(bytes <= bs->bl.max_transfer);
}
if (rule && rule->options.inject.error) {
return inject_error(bs, rule);
err = rule_check(bs, offset, bytes);
if (err) {
return err;
}
return bdrv_co_preadv(bs->file, offset, bytes, qiov, flags);
@@ -452,21 +547,18 @@ static int coroutine_fn
blkdebug_co_pwritev(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
QEMUIOVector *qiov, int flags)
{
BDRVBlkdebugState *s = bs->opaque;
BlkdebugRule *rule = NULL;
int err;
QSIMPLEQ_FOREACH(rule, &s->active_rules, active_next) {
uint64_t inject_offset = rule->options.inject.offset;
if (inject_offset == -1 ||
(inject_offset >= offset && inject_offset < offset + bytes))
{
break;
}
/* Sanity check block layer guarantees */
assert(QEMU_IS_ALIGNED(offset, bs->bl.request_alignment));
assert(QEMU_IS_ALIGNED(bytes, bs->bl.request_alignment));
if (bs->bl.max_transfer) {
assert(bytes <= bs->bl.max_transfer);
}
if (rule && rule->options.inject.error) {
return inject_error(bs, rule);
err = rule_check(bs, offset, bytes);
if (err) {
return err;
}
return bdrv_co_pwritev(bs->file, offset, bytes, qiov, flags);
@@ -474,22 +566,81 @@ blkdebug_co_pwritev(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
static int blkdebug_co_flush(BlockDriverState *bs)
{
BDRVBlkdebugState *s = bs->opaque;
BlkdebugRule *rule = NULL;
int err = rule_check(bs, 0, 0);
QSIMPLEQ_FOREACH(rule, &s->active_rules, active_next) {
if (rule->options.inject.offset == -1) {
break;
}
}
if (rule && rule->options.inject.error) {
return inject_error(bs, rule);
if (err) {
return err;
}
return bdrv_co_flush(bs->file->bs);
}
static int coroutine_fn blkdebug_co_pwrite_zeroes(BlockDriverState *bs,
int64_t offset, int count,
BdrvRequestFlags flags)
{
uint32_t align = MAX(bs->bl.request_alignment,
bs->bl.pwrite_zeroes_alignment);
int err;
/* Only pass through requests that are larger than requested
* preferred alignment (so that we test the fallback to writes on
* unaligned portions), and check that the block layer never hands
* us anything unaligned that crosses an alignment boundary. */
if (count < align) {
assert(QEMU_IS_ALIGNED(offset, align) ||
QEMU_IS_ALIGNED(offset + count, align) ||
DIV_ROUND_UP(offset, align) ==
DIV_ROUND_UP(offset + count, align));
return -ENOTSUP;
}
assert(QEMU_IS_ALIGNED(offset, align));
assert(QEMU_IS_ALIGNED(count, align));
if (bs->bl.max_pwrite_zeroes) {
assert(count <= bs->bl.max_pwrite_zeroes);
}
err = rule_check(bs, offset, count);
if (err) {
return err;
}
return bdrv_co_pwrite_zeroes(bs->file, offset, count, flags);
}
static int coroutine_fn blkdebug_co_pdiscard(BlockDriverState *bs,
int64_t offset, int count)
{
uint32_t align = bs->bl.pdiscard_alignment;
int err;
/* Only pass through requests that are larger than requested
* minimum alignment, and ensure that unaligned requests do not
* cross optimum discard boundaries. */
if (count < bs->bl.request_alignment) {
assert(QEMU_IS_ALIGNED(offset, align) ||
QEMU_IS_ALIGNED(offset + count, align) ||
DIV_ROUND_UP(offset, align) ==
DIV_ROUND_UP(offset + count, align));
return -ENOTSUP;
}
assert(QEMU_IS_ALIGNED(offset, bs->bl.request_alignment));
assert(QEMU_IS_ALIGNED(count, bs->bl.request_alignment));
if (align && count >= align) {
assert(QEMU_IS_ALIGNED(offset, align));
assert(QEMU_IS_ALIGNED(count, align));
}
if (bs->bl.max_pdiscard) {
assert(count <= bs->bl.max_pdiscard);
}
err = rule_check(bs, offset, count);
if (err) {
return err;
}
return bdrv_co_pdiscard(bs->file->bs, offset, count);
}
static void blkdebug_close(BlockDriverState *bs)
{
@@ -663,7 +814,7 @@ static int64_t blkdebug_getlength(BlockDriverState *bs)
static int blkdebug_truncate(BlockDriverState *bs, int64_t offset)
{
return bdrv_truncate(bs->file, offset);
return bdrv_truncate(bs->file, offset, NULL);
}
static void blkdebug_refresh_filename(BlockDriverState *bs, QDict *options)
@@ -689,16 +840,20 @@ static void blkdebug_refresh_filename(BlockDriverState *bs, QDict *options)
}
if (!force_json && bs->file->bs->exact_filename[0]) {
snprintf(bs->exact_filename, sizeof(bs->exact_filename),
"blkdebug:%s:%s", s->config_file ?: "",
bs->file->bs->exact_filename);
int ret = snprintf(bs->exact_filename, sizeof(bs->exact_filename),
"blkdebug:%s:%s", s->config_file ?: "",
bs->file->bs->exact_filename);
if (ret >= sizeof(bs->exact_filename)) {
/* An overflow makes the filename unusable, so do not report any */
bs->exact_filename[0] = 0;
}
}
opts = qdict_new();
qdict_put_obj(opts, "driver", QOBJECT(qstring_from_str("blkdebug")));
qdict_put_str(opts, "driver", "blkdebug");
QINCREF(bs->file->bs->full_open_options);
qdict_put_obj(opts, "image", QOBJECT(bs->file->bs->full_open_options));
qdict_put(opts, "image", bs->file->bs->full_open_options);
for (e = qdict_first(options); e; e = qdict_next(options, e)) {
if (strcmp(qdict_entry_key(e), "x-image")) {
@@ -717,6 +872,21 @@ static void blkdebug_refresh_limits(BlockDriverState *bs, Error **errp)
if (s->align) {
bs->bl.request_alignment = s->align;
}
if (s->max_transfer) {
bs->bl.max_transfer = s->max_transfer;
}
if (s->opt_write_zero) {
bs->bl.pwrite_zeroes_alignment = s->opt_write_zero;
}
if (s->max_write_zero) {
bs->bl.max_pwrite_zeroes = s->max_write_zero;
}
if (s->opt_discard) {
bs->bl.pdiscard_alignment = s->opt_discard;
}
if (s->max_discard) {
bs->bl.max_pdiscard = s->max_discard;
}
}
static int blkdebug_reopen_prepare(BDRVReopenState *reopen_state,
@@ -744,6 +914,8 @@ static BlockDriver bdrv_blkdebug = {
.bdrv_co_preadv = blkdebug_co_preadv,
.bdrv_co_pwritev = blkdebug_co_pwritev,
.bdrv_co_flush_to_disk = blkdebug_co_flush,
.bdrv_co_pwrite_zeroes = blkdebug_co_pwrite_zeroes,
.bdrv_co_pdiscard = blkdebug_co_pdiscard,
.bdrv_debug_event = blkdebug_debug_event,
.bdrv_debug_breakpoint = blkdebug_debug_breakpoint,

View File

@@ -37,9 +37,6 @@ static int blkreplay_open(BlockDriverState *bs, QDict *options, int flags,
ret = 0;
fail:
if (ret < 0) {
bdrv_unref_child(bs, bs->file);
}
return ret;
}

View File

@@ -67,7 +67,7 @@ static void blkverify_parse_filename(const char *filename, QDict *options,
if (!strstart(filename, "blkverify:", &filename)) {
/* There was no prefix; therefore, all options have to be already
present in the QDict (except for the filename) */
qdict_put(options, "x-image", qstring_from_str(filename));
qdict_put_str(options, "x-image", filename);
return;
}
@@ -84,7 +84,7 @@ static void blkverify_parse_filename(const char *filename, QDict *options,
/* TODO Allow multi-level nesting and set file.filename here */
filename = c + 1;
qdict_put(options, "x-image", qstring_from_str(filename));
qdict_put_str(options, "x-image", filename);
}
static QemuOptsList runtime_opts = {
@@ -142,9 +142,6 @@ static int blkverify_open(BlockDriverState *bs, QDict *options, int flags,
ret = 0;
fail:
if (ret < 0) {
bdrv_unref_child(bs, bs->file);
}
qemu_opts_del(opts);
return ret;
}
@@ -291,13 +288,12 @@ static void blkverify_refresh_filename(BlockDriverState *bs, QDict *options)
&& s->test_file->bs->full_open_options)
{
QDict *opts = qdict_new();
qdict_put_obj(opts, "driver", QOBJECT(qstring_from_str("blkverify")));
qdict_put_str(opts, "driver", "blkverify");
QINCREF(bs->file->bs->full_open_options);
qdict_put_obj(opts, "raw", QOBJECT(bs->file->bs->full_open_options));
qdict_put(opts, "raw", bs->file->bs->full_open_options);
QINCREF(s->test_file->bs->full_open_options);
qdict_put_obj(opts, "test",
QOBJECT(s->test_file->bs->full_open_options));
qdict_put(opts, "test", s->test_file->bs->full_open_options);
bs->full_open_options = opts;
}
@@ -305,10 +301,14 @@ static void blkverify_refresh_filename(BlockDriverState *bs, QDict *options)
if (bs->file->bs->exact_filename[0]
&& s->test_file->bs->exact_filename[0])
{
snprintf(bs->exact_filename, sizeof(bs->exact_filename),
"blkverify:%s:%s",
bs->file->bs->exact_filename,
s->test_file->bs->exact_filename);
int ret = snprintf(bs->exact_filename, sizeof(bs->exact_filename),
"blkverify:%s:%s",
bs->file->bs->exact_filename,
s->test_file->bs->exact_filename);
if (ret >= sizeof(bs->exact_filename)) {
/* An overflow makes the filename unusable, so do not report any */
bs->exact_filename[0] = 0;
}
}
}

View File

@@ -1746,13 +1746,21 @@ int blk_pwrite_compressed(BlockBackend *blk, int64_t offset, const void *buf,
BDRV_REQ_WRITE_COMPRESSED);
}
int blk_truncate(BlockBackend *blk, int64_t offset)
int blk_truncate(BlockBackend *blk, int64_t offset, Error **errp)
{
if (!blk_is_available(blk)) {
error_setg(errp, "No medium inserted");
return -ENOMEDIUM;
}
return bdrv_truncate(blk->root, offset);
return bdrv_truncate(blk->root, offset, errp);
}
void blk_legacy_resize_cb(BlockBackend *blk)
{
if (blk->legacy_dev) {
xen_blk_resize_cb(blk->dev);
}
}
static void blk_pdiscard_entry(void *opaque)

View File

@@ -89,6 +89,12 @@ static void commit_complete(BlockJob *job, void *opaque)
int ret = data->ret;
bool remove_commit_top_bs = false;
/* Make sure overlay_bs and top stay around until bdrv_set_backing_hd() */
bdrv_ref(top);
if (overlay_bs) {
bdrv_ref(overlay_bs);
}
/* Remove base node parent that still uses BLK_PERM_WRITE/RESIZE before
* the normal backing chain can be restored. */
blk_unref(s->base);
@@ -115,6 +121,13 @@ static void commit_complete(BlockJob *job, void *opaque)
}
g_free(s->backing_file_str);
blk_unref(s->top);
/* If there is more than one reference to the job (e.g. if called from
* block_job_finish_sync()), block_job_completed() won't free it and
* therefore the blockers on the intermediate nodes remain. This would
* cause bdrv_set_backing_hd() to fail. */
block_job_remove_all_bdrv(job);
block_job_completed(&s->common, ret);
g_free(data);
@@ -124,6 +137,9 @@ static void commit_complete(BlockJob *job, void *opaque)
if (remove_commit_top_bs) {
bdrv_set_backing_hd(overlay_bs, top, &error_abort);
}
bdrv_unref(overlay_bs);
bdrv_unref(top);
}
static void coroutine_fn commit_run(void *opaque)
@@ -151,7 +167,7 @@ static void coroutine_fn commit_run(void *opaque)
}
if (base_len < s->common.len) {
ret = blk_truncate(s->base, s->common.len);
ret = blk_truncate(s->base, s->common.len, NULL);
if (ret) {
goto out;
}
@@ -335,6 +351,9 @@ void commit_start(const char *job_id, BlockDriverState *bs,
if (commit_top_bs == NULL) {
goto fail;
}
if (!filter_node_name) {
commit_top_bs->implicit = true;
}
commit_top_bs->total_sectors = top->total_sectors;
bdrv_set_aio_context(commit_top_bs, bdrv_get_aio_context(top));
@@ -511,8 +530,9 @@ int bdrv_commit(BlockDriverState *bs)
* grow the backing file image if possible. If not possible,
* we must return an error */
if (length > backing_length) {
ret = blk_truncate(backing, length);
ret = blk_truncate(backing, length, &local_err);
if (ret < 0) {
error_report_err(local_err);
goto ro_cleanup;
}
}

View File

@@ -389,7 +389,7 @@ static int block_crypto_truncate(BlockDriverState *bs, int64_t offset)
offset += payload_offset;
return bdrv_truncate(bs->file, offset);
return bdrv_truncate(bs->file, offset, NULL);
}
static void block_crypto_close(BlockDriverState *bs)

View File

@@ -147,6 +147,7 @@ static void curl_multi_do(void *arg);
static void curl_multi_read(void *arg);
#ifdef NEED_CURL_TIMER_CALLBACK
/* Called from curl_multi_do_locked, with s->mutex held. */
static int curl_timer_cb(CURLM *multi, long timeout_ms, void *opaque)
{
BDRVCURLState *s = opaque;
@@ -163,6 +164,7 @@ static int curl_timer_cb(CURLM *multi, long timeout_ms, void *opaque)
}
#endif
/* Called from curl_multi_do_locked, with s->mutex held. */
static int curl_sock_cb(CURL *curl, curl_socket_t fd, int action,
void *userp, void *sp)
{
@@ -212,6 +214,7 @@ static int curl_sock_cb(CURL *curl, curl_socket_t fd, int action,
return 0;
}
/* Called from curl_multi_do_locked, with s->mutex held. */
static size_t curl_header_cb(void *ptr, size_t size, size_t nmemb, void *opaque)
{
BDRVCURLState *s = opaque;
@@ -226,6 +229,7 @@ static size_t curl_header_cb(void *ptr, size_t size, size_t nmemb, void *opaque)
return realsize;
}
/* Called from curl_multi_do_locked, with s->mutex held. */
static size_t curl_read_cb(void *ptr, size_t size, size_t nmemb, void *opaque)
{
CURLState *s = ((CURLState*)opaque);
@@ -264,7 +268,9 @@ static size_t curl_read_cb(void *ptr, size_t size, size_t nmemb, void *opaque)
request_length - offset);
}
qemu_mutex_unlock(&s->s->mutex);
acb->common.cb(acb->common.opaque, 0);
qemu_mutex_lock(&s->s->mutex);
qemu_aio_unref(acb);
s->acb[i] = NULL;
}
@@ -275,6 +281,7 @@ read_end:
return size * nmemb;
}
/* Called with s->mutex held. */
static int curl_find_buf(BDRVCURLState *s, size_t start, size_t len,
CURLAIOCB *acb)
{
@@ -305,8 +312,6 @@ static int curl_find_buf(BDRVCURLState *s, size_t start, size_t len,
if (clamped_len < len) {
qemu_iovec_memset(acb->qiov, clamped_len, 0, len - clamped_len);
}
acb->common.cb(acb->common.opaque, 0);
return FIND_RET_OK;
}
@@ -449,6 +454,7 @@ static void curl_multi_timeout_do(void *arg)
#endif
}
/* Called with s->mutex held. */
static CURLState *curl_init_state(BlockDriverState *bs, BDRVCURLState *s)
{
CURLState *state = NULL;
@@ -467,7 +473,9 @@ static CURLState *curl_init_state(BlockDriverState *bs, BDRVCURLState *s)
break;
}
if (!state) {
qemu_mutex_unlock(&s->mutex);
aio_poll(bdrv_get_aio_context(bs), true);
qemu_mutex_lock(&s->mutex);
}
} while(!state);
@@ -530,8 +538,14 @@ static CURLState *curl_init_state(BlockDriverState *bs, BDRVCURLState *s)
return state;
}
/* Called with s->mutex held. */
static void curl_clean_state(CURLState *s)
{
int j;
for (j = 0; j < CURL_NUM_ACB; j++) {
assert(!s->acb[j]);
}
if (s->s->multi)
curl_multi_remove_handle(s->s->multi, s->curl);
@@ -548,7 +562,7 @@ static void curl_clean_state(CURLState *s)
static void curl_parse_filename(const char *filename, QDict *options,
Error **errp)
{
qdict_put(options, CURL_BLOCK_OPT_URL, qstring_from_str(filename));
qdict_put_str(options, CURL_BLOCK_OPT_URL, filename);
}
static void curl_detach_aio_context(BlockDriverState *bs)
@@ -556,6 +570,7 @@ static void curl_detach_aio_context(BlockDriverState *bs)
BDRVCURLState *s = bs->opaque;
int i;
qemu_mutex_lock(&s->mutex);
for (i = 0; i < CURL_NUM_STATES; i++) {
if (s->states[i].in_use) {
curl_clean_state(&s->states[i]);
@@ -571,6 +586,7 @@ static void curl_detach_aio_context(BlockDriverState *bs)
curl_multi_cleanup(s->multi);
s->multi = NULL;
}
qemu_mutex_unlock(&s->mutex);
timer_del(&s->timer);
}
@@ -668,6 +684,7 @@ static int curl_open(BlockDriverState *bs, QDict *options, int flags,
return -EROFS;
}
qemu_mutex_init(&s->mutex);
opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort);
qemu_opts_absorb_qdict(opts, options, &local_err);
if (local_err) {
@@ -738,7 +755,9 @@ static int curl_open(BlockDriverState *bs, QDict *options, int flags,
DPRINTF("CURL: Opening %s\n", file);
s->aio_context = bdrv_get_aio_context(bs);
s->url = g_strdup(file);
qemu_mutex_lock(&s->mutex);
state = curl_init_state(bs, s);
qemu_mutex_unlock(&s->mutex);
if (!state)
goto out_noclean;
@@ -782,11 +801,12 @@ static int curl_open(BlockDriverState *bs, QDict *options, int flags,
}
DPRINTF("CURL: Size = %zd\n", s->len);
qemu_mutex_lock(&s->mutex);
curl_clean_state(state);
qemu_mutex_unlock(&s->mutex);
curl_easy_cleanup(state->curl);
state->curl = NULL;
qemu_mutex_init(&s->mutex);
curl_attach_aio_context(bs, bdrv_get_aio_context(bs));
qemu_opts_del(opts);
@@ -797,6 +817,7 @@ out:
curl_easy_cleanup(state->curl);
state->curl = NULL;
out_noclean:
qemu_mutex_destroy(&s->mutex);
g_free(s->cookie);
g_free(s->url);
qemu_opts_del(opts);
@@ -827,8 +848,8 @@ static void curl_readv_bh_cb(void *p)
// we can just call the callback and be done.
switch (curl_find_buf(s, start, acb->nb_sectors * BDRV_SECTOR_SIZE, acb)) {
case FIND_RET_OK:
qemu_aio_unref(acb);
// fall through
ret = 0;
goto out;
case FIND_RET_WAIT:
goto out;
default:

586
block/dictzip.c Normal file
View File

@@ -0,0 +1,586 @@
/*
* DictZip Block driver for dictzip enabled gzip files
*
* Use the "dictzip" tool from the "dictd" package to create gzip files that
* contain the extra DictZip headers.
*
* dictzip(1) is a compression program which creates compressed files in the
* gzip format (see RFC 1952). However, unlike gzip(1), dictzip(1) compresses
* the file in pieces and stores an index to the pieces in the gzip header.
* This allows random access to the file at the granularity of the compressed
* pieces (currently about 64kB) while maintaining good compression ratios
* (within 5% of the expected ratio for dictionary data).
* dictd(8) uses files stored in this format.
*
* For details on DictZip see http://dict.org/.
*
* Copyright (c) 2009 Alexander Graf <agraf@suse.de>
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*/
#include "qemu/osdep.h"
#include "qapi/error.h"
#include "qemu-common.h"
#include "block/block_int.h"
#include <zlib.h>
// #define DEBUG
#ifdef DEBUG
#define dprintf(fmt, ...) do { printf("dzip: " fmt, ## __VA_ARGS__); } while (0)
#else
#define dprintf(fmt, ...) do { } while (0)
#endif
#define SECTOR_SIZE 512
#define Z_STREAM_COUNT 4
#define CACHE_COUNT 20
/* magic values */
#define GZ_MAGIC1 0x1f
#define GZ_MAGIC2 0x8b
#define DZ_MAGIC1 'R'
#define DZ_MAGIC2 'A'
#define GZ_FEXTRA 0x04 /* Optional field (random access index) */
#define GZ_FNAME 0x08 /* Original name */
#define GZ_COMMENT 0x10 /* Zero-terminated, human-readable comment */
#define GZ_FHCRC 0x02 /* Header CRC16 */
/* offsets */
#define GZ_ID 0 /* GZ_MAGIC (16bit) */
#define GZ_FLG 3 /* FLaGs (see above) */
#define GZ_XLEN 10 /* eXtra LENgth (16bit) */
#define GZ_SI 12 /* Subfield ID (16bit) */
#define GZ_VERSION 16 /* Version for subfield format */
#define GZ_CHUNKSIZE 18 /* Chunk size (16bit) */
#define GZ_CHUNKCNT 20 /* Number of chunks (16bit) */
#define GZ_RNDDATA 22 /* Random access data (16bit) */
#define GZ_99_CHUNKSIZE 18 /* Chunk size (32bit) */
#define GZ_99_CHUNKCNT 22 /* Number of chunks (32bit) */
#define GZ_99_FILESIZE 26 /* Size of unpacked file (64bit) */
#define GZ_99_RNDDATA 34 /* Random access data (32bit) */
struct BDRVDictZipState;
typedef struct DictZipAIOCB {
BlockAIOCB common;
struct BDRVDictZipState *s;
QEMUIOVector *qiov; /* QIOV of the original request */
QEMUIOVector *qiov_gz; /* QIOV of the gz subrequest */
QEMUBH *bh; /* BH for cache */
z_stream *zStream; /* stream to use for decoding */
int zStream_id; /* stream id of the above pointer */
size_t start; /* offset into the uncompressed file */
size_t len; /* uncompressed bytes to read */
uint8_t *gzipped; /* the gzipped data */
uint8_t *buf; /* cached result */
size_t gz_len; /* amount of gzip data */
size_t gz_start; /* uncompressed starting point of gzip data */
uint64_t offset; /* offset for "start" into the uncompressed chunk */
int chunks_len; /* amount of uncompressed data in all gzip data */
} DictZipAIOCB;
typedef struct dict_cache {
size_t start;
size_t len;
uint8_t *buf;
} DictCache;
typedef struct BDRVDictZipState {
BlockDriverState *hd;
z_stream zStream[Z_STREAM_COUNT];
DictCache cache[CACHE_COUNT];
int cache_index;
uint8_t stream_in_use;
uint64_t chunk_len;
uint32_t chunk_cnt;
uint16_t *chunks;
uint32_t *chunks32;
uint64_t *offsets;
int64_t file_len;
} BDRVDictZipState;
static int start_zStream(z_stream *zStream)
{
zStream->zalloc = NULL;
zStream->zfree = NULL;
zStream->opaque = NULL;
zStream->next_in = 0;
zStream->avail_in = 0;
zStream->next_out = NULL;
zStream->avail_out = 0;
return inflateInit2( zStream, -15 );
}
static QemuOptsList runtime_opts = {
.name = "dzip",
.head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
.desc = {
{
.name = "filename",
.type = QEMU_OPT_STRING,
.help = "URL to the dictzip file",
},
{ /* end of list */ }
},
};
static int dictzip_open(BlockDriverState *bs, QDict *options, int flags, Error **errp)
{
BDRVDictZipState *s = bs->opaque;
const char *err = "Unknown (read error?)";
uint8_t magic[2];
char buf[100];
uint8_t header_flags;
uint16_t chunk_len16;
uint16_t chunk_cnt16;
uint32_t chunk_len32;
uint16_t header_ver;
uint16_t tmp_short;
uint64_t offset;
int chunks_len;
int headerLength = GZ_XLEN - 1;
int rnd_offs;
int ret;
int i;
QemuOpts *opts;
Error *local_err = NULL;
const char *filename;
opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort);
qemu_opts_absorb_qdict(opts, options, &local_err);
if (local_err != NULL) {
error_propagate(errp, local_err);
ret = -EINVAL;
goto fail;
}
filename = qemu_opt_get(opts, "filename");
if (!strncmp(filename, "dzip://", 7))
filename += 7;
else if (!strncmp(filename, "dzip:", 5))
filename += 5;
s->hd = bdrv_open(filename, NULL, NULL, flags | BDRV_O_PROTOCOL, errp);
if (!s->hd) {
ret = -EINVAL;
qemu_opts_del(opts);
return ret;
}
/* initialize zlib streams */
for (i = 0; i < Z_STREAM_COUNT; i++) {
if (start_zStream( &s->zStream[i] ) != Z_OK) {
err = s->zStream[i].msg;
goto fail;
}
}
/* gzip header */
if (bdrv_pread(s->hd->file, GZ_ID, &magic, sizeof(magic)) != sizeof(magic))
goto fail;
if (!((magic[0] == GZ_MAGIC1) && (magic[1] == GZ_MAGIC2))) {
err = "No gzip file";
goto fail;
}
/* dzip header */
if (bdrv_pread(s->hd->file, GZ_FLG, &header_flags, 1) != 1)
goto fail;
if (!(header_flags & GZ_FEXTRA)) {
err = "Not a dictzip file (wrong flags)";
goto fail;
}
/* extra length */
if (bdrv_pread(s->hd->file, GZ_XLEN, &tmp_short, 2) != 2)
goto fail;
headerLength += le16_to_cpu(tmp_short) + 2;
/* DictZip magic */
if (bdrv_pread(s->hd->file, GZ_SI, &magic, 2) != 2)
goto fail;
if (magic[0] != DZ_MAGIC1 || magic[1] != DZ_MAGIC2) {
err = "Not a dictzip file (missing extra magic)";
goto fail;
}
/* DictZip version */
if (bdrv_pread(s->hd->file, GZ_VERSION, &header_ver, 2) != 2)
goto fail;
header_ver = le16_to_cpu(header_ver);
switch (header_ver) {
case 1: /* Normal DictZip */
/* number of chunks */
if (bdrv_pread(s->hd->file, GZ_CHUNKSIZE, &chunk_len16, 2) != 2)
goto fail;
s->chunk_len = le16_to_cpu(chunk_len16);
/* chunk count */
if (bdrv_pread(s->hd->file, GZ_CHUNKCNT, &chunk_cnt16, 2) != 2)
goto fail;
s->chunk_cnt = le16_to_cpu(chunk_cnt16);
chunks_len = sizeof(short) * s->chunk_cnt;
rnd_offs = GZ_RNDDATA;
break;
case 99: /* Special Alex pigz version */
/* number of chunks */
if (bdrv_pread(s->hd->file, GZ_99_CHUNKSIZE, &chunk_len32, 4) != 4)
goto fail;
dprintf("chunk len [%#x] = %d\n", GZ_99_CHUNKSIZE, chunk_len32);
s->chunk_len = le32_to_cpu(chunk_len32);
/* chunk count */
if (bdrv_pread(s->hd->file, GZ_99_CHUNKCNT, &s->chunk_cnt, 4) != 4)
goto fail;
s->chunk_cnt = le32_to_cpu(s->chunk_cnt);
dprintf("chunk len | count = %"PRId64" | %d\n", s->chunk_len, s->chunk_cnt);
/* file size */
if (bdrv_pread(s->hd->file, GZ_99_FILESIZE, &s->file_len, 8) != 8)
goto fail;
s->file_len = le64_to_cpu(s->file_len);
chunks_len = sizeof(int) * s->chunk_cnt;
rnd_offs = GZ_99_RNDDATA;
break;
default:
err = "Invalid DictZip version";
goto fail;
}
/* random access data */
s->chunks = g_malloc(chunks_len);
if (header_ver == 99)
s->chunks32 = (uint32_t *)s->chunks;
if (bdrv_pread(s->hd->file, rnd_offs, s->chunks, chunks_len) != chunks_len)
goto fail;
/* orig filename */
if (header_flags & GZ_FNAME) {
if (bdrv_pread(s->hd->file, headerLength + 1, buf, sizeof(buf)) != sizeof(buf))
goto fail;
buf[sizeof(buf) - 1] = '\0';
headerLength += strlen(buf) + 1;
if (strlen(buf) == sizeof(buf))
goto fail;
dprintf("filename: %s\n", buf);
}
/* comment field */
if (header_flags & GZ_COMMENT) {
if (bdrv_pread(s->hd->file, headerLength, buf, sizeof(buf)) != sizeof(buf))
goto fail;
buf[sizeof(buf) - 1] = '\0';
headerLength += strlen(buf) + 1;
if (strlen(buf) == sizeof(buf))
goto fail;
dprintf("comment: %s\n", buf);
}
if (header_flags & GZ_FHCRC)
headerLength += 2;
/* uncompressed file length*/
if (!s->file_len) {
uint32_t file_len;
if (bdrv_pread(s->hd->file, bdrv_getlength(s->hd) - 4, &file_len, 4) != 4)
goto fail;
s->file_len = le32_to_cpu(file_len);
}
/* compute offsets */
s->offsets = g_malloc(sizeof( *s->offsets ) * s->chunk_cnt);
for (offset = headerLength + 1, i = 0; i < s->chunk_cnt; i++) {
s->offsets[i] = offset;
switch (header_ver) {
case 1:
offset += le16_to_cpu(s->chunks[i]);
break;
case 99:
offset += le32_to_cpu(s->chunks32[i]);
break;
}
dprintf("chunk %#"PRIx64" - %#"PRIx64" = offset %#"PRIx64" -> %#"PRIx64"\n", i * s->chunk_len, (i+1) * s->chunk_len, s->offsets[i], offset);
}
qemu_opts_del(opts);
return 0;
fail:
fprintf(stderr, "DictZip: Error opening file: %s\n", err);
bdrv_unref(s->hd);
if (s->chunks)
g_free(s->chunks);
qemu_opts_del(opts);
return -EINVAL;
}
/* This callback gets invoked when we have the result in cache already */
static void dictzip_cache_cb(void *opaque)
{
DictZipAIOCB *acb = (DictZipAIOCB *)opaque;
qemu_iovec_from_buf(acb->qiov, 0, acb->buf, acb->len);
acb->common.cb(acb->common.opaque, 0);
qemu_bh_delete(acb->bh);
qemu_aio_unref(acb);
}
/* This callback gets invoked by the underlying block reader when we have
* all compressed data. We uncompress in here. */
static void dictzip_read_cb(void *opaque, int ret)
{
DictZipAIOCB *acb = (DictZipAIOCB *)opaque;
struct BDRVDictZipState *s = acb->s;
uint8_t *buf;
DictCache *cache;
int r, i;
buf = g_malloc(acb->chunks_len);
/* try to find zlib stream for decoding */
do {
for (i = 0; i < Z_STREAM_COUNT; i++) {
if (!(s->stream_in_use & (1 << i))) {
s->stream_in_use |= (1 << i);
acb->zStream_id = i;
acb->zStream = &s->zStream[i];
break;
}
}
} while(!acb->zStream);
/* sure, we could handle more streams, but this callback should be single
threaded and when it's not, we really want to know! */
assert(i == 0);
/* uncompress the chunk */
acb->zStream->next_in = acb->gzipped;
acb->zStream->avail_in = acb->gz_len;
acb->zStream->next_out = buf;
acb->zStream->avail_out = acb->chunks_len;
r = inflate( acb->zStream, Z_PARTIAL_FLUSH );
if ( (r != Z_OK) && (r != Z_STREAM_END) )
fprintf(stderr, "Error inflating: [%d] %s\n", r, acb->zStream->msg);
if ( r == Z_STREAM_END )
inflateReset(acb->zStream);
dprintf("inflating [%d] left: %d | %d bytes\n", r, acb->zStream->avail_in, acb->zStream->avail_out);
s->stream_in_use &= ~(1 << acb->zStream_id);
/* nofity the caller */
qemu_iovec_from_buf(acb->qiov, 0, buf + acb->offset, acb->len);
acb->common.cb(acb->common.opaque, 0);
/* fill the cache */
cache = &s->cache[s->cache_index];
s->cache_index++;
if (s->cache_index == CACHE_COUNT)
s->cache_index = 0;
cache->len = 0;
if (cache->buf)
g_free(cache->buf);
cache->start = acb->gz_start;
cache->buf = buf;
cache->len = acb->chunks_len;
/* free occupied ressources */
g_free(acb->qiov_gz);
qemu_aio_unref(acb);
}
static const AIOCBInfo dictzip_aiocb_info = {
.aiocb_size = sizeof(DictZipAIOCB),
};
/* This is where we get a request from a caller to read something */
static BlockAIOCB *dictzip_aio_readv(BlockDriverState *bs,
int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
BlockCompletionFunc *cb, void *opaque)
{
BDRVDictZipState *s = bs->opaque;
DictZipAIOCB *acb;
QEMUIOVector *qiov_gz;
struct iovec *iov;
uint8_t *buf;
size_t start = sector_num * SECTOR_SIZE;
size_t len = nb_sectors * SECTOR_SIZE;
size_t end = start + len;
size_t gz_start;
size_t gz_len;
int64_t gz_sector_num;
int gz_nb_sectors;
int first_chunk, last_chunk;
int first_offset;
int i;
acb = qemu_aio_get(&dictzip_aiocb_info, bs, cb, opaque);
if (!acb)
return NULL;
/* Search Cache */
for (i = 0; i < CACHE_COUNT; i++) {
if (!s->cache[i].len)
continue;
if ((start >= s->cache[i].start) &&
(end <= (s->cache[i].start + s->cache[i].len))) {
acb->buf = s->cache[i].buf + (start - s->cache[i].start);
acb->len = len;
acb->qiov = qiov;
acb->bh = qemu_bh_new(dictzip_cache_cb, acb);
qemu_bh_schedule(acb->bh);
return &acb->common;
}
}
/* No cache, so let's decode */
/* We need to read these chunks */
first_chunk = start / s->chunk_len;
first_offset = start - first_chunk * s->chunk_len;
last_chunk = end / s->chunk_len;
gz_start = s->offsets[first_chunk];
gz_len = 0;
for (i = first_chunk; i <= last_chunk; i++) {
if (s->chunks32)
gz_len += le32_to_cpu(s->chunks32[i]);
else
gz_len += le16_to_cpu(s->chunks[i]);
}
gz_sector_num = gz_start / SECTOR_SIZE;
gz_nb_sectors = (gz_len / SECTOR_SIZE);
/* account for tail and heads */
while ((gz_start + gz_len) > ((gz_sector_num + gz_nb_sectors) * SECTOR_SIZE))
gz_nb_sectors++;
/* Allocate qiov, iov and buf in one chunk so we only need to free qiov */
qiov_gz = g_malloc0(sizeof(QEMUIOVector) + sizeof(struct iovec) +
(gz_nb_sectors * SECTOR_SIZE));
iov = (struct iovec *)(((char *)qiov_gz) + sizeof(QEMUIOVector));
buf = ((uint8_t *)iov) + sizeof(struct iovec *);
/* Kick off the read by the backing file, so we can start decompressing */
iov->iov_base = (void *)buf;
iov->iov_len = gz_nb_sectors * 512;
qemu_iovec_init_external(qiov_gz, iov, 1);
dprintf("read %zd - %zd => %zd - %zd\n", start, end, gz_start, gz_start + gz_len);
acb->s = s;
acb->qiov = qiov;
acb->qiov_gz = qiov_gz;
acb->start = start;
acb->len = len;
acb->gzipped = buf + (gz_start % SECTOR_SIZE);
acb->gz_len = gz_len;
acb->gz_start = first_chunk * s->chunk_len;
acb->offset = first_offset;
acb->chunks_len = (last_chunk - first_chunk + 1) * s->chunk_len;
return bdrv_aio_readv(s->hd->file, gz_sector_num, qiov_gz, gz_nb_sectors,
dictzip_read_cb, acb);
}
static void dictzip_close(BlockDriverState *bs)
{
BDRVDictZipState *s = bs->opaque;
int i;
for (i = 0; i < CACHE_COUNT; i++) {
if (!s->cache[i].len)
continue;
g_free(s->cache[i].buf);
}
for (i = 0; i < Z_STREAM_COUNT; i++) {
inflateEnd(&s->zStream[i]);
}
if (s->chunks)
g_free(s->chunks);
if (s->offsets)
g_free(s->offsets);
dprintf("Close\n");
}
static int64_t dictzip_getlength(BlockDriverState *bs)
{
BDRVDictZipState *s = bs->opaque;
dprintf("getlength -> %ld\n", s->file_len);
return s->file_len;
}
static BlockDriver bdrv_dictzip = {
.format_name = "dzip",
.protocol_name = "dzip",
.instance_size = sizeof(BDRVDictZipState),
.bdrv_file_open = dictzip_open,
.bdrv_close = dictzip_close,
.bdrv_getlength = dictzip_getlength,
.bdrv_aio_readv = dictzip_aio_readv,
};
static void dictzip_block_init(void)
{
bdrv_register(&bdrv_dictzip);
}
block_init(dictzip_block_init);

View File

@@ -345,7 +345,7 @@ BlockDirtyInfoList *bdrv_query_dirty_bitmaps(BlockDriverState *bs)
QLIST_FOREACH(bm, &bs->dirty_bitmaps, list) {
BlockDirtyInfo *info = g_new0(BlockDirtyInfo, 1);
BlockDirtyInfoList *entry = g_new0(BlockDirtyInfoList, 1);
info->count = bdrv_get_dirty_count(bm);
info->count = bdrv_get_dirty_count(bm) << BDRV_SECTOR_BITS;
info->granularity = bdrv_dirty_bitmap_granularity(bm);
info->has_name = !!bm->name;
info->name = g_strdup(bm->name);

View File

@@ -377,7 +377,7 @@ static void raw_parse_filename(const char *filename, QDict *options,
* function call can be ignored. */
strstart(filename, "file:", &filename);
qdict_put_obj(options, "filename", QOBJECT(qstring_from_str(filename)));
qdict_put_str(options, "filename", filename);
}
static QemuOptsList raw_runtime_opts = {
@@ -2150,7 +2150,7 @@ static void hdev_parse_filename(const char *filename, QDict *options,
/* The prefix is optional, just as for "file". */
strstart(filename, "host_device:", &filename);
qdict_put_obj(options, "filename", QOBJECT(qstring_from_str(filename)));
qdict_put_str(options, "filename", filename);
}
static bool hdev_is_sg(BlockDriverState *bs)
@@ -2239,7 +2239,7 @@ static int hdev_open(BlockDriverState *bs, QDict *options, int flags,
goto hdev_open_Mac_error;
}
qdict_put(options, "filename", qstring_from_str(bsd_path));
qdict_put_str(options, "filename", bsd_path);
hdev_open_Mac_error:
g_free(mediaType);
@@ -2449,7 +2449,7 @@ static void cdrom_parse_filename(const char *filename, QDict *options,
/* The prefix is optional, just as for "file". */
strstart(filename, "host_cdrom:", &filename);
qdict_put_obj(options, "filename", QOBJECT(qstring_from_str(filename)));
qdict_put_str(options, "filename", filename);
}
#endif

View File

@@ -282,7 +282,7 @@ static void raw_parse_filename(const char *filename, QDict *options,
* function call can be ignored. */
strstart(filename, "file:", &filename);
qdict_put_obj(options, "filename", QOBJECT(qstring_from_str(filename)));
qdict_put_str(options, "filename", filename);
}
static QemuOptsList raw_runtime_opts = {
@@ -669,7 +669,7 @@ static void hdev_parse_filename(const char *filename, QDict *options,
/* The prefix is optional, just as for "file". */
strstart(filename, "host_device:", &filename);
qdict_put_obj(options, "filename", QOBJECT(qstring_from_str(filename)));
qdict_put_str(options, "filename", filename);
}
static int hdev_open(BlockDriverState *bs, QDict *options, int flags,

View File

@@ -1757,6 +1757,7 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
int64_t n;
int64_t ret, ret2;
*file = NULL;
total_sectors = bdrv_nb_sectors(bs);
if (total_sectors < 0) {
return total_sectors;
@@ -1777,11 +1778,11 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
ret = BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED;
if (bs->drv->protocol_name) {
ret |= BDRV_BLOCK_OFFSET_VALID | (sector_num * BDRV_SECTOR_SIZE);
*file = bs;
}
return ret;
}
*file = NULL;
bdrv_inc_in_flight(bs);
ret = bs->drv->bdrv_co_get_block_status(bs, sector_num, nb_sectors, pnum,
file);
@@ -1791,9 +1792,9 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
}
if (ret & BDRV_BLOCK_RAW) {
assert(ret & BDRV_BLOCK_OFFSET_VALID);
ret = bdrv_get_block_status(*file, ret >> BDRV_SECTOR_BITS,
*pnum, pnum, file);
assert(ret & BDRV_BLOCK_OFFSET_VALID && *file);
ret = bdrv_co_get_block_status(*file, ret >> BDRV_SECTOR_BITS,
*pnum, pnum, file);
goto out;
}

View File

@@ -684,7 +684,7 @@ static int64_t coroutine_fn iscsi_co_get_block_status(BlockDriverState *bs,
struct scsi_get_lba_status *lbas = NULL;
struct scsi_lba_status_descriptor *lbasd = NULL;
struct IscsiTask iTask;
int64_t ret;
int64_t ret, max_sector;
iscsi_co_init_iscsitask(iscsilun, &iTask);
@@ -703,6 +703,7 @@ static int64_t coroutine_fn iscsi_co_get_block_status(BlockDriverState *bs,
goto out;
}
max_sector = iscsilun->num_blocks - sector_num;
qemu_mutex_lock(&iscsilun->mutex);
retry:
if (iscsi_get_lba_status_task(iscsilun->iscsi, iscsilun->lun,
@@ -750,7 +751,7 @@ retry:
goto out_unlock;
}
*pnum = sector_lun2qemu(lbasd->num_blocks, iscsilun);
*pnum = MIN(sector_lun2qemu(lbasd->num_blocks, iscsilun), max_sector);
if (lbasd->provisioning == SCSI_PROVISIONING_TYPE_DEALLOCATED ||
lbasd->provisioning == SCSI_PROVISIONING_TYPE_ANCHORED) {
@@ -965,8 +966,7 @@ iscsi_aio_ioctl_cb(struct iscsi_context *iscsi, int status,
acb->ioh->driver_status |= SG_ERR_DRIVER_SENSE;
acb->ioh->sb_len_wr = acb->task->datain.size - 2;
ss = (acb->ioh->mx_sb_len >= acb->ioh->sb_len_wr) ?
acb->ioh->mx_sb_len : acb->ioh->sb_len_wr;
ss = MIN(acb->ioh->mx_sb_len, acb->ioh->sb_len_wr);
memcpy(acb->ioh->sbp, &acb->task->datain.data[2], ss);
}

View File

@@ -514,7 +514,12 @@ static void mirror_exit(BlockJob *job, void *opaque)
/* Remove target parent that still uses BLK_PERM_WRITE/RESIZE before
* inserting target_bs at s->to_replace, where we might not be able to get
* these permissions. */
* these permissions.
*
* Note that blk_unref() alone doesn't necessarily drop permissions because
* we might be running nested inside mirror_drain(), which takes an extra
* reference, so use an explicit blk_set_perm() first. */
blk_set_perm(s->target, 0, BLK_PERM_ALL, &error_abort);
blk_unref(s->target);
s->target = NULL;
@@ -724,7 +729,7 @@ static void coroutine_fn mirror_run(void *opaque)
}
if (s->bdev_length > base_length) {
ret = blk_truncate(s->target, s->bdev_length);
ret = blk_truncate(s->target, s->bdev_length, NULL);
if (ret < 0) {
goto immediate_exit;
}
@@ -1147,6 +1152,9 @@ static void mirror_start_job(const char *job_id, BlockDriverState *bs,
if (mirror_top_bs == NULL) {
return;
}
if (!filter_node_name) {
mirror_top_bs->implicit = true;
}
mirror_top_bs->total_sectors = bs->total_sectors;
bdrv_set_aio_context(mirror_top_bs, bdrv_get_aio_context(bs));

View File

@@ -352,14 +352,14 @@ int nbd_client_co_pdiscard(BlockDriverState *bs, int64_t offset, int count)
void nbd_client_detach_aio_context(BlockDriverState *bs)
{
NBDClientSession *client = nbd_get_client_session(bs);
qio_channel_detach_aio_context(QIO_CHANNEL(client->sioc));
qio_channel_detach_aio_context(QIO_CHANNEL(client->ioc));
}
void nbd_client_attach_aio_context(BlockDriverState *bs,
AioContext *new_context)
{
NBDClientSession *client = nbd_get_client_session(bs);
qio_channel_attach_aio_context(QIO_CHANNEL(client->sioc), new_context);
qio_channel_attach_aio_context(QIO_CHANNEL(client->ioc), new_context);
aio_co_schedule(new_context, client->read_reply_co);
}

View File

@@ -65,11 +65,11 @@ static int nbd_parse_uri(const char *filename, QDict *options)
}
/* transport */
if (!strcmp(uri->scheme, "nbd")) {
if (!g_strcmp0(uri->scheme, "nbd")) {
is_unix = false;
} else if (!strcmp(uri->scheme, "nbd+tcp")) {
} else if (!g_strcmp0(uri->scheme, "nbd+tcp")) {
is_unix = false;
} else if (!strcmp(uri->scheme, "nbd+unix")) {
} else if (!g_strcmp0(uri->scheme, "nbd+unix")) {
is_unix = true;
} else {
ret = -EINVAL;
@@ -79,7 +79,7 @@ static int nbd_parse_uri(const char *filename, QDict *options)
p = uri->path ? uri->path : "/";
p += strspn(p, "/");
if (p[0]) {
qdict_put(options, "export", qstring_from_str(p));
qdict_put_str(options, "export", p);
}
qp = query_params_parse(uri->query);
@@ -94,9 +94,8 @@ static int nbd_parse_uri(const char *filename, QDict *options)
ret = -EINVAL;
goto out;
}
qdict_put(options, "server.type", qstring_from_str("unix"));
qdict_put(options, "server.path",
qstring_from_str(qp->p[0].value));
qdict_put_str(options, "server.type", "unix");
qdict_put_str(options, "server.path", qp->p[0].value);
} else {
QString *host;
char *port_str;
@@ -115,11 +114,11 @@ static int nbd_parse_uri(const char *filename, QDict *options)
host = qstring_from_str(uri->server);
}
qdict_put(options, "server.type", qstring_from_str("inet"));
qdict_put_str(options, "server.type", "inet");
qdict_put(options, "server.host", host);
port_str = g_strdup_printf("%d", uri->port ?: NBD_DEFAULT_PORT);
qdict_put(options, "server.port", qstring_from_str(port_str));
qdict_put_str(options, "server.port", port_str);
g_free(port_str);
}
@@ -181,7 +180,7 @@ static void nbd_parse_filename(const char *filename, QDict *options,
export_name[0] = 0; /* truncate 'file' */
export_name += strlen(EN_OPTSTR);
qdict_put(options, "export", qstring_from_str(export_name));
qdict_put_str(options, "export", export_name);
}
/* extract the host_spec - fail if it's not nbd:... */
@@ -196,8 +195,8 @@ static void nbd_parse_filename(const char *filename, QDict *options,
/* are we a UNIX or TCP socket? */
if (strstart(host_spec, "unix:", &unixpath)) {
qdict_put(options, "server.type", qstring_from_str("unix"));
qdict_put(options, "server.path", qstring_from_str(unixpath));
qdict_put_str(options, "server.type", "unix");
qdict_put_str(options, "server.path", unixpath);
} else {
InetSocketAddress *addr = NULL;
@@ -206,9 +205,9 @@ static void nbd_parse_filename(const char *filename, QDict *options,
goto out;
}
qdict_put(options, "server.type", qstring_from_str("inet"));
qdict_put(options, "server.host", qstring_from_str(addr->host));
qdict_put(options, "server.port", qstring_from_str(addr->port));
qdict_put_str(options, "server.type", "inet");
qdict_put_str(options, "server.host", addr->host);
qdict_put_str(options, "server.port", addr->port);
qapi_free_InetSocketAddress(addr);
}
@@ -247,13 +246,13 @@ static bool nbd_process_legacy_socket_options(QDict *output_options,
return false;
}
qdict_put(output_options, "server.type", qstring_from_str("unix"));
qdict_put(output_options, "server.path", qstring_from_str(path));
qdict_put_str(output_options, "server.type", "unix");
qdict_put_str(output_options, "server.path", path);
} else if (host) {
qdict_put(output_options, "server.type", qstring_from_str("inet"));
qdict_put(output_options, "server.host", qstring_from_str(host));
qdict_put(output_options, "server.port",
qstring_from_str(port ?: stringify(NBD_DEFAULT_PORT)));
qdict_put_str(output_options, "server.type", "inet");
qdict_put_str(output_options, "server.host", host);
qdict_put_str(output_options, "server.port",
port ?: stringify(NBD_DEFAULT_PORT));
}
return true;
@@ -528,7 +527,7 @@ static void nbd_refresh_filename(BlockDriverState *bs, QDict *options)
path = s->saddr->u.q_unix.path;
} /* else can't represent as pseudo-filename */
qdict_put(opts, "driver", qstring_from_str("nbd"));
qdict_put_str(opts, "driver", "nbd");
if (path && s->export) {
snprintf(bs->exact_filename, sizeof(bs->exact_filename),
@@ -551,10 +550,10 @@ static void nbd_refresh_filename(BlockDriverState *bs, QDict *options)
qdict_put_obj(opts, "server", saddr_qdict);
if (s->export) {
qdict_put(opts, "export", qstring_from_str(s->export));
qdict_put_str(opts, "export", s->export);
}
if (s->tlscredsid) {
qdict_put(opts, "tls-creds", qstring_from_str(s->tlscredsid));
qdict_put_str(opts, "tls-creds", s->tlscredsid);
}
qdict_flatten(opts);

View File

@@ -83,7 +83,7 @@ static int nfs_parse_uri(const char *filename, QDict *options, Error **errp)
error_setg(errp, "Invalid URI specified");
goto out;
}
if (strcmp(uri->scheme, "nfs") != 0) {
if (g_strcmp0(uri->scheme, "nfs") != 0) {
error_setg(errp, "URI scheme must be 'nfs'");
goto out;
}
@@ -104,9 +104,9 @@ static int nfs_parse_uri(const char *filename, QDict *options, Error **errp)
goto out;
}
qdict_put(options, "server.host", qstring_from_str(uri->server));
qdict_put(options, "server.type", qstring_from_str("inet"));
qdict_put(options, "path", qstring_from_str(uri->path));
qdict_put_str(options, "server.host", uri->server);
qdict_put_str(options, "server.type", "inet");
qdict_put_str(options, "path", uri->path);
for (i = 0; i < qp->n; i++) {
unsigned long long val;
@@ -121,23 +121,17 @@ static int nfs_parse_uri(const char *filename, QDict *options, Error **errp)
goto out;
}
if (!strcmp(qp->p[i].name, "uid")) {
qdict_put(options, "user",
qstring_from_str(qp->p[i].value));
qdict_put_str(options, "user", qp->p[i].value);
} else if (!strcmp(qp->p[i].name, "gid")) {
qdict_put(options, "group",
qstring_from_str(qp->p[i].value));
qdict_put_str(options, "group", qp->p[i].value);
} else if (!strcmp(qp->p[i].name, "tcp-syncnt")) {
qdict_put(options, "tcp-syn-count",
qstring_from_str(qp->p[i].value));
qdict_put_str(options, "tcp-syn-count", qp->p[i].value);
} else if (!strcmp(qp->p[i].name, "readahead")) {
qdict_put(options, "readahead-size",
qstring_from_str(qp->p[i].value));
qdict_put_str(options, "readahead-size", qp->p[i].value);
} else if (!strcmp(qp->p[i].name, "pagecache")) {
qdict_put(options, "page-cache-size",
qstring_from_str(qp->p[i].value));
qdict_put_str(options, "page-cache-size", qp->p[i].value);
} else if (!strcmp(qp->p[i].name, "debug")) {
qdict_put(options, "debug",
qstring_from_str(qp->p[i].value));
qdict_put_str(options, "debug", qp->p[i].value);
} else {
error_setg(errp, "Unknown NFS parameter name: %s",
qp->p[i].name);
@@ -440,19 +434,23 @@ static void nfs_client_close(NFSClient *client)
if (client->context) {
if (client->fh) {
nfs_close(client->context, client->fh);
client->fh = NULL;
}
aio_set_fd_handler(client->aio_context, nfs_get_fd(client->context),
false, NULL, NULL, NULL, NULL);
nfs_destroy_context(client->context);
client->context = NULL;
}
memset(client, 0, sizeof(NFSClient));
g_free(client->path);
qemu_mutex_destroy(&client->mutex);
qapi_free_NFSServer(client->server);
client->server = NULL;
}
static void nfs_file_close(BlockDriverState *bs)
{
NFSClient *client = bs->opaque;
nfs_client_close(client);
qemu_mutex_destroy(&client->mutex);
}
static NFSServer *nfs_config(QDict *options, Error **errp)
@@ -505,6 +503,7 @@ static int64_t nfs_client_open(NFSClient *client, QDict *options,
struct stat st;
char *file = NULL, *strp = NULL;
qemu_mutex_init(&client->mutex);
opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort);
qemu_opts_absorb_qdict(opts, options, &local_err);
if (local_err) {
@@ -667,7 +666,7 @@ static int nfs_file_open(BlockDriverState *bs, QDict *options, int flags,
if (ret < 0) {
return ret;
}
qemu_mutex_init(&client->mutex);
bs->total_sectors = ret;
ret = 0;
return ret;
@@ -811,7 +810,7 @@ static void nfs_refresh_filename(BlockDriverState *bs, QDict *options)
QObject *server_qdict;
Visitor *ov;
qdict_put(opts, "driver", qstring_from_str("nfs"));
qdict_put_str(opts, "driver", "nfs");
if (client->uid && !client->gid) {
snprintf(bs->exact_filename, sizeof(bs->exact_filename),
@@ -834,28 +833,25 @@ static void nfs_refresh_filename(BlockDriverState *bs, QDict *options)
visit_type_NFSServer(ov, NULL, &client->server, &error_abort);
visit_complete(ov, &server_qdict);
qdict_put_obj(opts, "server", server_qdict);
qdict_put(opts, "path", qstring_from_str(client->path));
qdict_put_str(opts, "path", client->path);
if (client->uid) {
qdict_put(opts, "user", qint_from_int(client->uid));
qdict_put_int(opts, "user", client->uid);
}
if (client->gid) {
qdict_put(opts, "group", qint_from_int(client->gid));
qdict_put_int(opts, "group", client->gid);
}
if (client->tcp_syncnt) {
qdict_put(opts, "tcp-syn-cnt",
qint_from_int(client->tcp_syncnt));
qdict_put_int(opts, "tcp-syn-cnt", client->tcp_syncnt);
}
if (client->readahead) {
qdict_put(opts, "readahead-size",
qint_from_int(client->readahead));
qdict_put_int(opts, "readahead-size", client->readahead);
}
if (client->pagecache) {
qdict_put(opts, "page-cache-size",
qint_from_int(client->pagecache));
qdict_put_int(opts, "page-cache-size", client->pagecache);
}
if (client->debug) {
qdict_put(opts, "debug", qint_from_int(client->debug));
qdict_put_int(opts, "debug", client->debug);
}
visit_free(ov);

View File

@@ -232,7 +232,7 @@ static void null_refresh_filename(BlockDriverState *bs, QDict *opts)
bs->drv->format_name);
}
qdict_put(opts, "driver", qstring_from_str(bs->drv->format_name));
qdict_put_str(opts, "driver", bs->drv->format_name);
bs->full_open_options = opts;
}

View File

@@ -223,7 +223,8 @@ static int64_t allocate_clusters(BlockDriverState *bs, int64_t sector_num,
space << BDRV_SECTOR_BITS, 0);
} else {
ret = bdrv_truncate(bs->file,
(s->data_end + space) << BDRV_SECTOR_BITS);
(s->data_end + space) << BDRV_SECTOR_BITS,
NULL);
}
if (ret < 0) {
return ret;
@@ -456,8 +457,10 @@ static int parallels_check(BlockDriverState *bs, BdrvCheckResult *res,
size - res->image_end_offset);
res->leaks += count;
if (fix & BDRV_FIX_LEAKS) {
ret = bdrv_truncate(bs->file, res->image_end_offset);
Error *local_err = NULL;
ret = bdrv_truncate(bs->file, res->image_end_offset, &local_err);
if (ret < 0) {
error_report_err(local_err);
res->check_errors++;
return ret;
}
@@ -504,7 +507,7 @@ static int parallels_create(const char *filename, QemuOpts *opts, Error **errp)
blk_set_allow_write_beyond_eof(file, true);
ret = blk_truncate(file, 0);
ret = blk_truncate(file, 0, errp);
if (ret < 0) {
goto exit;
}
@@ -696,7 +699,7 @@ static int parallels_open(BlockDriverState *bs, QDict *options, int flags,
}
if (!(flags & BDRV_O_RESIZE) || !bdrv_has_zero_init(bs->file->bs) ||
bdrv_truncate(bs->file, bdrv_getlength(bs->file->bs)) != 0) {
bdrv_truncate(bs->file, bdrv_getlength(bs->file->bs), NULL) != 0) {
s->prealloc_mode = PRL_PREALLOC_MODE_FALLOCATE;
}
@@ -739,7 +742,7 @@ static void parallels_close(BlockDriverState *bs)
}
if (bs->open_flags & BDRV_O_RDWR) {
bdrv_truncate(bs->file, s->data_end << BDRV_SECTOR_BITS);
bdrv_truncate(bs->file, s->data_end << BDRV_SECTOR_BITS, NULL);
}
g_free(s->bat_dirty_bmap);

View File

@@ -64,7 +64,6 @@ BlockDeviceInfo *bdrv_block_device_info(BlockBackend *blk,
info->backing_file = g_strdup(bs->backing_file);
}
info->backing_file_depth = bdrv_get_backing_file_depth(bs);
info->detect_zeroes = bs->detect_zeroes;
if (blk && blk_get_public(blk)->throttle_state) {
@@ -125,6 +124,7 @@ BlockDeviceInfo *bdrv_block_device_info(BlockBackend *blk,
bs0 = bs;
p_image_info = &info->image;
info->backing_file_depth = 0;
while (1) {
Error *local_err = NULL;
bdrv_query_image_info(bs0, p_image_info, &local_err);
@@ -133,13 +133,21 @@ BlockDeviceInfo *bdrv_block_device_info(BlockBackend *blk,
qapi_free_BlockDeviceInfo(info);
return NULL;
}
if (bs0->drv && bs0->backing) {
info->backing_file_depth++;
bs0 = bs0->backing->bs;
(*p_image_info)->has_backing_image = true;
p_image_info = &((*p_image_info)->backing_image);
} else {
break;
}
/* Skip automatically inserted nodes that the user isn't aware of for
* query-block (blk != NULL), but not for query-named-block-nodes */
while (blk && bs0 && bs0->drv && bs0->implicit) {
bs0 = backing_bs(bs0);
}
}
return info;
@@ -322,6 +330,12 @@ static void bdrv_query_info(BlockBackend *blk, BlockInfo **p_info,
{
BlockInfo *info = g_malloc0(sizeof(*info));
BlockDriverState *bs = blk_bs(blk);
/* Skip automatically inserted nodes that the user isn't aware of */
while (bs && bs->drv && bs->implicit) {
bs = backing_bs(bs);
}
info->device = g_strdup(blk_name(blk));
info->type = g_strdup("unknown");
info->locked = blk_dev_is_medium_locked(blk);
@@ -424,8 +438,8 @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
}
}
static BlockStats *bdrv_query_bds_stats(const BlockDriverState *bs,
bool query_backing)
static BlockStats *bdrv_query_bds_stats(BlockDriverState *bs,
bool blk_level)
{
BlockStats *s = NULL;
@@ -436,6 +450,14 @@ static BlockStats *bdrv_query_bds_stats(const BlockDriverState *bs,
return s;
}
/* Skip automatically inserted nodes that the user isn't aware of in
* a BlockBackend-level command. Stay at the exact node for a node-level
* command. */
while (blk_level && bs->drv && bs->implicit) {
bs = backing_bs(bs);
assert(bs);
}
if (bdrv_get_node_name(bs)[0]) {
s->has_node_name = true;
s->node_name = g_strdup(bdrv_get_node_name(bs));
@@ -445,12 +467,12 @@ static BlockStats *bdrv_query_bds_stats(const BlockDriverState *bs,
if (bs->file) {
s->has_parent = true;
s->parent = bdrv_query_bds_stats(bs->file->bs, query_backing);
s->parent = bdrv_query_bds_stats(bs->file->bs, blk_level);
}
if (query_backing && bs->backing) {
if (blk_level && bs->backing) {
s->has_backing = true;
s->backing = bdrv_query_bds_stats(bs->backing->bs, query_backing);
s->backing = bdrv_query_bds_stats(bs->backing->bs, blk_level);
}
return s;

View File

@@ -473,7 +473,7 @@ static uint64_t get_cluster_offset(BlockDriverState *bs,
/* round to cluster size */
cluster_offset = (cluster_offset + s->cluster_size - 1) &
~(s->cluster_size - 1);
bdrv_truncate(bs->file, cluster_offset + s->cluster_size);
bdrv_truncate(bs->file, cluster_offset + s->cluster_size, NULL);
/* if encrypted, we must initialize the cluster
content which won't be written */
if (bs->encrypted &&
@@ -833,7 +833,7 @@ static int qcow_create(const char *filename, QemuOpts *opts, Error **errp)
blk_set_allow_write_beyond_eof(qcow_blk, true);
ret = blk_truncate(qcow_blk, 0);
ret = blk_truncate(qcow_blk, 0, errp);
if (ret < 0) {
goto exit;
}
@@ -916,7 +916,7 @@ static int qcow_make_empty(BlockDriverState *bs)
if (bdrv_pwrite_sync(bs->file, s->l1_table_offset, s->l1_table,
l1_length) < 0)
return -1;
ret = bdrv_truncate(bs->file, s->l1_table_offset + l1_length);
ret = bdrv_truncate(bs->file, s->l1_table_offset + l1_length, NULL);
if (ret < 0)
return ret;

View File

@@ -1728,14 +1728,17 @@ static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
if (fix & BDRV_FIX_ERRORS) {
int64_t new_nb_clusters;
Error *local_err = NULL;
if (offset > INT64_MAX - s->cluster_size) {
ret = -EINVAL;
goto resize_fail;
}
ret = bdrv_truncate(bs->file, offset + s->cluster_size);
ret = bdrv_truncate(bs->file, offset + s->cluster_size,
&local_err);
if (ret < 0) {
error_report_err(local_err);
goto resize_fail;
}
size = bdrv_getlength(bs->file->bs);

View File

@@ -524,25 +524,32 @@ static void read_cache_sizes(BlockDriverState *bs, QemuOpts *opts,
uint64_t *refcount_cache_size, Error **errp)
{
BDRVQcow2State *s = bs->opaque;
uint64_t combined_cache_size;
uint64_t combined_cache_size, l2_cache_max_setting;
bool l2_cache_size_set, refcount_cache_size_set, combined_cache_size_set;
int min_refcount_cache = MIN_REFCOUNT_CACHE_SIZE * s->cluster_size;
uint64_t virtual_disk_size = bs->total_sectors * BDRV_SECTOR_SIZE;
uint64_t max_l2_cache = virtual_disk_size / (s->cluster_size / 8);
combined_cache_size_set = qemu_opt_get(opts, QCOW2_OPT_CACHE_SIZE);
l2_cache_size_set = qemu_opt_get(opts, QCOW2_OPT_L2_CACHE_SIZE);
refcount_cache_size_set = qemu_opt_get(opts, QCOW2_OPT_REFCOUNT_CACHE_SIZE);
combined_cache_size = qemu_opt_get_size(opts, QCOW2_OPT_CACHE_SIZE, 0);
*l2_cache_size = qemu_opt_get_size(opts, QCOW2_OPT_L2_CACHE_SIZE, 0);
l2_cache_max_setting = qemu_opt_get_size(opts, QCOW2_OPT_L2_CACHE_SIZE,
DEFAULT_L2_CACHE_MAX_SIZE);
*refcount_cache_size = qemu_opt_get_size(opts,
QCOW2_OPT_REFCOUNT_CACHE_SIZE, 0);
*l2_cache_size = MIN(max_l2_cache, l2_cache_max_setting);
if (combined_cache_size_set) {
if (l2_cache_size_set && refcount_cache_size_set) {
error_setg(errp, QCOW2_OPT_CACHE_SIZE ", " QCOW2_OPT_L2_CACHE_SIZE
" and " QCOW2_OPT_REFCOUNT_CACHE_SIZE " may not be set "
"the same time");
return;
} else if (*l2_cache_size > combined_cache_size) {
} else if (l2_cache_size_set &&
(l2_cache_max_setting > combined_cache_size)) {
error_setg(errp, QCOW2_OPT_L2_CACHE_SIZE " may not exceed "
QCOW2_OPT_CACHE_SIZE);
return;
@@ -557,25 +564,20 @@ static void read_cache_sizes(BlockDriverState *bs, QemuOpts *opts,
} else if (refcount_cache_size_set) {
*l2_cache_size = combined_cache_size - *refcount_cache_size;
} else {
*refcount_cache_size = combined_cache_size
/ (DEFAULT_L2_REFCOUNT_SIZE_RATIO + 1);
*l2_cache_size = combined_cache_size - *refcount_cache_size;
}
} else {
if (!l2_cache_size_set && !refcount_cache_size_set) {
*l2_cache_size = MAX(DEFAULT_L2_CACHE_BYTE_SIZE,
(uint64_t)DEFAULT_L2_CACHE_CLUSTERS
* s->cluster_size);
*refcount_cache_size = *l2_cache_size
/ DEFAULT_L2_REFCOUNT_SIZE_RATIO;
} else if (!l2_cache_size_set) {
*l2_cache_size = *refcount_cache_size
* DEFAULT_L2_REFCOUNT_SIZE_RATIO;
} else if (!refcount_cache_size_set) {
*refcount_cache_size = *l2_cache_size
/ DEFAULT_L2_REFCOUNT_SIZE_RATIO;
/* Assign as much memory as possible to the L2 cache, and
* use the remainder for the refcount cache */
if (combined_cache_size >= max_l2_cache + min_refcount_cache) {
*l2_cache_size = max_l2_cache;
*refcount_cache_size = combined_cache_size - *l2_cache_size;
} else {
*refcount_cache_size =
MIN(combined_cache_size, min_refcount_cache);
*l2_cache_size = combined_cache_size - *refcount_cache_size;
}
}
}
/* l2_cache_size and refcount_cache_size are ensured to have at least
* their minimum values in qcow2_update_options_prepare() */
}
typedef struct Qcow2ReopenState {
@@ -667,7 +669,7 @@ static int qcow2_update_options_prepare(BlockDriverState *bs,
/* New interval for cache cleanup timer */
r->cache_clean_interval =
qemu_opt_get_number(opts, QCOW2_OPT_CACHE_CLEAN_INTERVAL,
s->cache_clean_interval);
DEFAULT_CACHE_CLEAN_INTERVAL);
#ifndef CONFIG_LINUX
if (r->cache_clean_interval != 0) {
error_setg(errp, QCOW2_OPT_CACHE_CLEAN_INTERVAL
@@ -997,7 +999,7 @@ static int qcow2_do_open(BlockDriverState *bs, QDict *options, int flags,
/* 2^(s->refcount_order - 3) is the refcount width in bytes */
s->refcount_block_bits = s->cluster_bits - (s->refcount_order - 3);
s->refcount_block_size = 1 << s->refcount_block_bits;
bs->total_sectors = header.size / 512;
bs->total_sectors = header.size / BDRV_SECTOR_SIZE;
s->csize_shift = (62 - (s->cluster_bits - 8));
s->csize_mask = (1 << (s->cluster_bits - 8)) - 1;
s->cluster_offset_mask = (1LL << s->csize_shift) - 1;
@@ -2265,7 +2267,7 @@ static int qcow2_create2(const char *filename, int64_t total_size,
* table)
*/
options = qdict_new();
qdict_put(options, "driver", qstring_from_str("qcow2"));
qdict_put_str(options, "driver", "qcow2");
blk = blk_new_open(filename, NULL, options,
BDRV_O_RDWR | BDRV_O_RESIZE | BDRV_O_NO_FLUSH,
&local_err);
@@ -2294,9 +2296,9 @@ static int qcow2_create2(const char *filename, int64_t total_size,
}
/* Okay, now that we have a valid image, let's give it the right size */
ret = blk_truncate(blk, total_size);
ret = blk_truncate(blk, total_size, errp);
if (ret < 0) {
error_setg_errno(errp, -ret, "Could not resize image");
error_prepend(errp, "Could not resize image: ");
goto out;
}
@@ -2327,7 +2329,7 @@ static int qcow2_create2(const char *filename, int64_t total_size,
/* Reopen the image without BDRV_O_NO_FLUSH to flush it before returning */
options = qdict_new();
qdict_put(options, "driver", qstring_from_str("qcow2"));
qdict_put_str(options, "driver", "qcow2");
blk = blk_new_open(filename, NULL, options,
BDRV_O_RDWR | BDRV_O_NO_BACKING, &local_err);
if (blk == NULL) {
@@ -2584,7 +2586,7 @@ qcow2_co_pwritev_compressed(BlockDriverState *bs, uint64_t offset,
/* align end of file to a sector boundary to ease reading with
sector based I/Os */
cluster_offset = bdrv_getlength(bs->file->bs);
return bdrv_truncate(bs->file, cluster_offset);
return bdrv_truncate(bs->file, cluster_offset, NULL);
}
buf = qemu_blockalign(bs, s->cluster_size);
@@ -2674,6 +2676,7 @@ fail:
static int make_completely_empty(BlockDriverState *bs)
{
BDRVQcow2State *s = bs->opaque;
Error *local_err = NULL;
int ret, l1_clusters;
int64_t offset;
uint64_t *new_reftable = NULL;
@@ -2798,8 +2801,10 @@ static int make_completely_empty(BlockDriverState *bs)
goto fail;
}
ret = bdrv_truncate(bs->file, (3 + l1_clusters) * s->cluster_size);
ret = bdrv_truncate(bs->file, (3 + l1_clusters) * s->cluster_size,
&local_err);
if (ret < 0) {
error_report_err(local_err);
goto fail;
}
@@ -3273,9 +3278,10 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
return ret;
}
ret = blk_truncate(blk, new_size);
ret = blk_truncate(blk, new_size, &local_err);
blk_unref(blk);
if (ret < 0) {
error_report_err(local_err);
return ret;
}
}

View File

@@ -27,6 +27,7 @@
#include "crypto/cipher.h"
#include "qemu/coroutine.h"
#include "qemu/units.h"
//#define DEBUG_ALLOC
//#define DEBUG_ALLOC2
@@ -42,11 +43,11 @@
/* 8 MB refcount table is enough for 2 PB images at 64k cluster size
* (128 GB for 512 byte clusters, 2 EB for 2 MB clusters) */
#define QCOW_MAX_REFTABLE_SIZE 0x800000
#define QCOW_MAX_REFTABLE_SIZE S_8MiB
/* 32 MB L1 table is enough for 2 PB images at 64k cluster size
* (128 GB for 512 byte clusters, 2 EB for 2 MB clusters) */
#define QCOW_MAX_L1_SIZE 0x2000000
#define QCOW_MAX_L1_SIZE S_32MiB
/* Allow for an average of 1k per snapshot table entry, should be plenty of
* space for snapshot names and IDs */
@@ -68,16 +69,16 @@
/* Must be at least 4 to cover all cases of refcount table growth */
#define MIN_REFCOUNT_CACHE_SIZE 4 /* clusters */
/* Whichever is more */
#define DEFAULT_L2_CACHE_CLUSTERS 8 /* clusters */
#define DEFAULT_L2_CACHE_BYTE_SIZE 1048576 /* bytes */
/* The refblock cache needs only a fourth of the L2 cache size to cover as many
* clusters */
#define DEFAULT_L2_REFCOUNT_SIZE_RATIO 4
#define DEFAULT_CLUSTER_SIZE 65536
#ifdef CONFIG_LINUX
#define DEFAULT_L2_CACHE_MAX_SIZE S_32MiB
#define DEFAULT_CACHE_CLEAN_INTERVAL 600 /* seconds */
#else
#define DEFAULT_L2_CACHE_MAX_SIZE S_8MiB
/* Cache clean interval is currently available only on Linux, so must be 0 */
#define DEFAULT_CACHE_CLEAN_INTERVAL 0
#endif
#define DEFAULT_CLUSTER_SIZE S_64KiB
#define QCOW2_OPT_LAZY_REFCOUNTS "lazy-refcounts"
#define QCOW2_OPT_DISCARD_REQUEST "pass-discard-request"

View File

@@ -635,7 +635,7 @@ static int qed_create(const char *filename, uint32_t cluster_size,
blk_set_allow_write_beyond_eof(blk, true);
/* File must start empty and grow, check truncate is supported */
ret = blk_truncate(blk, 0);
ret = blk_truncate(blk, 0, errp);
if (ret < 0) {
goto out;
}

View File

@@ -1096,19 +1096,15 @@ static void quorum_refresh_filename(BlockDriverState *bs, QDict *options)
children = qlist_new();
for (i = 0; i < s->num_children; i++) {
QINCREF(s->children[i]->bs->full_open_options);
qlist_append_obj(children,
QOBJECT(s->children[i]->bs->full_open_options));
qlist_append(children, s->children[i]->bs->full_open_options);
}
opts = qdict_new();
qdict_put_obj(opts, "driver", QOBJECT(qstring_from_str("quorum")));
qdict_put_obj(opts, QUORUM_OPT_VOTE_THRESHOLD,
QOBJECT(qint_from_int(s->threshold)));
qdict_put_obj(opts, QUORUM_OPT_BLKVERIFY,
QOBJECT(qbool_from_bool(s->is_blkverify)));
qdict_put_obj(opts, QUORUM_OPT_REWRITE,
QOBJECT(qbool_from_bool(s->rewrite_corrupted)));
qdict_put_obj(opts, "children", QOBJECT(children));
qdict_put_str(opts, "driver", "quorum");
qdict_put_int(opts, QUORUM_OPT_VOTE_THRESHOLD, s->threshold);
qdict_put_bool(opts, QUORUM_OPT_BLKVERIFY, s->is_blkverify);
qdict_put_bool(opts, QUORUM_OPT_REWRITE, s->rewrite_corrupted);
qdict_put(opts, "children", children);
bs->full_open_options = opts;
}

View File

@@ -341,7 +341,7 @@ static int raw_truncate(BlockDriverState *bs, int64_t offset)
s->size = offset;
offset += s->offset;
return bdrv_truncate(bs->file, offset);
return bdrv_truncate(bs->file, offset, NULL);
}
static int raw_media_changed(BlockDriverState *bs)

View File

@@ -154,20 +154,20 @@ static void qemu_rbd_parse_filename(const char *filename, QDict *options,
goto done;
}
qemu_rbd_unescape(found_str);
qdict_put(options, "pool", qstring_from_str(found_str));
qdict_put_str(options, "pool", found_str);
if (strchr(p, '@')) {
found_str = qemu_rbd_next_tok(p, '@', &p);
qemu_rbd_unescape(found_str);
qdict_put(options, "image", qstring_from_str(found_str));
qdict_put_str(options, "image", found_str);
found_str = qemu_rbd_next_tok(p, ':', &p);
qemu_rbd_unescape(found_str);
qdict_put(options, "snapshot", qstring_from_str(found_str));
qdict_put_str(options, "snapshot", found_str);
} else {
found_str = qemu_rbd_next_tok(p, ':', &p);
qemu_rbd_unescape(found_str);
qdict_put(options, "image", qstring_from_str(found_str));
qdict_put_str(options, "image", found_str);
}
if (!p) {
goto done;
@@ -189,9 +189,9 @@ static void qemu_rbd_parse_filename(const char *filename, QDict *options,
qemu_rbd_unescape(value);
if (!strcmp(name, "conf")) {
qdict_put(options, "conf", qstring_from_str(value));
qdict_put_str(options, "conf", value);
} else if (!strcmp(name, "id")) {
qdict_put(options, "user" , qstring_from_str(value));
qdict_put_str(options, "user", value);
} else {
/*
* We pass these internally to qemu_rbd_set_keypairs(), so
@@ -204,8 +204,8 @@ static void qemu_rbd_parse_filename(const char *filename, QDict *options,
if (!keypairs) {
keypairs = qlist_new();
}
qlist_append(keypairs, qstring_from_str(name));
qlist_append(keypairs, qstring_from_str(value));
qlist_append_str(keypairs, name);
qlist_append_str(keypairs, value);
}
}

View File

@@ -1052,11 +1052,11 @@ static void sd_parse_uri(SheepdogConfig *cfg, const char *filename,
}
/* transport */
if (!strcmp(uri->scheme, "sheepdog")) {
if (!g_strcmp0(uri->scheme, "sheepdog")) {
is_unix = false;
} else if (!strcmp(uri->scheme, "sheepdog+tcp")) {
} else if (!g_strcmp0(uri->scheme, "sheepdog+tcp")) {
is_unix = false;
} else if (!strcmp(uri->scheme, "sheepdog+unix")) {
} else if (!g_strcmp0(uri->scheme, "sheepdog+unix")) {
is_unix = true;
} else {
error_setg(&err, "URI scheme must be 'sheepdog', 'sheepdog+tcp',"

View File

@@ -200,7 +200,7 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
qdict_extract_subqdict(options, &file_options, "file.");
QDECREF(file_options);
qdict_put(options, "file", qstring_from_str(bdrv_get_node_name(file)));
qdict_put_str(options, "file", bdrv_get_node_name(file));
drv->bdrv_close(bs);
bdrv_unref_child(bs, bs->file);

View File

@@ -205,7 +205,7 @@ static int parse_uri(const char *filename, QDict *options, Error **errp)
return -EINVAL;
}
if (strcmp(uri->scheme, "ssh") != 0) {
if (g_strcmp0(uri->scheme, "ssh") != 0) {
error_setg(errp, "URI scheme must be 'ssh'");
goto err;
}
@@ -227,24 +227,23 @@ static int parse_uri(const char *filename, QDict *options, Error **errp)
}
if(uri->user && strcmp(uri->user, "") != 0) {
qdict_put(options, "user", qstring_from_str(uri->user));
qdict_put_str(options, "user", uri->user);
}
qdict_put(options, "server.host", qstring_from_str(uri->server));
qdict_put_str(options, "server.host", uri->server);
port_str = g_strdup_printf("%d", uri->port ?: 22);
qdict_put(options, "server.port", qstring_from_str(port_str));
qdict_put_str(options, "server.port", port_str);
g_free(port_str);
qdict_put(options, "path", qstring_from_str(uri->path));
qdict_put_str(options, "path", uri->path);
/* Pick out any query parameters that we understand, and ignore
* the rest.
*/
for (i = 0; i < qp->n; ++i) {
if (strcmp(qp->p[i].name, "host_key_check") == 0) {
qdict_put(options, "host_key_check",
qstring_from_str(qp->p[i].value));
qdict_put_str(options, "host_key_check", qp->p[i].value);
}
}
@@ -574,9 +573,8 @@ static bool ssh_process_legacy_socket_options(QDict *output_opts,
}
if (host) {
qdict_put(output_opts, "server.host", qstring_from_str(host));
qdict_put(output_opts, "server.port",
qstring_from_str(port ?: stringify(22)));
qdict_put_str(output_opts, "server.host", host);
qdict_put_str(output_opts, "server.port", port ?: stringify(22));
}
return true;

View File

@@ -280,6 +280,6 @@ void stream_start(const char *job_id, BlockDriverState *bs,
fail:
if (orig_bs_flags != bdrv_get_flags(bs)) {
bdrv_reopen(bs, s->bs_flags, NULL);
bdrv_reopen(bs, orig_bs_flags, NULL);
}
}

379
block/tar.c Normal file
View File

@@ -0,0 +1,379 @@
/*
* Tar block driver
*
* Copyright (c) 2009 Alexander Graf <agraf@suse.de>
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*/
#include "qemu/osdep.h"
#include "qapi/error.h"
#include "qemu-common.h"
#include "block/block_int.h"
// #define DEBUG
#ifdef DEBUG
#define dprintf(fmt, ...) do { printf("tar: " fmt, ## __VA_ARGS__); } while (0)
#else
#define dprintf(fmt, ...) do { } while (0)
#endif
#define SECTOR_SIZE 512
#define POSIX_TAR_MAGIC "ustar"
#define OFFS_LENGTH 0x7c
#define OFFS_TYPE 0x9c
#define OFFS_MAGIC 0x101
#define OFFS_S_SP 0x182
#define OFFS_S_EXT 0x1e2
#define OFFS_S_LENGTH 0x1e3
#define OFFS_SX_EXT 0x1f8
typedef struct SparseCache {
uint64_t start;
uint64_t end;
} SparseCache;
typedef struct BDRVTarState {
BlockDriverState *hd;
size_t file_sec;
uint64_t file_len;
SparseCache *sparse;
int sparse_num;
uint64_t last_end;
char longfile[2048];
} BDRVTarState;
static int str_ends(char *str, const char *end)
{
int end_len = strlen(end);
int str_len = strlen(str);
if (str_len < end_len)
return 0;
return !strncmp(str + str_len - end_len, end, end_len);
}
static int is_target_file(BlockDriverState *bs, char *filename,
char *header)
{
int retval = 0;
if (str_ends(filename, ".raw"))
retval = 1;
if (str_ends(filename, ".qcow"))
retval = 1;
if (str_ends(filename, ".qcow2"))
retval = 1;
if (str_ends(filename, ".vmdk"))
retval = 1;
if (retval &&
(header[OFFS_TYPE] != '0') &&
(header[OFFS_TYPE] != 'S')) {
retval = 0;
}
dprintf("does filename %s match? %s\n", filename, retval ? "yes" : "no");
/* make sure we're not using this name again */
filename[0] = '\0';
return retval;
}
static uint64_t tar2u64(char *ptr)
{
uint64_t retval;
char oldend = ptr[12];
ptr[12] = '\0';
if (*ptr & 0x80) {
/* XXX we only support files up to 64 bit length */
retval = be64_to_cpu(*(uint64_t *)(ptr+4));
dprintf("Convert %lx -> %#lx\n", *(uint64_t*)(ptr+4), retval);
} else {
retval = strtol(ptr, NULL, 8);
dprintf("Convert %s -> %#lx\n", ptr, retval);
}
ptr[12] = oldend;
return retval;
}
static void tar_sparse(BDRVTarState *s, uint64_t offs, uint64_t len)
{
SparseCache *sparse;
if (!len)
return;
if (!(offs - s->last_end)) {
s->last_end += len;
return;
}
if (s->last_end > offs)
return;
dprintf("Last chunk until %lx new chunk at %lx\n", s->last_end, offs);
s->sparse = g_realloc(s->sparse, (s->sparse_num + 1) * sizeof(SparseCache));
sparse = &s->sparse[s->sparse_num];
sparse->start = s->last_end;
sparse->end = offs;
s->last_end = offs + len;
s->sparse_num++;
dprintf("Sparse at %lx end=%lx\n", sparse->start,
sparse->end);
}
static QemuOptsList runtime_opts = {
.name = "tar",
.head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
.desc = {
{
.name = "filename",
.type = QEMU_OPT_STRING,
.help = "URL to the tar file",
},
{ /* end of list */ }
},
};
static int tar_open(BlockDriverState *bs, QDict *options, int flags, Error **errp)
{
BDRVTarState *s = bs->opaque;
char header[SECTOR_SIZE];
char *real_file = header;
char *magic;
size_t header_offs = 0;
int ret;
QemuOpts *opts;
Error *local_err = NULL;
const char *filename;
opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort);
qemu_opts_absorb_qdict(opts, options, &local_err);
if (local_err != NULL) {
error_propagate(errp, local_err);
ret = -EINVAL;
goto fail;
}
filename = qemu_opt_get(opts, "filename");
if (!strncmp(filename, "tar://", 6))
filename += 6;
else if (!strncmp(filename, "tar:", 4))
filename += 4;
s->hd = bdrv_open(filename, NULL, NULL, flags | BDRV_O_PROTOCOL, errp);
if (!s->hd) {
ret = -EINVAL;
qemu_opts_del(opts);
return ret;
}
/* Search the file for an image */
do {
/* tar header */
if (bdrv_pread(s->hd->file, header_offs, header, SECTOR_SIZE) != SECTOR_SIZE)
goto fail;
if ((header_offs > 1) && !header[0]) {
fprintf(stderr, "Tar: No image file found in archive\n");
goto fail;
}
magic = &header[OFFS_MAGIC];
if (strncmp(magic, POSIX_TAR_MAGIC, 5)) {
fprintf(stderr, "Tar: Invalid magic: %s\n", magic);
goto fail;
}
dprintf("file type: %c\n", header[OFFS_TYPE]);
/* file length*/
s->file_len = (tar2u64(&header[OFFS_LENGTH]) + (SECTOR_SIZE - 1)) &
~(SECTOR_SIZE - 1);
s->file_sec = (header_offs / SECTOR_SIZE) + 1;
header_offs += s->file_len + SECTOR_SIZE;
if (header[OFFS_TYPE] == 'L') {
bdrv_pread(s->hd->file, header_offs - s->file_len, s->longfile,
sizeof(s->longfile));
s->longfile[sizeof(s->longfile)-1] = '\0';
real_file = header;
} else if (s->longfile[0]) {
real_file = s->longfile;
} else {
real_file = header;
}
} while(!is_target_file(bs, real_file, header));
/* We found an image! */
if (header[OFFS_TYPE] == 'S') {
uint8_t isextended;
int i;
for (i = OFFS_S_SP; i < (OFFS_S_SP + (4 * 24)); i += 24)
tar_sparse(s, tar2u64(&header[i]), tar2u64(&header[i+12]));
s->file_len = tar2u64(&header[OFFS_S_LENGTH]);
isextended = header[OFFS_S_EXT];
while (isextended) {
if (bdrv_pread(s->hd->file, s->file_sec * SECTOR_SIZE, header,
SECTOR_SIZE) != SECTOR_SIZE)
goto fail;
for (i = 0; i < (21 * 24); i += 24)
tar_sparse(s, tar2u64(&header[i]), tar2u64(&header[i+12]));
isextended = header[OFFS_SX_EXT];
s->file_sec++;
}
tar_sparse(s, s->file_len, 1);
}
qemu_opts_del(opts);
return 0;
fail:
fprintf(stderr, "Tar: Error opening file\n");
bdrv_unref(s->hd);
qemu_opts_del(opts);
return -EINVAL;
}
typedef struct TarAIOCB {
BlockAIOCB common;
QEMUBH *bh;
} TarAIOCB;
/* This callback gets invoked when we have pure sparseness */
static void tar_sparse_cb(void *opaque)
{
TarAIOCB *acb = (TarAIOCB *)opaque;
acb->common.cb(acb->common.opaque, 0);
qemu_bh_delete(acb->bh);
qemu_aio_unref(acb);
}
static AIOCBInfo tar_aiocb_info = {
.aiocb_size = sizeof(TarAIOCB),
};
/* This is where we get a request from a caller to read something */
static BlockAIOCB *tar_aio_readv(BlockDriverState *bs,
int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
BlockCompletionFunc *cb, void *opaque)
{
BDRVTarState *s = bs->opaque;
SparseCache *sparse;
int64_t sec_file = sector_num + s->file_sec;
int64_t start = sector_num * SECTOR_SIZE;
int64_t end = start + (nb_sectors * SECTOR_SIZE);
int i;
TarAIOCB *acb;
for (i = 0; i < s->sparse_num; i++) {
sparse = &s->sparse[i];
if (sparse->start > end) {
/* We expect the cache to be start increasing */
break;
} else if ((sparse->start < start) && (sparse->end <= start)) {
/* sparse before our offset */
sec_file -= (sparse->end - sparse->start) / SECTOR_SIZE;
} else if ((sparse->start <= start) && (sparse->end >= end)) {
/* all our sectors are sparse */
char *buf = g_malloc0(nb_sectors * SECTOR_SIZE);
acb = qemu_aio_get(&tar_aiocb_info, bs, cb, opaque);
qemu_iovec_from_buf(qiov, 0, buf, nb_sectors * SECTOR_SIZE);
g_free(buf);
acb->bh = qemu_bh_new(tar_sparse_cb, acb);
qemu_bh_schedule(acb->bh);
return &acb->common;
} else if (((sparse->start >= start) && (sparse->start < end)) ||
((sparse->end >= start) && (sparse->end < end))) {
/* we're semi-sparse (worst case) */
/* let's go synchronous and read all sectors individually */
char *buf = g_malloc(nb_sectors * SECTOR_SIZE);
uint64_t offs;
for (offs = 0; offs < (nb_sectors * SECTOR_SIZE);
offs += SECTOR_SIZE) {
bdrv_pread(bs->file, (sector_num * SECTOR_SIZE) + offs,
buf + offs, SECTOR_SIZE);
}
qemu_iovec_from_buf(qiov, 0, buf, nb_sectors * SECTOR_SIZE);
acb = qemu_aio_get(&tar_aiocb_info, bs, cb, opaque);
acb->bh = qemu_bh_new(tar_sparse_cb, acb);
qemu_bh_schedule(acb->bh);
return &acb->common;
}
}
return bdrv_aio_readv(s->hd->file, sec_file, qiov, nb_sectors,
cb, opaque);
}
static void tar_close(BlockDriverState *bs)
{
dprintf("Close\n");
}
static int64_t tar_getlength(BlockDriverState *bs)
{
BDRVTarState *s = bs->opaque;
dprintf("getlength -> %ld\n", s->file_len);
return s->file_len;
}
static BlockDriver bdrv_tar = {
.format_name = "tar",
.protocol_name = "tar",
.instance_size = sizeof(BDRVTarState),
.bdrv_file_open = tar_open,
.bdrv_close = tar_close,
.bdrv_getlength = tar_getlength,
.bdrv_aio_readv = tar_aio_readv,
};
static void tar_block_init(void)
{
bdrv_register(&bdrv_tar);
}
block_init(tar_block_init);

View File

@@ -832,9 +832,9 @@ static int vdi_create(const char *filename, QemuOpts *opts, Error **errp)
}
if (image_type == VDI_TYPE_STATIC) {
ret = blk_truncate(blk, offset + blocks * block_size);
ret = blk_truncate(blk, offset + blocks * block_size, errp);
if (ret < 0) {
error_setg(errp, "Failed to statically allocate %s", filename);
error_prepend(errp, "Failed to statically allocate %s", filename);
goto exit;
}
}

View File

@@ -548,7 +548,7 @@ static int vhdx_log_flush(BlockDriverState *bs, BDRVVHDXState *s,
if (new_file_size % (1024*1024)) {
/* round up to nearest 1MB boundary */
new_file_size = ((new_file_size >> 20) + 1) << 20;
bdrv_truncate(bs->file, new_file_size);
bdrv_truncate(bs->file, new_file_size, NULL);
}
}
qemu_vfree(desc_entries);

View File

@@ -1171,7 +1171,7 @@ static int vhdx_allocate_block(BlockDriverState *bs, BDRVVHDXState *s,
/* per the spec, the address for a block is in units of 1MB */
*new_offset = ROUND_UP(*new_offset, 1024 * 1024);
return bdrv_truncate(bs->file, *new_offset + s->block_size);
return bdrv_truncate(bs->file, *new_offset + s->block_size, NULL);
}
/*
@@ -1586,7 +1586,7 @@ exit:
static int vhdx_create_bat(BlockBackend *blk, BDRVVHDXState *s,
uint64_t image_size, VHDXImageType type,
bool use_zero_blocks, uint64_t file_offset,
uint32_t length)
uint32_t length, Error **errp)
{
int ret = 0;
uint64_t data_file_offset;
@@ -1607,16 +1607,21 @@ static int vhdx_create_bat(BlockBackend *blk, BDRVVHDXState *s,
if (type == VHDX_TYPE_DYNAMIC) {
/* All zeroes, so we can just extend the file - the end of the BAT
* is the furthest thing we have written yet */
ret = blk_truncate(blk, data_file_offset);
ret = blk_truncate(blk, data_file_offset, errp);
if (ret < 0) {
error_setg_errno(errp, -ret,
"Failed to resize the underlying file");
goto exit;
}
} else if (type == VHDX_TYPE_FIXED) {
ret = blk_truncate(blk, data_file_offset + image_size);
ret = blk_truncate(blk, data_file_offset + image_size, errp);
if (ret < 0) {
error_setg_errno(errp, -ret,
"Failed to resize the underlying file");
goto exit;
}
} else {
error_setg(errp, "Unsupported image type");
ret = -ENOTSUP;
goto exit;
}
@@ -1627,6 +1632,7 @@ static int vhdx_create_bat(BlockBackend *blk, BDRVVHDXState *s,
/* for a fixed file, the default BAT entry is not zero */
s->bat = g_try_malloc0(length);
if (length && s->bat == NULL) {
error_setg(errp, "Failed to allocate memory for the BAT");
ret = -ENOMEM;
goto exit;
}
@@ -1646,6 +1652,7 @@ static int vhdx_create_bat(BlockBackend *blk, BDRVVHDXState *s,
}
ret = blk_pwrite(blk, file_offset, s->bat, length, 0);
if (ret < 0) {
error_setg_errno(errp, -ret, "Failed to write the BAT");
goto exit;
}
}
@@ -1671,7 +1678,8 @@ static int vhdx_create_new_region_table(BlockBackend *blk,
uint32_t log_size,
bool use_zero_blocks,
VHDXImageType type,
uint64_t *metadata_offset)
uint64_t *metadata_offset,
Error **errp)
{
int ret = 0;
uint32_t offset = 0;
@@ -1740,7 +1748,7 @@ static int vhdx_create_new_region_table(BlockBackend *blk,
/* The region table gives us the data we need to create the BAT,
* so do that now */
ret = vhdx_create_bat(blk, s, image_size, type, use_zero_blocks,
bat_file_offset, bat_length);
bat_file_offset, bat_length, errp);
if (ret < 0) {
goto exit;
}
@@ -1749,12 +1757,14 @@ static int vhdx_create_new_region_table(BlockBackend *blk,
ret = blk_pwrite(blk, VHDX_REGION_TABLE_OFFSET, buffer,
VHDX_HEADER_BLOCK_SIZE, 0);
if (ret < 0) {
error_setg_errno(errp, -ret, "Failed to write first region table");
goto exit;
}
ret = blk_pwrite(blk, VHDX_REGION_TABLE2_OFFSET, buffer,
VHDX_HEADER_BLOCK_SIZE, 0);
if (ret < 0) {
error_setg_errno(errp, -ret, "Failed to write second region table");
goto exit;
}
@@ -1825,6 +1835,7 @@ static int vhdx_create(const char *filename, QemuOpts *opts, Error **errp)
ret = -ENOTSUP;
goto exit;
} else {
error_setg(errp, "Invalid subformat '%s'", type);
ret = -EINVAL;
goto exit;
}
@@ -1879,12 +1890,14 @@ static int vhdx_create(const char *filename, QemuOpts *opts, Error **errp)
ret = blk_pwrite(blk, VHDX_FILE_ID_OFFSET, &signature, sizeof(signature),
0);
if (ret < 0) {
error_setg_errno(errp, -ret, "Failed to write file signature");
goto delete_and_exit;
}
if (creator) {
ret = blk_pwrite(blk, VHDX_FILE_ID_OFFSET + sizeof(signature),
creator, creator_items * sizeof(gunichar2), 0);
if (ret < 0) {
error_setg_errno(errp, -ret, "Failed to write creator field");
goto delete_and_exit;
}
}
@@ -1893,13 +1906,14 @@ static int vhdx_create(const char *filename, QemuOpts *opts, Error **errp)
/* Creates (B),(C) */
ret = vhdx_create_new_headers(blk, image_size, log_size);
if (ret < 0) {
error_setg_errno(errp, -ret, "Failed to write image headers");
goto delete_and_exit;
}
/* Creates (D),(E),(G) explicitly. (F) created as by-product */
ret = vhdx_create_new_region_table(blk, image_size, block_size, 512,
log_size, use_zero_blocks, image_type,
&metadata_offset);
&metadata_offset, errp);
if (ret < 0) {
goto delete_and_exit;
}
@@ -1908,6 +1922,7 @@ static int vhdx_create(const char *filename, QemuOpts *opts, Error **errp)
ret = vhdx_create_new_metadata(blk, image_size, block_size, 512,
metadata_offset, image_type);
if (ret < 0) {
error_setg_errno(errp, -ret, "Failed to initialize metadata");
goto delete_and_exit;
}

View File

@@ -1714,10 +1714,7 @@ static int vmdk_create_extent(const char *filename, int64_t filesize,
blk_set_allow_write_beyond_eof(blk, true);
if (flat) {
ret = blk_truncate(blk, filesize);
if (ret < 0) {
error_setg_errno(errp, -ret, "Could not truncate file");
}
ret = blk_truncate(blk, filesize, errp);
goto exit;
}
magic = cpu_to_be32(VMDK4_MAGIC);
@@ -1780,9 +1777,8 @@ static int vmdk_create_extent(const char *filename, int64_t filesize,
goto exit;
}
ret = blk_truncate(blk, le64_to_cpu(header.grain_offset) << 9);
ret = blk_truncate(blk, le64_to_cpu(header.grain_offset) << 9, errp);
if (ret < 0) {
error_setg_errno(errp, -ret, "Could not truncate file");
goto exit;
}
@@ -2090,10 +2086,7 @@ static int vmdk_create(const char *filename, QemuOpts *opts, Error **errp)
/* bdrv_pwrite write padding zeros to align to sector, we don't need that
* for description file */
if (desc_offset == 0) {
ret = blk_truncate(new_blk, desc_len);
if (ret < 0) {
error_setg_errno(errp, -ret, "Could not truncate file");
}
ret = blk_truncate(new_blk, desc_len, errp);
}
exit:
if (new_blk) {

View File

@@ -851,20 +851,21 @@ static int create_dynamic_disk(BlockBackend *blk, uint8_t *buf,
}
static int create_fixed_disk(BlockBackend *blk, uint8_t *buf,
int64_t total_size)
int64_t total_size, Error **errp)
{
int ret;
/* Add footer to total size */
total_size += HEADER_SIZE;
ret = blk_truncate(blk, total_size);
ret = blk_truncate(blk, total_size, errp);
if (ret < 0) {
return ret;
}
ret = blk_pwrite(blk, total_size - HEADER_SIZE, buf, HEADER_SIZE, 0);
if (ret < 0) {
error_setg_errno(errp, -ret, "Unable to write VHD header");
return ret;
}
@@ -996,11 +997,11 @@ static int vpc_create(const char *filename, QemuOpts *opts, Error **errp)
if (disk_type == VHD_DYNAMIC) {
ret = create_dynamic_disk(blk, buf, total_sectors);
if (ret < 0) {
error_setg(errp, "Unable to create or write VHD header");
}
} else {
ret = create_fixed_disk(blk, buf, total_size);
}
if (ret < 0) {
error_setg(errp, "Unable to create or write VHD header");
ret = create_fixed_disk(blk, buf, total_size, errp);
}
out:

View File

@@ -1057,10 +1057,10 @@ static void vvfat_parse_filename(const char *filename, QDict *options,
}
/* Fill in the options QDict */
qdict_put(options, "dir", qstring_from_str(filename));
qdict_put(options, "fat-type", qint_from_int(fat_type));
qdict_put(options, "floppy", qbool_from_bool(floppy));
qdict_put(options, "rw", qbool_from_bool(rw));
qdict_put_str(options, "dir", filename);
qdict_put_int(options, "fat-type", fat_type);
qdict_put_bool(options, "floppy", floppy);
qdict_put_bool(options, "rw", rw);
}
static int vvfat_open(BlockDriverState *bs, QDict *options, int flags,
@@ -2958,8 +2958,7 @@ vvfat_co_pwritev(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
static int64_t coroutine_fn vvfat_co_get_block_status(BlockDriverState *bs,
int64_t sector_num, int nb_sectors, int *n, BlockDriverState **file)
{
BDRVVVFATState* s = bs->opaque;
*n = s->sector_count - sector_num;
*n = bs->total_sectors - sector_num;
if (*n > nb_sectors) {
*n = nb_sectors;
} else if (*n < 0) {
@@ -3040,7 +3039,7 @@ static int enable_write_target(BlockDriverState *bs, Error **errp)
}
options = qdict_new();
qdict_put(options, "write-target.driver", qstring_from_str("qcow"));
qdict_put_str(options, "write-target.driver", "qcow");
s->qcow = bdrv_open_child(s->qcow_filename, options, "write-target", bs,
&child_vvfat_qcow, false, errp);
QDECREF(options);

View File

@@ -27,6 +27,10 @@ typedef struct NBDServerData {
static NBDServerData *nbd_server;
static void nbd_blockdev_client_closed(NBDClient *client, bool ignored)
{
nbd_client_put(client);
}
static gboolean nbd_accept(QIOChannel *ioc, GIOCondition condition,
gpointer opaque)
@@ -46,7 +50,7 @@ static gboolean nbd_accept(QIOChannel *ioc, GIOCondition condition,
qio_channel_set_name(QIO_CHANNEL(cioc), "nbd-server");
nbd_client_new(NULL, cioc,
nbd_server->tlscreds, NULL,
nbd_client_put);
nbd_blockdev_client_closed);
object_unref(OBJECT(cioc));
return TRUE;
}

View File

@@ -527,7 +527,7 @@ static BlockBackend *blockdev_init(const char *file, QDict *bs_opts,
error_setg(errp, "Cannot specify both 'driver' and 'format'");
goto early_err;
}
qdict_put(bs_opts, "driver", qstring_from_str(buf));
qdict_put_str(bs_opts, "driver", buf);
}
on_write_error = BLOCKDEV_ON_ERROR_ENOSPC;
@@ -903,10 +903,8 @@ DriveInfo *drive_new(QemuOpts *all_opts, BlockInterfaceType block_default_type)
copy_on_read = false;
}
qdict_put(bs_opts, BDRV_OPT_READ_ONLY,
qstring_from_str(read_only ? "on" : "off"));
qdict_put(bs_opts, "copy-on-read",
qstring_from_str(copy_on_read ? "on" :"off"));
qdict_put_str(bs_opts, BDRV_OPT_READ_ONLY, read_only ? "on" : "off");
qdict_put_str(bs_opts, "copy-on-read", copy_on_read ? "on" : "off");
/* Controller type */
value = qemu_opt_get(legacy_opts, "if");
@@ -1030,7 +1028,7 @@ DriveInfo *drive_new(QemuOpts *all_opts, BlockInterfaceType block_default_type)
new_id = g_strdup_printf("%s%s%i", if_name[type],
mediastr, unit_id);
}
qdict_put(bs_opts, "id", qstring_from_str(new_id));
qdict_put_str(bs_opts, "id", new_id);
g_free(new_id);
}
@@ -1067,7 +1065,7 @@ DriveInfo *drive_new(QemuOpts *all_opts, BlockInterfaceType block_default_type)
error_report("werror is not supported by this bus type");
goto fail;
}
qdict_put(bs_opts, "werror", qstring_from_str(werror));
qdict_put_str(bs_opts, "werror", werror);
}
rerror = qemu_opt_get(legacy_opts, "rerror");
@@ -1077,7 +1075,7 @@ DriveInfo *drive_new(QemuOpts *all_opts, BlockInterfaceType block_default_type)
error_report("rerror is not supported by this bus type");
goto fail;
}
qdict_put(bs_opts, "rerror", qstring_from_str(rerror));
qdict_put_str(bs_opts, "rerror", rerror);
}
/* Actual block device init: Functionality shared with blockdev-add */
@@ -1737,10 +1735,9 @@ static void external_snapshot_prepare(BlkActionState *common,
options = qdict_new();
if (s->has_snapshot_node_name) {
qdict_put(options, "node-name",
qstring_from_str(snapshot_node_name));
qdict_put_str(options, "node-name", snapshot_node_name);
}
qdict_put(options, "driver", qstring_from_str(format));
qdict_put_str(options, "driver", format);
flags |= BDRV_O_NO_BACKING;
}
@@ -2579,11 +2576,10 @@ void qmp_blockdev_change_medium(bool has_device, const char *device,
options = qdict_new();
detect_zeroes = blk_get_detect_zeroes_from_root_state(blk);
qdict_put(options, "detect-zeroes",
qstring_from_str(detect_zeroes ? "on" : "off"));
qdict_put_str(options, "detect-zeroes", detect_zeroes ? "on" : "off");
if (has_format) {
qdict_put(options, "driver", qstring_from_str(format));
qdict_put_str(options, "driver", format);
}
medium_bs = bdrv_open(filename, NULL, options, bdrv_flags, errp);
@@ -2892,6 +2888,7 @@ void qmp_block_resize(bool has_device, const char *device,
Error *local_err = NULL;
BlockBackend *blk = NULL;
BlockDriverState *bs;
BlockBackend *cb_blk = NULL;
AioContext *aio_context;
int ret;
@@ -2903,6 +2900,10 @@ void qmp_block_resize(bool has_device, const char *device,
return;
}
if (has_device) {
cb_blk = blk_by_name(device);
}
aio_context = bdrv_get_aio_context(bs);
aio_context_acquire(aio_context);
@@ -2927,29 +2928,12 @@ void qmp_block_resize(bool has_device, const char *device,
goto out;
}
/* complete all in-flight operations before resizing the device */
bdrv_drain_all();
ret = blk_truncate(blk, size);
switch (ret) {
case 0:
break;
case -ENOMEDIUM:
error_setg(errp, QERR_DEVICE_HAS_NO_MEDIUM, device);
break;
case -ENOTSUP:
error_setg(errp, QERR_UNSUPPORTED);
break;
case -EACCES:
error_setg(errp, "Device '%s' is read only", device);
break;
case -EBUSY:
error_setg(errp, QERR_DEVICE_IN_USE, device);
break;
default:
error_setg_errno(errp, -ret, "Could not resize");
break;
bdrv_drained_begin(bs);
ret = blk_truncate(blk, size, errp);
if (!ret && cb_blk) {
blk_legacy_resize_cb(cb_blk);
}
bdrv_drained_end(bs);
out:
blk_unref(blk);
@@ -3174,6 +3158,7 @@ static BlockJob *do_drive_backup(DriveBackup *backup, BlockJobTxn *txn,
Error *local_err = NULL;
int flags;
int64_t size;
bool set_backing_hd = false;
if (!backup->has_speed) {
backup->speed = 0;
@@ -3224,6 +3209,8 @@ static BlockJob *do_drive_backup(DriveBackup *backup, BlockJobTxn *txn,
}
if (backup->sync == MIRROR_SYNC_MODE_NONE) {
source = bs;
flags |= BDRV_O_NO_BACKING;
set_backing_hd = true;
}
size = bdrv_getlength(bs);
@@ -3250,8 +3237,10 @@ static BlockJob *do_drive_backup(DriveBackup *backup, BlockJobTxn *txn,
}
if (backup->format) {
options = qdict_new();
qdict_put(options, "driver", qstring_from_str(backup->format));
if (!options) {
options = qdict_new();
}
qdict_put_str(options, "driver", backup->format);
}
target_bs = bdrv_open(backup->target, NULL, options, flags, errp);
@@ -3261,6 +3250,14 @@ static BlockJob *do_drive_backup(DriveBackup *backup, BlockJobTxn *txn,
bdrv_set_aio_context(target_bs, aio_context);
if (set_backing_hd) {
bdrv_set_backing_hd(target_bs, source, &local_err);
if (local_err) {
bdrv_unref(target_bs);
goto out;
}
}
if (backup->has_bitmap) {
bmap = bdrv_find_dirty_bitmap(bs, backup->bitmap);
if (!bmap) {
@@ -3555,10 +3552,10 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
options = qdict_new();
if (arg->has_node_name) {
qdict_put(options, "node-name", qstring_from_str(arg->node_name));
qdict_put_str(options, "node-name", arg->node_name);
}
if (format) {
qdict_put(options, "driver", qstring_from_str(format));
qdict_put_str(options, "driver", format);
}
/* Mirroring takes care of copy-on-write using the source's backing

View File

@@ -179,6 +179,15 @@ static void mux_chr_accept_input(Chardev *chr)
be->chr_read(be->opaque,
&d->buffer[m][d->cons[m]++ & MUX_BUFFER_MASK], 1);
}
#if defined(TARGET_S390X)
/* We're still not able to sync producer and consumer, so let's wait a bit
and try again by then. */
if (d->prod[m] != d->cons[m]) {
qemu_mod_timer(d->accept_timer, qemu_get_clock_ns(vm_clock)
+ (int64_t)100000);
}
#endif
}
static int mux_chr_can_read(void *opaque)
@@ -308,6 +317,10 @@ static void qemu_chr_open_mux(Chardev *chr,
}
d->focus = -1;
#if defined(TARGET_S390X)
d->accept_timer = qemu_new_timer_ns(vm_clock,
(QEMUTimerCB*)mux_chr_accept_input, chr);
#endif
/* only default to opened state if we've realized the initial
* set of muxes
*/

View File

@@ -35,6 +35,9 @@ typedef struct MuxChardev {
Chardev parent;
CharBackend *backends[MAX_MUX];
CharBackend chr;
#if defined(TARGET_S390X)
QEMUTimer *accept_timer;
#endif
int focus;
int mux_cnt;
int term_got_escape;

15
configure vendored
View File

@@ -1600,7 +1600,7 @@ fi
if test "$pie" = ""; then
case "$cpu-$targetos" in
i386-Linux|x86_64-Linux|x32-Linux|i386-OpenBSD|x86_64-OpenBSD)
i386-Linux|x86_64-Linux|x32-Linux|ppc*-Linux|i386-OpenBSD|x86_64-OpenBSD)
;;
*)
pie="no"
@@ -1640,7 +1640,7 @@ EOF
if compile_prog "-Werror -fno-pie" "-nopie"; then
CFLAGS_NOPIE="-fno-pie"
LDFLAGS_NOPIE="-nopie"
LDFLAGS_NOPIE="-no-pie"
fi
fi
@@ -1947,13 +1947,10 @@ fi
##########################################
# libseccomp check
libseccomp_minver="2.2.0"
if test "$seccomp" != "no" ; then
case "$cpu" in
i386|x86_64)
libseccomp_minver="2.1.0"
;;
mips)
libseccomp_minver="2.2.0"
i386|x86_64|mips)
;;
arm|aarch64)
libseccomp_minver="2.2.3"
@@ -1961,6 +1958,9 @@ if test "$seccomp" != "no" ; then
ppc|ppc64)
libseccomp_minver="2.3.0"
;;
s390|s390x)
libseccomp_minver="2.2.0"
;;
*)
libseccomp_minver=""
;;
@@ -5843,7 +5843,6 @@ else
echo "AUTOCONF_HOST := " >> $config_host_mak
fi
echo "LDFLAGS=$LDFLAGS" >> $config_host_mak
echo "LDFLAGS_NOPIE=$LDFLAGS_NOPIE" >> $config_host_mak
echo "LD_REL_FLAGS=$LD_REL_FLAGS" >> $config_host_mak
echo "LD_I386_EMULATION=$ld_i386_emulation" >> $config_host_mak
echo "LIBS+=$LIBS" >> $config_host_mak

12
cpus.c
View File

@@ -1999,6 +1999,18 @@ exit:
fclose(f);
}
bool spec_ctrl_is_inconsistent(void)
{
#if defined(TARGET_I386)
X86CPU *x86_cpu = X86_CPU(current_cpu);
CPUX86State *env = x86_cpu != NULL ? &x86_cpu->env : NULL;
if (env && !(env->features[FEAT_7_0_EDX] & CPUID_7_0_EDX_SPEC_CTRL) &&
env->spec_ctrl)
return true;
#endif
return false;
}
void qmp_inject_nmi(Error **errp)
{
nmi_monitor_handle(monitor_get_cpu_index(), errp);

View File

@@ -91,7 +91,7 @@ void *load_device_tree(const char *filename_path, int *sizep)
/* First allocate space in qemu for device tree */
fdt = g_malloc0(dt_size);
dt_file_load_size = load_image(filename_path, fdt);
dt_file_load_size = load_image_size(filename_path, fdt, dt_size);
if (dt_file_load_size < 0) {
error_report("Unable to open device tree file '%s'",
filename_path);

View File

@@ -79,14 +79,14 @@ Choosing the right cache sizes
In order to choose the cache sizes we need to know how they relate to
the amount of allocated space.
The amount of virtual disk that can be mapped by the L2 and refcount
The part of the virtual disk that can be mapped by the L2 and refcount
caches (in bytes) is:
disk_size = l2_cache_size * cluster_size / 8
disk_size = refcount_cache_size * cluster_size * 8 / refcount_bits
With the default values for cluster_size (64KB) and refcount_bits
(16), that is
(16), this becomes:
disk_size = l2_cache_size * 8192
disk_size = refcount_cache_size * 32768
@@ -97,12 +97,16 @@ need:
l2_cache_size = disk_size_GB * 131072
refcount_cache_size = disk_size_GB * 32768
QEMU has a default L2 cache of 1MB (1048576 bytes) and a refcount
cache of 256KB (262144 bytes), so using the formulas we've just seen
we have
For example, 1MB of L2 cache is needed to cover every 8 GB of the virtual
image size (given that the default cluster size is used):
1048576 / 131072 = 8 GB of virtual disk covered by that cache
262144 / 32768 = 8 GB
8 GB / 8192 = 1 MB
The refcount cache is 4 times the cluster size by default. With the default
cluster size of 64 KB, it is 256 KB (262144 bytes). This is sufficient for
8 GB of image size:
262144 * 32768 = 8 GB
How to configure the cache sizes
@@ -116,31 +120,39 @@ There are three options available, and all of them take bytes:
"refcount-cache-size": maximum size of the refcount block cache
"cache-size": maximum size of both caches combined
There are two things that need to be taken into account:
There are a few things that need to be taken into account:
- Both caches must have a size that is a multiple of the cluster
size.
- If you only set one of the options above, QEMU will automatically
adjust the others so that the L2 cache is 4 times bigger than the
refcount cache.
- The maximum L2 cache size is 32 MB by default on Linux platforms (enough
for full coverage of 256 GB images, with the default cluster size). This
value can be modified using the "l2-cache-size" option. QEMU will not use
more memory than needed to hold all of the image's L2 tables, regardless
of this max. value.
On non-Linux platforms the maximal value is smaller by default (8 MB) and
this difference stems from the fact that on Linux the cache can be cleared
periodically if needed, using the "cache-clean-interval" option (see below).
The minimal L2 cache size is 2 clusters (or 2 cache entries, see below).
This means that these options are equivalent:
- The default (and minimum) refcount cache size is 4 clusters.
-drive file=hd.qcow2,l2-cache-size=2097152
-drive file=hd.qcow2,refcount-cache-size=524288
-drive file=hd.qcow2,cache-size=2621440
- If only "cache-size" is specified then QEMU will assign as much
memory as possible to the L2 cache before increasing the refcount
cache size.
The reason for this 1/4 ratio is to ensure that both caches cover the
same amount of disk space. Note however that this is only valid with
the default value of refcount_bits (16). If you are using a different
value you might want to calculate both cache sizes yourself since QEMU
will always use the same 1/4 ratio.
- At most two of "l2-cache-size", "refcount-cache-size", and "cache-size"
can be set simultaneously.
It's also worth mentioning that there's no strict need for both caches
to cover the same amount of disk space. The refcount cache is used
much less often than the L2 cache, so it's perfectly reasonable to
keep it small.
Unlike L2 tables, refcount blocks are not used during normal I/O but
only during allocations and internal snapshots. In most cases they are
accessed sequentially (even during random guest I/O) so increasing the
refcount cache size won't have any measurable effect in performance
(this can change if you are using internal snapshots, so you may want
to think about increasing the cache size if you use them heavily).
Now the refcount cache is as small as possible unless overridden by the
user.
Reducing the memory usage
@@ -156,8 +168,8 @@ This example removes all unused cache entries every 15 minutes:
-drive file=hd.qcow2,cache-clean-interval=900
If unset, the default value for this parameter is 0 and it disables
this feature.
If unset, the default value for this parameter is 600. Setting it to 0
disables this feature.
Note that this functionality currently relies on the MADV_DONTNEED
argument for madvise() to actually free the memory. This is a

94
exec.c
View File

@@ -223,6 +223,12 @@ struct CPUAddressSpace {
MemoryListener tcg_as_listener;
};
struct DirtyBitmapSnapshot {
ram_addr_t start;
ram_addr_t end;
unsigned long dirty[];
};
#endif
#if !defined(CONFIG_USER_ONLY)
@@ -1061,6 +1067,75 @@ bool cpu_physical_memory_test_and_clear_dirty(ram_addr_t start,
return dirty;
}
DirtyBitmapSnapshot *cpu_physical_memory_snapshot_and_clear_dirty
(ram_addr_t start, ram_addr_t length, unsigned client)
{
DirtyMemoryBlocks *blocks;
unsigned long align = 1UL << (TARGET_PAGE_BITS + BITS_PER_LEVEL);
ram_addr_t first = QEMU_ALIGN_DOWN(start, align);
ram_addr_t last = QEMU_ALIGN_UP(start + length, align);
DirtyBitmapSnapshot *snap;
unsigned long page, end, dest;
snap = g_malloc0(sizeof(*snap) +
((last - first) >> (TARGET_PAGE_BITS + 3)));
snap->start = first;
snap->end = last;
page = first >> TARGET_PAGE_BITS;
end = last >> TARGET_PAGE_BITS;
dest = 0;
rcu_read_lock();
blocks = atomic_rcu_read(&ram_list.dirty_memory[client]);
while (page < end) {
unsigned long idx = page / DIRTY_MEMORY_BLOCK_SIZE;
unsigned long offset = page % DIRTY_MEMORY_BLOCK_SIZE;
unsigned long num = MIN(end - page, DIRTY_MEMORY_BLOCK_SIZE - offset);
assert(QEMU_IS_ALIGNED(offset, (1 << BITS_PER_LEVEL)));
assert(QEMU_IS_ALIGNED(num, (1 << BITS_PER_LEVEL)));
offset >>= BITS_PER_LEVEL;
bitmap_copy_and_clear_atomic(snap->dirty + dest,
blocks->blocks[idx] + offset,
num);
page += num;
dest += num >> BITS_PER_LEVEL;
}
rcu_read_unlock();
if (tcg_enabled()) {
tlb_reset_dirty_range_all(start, length);
}
return snap;
}
bool cpu_physical_memory_snapshot_get_dirty(DirtyBitmapSnapshot *snap,
ram_addr_t start,
ram_addr_t length)
{
unsigned long page, end;
assert(start >= snap->start);
assert(start + length <= snap->end);
end = TARGET_PAGE_ALIGN(start + length - snap->start) >> TARGET_PAGE_BITS;
page = (start - snap->start) >> TARGET_PAGE_BITS;
while (page < end) {
if (test_bit(page, snap->dirty)) {
return true;
}
page++;
}
return false;
}
/* Called from RCU critical section */
hwaddr memory_region_section_get_iotlb(CPUState *cpu,
MemoryRegionSection *section,
@@ -1362,11 +1437,13 @@ static void *file_ram_alloc(RAMBlock *block,
int fd = -1;
int64_t file_size;
#ifndef TARGET_PPC
if (kvm_enabled() && !kvm_has_sync_mmu()) {
error_setg(errp,
"host lacks kvm mmu notifiers, -mem-path unsupported");
return NULL;
}
#endif
for (;;) {
fd = open(path, O_RDWR);
@@ -2010,10 +2087,10 @@ void *qemu_map_ram_ptr(RAMBlock *ram_block, ram_addr_t addr)
* In that case just map until the end of the page.
*/
if (block->offset == 0) {
return xen_map_cache(addr, 0, 0);
return xen_map_cache(addr, 0, 0, false);
}
block->host = xen_map_cache(block->offset, block->max_length, 1);
block->host = xen_map_cache(block->offset, block->max_length, 1, false);
}
return ramblock_ptr(block, addr);
}
@@ -2024,7 +2101,7 @@ void *qemu_map_ram_ptr(RAMBlock *ram_block, ram_addr_t addr)
* Called within RCU critical section.
*/
static void *qemu_ram_ptr_length(RAMBlock *ram_block, ram_addr_t addr,
hwaddr *size)
hwaddr *size, bool lock)
{
RAMBlock *block = ram_block;
if (*size == 0) {
@@ -2043,10 +2120,10 @@ static void *qemu_ram_ptr_length(RAMBlock *ram_block, ram_addr_t addr,
* In that case just map the requested area.
*/
if (block->offset == 0) {
return xen_map_cache(addr, *size, 1);
return xen_map_cache(addr, *size, lock, lock);
}
block->host = xen_map_cache(block->offset, block->max_length, 1);
block->host = xen_map_cache(block->offset, block->max_length, 1, lock);
}
return ramblock_ptr(block, addr);
@@ -2765,7 +2842,7 @@ static MemTxResult address_space_write_continue(AddressSpace *as, hwaddr addr,
}
} else {
/* RAM case */
ptr = qemu_map_ram_ptr(mr->ram_block, addr1);
ptr = qemu_ram_ptr_length(mr->ram_block, addr1, &l, false);
memcpy(ptr, buf, l);
invalidate_and_set_dirty(mr, addr1, l);
}
@@ -2856,7 +2933,7 @@ MemTxResult address_space_read_continue(AddressSpace *as, hwaddr addr,
}
} else {
/* RAM case */
ptr = qemu_map_ram_ptr(mr->ram_block, addr1);
ptr = qemu_ram_ptr_length(mr->ram_block, addr1, &l, false);
memcpy(buf, ptr, l);
}
@@ -3144,6 +3221,7 @@ void *address_space_map(AddressSpace *as,
if (!memory_access_is_direct(mr, is_write)) {
if (atomic_xchg(&bounce.in_use, true)) {
rcu_read_unlock();
*plen = 0;
return NULL;
}
/* Avoid unbounded allocations */
@@ -3167,7 +3245,7 @@ void *address_space_map(AddressSpace *as,
memory_region_ref(mr);
*plen = address_space_extend_translation(as, addr, len, mr, xlat, l, is_write);
ptr = qemu_ram_ptr_length(mr->ram_block, xlat, plen);
ptr = qemu_ram_ptr_length(mr->ram_block, xlat, plen, true);
rcu_read_unlock();
return ptr;

View File

@@ -278,17 +278,27 @@ update_map_file:
static int fchmodat_nofollow(int dirfd, const char *name, mode_t mode)
{
struct stat stbuf;
int fd, ret;
/* FIXME: this should be handled with fchmodat(AT_SYMLINK_NOFOLLOW).
* Unfortunately, the linux kernel doesn't implement it yet. As an
* alternative, let's open the file and use fchmod() instead. This
* may fail depending on the permissions of the file, but it is the
* best we can do to avoid TOCTTOU. We first try to open read-only
* in case name points to a directory. If that fails, we try write-only
* in case name doesn't point to a directory.
* Unfortunately, the linux kernel doesn't implement it yet.
*/
fd = openat_file(dirfd, name, O_RDONLY, 0);
/* First, we clear non-racing symlinks out of the way. */
if (fstatat(dirfd, name, &stbuf, AT_SYMLINK_NOFOLLOW)) {
return -1;
}
if (S_ISLNK(stbuf.st_mode)) {
errno = ELOOP;
return -1;
}
/* Access modes are ignored when O_PATH is supported. We try O_RDONLY and
* O_WRONLY for old-systems that don't support O_PATH.
*/
fd = openat_file(dirfd, name, O_RDONLY | O_PATH_9P_UTIL, 0);
#if O_PATH_9P_UTIL == 0
if (fd == -1) {
/* In case the file is writable-only and isn't a directory. */
if (errno == EACCES) {
@@ -302,6 +312,24 @@ static int fchmodat_nofollow(int dirfd, const char *name, mode_t mode)
return -1;
}
ret = fchmod(fd, mode);
#else
if (fd == -1) {
return -1;
}
/* Now we handle racing symlinks. */
ret = fstat(fd, &stbuf);
if (!ret) {
if (S_ISLNK(stbuf.st_mode)) {
errno = ELOOP;
ret = -1;
} else {
char *proc_path = g_strdup_printf("/proc/self/fd/%d", fd);
ret = chmod(proc_path, mode);
g_free(proc_path);
}
}
#endif
close_preserve_errno(fd);
return ret;
}
@@ -452,6 +480,11 @@ static off_t local_telldir(FsContext *ctx, V9fsFidOpenState *fs)
return telldir(fs->dir.stream);
}
static bool local_is_mapped_file_metadata(FsContext *fs_ctx, const char *name)
{
return !strcmp(name, VIRTFS_META_DIR);
}
static struct dirent *local_readdir(FsContext *ctx, V9fsFidOpenState *fs)
{
struct dirent *entry;
@@ -465,8 +498,8 @@ again:
if (ctx->export_flags & V9FS_SM_MAPPED) {
entry->d_type = DT_UNKNOWN;
} else if (ctx->export_flags & V9FS_SM_MAPPED_FILE) {
if (!strcmp(entry->d_name, VIRTFS_META_DIR)) {
/* skp the meta data directory */
if (local_is_mapped_file_metadata(ctx, entry->d_name)) {
/* skip the meta data directory */
goto again;
}
entry->d_type = DT_UNKNOWN;
@@ -559,6 +592,12 @@ static int local_mknod(FsContext *fs_ctx, V9fsPath *dir_path,
int err = -1;
int dirfd;
if (fs_ctx->export_flags & V9FS_SM_MAPPED_FILE &&
local_is_mapped_file_metadata(fs_ctx, name)) {
errno = EINVAL;
return -1;
}
dirfd = local_opendir_nofollow(fs_ctx, dir_path->data);
if (dirfd == -1) {
return -1;
@@ -605,6 +644,12 @@ static int local_mkdir(FsContext *fs_ctx, V9fsPath *dir_path,
int err = -1;
int dirfd;
if (fs_ctx->export_flags & V9FS_SM_MAPPED_FILE &&
local_is_mapped_file_metadata(fs_ctx, name)) {
errno = EINVAL;
return -1;
}
dirfd = local_opendir_nofollow(fs_ctx, dir_path->data);
if (dirfd == -1) {
return -1;
@@ -694,6 +739,12 @@ static int local_open2(FsContext *fs_ctx, V9fsPath *dir_path, const char *name,
int err = -1;
int dirfd;
if (fs_ctx->export_flags & V9FS_SM_MAPPED_FILE &&
local_is_mapped_file_metadata(fs_ctx, name)) {
errno = EINVAL;
return -1;
}
/*
* Mark all the open to not follow symlinks
*/
@@ -752,6 +803,12 @@ static int local_symlink(FsContext *fs_ctx, const char *oldpath,
int err = -1;
int dirfd;
if (fs_ctx->export_flags & V9FS_SM_MAPPED_FILE &&
local_is_mapped_file_metadata(fs_ctx, name)) {
errno = EINVAL;
return -1;
}
dirfd = local_opendir_nofollow(fs_ctx, dir_path->data);
if (dirfd == -1) {
return -1;
@@ -826,6 +883,12 @@ static int local_link(FsContext *ctx, V9fsPath *oldpath,
int ret = -1;
int odirfd, ndirfd;
if (ctx->export_flags & V9FS_SM_MAPPED_FILE &&
local_is_mapped_file_metadata(ctx, name)) {
errno = EINVAL;
return -1;
}
odirfd = local_opendir_nofollow(ctx, odirpath);
if (odirfd == -1) {
goto out;
@@ -957,6 +1020,14 @@ static int local_unlinkat_common(FsContext *ctx, int dirfd, const char *name,
if (ctx->export_flags & V9FS_SM_MAPPED_FILE) {
int map_dirfd;
/* We need to remove the metadata as well:
* - the metadata directory if we're removing a directory
* - the metadata file in the parent's metadata directory
*
* If any of these are missing (ie, ENOENT) then we're probably
* trying to remove something that wasn't created in mapped-file
* mode. We just ignore the error.
*/
if (flags == AT_REMOVEDIR) {
int fd;
@@ -964,32 +1035,20 @@ static int local_unlinkat_common(FsContext *ctx, int dirfd, const char *name,
if (fd == -1) {
goto err_out;
}
/*
* If directory remove .virtfs_metadata contained in the
* directory
*/
ret = unlinkat(fd, VIRTFS_META_DIR, AT_REMOVEDIR);
close_preserve_errno(fd);
if (ret < 0 && errno != ENOENT) {
/*
* We didn't had the .virtfs_metadata file. May be file created
* in non-mapped mode ?. Ignore ENOENT.
*/
goto err_out;
}
}
/*
* Now remove the name from parent directory
* .virtfs_metadata directory.
*/
map_dirfd = openat_dir(dirfd, VIRTFS_META_DIR);
ret = unlinkat(map_dirfd, name, 0);
close_preserve_errno(map_dirfd);
if (ret < 0 && errno != ENOENT) {
/*
* We didn't had the .virtfs_metadata file. May be file created
* in non-mapped mode ?. Ignore ENOENT.
*/
if (map_dirfd != -1) {
ret = unlinkat(map_dirfd, name, 0);
close_preserve_errno(map_dirfd);
if (ret < 0 && errno != ENOENT) {
goto err_out;
}
} else if (errno != ENOENT) {
goto err_out;
}
}
@@ -1013,7 +1072,7 @@ static int local_remove(FsContext *ctx, const char *path)
goto out;
}
if (fstatat(dirfd, path, &stbuf, AT_SYMLINK_NOFOLLOW) < 0) {
if (fstatat(dirfd, name, &stbuf, AT_SYMLINK_NOFOLLOW) < 0) {
goto err_out;
}
@@ -1096,6 +1155,12 @@ static int local_lremovexattr(FsContext *ctx, V9fsPath *fs_path,
static int local_name_to_path(FsContext *ctx, V9fsPath *dir_path,
const char *name, V9fsPath *target)
{
if (ctx->export_flags & V9FS_SM_MAPPED_FILE &&
local_is_mapped_file_metadata(ctx, name)) {
errno = EINVAL;
return -1;
}
if (dir_path) {
v9fs_path_sprintf(target, "%s/%s", dir_path->data, name);
} else if (strcmp(name, "/")) {
@@ -1116,6 +1181,13 @@ static int local_renameat(FsContext *ctx, V9fsPath *olddir,
int ret;
int odirfd, ndirfd;
if (ctx->export_flags & V9FS_SM_MAPPED_FILE &&
(local_is_mapped_file_metadata(ctx, old_name) ||
local_is_mapped_file_metadata(ctx, new_name))) {
errno = EINVAL;
return -1;
}
odirfd = local_opendir_nofollow(ctx, olddir->data);
if (odirfd == -1) {
return -1;
@@ -1206,6 +1278,12 @@ static int local_unlinkat(FsContext *ctx, V9fsPath *dir,
int ret;
int dirfd;
if (ctx->export_flags & V9FS_SM_MAPPED_FILE &&
local_is_mapped_file_metadata(ctx, name)) {
errno = EINVAL;
return -1;
}
dirfd = local_opendir_nofollow(ctx, dir->data);
if (dirfd == -1) {
return -1;

View File

@@ -13,6 +13,12 @@
#ifndef QEMU_9P_UTIL_H
#define QEMU_9P_UTIL_H
#ifdef O_PATH
#define O_PATH_9P_UTIL O_PATH
#else
#define O_PATH_9P_UTIL 0
#endif
static inline void close_preserve_errno(int fd)
{
int serrno = errno;
@@ -22,13 +28,8 @@ static inline void close_preserve_errno(int fd)
static inline int openat_dir(int dirfd, const char *name)
{
#ifdef O_PATH
#define OPENAT_DIR_O_PATH O_PATH
#else
#define OPENAT_DIR_O_PATH 0
#endif
return openat(dirfd, name,
O_DIRECTORY | O_RDONLY | O_NOFOLLOW | OPENAT_DIR_O_PATH);
O_DIRECTORY | O_RDONLY | O_NOFOLLOW | O_PATH_9P_UTIL);
}
static inline int openat_file(int dirfd, const char *name, int flags,
@@ -43,9 +44,14 @@ static inline int openat_file(int dirfd, const char *name, int flags,
}
serrno = errno;
/* O_NONBLOCK was only needed to open the file. Let's drop it. */
ret = fcntl(fd, F_SETFL, flags);
assert(!ret);
/* O_NONBLOCK was only needed to open the file. Let's drop it. We don't
* do that with O_PATH since fcntl(F_SETFL) isn't supported, and openat()
* ignored it anyway.
*/
if (!(flags & O_PATH_9P_UTIL)) {
ret = fcntl(fd, F_SETFL, flags);
assert(!ret);
}
errno = serrno;
return fd;
}

View File

@@ -503,9 +503,9 @@ static int coroutine_fn v9fs_mark_fids_unreclaim(V9fsPDU *pdu, V9fsPath *path)
{
int err;
V9fsState *s = pdu->s;
V9fsFidState *fidp, head_fid;
V9fsFidState *fidp;
head_fid.next = s->fid_list;
again:
for (fidp = s->fid_list; fidp; fidp = fidp->next) {
if (fidp->path.size != path->size) {
continue;
@@ -525,7 +525,7 @@ static int coroutine_fn v9fs_mark_fids_unreclaim(V9fsPDU *pdu, V9fsPath *path)
* switched to the worker thread
*/
if (err == 0) {
fidp = &head_fid;
goto again;
}
}
}
@@ -1372,7 +1372,9 @@ static void coroutine_fn v9fs_walk(void *opaque)
err = -EINVAL;
goto out;
}
v9fs_path_write_lock(s);
v9fs_path_copy(&fidp->path, &path);
v9fs_path_unlock(s);
} else {
newfidp = alloc_fid(s, newfid);
if (newfidp == NULL) {
@@ -2132,6 +2134,7 @@ static void coroutine_fn v9fs_create(void *opaque)
V9fsString extension;
int iounit;
V9fsPDU *pdu = opaque;
V9fsState *s = pdu->s;
v9fs_path_init(&path);
v9fs_string_init(&name);
@@ -2172,7 +2175,9 @@ static void coroutine_fn v9fs_create(void *opaque)
if (err < 0) {
goto out;
}
v9fs_path_write_lock(s);
v9fs_path_copy(&fidp->path, &path);
v9fs_path_unlock(s);
err = v9fs_co_opendir(pdu, fidp);
if (err < 0) {
goto out;
@@ -2188,7 +2193,9 @@ static void coroutine_fn v9fs_create(void *opaque)
if (err < 0) {
goto out;
}
v9fs_path_write_lock(s);
v9fs_path_copy(&fidp->path, &path);
v9fs_path_unlock(s);
} else if (perm & P9_STAT_MODE_LINK) {
int32_t ofid = atoi(extension.data);
V9fsFidState *ofidp = get_fid(pdu, ofid);
@@ -2206,7 +2213,9 @@ static void coroutine_fn v9fs_create(void *opaque)
fidp->fid_type = P9_FID_NONE;
goto out;
}
v9fs_path_write_lock(s);
v9fs_path_copy(&fidp->path, &path);
v9fs_path_unlock(s);
err = v9fs_co_lstat(pdu, &fidp->path, &stbuf);
if (err < 0) {
fidp->fid_type = P9_FID_NONE;
@@ -2244,7 +2253,9 @@ static void coroutine_fn v9fs_create(void *opaque)
if (err < 0) {
goto out;
}
v9fs_path_write_lock(s);
v9fs_path_copy(&fidp->path, &path);
v9fs_path_unlock(s);
} else if (perm & P9_STAT_MODE_NAMED_PIPE) {
err = v9fs_co_mknod(pdu, fidp, &name, fidp->uid, -1,
0, S_IFIFO | (perm & 0777), &stbuf);
@@ -2255,7 +2266,9 @@ static void coroutine_fn v9fs_create(void *opaque)
if (err < 0) {
goto out;
}
v9fs_path_write_lock(s);
v9fs_path_copy(&fidp->path, &path);
v9fs_path_unlock(s);
} else if (perm & P9_STAT_MODE_SOCKET) {
err = v9fs_co_mknod(pdu, fidp, &name, fidp->uid, -1,
0, S_IFSOCK | (perm & 0777), &stbuf);
@@ -2266,7 +2279,9 @@ static void coroutine_fn v9fs_create(void *opaque)
if (err < 0) {
goto out;
}
v9fs_path_write_lock(s);
v9fs_path_copy(&fidp->path, &path);
v9fs_path_unlock(s);
} else {
err = v9fs_co_open2(pdu, fidp, &name, -1,
omode_to_uflags(mode)|O_CREAT, perm, &stbuf);
@@ -2794,6 +2809,7 @@ static void coroutine_fn v9fs_wstat(void *opaque)
struct stat stbuf;
V9fsFidState *fidp;
V9fsPDU *pdu = opaque;
V9fsState *s = pdu->s;
v9fs_stat_init(&v9stat);
err = pdu_unmarshal(pdu, offset, "dwS", &fid, &unused, &v9stat);
@@ -2859,7 +2875,9 @@ static void coroutine_fn v9fs_wstat(void *opaque)
}
}
if (v9stat.name.size != 0) {
v9fs_path_write_lock(s);
err = v9fs_complete_rename(pdu, fidp, -1, &v9stat.name);
v9fs_path_unlock(s);
if (err < 0) {
goto out;
}
@@ -3220,7 +3238,7 @@ static void coroutine_fn v9fs_xattrwalk(void *opaque)
xattr_fidp->fid_type = P9_FID_XATTR;
xattr_fidp->fs.xattr.xattrwalk_fid = true;
if (size) {
xattr_fidp->fs.xattr.value = g_malloc(size);
xattr_fidp->fs.xattr.value = g_malloc0(size);
err = v9fs_co_llistxattr(pdu, &xattr_fidp->path,
xattr_fidp->fs.xattr.value,
xattr_fidp->fs.xattr.len);
@@ -3253,7 +3271,7 @@ static void coroutine_fn v9fs_xattrwalk(void *opaque)
xattr_fidp->fid_type = P9_FID_XATTR;
xattr_fidp->fs.xattr.xattrwalk_fid = true;
if (size) {
xattr_fidp->fs.xattr.value = g_malloc(size);
xattr_fidp->fs.xattr.value = g_malloc0(size);
err = v9fs_co_lgetxattr(pdu, &xattr_fidp->path,
&name, xattr_fidp->fs.xattr.value,
xattr_fidp->fs.xattr.len);

View File

@@ -140,10 +140,10 @@ int coroutine_fn v9fs_co_open2(V9fsPDU *pdu, V9fsFidState *fidp,
cred.fc_gid = gid;
/*
* Hold the directory fid lock so that directory path name
* don't change. Read lock is fine because this fid cannot
* be used by any other operation.
* don't change. Take the write lock to be sure this fid
* cannot be used by another operation.
*/
v9fs_path_read_lock(s);
v9fs_path_write_lock(s);
v9fs_co_run_in_worker(
{
err = s->ops->open2(&s->ctx, &fidp->path,

View File

@@ -462,7 +462,8 @@ static void acpi_pm_evt_write(void *opaque, hwaddr addr, uint64_t val,
static const MemoryRegionOps acpi_pm_evt_ops = {
.read = acpi_pm_evt_read,
.write = acpi_pm_evt_write,
.valid.min_access_size = 2,
.impl.min_access_size = 2,
.valid.min_access_size = 1,
.valid.max_access_size = 2,
.endianness = DEVICE_LITTLE_ENDIAN,
};
@@ -530,7 +531,8 @@ static void acpi_pm_tmr_write(void *opaque, hwaddr addr, uint64_t val,
static const MemoryRegionOps acpi_pm_tmr_ops = {
.read = acpi_pm_tmr_read,
.write = acpi_pm_tmr_write,
.valid.min_access_size = 4,
.impl.min_access_size = 4,
.valid.min_access_size = 1,
.valid.max_access_size = 4,
.endianness = DEVICE_LITTLE_ENDIAN,
};
@@ -602,7 +604,8 @@ static void acpi_pm_cnt_write(void *opaque, hwaddr addr, uint64_t val,
static const MemoryRegionOps acpi_pm_cnt_ops = {
.read = acpi_pm_cnt_read,
.write = acpi_pm_cnt_write,
.valid.min_access_size = 2,
.impl.min_access_size = 2,
.valid.min_access_size = 1,
.valid.max_access_size = 2,
.endianness = DEVICE_LITTLE_ENDIAN,
};

View File

@@ -49,6 +49,7 @@
#define ACPI_PCIHP_ADDR 0xae00
#define ACPI_PCIHP_SIZE 0x0014
#define ACPI_PCIHP_LEGACY_SIZE 0x000f
#define PCI_UP_BASE 0x0000
#define PCI_DOWN_BASE 0x0004
#define PCI_EJ_BASE 0x0008
@@ -301,6 +302,16 @@ void acpi_pcihp_init(Object *owner, AcpiPciHpState *s, PCIBus *root_bus,
s->root= root_bus;
s->legacy_piix = !bridges_enabled;
if (s->legacy_piix) {
unsigned *bus_bsel = g_malloc(sizeof *bus_bsel);
s->io_len = ACPI_PCIHP_LEGACY_SIZE;
*bus_bsel = ACPI_PCIHP_BSEL_DEFAULT;
object_property_add_uint32_ptr(OBJECT(root_bus), ACPI_PCIHP_PROP_BSEL,
bus_bsel, NULL);
}
memory_region_init_io(&s->io, owner, &acpi_pcihp_io_ops, s,
"acpi-pci-hotplug", s->io_len);
memory_region_add_subregion(address_space_io, s->io_base, &s->io);

View File

@@ -311,7 +311,7 @@ static const VMStateDescription vmstate_cpuhp_state = {
static const VMStateDescription vmstate_acpi = {
.name = "piix4_pm",
.version_id = 3,
.minimum_version_id = 3,
.minimum_version_id = 2, /* qemu-kvm */
.minimum_version_id_old = 1,
.load_state_old = acpi_load_old,
.post_load = vmstate_acpi_post_load,
@@ -385,7 +385,10 @@ static void piix4_device_plug_cb(HotplugHandler *hotplug_dev,
dev, errp);
}
} else if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
acpi_pcihp_device_plug_cb(hotplug_dev, &s->acpi_pci_hotplug, dev, errp);
if (!xen_enabled()) {
acpi_pcihp_device_plug_cb(hotplug_dev, &s->acpi_pci_hotplug, dev,
errp);
}
} else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
if (s->cpu_hotplug_legacy) {
legacy_acpi_cpu_plug_cb(hotplug_dev, &s->gpe_cpu, dev, errp);
@@ -408,8 +411,10 @@ static void piix4_device_unplug_request_cb(HotplugHandler *hotplug_dev,
acpi_memory_unplug_request_cb(hotplug_dev, &s->acpi_memory_hotplug,
dev, errp);
} else if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
acpi_pcihp_device_unplug_cb(hotplug_dev, &s->acpi_pci_hotplug, dev,
errp);
if (!xen_enabled()) {
acpi_pcihp_device_unplug_cb(hotplug_dev, &s->acpi_pci_hotplug, dev,
errp);
}
} else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU) &&
!s->cpu_hotplug_legacy) {
acpi_cpu_unplug_request_cb(hotplug_dev, &s->cpuhp_state, dev, errp);
@@ -700,7 +705,7 @@ static void piix4_pm_class_init(ObjectClass *klass, void *data)
* Reason: part of PIIX4 southbridge, needs to be wired up,
* e.g. by mips_malta_init()
*/
dc->cannot_instantiate_with_device_add_yet = true;
dc->user_creatable = false;
dc->hotpluggable = false;
hc->plug = piix4_device_plug_cb;
hc->unplug_request = piix4_device_unplug_request_cb;

View File

@@ -1076,7 +1076,7 @@ static void sl_nand_class_init(ObjectClass *klass, void *data)
dc->vmsd = &vmstate_sl_nand_info;
dc->props = sl_nand_properties;
/* Reason: init() method uses drive_get() */
dc->cannot_instantiate_with_device_add_yet = true;
dc->user_creatable = false;
}
static const TypeInfo sl_nand_info = {

View File

@@ -609,6 +609,16 @@ static void create_gic(VirtMachineState *vms, qemu_irq *pic)
} else if (type == 2) {
create_v2m(vms, pic);
}
#ifdef CONFIG_KVM
if (kvm_enabled() && !kvm_irqchip_in_kernel()) {
if (!kvm_check_extension(kvm_state, KVM_CAP_ARM_USER_IRQ)) {
error_report("KVM with user space irqchip only works when the "
"host kernel supports KVM_CAP_ARM_USER_IRQ");
exit(1);
}
}
#endif
}
static void create_uart(const VirtMachineState *vms, qemu_irq *pic, int uart,

View File

@@ -788,6 +788,9 @@ static void es1370_transfer_audio (ES1370State *s, struct chan *d, int loop_sel,
int csc_bytes = (csc + 1) << d->shift;
int cnt = d->frame_cnt >> 16;
int size = d->frame_cnt & 0xffff;
if (size < cnt) {
return;
}
int left = ((size - cnt + 1) << 2) + d->leftover;
int transferred = 0;
int temp = audio_MIN (max, audio_MIN (left, csc_bytes));
@@ -796,7 +799,7 @@ static void es1370_transfer_audio (ES1370State *s, struct chan *d, int loop_sel,
addr += (cnt << 2) + d->leftover;
if (index == ADC_CHANNEL) {
while (temp) {
while (temp > 0) {
int acquired, to_copy;
to_copy = audio_MIN ((size_t) temp, sizeof (tmpbuf));
@@ -814,7 +817,7 @@ static void es1370_transfer_audio (ES1370State *s, struct chan *d, int loop_sel,
else {
SWVoiceOut *voice = s->dac_voice[index];
while (temp) {
while (temp > 0) {
int copied, to_copy;
to_copy = audio_MIN ((size_t) temp, sizeof (tmpbuf));

View File

@@ -292,7 +292,7 @@ static void mv88w8618_audio_class_init(ObjectClass *klass, void *data)
dc->vmsd = &mv88w8618_audio_vmsd;
dc->props = mv88w8618_audio_properties;
/* Reason: pointer property "wm8750" */
dc->cannot_instantiate_with_device_add_yet = true;
dc->user_creatable = false;
}
static const TypeInfo mv88w8618_audio_info = {

View File

@@ -223,7 +223,7 @@ static void pcspk_class_initfn(ObjectClass *klass, void *data)
dc->vmsd = &vmstate_spk;
dc->props = pcspk_properties;
/* Reason: realize sets global pcspk_state */
dc->cannot_instantiate_with_device_add_yet = true;
dc->user_creatable = false;
}
static const TypeInfo pcspk_info = {

View File

@@ -54,6 +54,12 @@
} while (0)
/* Anonymous BlockBackend for empty drive */
static BlockBackend *blk_create_empty_drive(void)
{
return blk_new(0, BLK_PERM_ALL);
}
/********************************************************/
/* qdev floppy bus */
@@ -547,8 +553,7 @@ static int floppy_drive_init(DeviceState *qdev)
}
if (!dev->conf.blk) {
/* Anonymous BlockBackend for an empty drive */
dev->conf.blk = blk_new(0, BLK_PERM_ALL);
dev->conf.blk = blk_create_empty_drive();
ret = blk_attach_dev(dev->conf.blk, qdev);
assert(ret == 0);
}
@@ -1358,7 +1363,19 @@ static FDrive *get_drv(FDCtrl *fdctrl, int unit)
static FDrive *get_cur_drv(FDCtrl *fdctrl)
{
return get_drv(fdctrl, fdctrl->cur_drv);
FDrive *cur_drv = get_drv(fdctrl, fdctrl->cur_drv);
if (!cur_drv->blk) {
/*
* Kludge: empty drive line selected. Create an anonymous
* BlockBackend to avoid NULL deref with various BlockBackend
* API calls within this model (CVE-2021-20196).
* Due to the controller QOM model limitations, we don't
* attach the created to the controller device.
*/
cur_drv->blk = blk_create_empty_drive();
}
return cur_drv;
}
/* Status A register : 0x00 (read-only) */
@@ -1710,6 +1727,14 @@ static void fdctrl_start_transfer(FDCtrl *fdctrl, int direction)
int tmp;
fdctrl->data_len = 128 << (fdctrl->fifo[5] > 7 ? 7 : fdctrl->fifo[5]);
tmp = (fdctrl->fifo[6] - ks + 1);
if (tmp < 0) {
FLOPPY_DPRINTF("invalid EOT: %d\n", tmp);
fdctrl_stop_transfer(fdctrl, FD_SR0_ABNTERM, FD_SR1_MA, 0x00);
fdctrl->fifo[3] = kt;
fdctrl->fifo[4] = kh;
fdctrl->fifo[5] = ks;
return;
}
if (fdctrl->fifo[0] & 0x80)
tmp += fdctrl->fifo[6];
fdctrl->data_len *= tmp;

View File

@@ -34,6 +34,24 @@
static void nvme_process_sq(void *opaque);
static inline bool nvme_addr_is_iomem(NvmeCtrl *n, hwaddr addr)
{
PCIDevice *pci_dev = &n->parent_obj;
hwaddr regs_hi, regs_lo, msix_hi, msix_lo;
/*
* The purpose of this check is to guard against invalid "local" access
* to the iomem (i.e. controller registers, MSIX-related space).
*/
regs_lo = n->iomem.addr;
regs_hi = regs_lo + int128_get64(n->iomem.size);
msix_lo = pci_dev->msix_exclusive_bar.addr;
msix_hi = msix_lo + int128_get64(pci_dev->msix_exclusive_bar.size);
return (addr >= regs_lo && addr < regs_hi) ||
(addr >= msix_lo && addr < msix_hi);
}
static int nvme_check_sqid(NvmeCtrl *n, uint16_t sqid)
{
return sqid < n->num_queues && n->sq[sqid] != NULL ? 0 : -1;
@@ -90,6 +108,9 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, uint64_t prp1, uint64_t prp2,
return NVME_INVALID_FIELD | NVME_DNR;
}
if (nvme_addr_is_iomem(n, prp1)) {
return NVME_DATA_TRAS_ERROR;
}
pci_dma_sglist_init(qsg, &n->parent_obj, num_prps);
qemu_sglist_add(qsg, prp1, trans_len);
len -= trans_len;
@@ -126,6 +147,9 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, uint64_t prp1, uint64_t prp2,
}
trans_len = MIN(len, n->page_size);
if (nvme_addr_is_iomem(n, prp_ent)) {
return NVME_DATA_TRAS_ERROR;
}
qemu_sglist_add(qsg, prp_ent, trans_len);
len -= trans_len;
i++;
@@ -134,6 +158,9 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, uint64_t prp1, uint64_t prp2,
if (prp2 & (n->page_size - 1)) {
goto unmap;
}
if (nvme_addr_is_iomem(n, prp2)) {
return NVME_DATA_TRAS_ERROR;
}
qemu_sglist_add(qsg, prp2, len);
}
}

View File

@@ -34,29 +34,9 @@
/* ------------------------------------------------------------- */
static int batch_maps = 0;
static int max_requests = 32;
/* ------------------------------------------------------------- */
#define BLOCK_SIZE 512
#define IOCB_COUNT (BLKIF_MAX_SEGMENTS_PER_REQUEST + 2)
struct PersistentGrant {
void *page;
struct XenBlkDev *blkdev;
};
typedef struct PersistentGrant PersistentGrant;
struct PersistentRegion {
void *addr;
int num;
};
typedef struct PersistentRegion PersistentRegion;
struct ioreq {
blkif_request_t req;
int16_t status;
@@ -64,16 +44,9 @@ struct ioreq {
/* parsed request */
off_t start;
QEMUIOVector v;
void *buf;
size_t size;
int presync;
uint8_t mapped;
/* grant mapping */
uint32_t domids[BLKIF_MAX_SEGMENTS_PER_REQUEST];
uint32_t refs[BLKIF_MAX_SEGMENTS_PER_REQUEST];
int prot;
void *page[BLKIF_MAX_SEGMENTS_PER_REQUEST];
void *pages;
int num_unmap;
/* aio status */
int aio_inflight;
@@ -84,6 +57,8 @@ struct ioreq {
BlockAcctCookie acct;
};
#define MAX_RING_PAGE_ORDER 4
struct XenBlkDev {
struct XenDevice xendev; /* must be first */
char *params;
@@ -94,14 +69,14 @@ struct XenBlkDev {
bool directiosafe;
const char *fileproto;
const char *filename;
int ring_ref;
unsigned int ring_ref[1 << MAX_RING_PAGE_ORDER];
unsigned int nr_ring_ref;
void *sring;
int64_t file_blk;
int64_t file_size;
int protocol;
blkif_back_rings_t rings;
int more_work;
int cnt_map;
/* request lists */
QLIST_HEAD(inflight_head, ioreq) inflight;
@@ -110,17 +85,10 @@ struct XenBlkDev {
int requests_total;
int requests_inflight;
int requests_finished;
unsigned int max_requests;
/* Persistent grants extension */
gboolean cache_unsafe;
gboolean feature_discard;
gboolean feature_persistent;
GTree *persistent_gnts;
GSList *persistent_regions;
unsigned int persistent_gnt_count;
unsigned int max_grants;
/* Grant copy */
gboolean feature_grant_copy;
/* qemu block driver */
DriveInfo *dinfo;
@@ -135,14 +103,8 @@ static void ioreq_reset(struct ioreq *ioreq)
memset(&ioreq->req, 0, sizeof(ioreq->req));
ioreq->status = 0;
ioreq->start = 0;
ioreq->size = 0;
ioreq->presync = 0;
ioreq->mapped = 0;
memset(ioreq->domids, 0, sizeof(ioreq->domids));
memset(ioreq->refs, 0, sizeof(ioreq->refs));
ioreq->prot = 0;
memset(ioreq->page, 0, sizeof(ioreq->page));
ioreq->pages = NULL;
ioreq->aio_inflight = 0;
ioreq->aio_errors = 0;
@@ -154,59 +116,20 @@ static void ioreq_reset(struct ioreq *ioreq)
qemu_iovec_reset(&ioreq->v);
}
static gint int_cmp(gconstpointer a, gconstpointer b, gpointer user_data)
{
uint ua = GPOINTER_TO_UINT(a);
uint ub = GPOINTER_TO_UINT(b);
return (ua > ub) - (ua < ub);
}
static void destroy_grant(gpointer pgnt)
{
PersistentGrant *grant = pgnt;
xengnttab_handle *gnt = grant->blkdev->xendev.gnttabdev;
if (xengnttab_unmap(gnt, grant->page, 1) != 0) {
xen_pv_printf(&grant->blkdev->xendev, 0,
"xengnttab_unmap failed: %s\n",
strerror(errno));
}
grant->blkdev->persistent_gnt_count--;
xen_pv_printf(&grant->blkdev->xendev, 3,
"unmapped grant %p\n", grant->page);
g_free(grant);
}
static void remove_persistent_region(gpointer data, gpointer dev)
{
PersistentRegion *region = data;
struct XenBlkDev *blkdev = dev;
xengnttab_handle *gnt = blkdev->xendev.gnttabdev;
if (xengnttab_unmap(gnt, region->addr, region->num) != 0) {
xen_pv_printf(&blkdev->xendev, 0,
"xengnttab_unmap region %p failed: %s\n",
region->addr, strerror(errno));
}
xen_pv_printf(&blkdev->xendev, 3,
"unmapped grant region %p with %d pages\n",
region->addr, region->num);
g_free(region);
}
static struct ioreq *ioreq_start(struct XenBlkDev *blkdev)
{
struct ioreq *ioreq = NULL;
if (QLIST_EMPTY(&blkdev->freelist)) {
if (blkdev->requests_total >= max_requests) {
if (blkdev->requests_total >= blkdev->max_requests) {
goto out;
}
/* allocate new struct */
ioreq = g_malloc0(sizeof(*ioreq));
ioreq->blkdev = blkdev;
ioreq->buf = qemu_memalign(XC_PAGE_SIZE, BLKIF_MAX_SEGMENTS_PER_REQUEST * XC_PAGE_SIZE);
blkdev->requests_total++;
qemu_iovec_init(&ioreq->v, BLKIF_MAX_SEGMENTS_PER_REQUEST);
qemu_iovec_init(&ioreq->v, 1);
} else {
/* get one from freelist */
ioreq = QLIST_FIRST(&blkdev->freelist);
@@ -251,7 +174,6 @@ static void ioreq_release(struct ioreq *ioreq, bool finish)
static int ioreq_parse(struct ioreq *ioreq)
{
struct XenBlkDev *blkdev = ioreq->blkdev;
uintptr_t mem;
size_t len;
int i;
@@ -261,7 +183,6 @@ static int ioreq_parse(struct ioreq *ioreq)
ioreq->req.handle, ioreq->req.id, ioreq->req.sector_number);
switch (ioreq->req.operation) {
case BLKIF_OP_READ:
ioreq->prot = PROT_WRITE; /* to memory */
break;
case BLKIF_OP_FLUSH_DISKCACHE:
ioreq->presync = 1;
@@ -270,7 +191,6 @@ static int ioreq_parse(struct ioreq *ioreq)
}
/* fall through */
case BLKIF_OP_WRITE:
ioreq->prot = PROT_READ; /* from memory */
break;
case BLKIF_OP_DISCARD:
return 0;
@@ -300,14 +220,10 @@ static int ioreq_parse(struct ioreq *ioreq)
goto err;
}
ioreq->domids[i] = blkdev->xendev.dom;
ioreq->refs[i] = ioreq->req.seg[i].gref;
mem = ioreq->req.seg[i].first_sect * blkdev->file_blk;
len = (ioreq->req.seg[i].last_sect - ioreq->req.seg[i].first_sect + 1) * blkdev->file_blk;
qemu_iovec_add(&ioreq->v, (void*)mem, len);
ioreq->size += len;
}
if (ioreq->start + ioreq->v.size > blkdev->file_size) {
if (ioreq->start + ioreq->size > blkdev->file_size) {
xen_pv_printf(&blkdev->xendev, 0, "error: access beyond end of file\n");
goto err;
}
@@ -318,243 +234,38 @@ err:
return -1;
}
static void ioreq_unmap(struct ioreq *ioreq)
{
xengnttab_handle *gnt = ioreq->blkdev->xendev.gnttabdev;
int i;
if (ioreq->num_unmap == 0 || ioreq->mapped == 0) {
return;
}
if (batch_maps) {
if (!ioreq->pages) {
return;
}
if (xengnttab_unmap(gnt, ioreq->pages, ioreq->num_unmap) != 0) {
xen_pv_printf(&ioreq->blkdev->xendev, 0,
"xengnttab_unmap failed: %s\n",
strerror(errno));
}
ioreq->blkdev->cnt_map -= ioreq->num_unmap;
ioreq->pages = NULL;
} else {
for (i = 0; i < ioreq->num_unmap; i++) {
if (!ioreq->page[i]) {
continue;
}
if (xengnttab_unmap(gnt, ioreq->page[i], 1) != 0) {
xen_pv_printf(&ioreq->blkdev->xendev, 0,
"xengnttab_unmap failed: %s\n",
strerror(errno));
}
ioreq->blkdev->cnt_map--;
ioreq->page[i] = NULL;
}
}
ioreq->mapped = 0;
}
static int ioreq_map(struct ioreq *ioreq)
{
xengnttab_handle *gnt = ioreq->blkdev->xendev.gnttabdev;
uint32_t domids[BLKIF_MAX_SEGMENTS_PER_REQUEST];
uint32_t refs[BLKIF_MAX_SEGMENTS_PER_REQUEST];
void *page[BLKIF_MAX_SEGMENTS_PER_REQUEST];
int i, j, new_maps = 0;
PersistentGrant *grant;
PersistentRegion *region;
/* domids and refs variables will contain the information necessary
* to map the grants that are needed to fulfill this request.
*
* After mapping the needed grants, the page array will contain the
* memory address of each granted page in the order specified in ioreq
* (disregarding if it's a persistent grant or not).
*/
if (ioreq->v.niov == 0 || ioreq->mapped == 1) {
return 0;
}
if (ioreq->blkdev->feature_persistent) {
for (i = 0; i < ioreq->v.niov; i++) {
grant = g_tree_lookup(ioreq->blkdev->persistent_gnts,
GUINT_TO_POINTER(ioreq->refs[i]));
if (grant != NULL) {
page[i] = grant->page;
xen_pv_printf(&ioreq->blkdev->xendev, 3,
"using persistent-grant %" PRIu32 "\n",
ioreq->refs[i]);
} else {
/* Add the grant to the list of grants that
* should be mapped
*/
domids[new_maps] = ioreq->domids[i];
refs[new_maps] = ioreq->refs[i];
page[i] = NULL;
new_maps++;
}
}
/* Set the protection to RW, since grants may be reused later
* with a different protection than the one needed for this request
*/
ioreq->prot = PROT_WRITE | PROT_READ;
} else {
/* All grants in the request should be mapped */
memcpy(refs, ioreq->refs, sizeof(refs));
memcpy(domids, ioreq->domids, sizeof(domids));
memset(page, 0, sizeof(page));
new_maps = ioreq->v.niov;
}
if (batch_maps && new_maps) {
ioreq->pages = xengnttab_map_grant_refs
(gnt, new_maps, domids, refs, ioreq->prot);
if (ioreq->pages == NULL) {
xen_pv_printf(&ioreq->blkdev->xendev, 0,
"can't map %d grant refs (%s, %d maps)\n",
new_maps, strerror(errno), ioreq->blkdev->cnt_map);
return -1;
}
for (i = 0, j = 0; i < ioreq->v.niov; i++) {
if (page[i] == NULL) {
page[i] = ioreq->pages + (j++) * XC_PAGE_SIZE;
}
}
ioreq->blkdev->cnt_map += new_maps;
} else if (new_maps) {
for (i = 0; i < new_maps; i++) {
ioreq->page[i] = xengnttab_map_grant_ref
(gnt, domids[i], refs[i], ioreq->prot);
if (ioreq->page[i] == NULL) {
xen_pv_printf(&ioreq->blkdev->xendev, 0,
"can't map grant ref %d (%s, %d maps)\n",
refs[i], strerror(errno), ioreq->blkdev->cnt_map);
ioreq->mapped = 1;
ioreq_unmap(ioreq);
return -1;
}
ioreq->blkdev->cnt_map++;
}
for (i = 0, j = 0; i < ioreq->v.niov; i++) {
if (page[i] == NULL) {
page[i] = ioreq->page[j++];
}
}
}
if (ioreq->blkdev->feature_persistent && new_maps != 0 &&
(!batch_maps || (ioreq->blkdev->persistent_gnt_count + new_maps <=
ioreq->blkdev->max_grants))) {
/*
* If we are using persistent grants and batch mappings only
* add the new maps to the list of persistent grants if the whole
* area can be persistently mapped.
*/
if (batch_maps) {
region = g_malloc0(sizeof(*region));
region->addr = ioreq->pages;
region->num = new_maps;
ioreq->blkdev->persistent_regions = g_slist_append(
ioreq->blkdev->persistent_regions,
region);
}
while ((ioreq->blkdev->persistent_gnt_count < ioreq->blkdev->max_grants)
&& new_maps) {
/* Go through the list of newly mapped grants and add as many
* as possible to the list of persistently mapped grants.
*
* Since we start at the end of ioreq->page(s), we only need
* to decrease new_maps to prevent this granted pages from
* being unmapped in ioreq_unmap.
*/
grant = g_malloc0(sizeof(*grant));
new_maps--;
if (batch_maps) {
grant->page = ioreq->pages + (new_maps) * XC_PAGE_SIZE;
} else {
grant->page = ioreq->page[new_maps];
}
grant->blkdev = ioreq->blkdev;
xen_pv_printf(&ioreq->blkdev->xendev, 3,
"adding grant %" PRIu32 " page: %p\n",
refs[new_maps], grant->page);
g_tree_insert(ioreq->blkdev->persistent_gnts,
GUINT_TO_POINTER(refs[new_maps]),
grant);
ioreq->blkdev->persistent_gnt_count++;
}
assert(!batch_maps || new_maps == 0);
}
for (i = 0; i < ioreq->v.niov; i++) {
ioreq->v.iov[i].iov_base += (uintptr_t)page[i];
}
ioreq->mapped = 1;
ioreq->num_unmap = new_maps;
return 0;
}
#if CONFIG_XEN_CTRL_INTERFACE_VERSION >= 480
static void ioreq_free_copy_buffers(struct ioreq *ioreq)
{
int i;
for (i = 0; i < ioreq->v.niov; i++) {
ioreq->page[i] = NULL;
}
qemu_vfree(ioreq->pages);
}
static int ioreq_init_copy_buffers(struct ioreq *ioreq)
{
int i;
if (ioreq->v.niov == 0) {
return 0;
}
ioreq->pages = qemu_memalign(XC_PAGE_SIZE, ioreq->v.niov * XC_PAGE_SIZE);
for (i = 0; i < ioreq->v.niov; i++) {
ioreq->page[i] = ioreq->pages + i * XC_PAGE_SIZE;
ioreq->v.iov[i].iov_base = ioreq->page[i];
}
return 0;
}
static int ioreq_grant_copy(struct ioreq *ioreq)
{
xengnttab_handle *gnt = ioreq->blkdev->xendev.gnttabdev;
xengnttab_grant_copy_segment_t segs[BLKIF_MAX_SEGMENTS_PER_REQUEST];
struct XenBlkDev *blkdev = ioreq->blkdev;
struct XenDevice *xendev = &blkdev->xendev;
XenGrantCopySegment segs[BLKIF_MAX_SEGMENTS_PER_REQUEST];
int i, count, rc;
int64_t file_blk = ioreq->blkdev->file_blk;
bool to_domain = (ioreq->req.operation == BLKIF_OP_READ);
void *virt = ioreq->buf;
if (ioreq->v.niov == 0) {
if (ioreq->req.nr_segments == 0) {
return 0;
}
count = ioreq->v.niov;
count = ioreq->req.nr_segments;
for (i = 0; i < count; i++) {
if (ioreq->req.operation == BLKIF_OP_READ) {
segs[i].flags = GNTCOPY_dest_gref;
segs[i].dest.foreign.ref = ioreq->refs[i];
segs[i].dest.foreign.domid = ioreq->domids[i];
if (to_domain) {
segs[i].dest.foreign.ref = ioreq->req.seg[i].gref;
segs[i].dest.foreign.offset = ioreq->req.seg[i].first_sect * file_blk;
segs[i].source.virt = ioreq->v.iov[i].iov_base;
segs[i].source.virt = virt;
} else {
segs[i].flags = GNTCOPY_source_gref;
segs[i].source.foreign.ref = ioreq->refs[i];
segs[i].source.foreign.domid = ioreq->domids[i];
segs[i].source.foreign.ref = ioreq->req.seg[i].gref;
segs[i].source.foreign.offset = ioreq->req.seg[i].first_sect * file_blk;
segs[i].dest.virt = ioreq->v.iov[i].iov_base;
segs[i].dest.virt = virt;
}
segs[i].len = (ioreq->req.seg[i].last_sect
- ioreq->req.seg[i].first_sect + 1) * file_blk;
virt += segs[i].len;
}
rc = xengnttab_grant_copy(gnt, count, segs);
rc = xen_be_copy_grant_refs(xendev, to_domain, segs, count);
if (rc) {
xen_pv_printf(&ioreq->blkdev->xendev, 0,
@@ -563,34 +274,8 @@ static int ioreq_grant_copy(struct ioreq *ioreq)
return -1;
}
for (i = 0; i < count; i++) {
if (segs[i].status != GNTST_okay) {
xen_pv_printf(&ioreq->blkdev->xendev, 3,
"failed to copy data %d for gref %d, domid %d\n",
segs[i].status, ioreq->refs[i], ioreq->domids[i]);
ioreq->aio_errors++;
rc = -1;
}
}
return rc;
}
#else
static void ioreq_free_copy_buffers(struct ioreq *ioreq)
{
abort();
}
static int ioreq_init_copy_buffers(struct ioreq *ioreq)
{
abort();
}
static int ioreq_grant_copy(struct ioreq *ioreq)
{
abort();
}
#endif
static int ioreq_runio_qemu_aio(struct ioreq *ioreq);
@@ -614,32 +299,26 @@ static void qemu_aio_complete(void *opaque, int ret)
return;
}
if (ioreq->blkdev->feature_grant_copy) {
switch (ioreq->req.operation) {
case BLKIF_OP_READ:
/* in case of failure ioreq->aio_errors is increased */
if (ret == 0) {
ioreq_grant_copy(ioreq);
}
ioreq_free_copy_buffers(ioreq);
break;
case BLKIF_OP_WRITE:
case BLKIF_OP_FLUSH_DISKCACHE:
if (!ioreq->req.nr_segments) {
break;
}
ioreq_free_copy_buffers(ioreq);
break;
default:
switch (ioreq->req.operation) {
case BLKIF_OP_READ:
/* in case of failure ioreq->aio_errors is increased */
if (ret == 0) {
ioreq_grant_copy(ioreq);
}
break;
case BLKIF_OP_WRITE:
case BLKIF_OP_FLUSH_DISKCACHE:
if (!ioreq->req.nr_segments) {
break;
}
break;
default:
break;
}
ioreq->status = ioreq->aio_errors ? BLKIF_RSP_ERROR : BLKIF_RSP_OKAY;
if (!ioreq->blkdev->feature_grant_copy) {
ioreq_unmap(ioreq);
}
ioreq_finish(ioreq);
switch (ioreq->req.operation) {
case BLKIF_OP_WRITE:
case BLKIF_OP_FLUSH_DISKCACHE:
@@ -696,18 +375,11 @@ static int ioreq_runio_qemu_aio(struct ioreq *ioreq)
{
struct XenBlkDev *blkdev = ioreq->blkdev;
if (ioreq->blkdev->feature_grant_copy) {
ioreq_init_copy_buffers(ioreq);
if (ioreq->req.nr_segments && (ioreq->req.operation == BLKIF_OP_WRITE ||
ioreq->req.operation == BLKIF_OP_FLUSH_DISKCACHE) &&
ioreq_grant_copy(ioreq)) {
ioreq_free_copy_buffers(ioreq);
goto err;
}
} else {
if (ioreq->req.nr_segments && ioreq_map(ioreq)) {
goto err;
}
if (ioreq->req.nr_segments &&
(ioreq->req.operation == BLKIF_OP_WRITE ||
ioreq->req.operation == BLKIF_OP_FLUSH_DISKCACHE) &&
ioreq_grant_copy(ioreq)) {
goto err;
}
ioreq->aio_inflight++;
@@ -718,6 +390,7 @@ static int ioreq_runio_qemu_aio(struct ioreq *ioreq)
switch (ioreq->req.operation) {
case BLKIF_OP_READ:
qemu_iovec_add(&ioreq->v, ioreq->buf, ioreq->size);
block_acct_start(blk_get_stats(blkdev->blk), &ioreq->acct,
ioreq->v.size, BLOCK_ACCT_READ);
ioreq->aio_inflight++;
@@ -730,6 +403,7 @@ static int ioreq_runio_qemu_aio(struct ioreq *ioreq)
break;
}
qemu_iovec_add(&ioreq->v, ioreq->buf, ioreq->size);
block_acct_start(blk_get_stats(blkdev->blk), &ioreq->acct,
ioreq->v.size,
ioreq->req.operation == BLKIF_OP_WRITE ?
@@ -748,9 +422,6 @@ static int ioreq_runio_qemu_aio(struct ioreq *ioreq)
}
default:
/* unknown operation (shouldn't happen -- parse catches this) */
if (!ioreq->blkdev->feature_grant_copy) {
ioreq_unmap(ioreq);
}
goto err;
}
@@ -769,31 +440,30 @@ static int blk_send_response_one(struct ioreq *ioreq)
struct XenBlkDev *blkdev = ioreq->blkdev;
int send_notify = 0;
int have_requests = 0;
blkif_response_t resp;
void *dst;
resp.id = ioreq->req.id;
resp.operation = ioreq->req.operation;
resp.status = ioreq->status;
blkif_response_t *resp;
/* Place on the response ring for the relevant domain. */
switch (blkdev->protocol) {
case BLKIF_PROTOCOL_NATIVE:
dst = RING_GET_RESPONSE(&blkdev->rings.native, blkdev->rings.native.rsp_prod_pvt);
resp = (blkif_response_t *) RING_GET_RESPONSE(&blkdev->rings.native,
blkdev->rings.native.rsp_prod_pvt);
break;
case BLKIF_PROTOCOL_X86_32:
dst = RING_GET_RESPONSE(&blkdev->rings.x86_32_part,
blkdev->rings.x86_32_part.rsp_prod_pvt);
resp = (blkif_response_t *) RING_GET_RESPONSE(&blkdev->rings.x86_32_part,
blkdev->rings.x86_32_part.rsp_prod_pvt);
break;
case BLKIF_PROTOCOL_X86_64:
dst = RING_GET_RESPONSE(&blkdev->rings.x86_64_part,
blkdev->rings.x86_64_part.rsp_prod_pvt);
resp = (blkif_response_t *) RING_GET_RESPONSE(&blkdev->rings.x86_64_part,
blkdev->rings.x86_64_part.rsp_prod_pvt);
break;
default:
dst = NULL;
return 0;
}
memcpy(dst, &resp, sizeof(resp));
resp->id = ioreq->req.id;
resp->operation = ioreq->req.operation;
resp->status = ioreq->status;
blkdev->rings.common.rsp_prod_pvt++;
RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&blkdev->rings.common, send_notify);
@@ -905,7 +575,7 @@ static void blk_handle_requests(struct XenBlkDev *blkdev)
ioreq_runio_qemu_aio(ioreq);
}
if (blkdev->more_work && blkdev->requests_inflight < max_requests) {
if (blkdev->more_work && blkdev->requests_inflight < blkdev->max_requests) {
qemu_bh_schedule(blkdev->bh);
}
}
@@ -918,15 +588,6 @@ static void blk_bh(void *opaque)
blk_handle_requests(blkdev);
}
/*
* We need to account for the grant allocations requiring contiguous
* chunks; the worst case number would be
* max_req * max_seg + (max_req - 1) * (max_seg - 1) + 1,
* but in order to keep things simple just use
* 2 * max_req * max_seg.
*/
#define MAX_GRANTS(max_req, max_seg) (2 * (max_req) * (max_seg))
static void blk_alloc(struct XenDevice *xendev)
{
struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
@@ -935,14 +596,6 @@ static void blk_alloc(struct XenDevice *xendev)
QLIST_INIT(&blkdev->finished);
QLIST_INIT(&blkdev->freelist);
blkdev->bh = qemu_bh_new(blk_bh, blkdev);
if (xen_mode != XEN_EMULATE) {
batch_maps = 1;
}
if (xengnttab_set_max_grants(xendev->gnttabdev,
MAX_GRANTS(max_requests, BLKIF_MAX_SEGMENTS_PER_REQUEST)) < 0) {
xen_pv_printf(xendev, 0, "xengnttab_set_max_grants failed: %s\n",
strerror(errno));
}
}
static void blk_parse_discard(struct XenBlkDev *blkdev)
@@ -960,6 +613,16 @@ static void blk_parse_discard(struct XenBlkDev *blkdev)
}
}
static void blk_parse_cache_unsafe(struct XenBlkDev *blkdev)
{
int enable;
blkdev->cache_unsafe = false;
if (xenstore_read_be_int(&blkdev->xendev, "suse-diskcache-disable-flush", &enable) == 0)
blkdev->cache_unsafe = !!enable;
}
static int blk_init(struct XenDevice *xendev)
{
struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
@@ -1027,10 +690,13 @@ static int blk_init(struct XenDevice *xendev)
* blk_connect supplies sector-size and sectors
*/
xenstore_write_be_int(&blkdev->xendev, "feature-flush-cache", 1);
xenstore_write_be_int(&blkdev->xendev, "feature-persistent", 1);
xenstore_write_be_int(&blkdev->xendev, "info", info);
xenstore_write_be_int(&blkdev->xendev, "max-ring-page-order",
MAX_RING_PAGE_ORDER);
blk_parse_discard(blkdev);
blk_parse_cache_unsafe(blkdev);
g_free(directiosafe);
return 0;
@@ -1054,9 +720,13 @@ out_error:
static int blk_connect(struct XenDevice *xendev)
{
struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
int pers, index, qflags;
int index, qflags;
bool readonly = true;
bool writethrough = true;
Error *errp = NULL;
int order, ring_ref;
unsigned int ring_size, max_grants;
unsigned int i;
/* read-only ? */
if (blkdev->directiosafe) {
@@ -1073,6 +743,9 @@ static int blk_connect(struct XenDevice *xendev)
qflags |= BDRV_O_UNMAP;
}
if (blkdev->cache_unsafe)
qflags |= BDRV_O_NO_FLUSH;
/* init qemu block driver */
index = (blkdev->xendev.dev - 202 * 256) / 16;
blkdev->dinfo = drive_get(IF_XEN, 0, index);
@@ -1082,7 +755,7 @@ static int blk_connect(struct XenDevice *xendev)
if (strcmp(blkdev->fileproto, "<unset>")) {
options = qdict_new();
qdict_put(options, "driver", qstring_from_str(blkdev->fileproto));
qdict_put_str(options, "driver", blkdev->fileproto);
}
/* setup via xenbus -> create new block driver instance */
@@ -1111,6 +784,13 @@ static int blk_connect(struct XenDevice *xendev)
blk_ref(blkdev->blk);
}
blk_attach_dev_legacy(blkdev->blk, blkdev);
if (!monitor_add_blk(blkdev->blk, g_strdup(blkdev->dev), &errp)) {
xen_pv_printf(&blkdev->xendev, 0, "error: %s\n",
error_get_pretty(errp));
error_free(errp);
return -1;
}
blkdev->file_size = blk_getlength(blkdev->blk);
if (blkdev->file_size < 0) {
BlockDriverState *bs = blk_bs(blkdev->blk);
@@ -1131,18 +811,46 @@ static int blk_connect(struct XenDevice *xendev)
xenstore_write_be_int64(&blkdev->xendev, "sectors",
blkdev->file_size / blkdev->file_blk);
if (xenstore_read_fe_int(&blkdev->xendev, "ring-ref", &blkdev->ring_ref) == -1) {
if (xenstore_read_fe_int(&blkdev->xendev, "ring-page-order",
&order) == -1) {
blkdev->nr_ring_ref = 1;
if (xenstore_read_fe_int(&blkdev->xendev, "ring-ref",
&ring_ref) == -1) {
return -1;
}
blkdev->ring_ref[0] = ring_ref;
} else if (order >= 0 && order <= MAX_RING_PAGE_ORDER) {
blkdev->nr_ring_ref = 1 << order;
for (i = 0; i < blkdev->nr_ring_ref; i++) {
char *key;
key = g_strdup_printf("ring-ref%u", i);
if (!key) {
return -1;
}
if (xenstore_read_fe_int(&blkdev->xendev, key,
&ring_ref) == -1) {
g_free(key);
return -1;
}
blkdev->ring_ref[i] = ring_ref;
g_free(key);
}
} else {
xen_pv_printf(xendev, 0, "invalid ring-page-order: %d\n",
order);
return -1;
}
if (xenstore_read_fe_int(&blkdev->xendev, "event-channel",
&blkdev->xendev.remote_port) == -1) {
return -1;
}
if (xenstore_read_fe_int(&blkdev->xendev, "feature-persistent", &pers)) {
blkdev->feature_persistent = FALSE;
} else {
blkdev->feature_persistent = !!pers;
}
if (!blkdev->xendev.protocol) {
blkdev->protocol = BLKIF_PROTOCOL_NATIVE;
@@ -1156,61 +864,67 @@ static int blk_connect(struct XenDevice *xendev)
blkdev->protocol = BLKIF_PROTOCOL_NATIVE;
}
blkdev->sring = xengnttab_map_grant_ref(blkdev->xendev.gnttabdev,
blkdev->xendev.dom,
blkdev->ring_ref,
PROT_READ | PROT_WRITE);
ring_size = XC_PAGE_SIZE * blkdev->nr_ring_ref;
switch (blkdev->protocol) {
case BLKIF_PROTOCOL_NATIVE:
{
blkdev->max_requests = __CONST_RING_SIZE(blkif, ring_size);
break;
}
case BLKIF_PROTOCOL_X86_32:
{
blkdev->max_requests = __CONST_RING_SIZE(blkif_x86_32, ring_size);
break;
}
case BLKIF_PROTOCOL_X86_64:
{
blkdev->max_requests = __CONST_RING_SIZE(blkif_x86_64, ring_size);
break;
}
default:
return -1;
}
/* Add on the number needed for the ring pages */
max_grants = blkdev->nr_ring_ref;
xen_be_set_max_grant_refs(xendev, max_grants);
blkdev->sring = xen_be_map_grant_refs(xendev, blkdev->ring_ref,
blkdev->nr_ring_ref,
PROT_READ | PROT_WRITE);
if (!blkdev->sring) {
return -1;
}
blkdev->cnt_map++;
switch (blkdev->protocol) {
case BLKIF_PROTOCOL_NATIVE:
{
blkif_sring_t *sring_native = blkdev->sring;
BACK_RING_INIT(&blkdev->rings.native, sring_native, XC_PAGE_SIZE);
BACK_RING_INIT(&blkdev->rings.native, sring_native, ring_size);
break;
}
case BLKIF_PROTOCOL_X86_32:
{
blkif_x86_32_sring_t *sring_x86_32 = blkdev->sring;
BACK_RING_INIT(&blkdev->rings.x86_32_part, sring_x86_32, XC_PAGE_SIZE);
BACK_RING_INIT(&blkdev->rings.x86_32_part, sring_x86_32, ring_size);
break;
}
case BLKIF_PROTOCOL_X86_64:
{
blkif_x86_64_sring_t *sring_x86_64 = blkdev->sring;
BACK_RING_INIT(&blkdev->rings.x86_64_part, sring_x86_64, XC_PAGE_SIZE);
BACK_RING_INIT(&blkdev->rings.x86_64_part, sring_x86_64, ring_size);
break;
}
}
if (blkdev->feature_persistent) {
/* Init persistent grants */
blkdev->max_grants = max_requests * BLKIF_MAX_SEGMENTS_PER_REQUEST;
blkdev->persistent_gnts = g_tree_new_full((GCompareDataFunc)int_cmp,
NULL, NULL,
batch_maps ?
(GDestroyNotify)g_free :
(GDestroyNotify)destroy_grant);
blkdev->persistent_regions = NULL;
blkdev->persistent_gnt_count = 0;
}
xen_be_bind_evtchn(&blkdev->xendev);
blkdev->feature_grant_copy =
(xengnttab_grant_copy(blkdev->xendev.gnttabdev, 0, NULL) == 0);
xen_pv_printf(&blkdev->xendev, 3, "grant copy operation %s\n",
blkdev->feature_grant_copy ? "enabled" : "disabled");
xen_pv_printf(&blkdev->xendev, 1, "ok: proto %s, ring-ref %d, "
xen_pv_printf(&blkdev->xendev, 1, "ok: proto %s, nr-ring-ref %u, "
"remote port %d, local port %d\n",
blkdev->xendev.protocol, blkdev->ring_ref,
blkdev->xendev.protocol, blkdev->nr_ring_ref,
blkdev->xendev.remote_port, blkdev->xendev.local_port);
return 0;
}
@@ -1221,36 +935,17 @@ static void blk_disconnect(struct XenDevice *xendev)
if (blkdev->blk) {
blk_detach_dev(blkdev->blk, blkdev);
monitor_remove_blk(blkdev->blk);
blk_unref(blkdev->blk);
blkdev->blk = NULL;
}
xen_pv_unbind_evtchn(&blkdev->xendev);
if (blkdev->sring) {
xengnttab_unmap(blkdev->xendev.gnttabdev, blkdev->sring, 1);
blkdev->cnt_map--;
xen_be_unmap_grant_refs(xendev, blkdev->sring,
blkdev->nr_ring_ref);
blkdev->sring = NULL;
}
/*
* Unmap persistent grants before switching to the closed state
* so the frontend can free them.
*
* In the !batch_maps case g_tree_destroy will take care of unmapping
* the grant, but in the batch_maps case we need to iterate over every
* region in persistent_regions and unmap it.
*/
if (blkdev->feature_persistent) {
g_tree_destroy(blkdev->persistent_gnts);
assert(batch_maps || blkdev->persistent_gnt_count == 0);
if (batch_maps) {
blkdev->persistent_gnt_count = 0;
g_slist_foreach(blkdev->persistent_regions,
(GFunc)remove_persistent_region, blkdev);
g_slist_free(blkdev->persistent_regions);
}
blkdev->feature_persistent = false;
}
}
static int blk_free(struct XenDevice *xendev)
@@ -1258,14 +953,13 @@ static int blk_free(struct XenDevice *xendev)
struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
struct ioreq *ioreq;
if (blkdev->blk || blkdev->sring) {
blk_disconnect(xendev);
}
blk_disconnect(xendev);
while (!QLIST_EMPTY(&blkdev->freelist)) {
ioreq = QLIST_FIRST(&blkdev->freelist);
QLIST_REMOVE(ioreq, list);
qemu_iovec_destroy(&ioreq->v);
qemu_vfree(ioreq->buf);
g_free(ioreq);
}
@@ -1285,12 +979,22 @@ static void blk_event(struct XenDevice *xendev)
qemu_bh_schedule(blkdev->bh);
}
extern void xen_blk_resize_update(void *dev);
void xen_blk_resize_update(void *dev)
{
struct XenBlkDev *blkdev = dev;
blkdev->file_size = blk_getlength(blkdev->blk);
xenstore_write_be_int64(&blkdev->xendev, "sectors",
blkdev->file_size / blkdev->file_blk);
xen_be_set_state(&blkdev->xendev, blkdev->xendev.be_state);
}
struct XenDevOps xen_blkdev_ops = {
.size = sizeof(struct XenBlkDev),
.flags = DEVOPS_FLAG_NEED_GNTDEV,
.size = sizeof(struct XenBlkDev),
.alloc = blk_alloc,
.init = blk_init,
.initialise = blk_connect,
.initialise = blk_connect,
.disconnect = blk_disconnect,
.event = blk_event,
.free = blk_free,

View File

@@ -244,7 +244,9 @@ static const MemoryRegionOps bcm2835_aux_ops = {
.read = bcm2835_aux_read,
.write = bcm2835_aux_write,
.endianness = DEVICE_NATIVE_ENDIAN,
.valid.min_access_size = 4,
.impl.min_access_size = 4,
.impl.max_access_size = 4,
.valid.min_access_size = 1,
.valid.max_access_size = 4,
};

View File

@@ -261,15 +261,20 @@ static void serial_xmit(SerialState *s)
if (s->mcr & UART_MCR_LOOP) {
/* in loopback mode, say that we just received a char */
serial_receive1(s, &s->tsr, 1);
} else if (qemu_chr_fe_write(&s->chr, &s->tsr, 1) != 1 &&
s->tsr_retry < MAX_XMIT_RETRY) {
assert(s->watch_tag == 0);
s->watch_tag =
qemu_chr_fe_add_watch(&s->chr, G_IO_OUT | G_IO_HUP,
serial_watch_cb, s);
if (s->watch_tag > 0) {
s->tsr_retry++;
return;
} else {
int rc = qemu_chr_fe_write(&s->chr, &s->tsr, 1);
if ((rc == 0 ||
(rc == -1 && errno == EAGAIN)) &&
s->tsr_retry < MAX_XMIT_RETRY) {
assert(s->watch_tag == 0);
s->watch_tag =
qemu_chr_fe_add_watch(&s->chr, G_IO_OUT | G_IO_HUP,
serial_watch_cb, s);
if (s->watch_tag > 0) {
s->tsr_retry++;
return;
}
}
}
s->tsr_retry = 0;

View File

@@ -1121,6 +1121,9 @@ static void virtio_serial_device_unrealize(DeviceState *dev, Error **errp)
timer_free(vser->post_load->timer);
g_free(vser->post_load);
}
qbus_set_hotplug_handler(BUS(&vser->bus), NULL, errp);
virtio_cleanup(vdev);
}

View File

@@ -234,12 +234,11 @@ static int con_initialise(struct XenDevice *xendev)
if (!xendev->dev) {
xen_pfn_t mfn = con->ring_ref;
con->sring = xenforeignmemory_map(xen_fmem, con->xendev.dom,
PROT_READ|PROT_WRITE,
PROT_READ | PROT_WRITE,
1, &mfn, NULL);
} else {
con->sring = xengnttab_map_grant_ref(xendev->gnttabdev, con->xendev.dom,
con->ring_ref,
PROT_READ|PROT_WRITE);
con->sring = xen_be_map_grant_ref(xendev, con->ring_ref,
PROT_READ | PROT_WRITE);
}
if (!con->sring)
return -1;
@@ -268,7 +267,7 @@ static void con_disconnect(struct XenDevice *xendev)
if (!xendev->dev) {
xenforeignmemory_unmap(xen_fmem, con->sring, 1);
} else {
xengnttab_unmap(xendev->gnttabdev, con->sring, 1);
xen_be_unmap_grant_ref(xendev, con->sring);
}
con->sring = NULL;
}

View File

@@ -55,6 +55,7 @@
#include "exec/address-spaces.h"
#include "hw/boards.h"
#include "qemu/cutils.h"
#include "hw/xen/xen.h"
#include <zlib.h>
@@ -866,7 +867,10 @@ static void *rom_set_mr(Rom *rom, Object *owner, const char *name, bool ro)
void *data;
rom->mr = g_malloc(sizeof(*rom->mr));
memory_region_init_resizeable_ram(rom->mr, owner, name,
if (xen_enabled())
memory_region_init_ram(rom->mr, owner, name, rom->datasize, &error_fatal);
else
memory_region_init_resizeable_ram(rom->mr, owner, name,
rom->datasize, rom->romsize,
fw_cfg_resized,
&error_fatal);
@@ -1184,7 +1188,7 @@ int rom_copy(uint8_t *dest, hwaddr addr, size_t size)
if (rom->addr + rom->romsize < addr) {
continue;
}
if (rom->addr > end) {
if (rom->addr > end || rom->addr < addr) {
break;
}

View File

@@ -91,7 +91,7 @@ static void or_irq_class_init(ObjectClass *klass, void *data)
dc->vmsd = &vmstate_or_irq;
/* Reason: Needs to be wired up to work, e.g. see stm32f205_soc.c */
dc->cannot_instantiate_with_device_add_yet = true;
dc->user_creatable = false;
}
static const TypeInfo or_irq_type_info = {

View File

@@ -1159,6 +1159,7 @@ static void device_class_init(ObjectClass *class, void *data)
* should override it in their class_init()
*/
dc->hotpluggable = true;
dc->user_creatable = true;
}
void device_reset(DeviceState *dev)

View File

@@ -288,7 +288,7 @@ static void register_class_init(ObjectClass *oc, void *data)
DeviceClass *dc = DEVICE_CLASS(oc);
/* Reason: needs to be wired up to work */
dc->cannot_instantiate_with_device_add_yet = true;
dc->user_creatable = false;
}
static const TypeInfo register_info = {

View File

@@ -33,6 +33,11 @@ static void core_prop_set_core_id(Object *obj, Visitor *v, const char *name,
return;
}
if (value < 0) {
error_setg(errp, "Invalid core id %"PRId64, value);
return;
}
core->core_id = value;
}

View File

@@ -2038,15 +2038,14 @@ static void cirrus_mem_writeb_mode4and5_8bpp(CirrusVGAState * s,
unsigned val = mem_value;
uint8_t *dst;
dst = s->vga.vram_ptr + (offset &= s->cirrus_addr_mask);
for (x = 0; x < 8; x++) {
dst = s->vga.vram_ptr + ((offset + x) & s->cirrus_addr_mask);
if (val & 0x80) {
*dst = s->cirrus_shadow_gr1;
} else if (mode == 5) {
*dst = s->cirrus_shadow_gr0;
}
val <<= 1;
dst++;
}
memory_region_set_dirty(&s->vga.vram, offset, 8);
}
@@ -2060,8 +2059,8 @@ static void cirrus_mem_writeb_mode4and5_16bpp(CirrusVGAState * s,
unsigned val = mem_value;
uint8_t *dst;
dst = s->vga.vram_ptr + (offset &= s->cirrus_addr_mask);
for (x = 0; x < 8; x++) {
dst = s->vga.vram_ptr + ((offset + 2 * x) & s->cirrus_addr_mask & ~1);
if (val & 0x80) {
*dst = s->cirrus_shadow_gr1;
*(dst + 1) = s->vga.gr[0x11];
@@ -2070,7 +2069,6 @@ static void cirrus_mem_writeb_mode4and5_16bpp(CirrusVGAState * s,
*(dst + 1) = s->vga.gr[0x10];
}
val <<= 1;
dst += 2;
}
memory_region_set_dirty(&s->vga.vram, offset, 16);
}

View File

@@ -227,13 +227,13 @@ static void jazz_led_invalidate_display(void *opaque)
static void jazz_led_text_update(void *opaque, console_ch_t *chardata)
{
LedState *s = opaque;
char buf[2];
char buf[3];
dpy_text_cursor(s->con, -1, -1);
qemu_console_resize(s->con, 2, 1);
/* TODO: draw the segments */
snprintf(buf, 2, "%02hhx\n", s->segments);
snprintf(buf, 3, "%02hhx", s->segments);
console_write_ch(chardata++, ATTR2CHTYPE(buf[0], QEMU_COLOR_BLUE,
QEMU_COLOR_BLACK, 1));
console_write_ch(chardata++, ATTR2CHTYPE(buf[1], QEMU_COLOR_BLUE,

View File

@@ -106,7 +106,7 @@ static int qxl_log_image(PCIQXLDevice *qxl, QXLPHYSICAL addr, int group_id)
QXLImage *image;
QXLImageDescriptor *desc;
image = qxl_phys2virt(qxl, addr, group_id);
image = qxl_phys2virt(qxl, addr, group_id, sizeof(QXLImage));
if (!image) {
return 1;
}
@@ -216,7 +216,8 @@ int qxl_log_cmd_cursor(PCIQXLDevice *qxl, QXLCursorCmd *cmd, int group_id)
cmd->u.set.position.y,
cmd->u.set.visible ? "yes" : "no",
cmd->u.set.shape);
cursor = qxl_phys2virt(qxl, cmd->u.set.shape, group_id);
cursor = qxl_phys2virt(qxl, cmd->u.set.shape, group_id,
sizeof(QXLCursor));
if (!cursor) {
return 1;
}
@@ -238,6 +239,7 @@ int qxl_log_command(PCIQXLDevice *qxl, const char *ring, QXLCommandExt *ext)
{
bool compat = ext->flags & QXL_COMMAND_FLAG_COMPAT;
void *data;
size_t datasz;
int ret;
if (!qxl->cmdlog) {
@@ -249,7 +251,20 @@ int qxl_log_command(PCIQXLDevice *qxl, const char *ring, QXLCommandExt *ext)
qxl_name(qxl_type, ext->cmd.type),
compat ? "(compat)" : "");
data = qxl_phys2virt(qxl, ext->cmd.data, ext->group_id);
switch (ext->cmd.type) {
case QXL_CMD_DRAW:
datasz = compat ? sizeof(QXLCompatDrawable) : sizeof(QXLDrawable);
break;
case QXL_CMD_SURFACE:
datasz = sizeof(QXLSurfaceCmd);
break;
case QXL_CMD_CURSOR:
datasz = sizeof(QXLCursorCmd);
break;
default:
goto out;
}
data = qxl_phys2virt(qxl, ext->cmd.data, ext->group_id, datasz);
if (!data) {
return 1;
}
@@ -271,6 +286,7 @@ int qxl_log_command(PCIQXLDevice *qxl, const char *ring, QXLCommandExt *ext)
qxl_log_cmd_cursor(qxl, data, ext->group_id);
break;
}
out:
fprintf(stderr, "\n");
return 0;
}

View File

@@ -98,21 +98,25 @@ static void qxl_render_update_area_unlocked(PCIQXLDevice *qxl)
{
VGACommonState *vga = &qxl->vga;
DisplaySurface *surface;
int width = qxl->guest_head0_width ?: qxl->guest_primary.surface.width;
int height = qxl->guest_head0_height ?: qxl->guest_primary.surface.height;
int i;
if (qxl->guest_primary.resized) {
qxl->guest_primary.resized = 0;
qxl->guest_primary.data = qxl_phys2virt(qxl,
qxl->guest_primary.surface.mem,
MEMSLOT_GROUP_GUEST);
MEMSLOT_GROUP_GUEST,
qxl->guest_primary.abs_stride
* height);
if (!qxl->guest_primary.data) {
return;
}
qxl_set_rect_to_surface(qxl, &qxl->dirty[0]);
qxl->num_dirty_rects = 1;
trace_qxl_render_guest_primary_resized(
qxl->guest_primary.surface.width,
qxl->guest_primary.surface.height,
width,
height,
qxl->guest_primary.qxl_stride,
qxl->guest_primary.bytes_pp,
qxl->guest_primary.bits_pp);
@@ -120,15 +124,15 @@ static void qxl_render_update_area_unlocked(PCIQXLDevice *qxl)
pixman_format_code_t format =
qemu_default_pixman_format(qxl->guest_primary.bits_pp, true);
surface = qemu_create_displaysurface_from
(qxl->guest_primary.surface.width,
qxl->guest_primary.surface.height,
(width,
height,
format,
qxl->guest_primary.abs_stride,
qxl->guest_primary.data);
} else {
surface = qemu_create_displaysurface
(qxl->guest_primary.surface.width,
qxl->guest_primary.surface.height);
(width,
height);
}
dpy_gfx_replace_surface(vga->con, surface);
}
@@ -144,8 +148,8 @@ static void qxl_render_update_area_unlocked(PCIQXLDevice *qxl)
qxl->dirty[i].top < 0 ||
qxl->dirty[i].left > qxl->dirty[i].right ||
qxl->dirty[i].top > qxl->dirty[i].bottom ||
qxl->dirty[i].right > qxl->guest_primary.surface.width ||
qxl->dirty[i].bottom > qxl->guest_primary.surface.height) {
qxl->dirty[i].right > width ||
qxl->dirty[i].bottom > height) {
continue;
}
qxl_blit(qxl, qxl->dirty+i);
@@ -211,11 +215,18 @@ static QEMUCursor *qxl_cursor(PCIQXLDevice *qxl, QXLCursor *cursor)
size_t size;
c = cursor_alloc(cursor->header.width, cursor->header.height);
if (!c) {
qxl_set_guest_bug(qxl, "%s: cursor %ux%u alloc error", __func__,
cursor->header.width, cursor->header.height);
goto fail;
}
c->hot_x = cursor->header.hot_spot_x;
c->hot_y = cursor->header.hot_spot_y;
switch (cursor->header.type) {
case SPICE_CURSOR_TYPE_ALPHA:
size = sizeof(uint32_t) * cursor->header.width * cursor->header.height;
size = sizeof(uint32_t) * c->width * c->height;
memcpy(c->data, cursor->chunk.data, size);
if (qxl->debug > 2) {
cursor_print_ascii_art(c, "qxl/alpha");
@@ -245,7 +256,8 @@ fail:
/* called from spice server thread context only */
int qxl_render_cursor(PCIQXLDevice *qxl, QXLCommandExt *ext)
{
QXLCursorCmd *cmd = qxl_phys2virt(qxl, ext->cmd.data, ext->group_id);
QXLCursorCmd *cmd = qxl_phys2virt(qxl, ext->cmd.data, ext->group_id,
sizeof(QXLCursorCmd));
QXLCursor *cursor;
QEMUCursor *c;
@@ -264,7 +276,15 @@ int qxl_render_cursor(PCIQXLDevice *qxl, QXLCommandExt *ext)
}
switch (cmd->type) {
case QXL_CURSOR_SET:
cursor = qxl_phys2virt(qxl, cmd->u.set.shape, ext->group_id);
/* First read the QXLCursor to get QXLDataChunk::data_size ... */
cursor = qxl_phys2virt(qxl, cmd->u.set.shape, ext->group_id,
sizeof(QXLCursor));
if (!cursor) {
return 1;
}
/* Then read including the chunked data following QXLCursor. */
cursor = qxl_phys2virt(qxl, cmd->u.set.shape, ext->group_id,
sizeof(QXLCursor) + cursor->chunk.data_size);
if (!cursor) {
return 1;
}

View File

@@ -257,6 +257,8 @@ static void qxl_spice_destroy_surfaces(PCIQXLDevice *qxl, qxl_async_io async)
static void qxl_spice_monitors_config_async(PCIQXLDevice *qxl, int replay)
{
QXLMonitorsConfig *cfg;
trace_qxl_spice_monitors_config(qxl->id);
if (replay) {
/*
@@ -284,6 +286,17 @@ static void qxl_spice_monitors_config_async(PCIQXLDevice *qxl, int replay)
(uintptr_t)qxl_cookie_new(QXL_COOKIE_TYPE_IO,
QXL_IO_MONITORS_CONFIG_ASYNC));
}
cfg = qxl_phys2virt(qxl, qxl->guest_monitors_config, MEMSLOT_GROUP_GUEST,
sizeof(QXLMonitorsConfig));
if (cfg->count == 1) {
qxl->guest_primary.resized = 1;
qxl->guest_head0_width = cfg->heads[0].width;
qxl->guest_head0_height = cfg->heads[0].height;
} else {
qxl->guest_head0_width = 0;
qxl->guest_head0_height = 0;
}
}
void qxl_spice_reset_image_cache(PCIQXLDevice *qxl)
@@ -434,7 +447,8 @@ static int qxl_track_command(PCIQXLDevice *qxl, struct QXLCommandExt *ext)
switch (le32_to_cpu(ext->cmd.type)) {
case QXL_CMD_SURFACE:
{
QXLSurfaceCmd *cmd = qxl_phys2virt(qxl, ext->cmd.data, ext->group_id);
QXLSurfaceCmd *cmd = qxl_phys2virt(qxl, ext->cmd.data, ext->group_id,
sizeof(QXLSurfaceCmd));
if (!cmd) {
return 1;
@@ -468,7 +482,8 @@ static int qxl_track_command(PCIQXLDevice *qxl, struct QXLCommandExt *ext)
}
case QXL_CMD_CURSOR:
{
QXLCursorCmd *cmd = qxl_phys2virt(qxl, ext->cmd.data, ext->group_id);
QXLCursorCmd *cmd = qxl_phys2virt(qxl, ext->cmd.data, ext->group_id,
sizeof(QXLCursorCmd));
if (!cmd) {
return 1;
@@ -649,7 +664,8 @@ static int interface_get_command(QXLInstance *sin, struct QXLCommandExt *ext)
*
* https://cgit.freedesktop.org/spice/win32/qxl-wddm-dod/commit/?id=f6e099db39e7d0787f294d5fd0dce328b5210faa
*/
void *msg = qxl_phys2virt(qxl, ext->cmd.data, ext->group_id);
void *msg = qxl_phys2virt(qxl, ext->cmd.data, ext->group_id,
sizeof(QXLSurfaceCmd));
if (msg != NULL && (
msg < (void *)qxl->vga.vram_ptr ||
msg > ((void *)qxl->vga.vram_ptr + qxl->vga.vram_size))) {
@@ -741,6 +757,9 @@ static void interface_release_resource(QXLInstance *sin,
QXLReleaseRing *ring;
uint64_t *item, id;
if (!ext.info) {
return;
}
if (ext.group_id == MEMSLOT_GROUP_HOST) {
/* host group -> vga mode update request */
QXLCommandExt *cmdext = (void *)(intptr_t)(ext.info->id);
@@ -1377,6 +1396,7 @@ static int qxl_add_memslot(PCIQXLDevice *d, uint32_t slot_id, uint64_t delta,
qxl_set_guest_bug(d, "%s: pci_region = %d", __func__, pci_region);
return 1;
}
assert(guest_end - pci_start <= memory_region_size(mr));
virt_start = (intptr_t)memory_region_get_ram_ptr(mr);
memslot.slot_id = slot_id;
@@ -1417,11 +1437,13 @@ static void qxl_reset_surfaces(PCIQXLDevice *d)
/* can be also called from spice server thread context */
static bool qxl_get_check_slot_offset(PCIQXLDevice *qxl, QXLPHYSICAL pqxl,
uint32_t *s, uint64_t *o)
uint32_t *s, uint64_t *o,
size_t size_requested)
{
uint64_t phys = le64_to_cpu(pqxl);
uint32_t slot = (phys >> (64 - 8)) & 0xff;
uint64_t offset = phys & 0xffffffffffff;
uint64_t size_available;
if (slot >= NUM_MEMSLOTS) {
qxl_set_guest_bug(qxl, "slot too large %d >= %d", slot,
@@ -1445,6 +1467,23 @@ static bool qxl_get_check_slot_offset(PCIQXLDevice *qxl, QXLPHYSICAL pqxl,
slot, offset, qxl->guest_slots[slot].size);
return false;
}
size_available = memory_region_size(qxl->guest_slots[slot].mr);
if (qxl->guest_slots[slot].offset + offset >= size_available) {
qxl_set_guest_bug(qxl,
"slot %d offset %"PRIu64" > region size %"PRIu64"\n",
slot, qxl->guest_slots[slot].offset + offset,
size_available);
return false;
}
size_available -= qxl->guest_slots[slot].offset + offset;
if (size_requested > size_available) {
qxl_set_guest_bug(qxl,
"slot %d offset %"PRIu64" size %zu: "
"overrun by %"PRIu64" bytes\n",
slot, offset, size_requested,
size_requested - size_available);
return false;
}
*s = slot;
*o = offset;
@@ -1452,7 +1491,8 @@ static bool qxl_get_check_slot_offset(PCIQXLDevice *qxl, QXLPHYSICAL pqxl,
}
/* can be also called from spice server thread context */
void *qxl_phys2virt(PCIQXLDevice *qxl, QXLPHYSICAL pqxl, int group_id)
void *qxl_phys2virt(PCIQXLDevice *qxl, QXLPHYSICAL pqxl, int group_id,
size_t size)
{
uint64_t offset;
uint32_t slot;
@@ -1463,7 +1503,7 @@ void *qxl_phys2virt(PCIQXLDevice *qxl, QXLPHYSICAL pqxl, int group_id)
offset = le64_to_cpu(pqxl) & 0xffffffffffff;
return (void *)(intptr_t)offset;
case MEMSLOT_GROUP_GUEST:
if (!qxl_get_check_slot_offset(qxl, pqxl, &slot, &offset)) {
if (!qxl_get_check_slot_offset(qxl, pqxl, &slot, &offset, size)) {
return NULL;
}
ptr = memory_region_get_ram_ptr(qxl->guest_slots[slot].mr);
@@ -1891,9 +1931,9 @@ static void qxl_dirty_one_surface(PCIQXLDevice *qxl, QXLPHYSICAL pqxl,
uint32_t slot;
bool rc;
rc = qxl_get_check_slot_offset(qxl, pqxl, &slot, &offset);
assert(rc == true);
size = (uint64_t)height * abs(stride);
rc = qxl_get_check_slot_offset(qxl, pqxl, &slot, &offset, size);
assert(rc == true);
trace_qxl_surfaces_dirty(qxl->id, offset, size);
qxl_set_dirty(qxl->guest_slots[slot].mr,
qxl->guest_slots[slot].offset + offset,
@@ -1922,7 +1962,7 @@ static void qxl_dirty_surfaces(PCIQXLDevice *qxl)
}
cmd = qxl_phys2virt(qxl, qxl->guest_surfaces.cmds[i],
MEMSLOT_GROUP_GUEST);
MEMSLOT_GROUP_GUEST, sizeof(QXLSurfaceCmd));
assert(cmd);
assert(cmd->type == QXL_SURFACE_CMD_CREATE);
qxl_dirty_one_surface(qxl, cmd->u.surface_create.data,

View File

@@ -80,6 +80,8 @@ typedef struct PCIQXLDevice {
QXLPHYSICAL guest_cursor;
QXLPHYSICAL guest_monitors_config;
uint32_t guest_head0_width;
uint32_t guest_head0_height;
QemuMutex track_lock;
@@ -146,7 +148,28 @@ typedef struct PCIQXLDevice {
#define QXL_DEFAULT_REVISION QXL_REVISION_STABLE_V12
/* qxl.c */
void *qxl_phys2virt(PCIQXLDevice *qxl, QXLPHYSICAL phys, int group_id);
/**
* qxl_phys2virt: Get a pointer within a PCI VRAM memory region.
*
* @qxl: QXL device
* @phys: physical offset of buffer within the VRAM
* @group_id: memory slot group
* @size: size of the buffer
*
* Returns a host pointer to a buffer placed at offset @phys within the
* active slot @group_id of the PCI VGA RAM memory region associated with
* the @qxl device. If the slot is inactive, or the offset + size are out
* of the memory region, returns NULL.
*
* Use with care; by the time this function returns, the returned pointer is
* not protected by RCU anymore. If the caller is not within an RCU critical
* section and does not hold the iothread lock, it must have other means of
* protecting the pointer, such as a reference to the region that includes
* the incoming ram_addr_t.
*
*/
void *qxl_phys2virt(PCIQXLDevice *qxl, QXLPHYSICAL phys, int group_id,
size_t size);
void qxl_set_guest_bug(PCIQXLDevice *qxl, const char *msg, ...)
GCC_FMT_ATTR(2, 3);

View File

@@ -2,6 +2,7 @@
* QEMU SM501 Device
*
* Copyright (c) 2008 Shin-ichiro KAWASAKI
* Copyright (c) 2016 BALATON Zoltan
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
@@ -23,6 +24,7 @@
*/
#include "qemu/osdep.h"
#include "qemu/log.h"
#include "qapi/error.h"
#include "qemu-common.h"
#include "cpu.h"
@@ -34,14 +36,18 @@
#include "qemu/range.h"
#include "ui/pixel_ops.h"
#include "exec/address-spaces.h"
#include "qemu/bswap.h"
/*
* Status: 2010/05/07
* - Minimum implementation for Linux console : mmio regs and CRT layer.
* - 2D grapihcs acceleration partially supported : only fill rectangle.
*
* TODO:
* Status: 2016/12/04
* - Misc fixes: endianness, hardware cursor
* - Panel support
*
* TODO:
* - Touch panel support
* - USB support
* - UART support
@@ -548,6 +554,29 @@ static uint32_t get_local_mem_size_index(uint32_t size)
return index;
}
static inline int get_width(SM501State *s, int crt)
{
int width = crt ? s->dc_crt_h_total : s->dc_panel_h_total;
return (width & 0x00000FFF) + 1;
}
static inline int get_height(SM501State *s, int crt)
{
int height = crt ? s->dc_crt_v_total : s->dc_panel_v_total;
return (height & 0x00000FFF) + 1;
}
static inline int get_bpp(SM501State *s, int crt)
{
int bpp = crt ? s->dc_crt_control : s->dc_panel_control;
return 1 << (bpp & 3);
}
static ram_addr_t get_fb_addr(SM501State *s, int crt)
{
return (crt ? s->dc_crt_fb_addr : s->dc_panel_fb_addr) & 0x3FFFFF0;
}
/**
* Check the availability of hardware cursor.
* @param crt 0 for PANEL, 1 for CRT.
@@ -562,10 +591,10 @@ static inline int is_hwc_enabled(SM501State *state, int crt)
* Get the address which holds cursor pattern data.
* @param crt 0 for PANEL, 1 for CRT.
*/
static inline uint32_t get_hwc_address(SM501State *state, int crt)
static inline uint8_t *get_hwc_address(SM501State *state, int crt)
{
uint32_t addr = crt ? state->dc_crt_hwc_addr : state->dc_panel_hwc_addr;
return (addr & 0x03FFFFF0)/* >> 4*/;
return state->local_mem + (addr & 0x03FFFFF0);
}
/**
@@ -591,142 +620,218 @@ static inline uint32_t get_hwc_x(SM501State *state, int crt)
}
/**
* Get the cursor position in x coordinate.
* Get the hardware cursor palette.
* @param crt 0 for PANEL, 1 for CRT.
* @param index 0, 1, 2 or 3 which specifies color of corsor dot.
* @param palette pointer to a [3 * 3] array to store color values in
*/
static inline uint16_t get_hwc_color(SM501State *state, int crt, int index)
static inline void get_hwc_palette(SM501State *state, int crt, uint8_t *palette)
{
uint32_t color_reg = 0;
uint16_t color_565 = 0;
int i;
uint32_t color_reg;
uint16_t rgb565;
if (index == 0) {
return 0;
}
for (i = 0; i < 3; i++) {
if (i + 1 == 3) {
color_reg = crt ? state->dc_crt_hwc_color_3
: state->dc_panel_hwc_color_3;
} else {
color_reg = crt ? state->dc_crt_hwc_color_1_2
: state->dc_panel_hwc_color_1_2;
}
switch (index) {
case 1:
case 2:
color_reg = crt ? state->dc_crt_hwc_color_1_2
: state->dc_panel_hwc_color_1_2;
break;
case 3:
color_reg = crt ? state->dc_crt_hwc_color_3
: state->dc_panel_hwc_color_3;
break;
default:
printf("invalid hw cursor color.\n");
abort();
if (i + 1 == 2) {
rgb565 = (color_reg >> 16) & 0xFFFF;
} else {
rgb565 = color_reg & 0xFFFF;
}
palette[i * 3 + 0] = (rgb565 << 3) & 0xf8; /* red */
palette[i * 3 + 1] = (rgb565 >> 3) & 0xfc; /* green */
palette[i * 3 + 2] = (rgb565 >> 8) & 0xf8; /* blue */
}
switch (index) {
case 1:
case 3:
color_565 = (uint16_t)(color_reg & 0xFFFF);
break;
case 2:
color_565 = (uint16_t)((color_reg >> 16) & 0xFFFF);
break;
}
return color_565;
}
static int within_hwc_y_range(SM501State *state, int y, int crt)
static inline void hwc_invalidate(SM501State *s, int crt)
{
int hwc_y = get_hwc_y(state, crt);
return (hwc_y <= y && y < hwc_y + SM501_HWC_HEIGHT);
int w = get_width(s, crt);
int h = get_height(s, crt);
int bpp = get_bpp(s, crt);
int start = get_hwc_y(s, crt);
int end = MIN(h, start + SM501_HWC_HEIGHT) + 1;
start *= w * bpp;
end *= w * bpp;
memory_region_set_dirty(&s->local_mem_region, start, end - start);
}
static void sm501_2d_operation(SM501State * s)
{
/* obtain operation parameters */
int operation = (s->twoD_control >> 16) & 0x1f;
int rtl = s->twoD_control & 0x8000000;
int src_x = (s->twoD_source >> 16) & 0x01FFF;
int src_y = s->twoD_source & 0xFFFF;
int dst_x = (s->twoD_destination >> 16) & 0x01FFF;
int dst_y = s->twoD_destination & 0xFFFF;
int operation_width = (s->twoD_dimension >> 16) & 0x1FFF;
int operation_height = s->twoD_dimension & 0xFFFF;
uint32_t color = s->twoD_foreground;
int format_flags = (s->twoD_stretch >> 20) & 0x3;
int addressing = (s->twoD_stretch >> 16) & 0xF;
int cmd = (s->twoD_control >> 16) & 0x1F;
int rtl = s->twoD_control & BIT(27);
int format = (s->twoD_stretch >> 20) & 0x3;
int rop_mode = (s->twoD_control >> 15) & 0x1; /* 1 for rop2, else rop3 */
/* 1 if rop2 source is the pattern, otherwise the source is the bitmap */
int rop2_source_is_pattern = (s->twoD_control >> 14) & 0x1;
int rop = s->twoD_control & 0xFF;
unsigned int dst_x = (s->twoD_destination >> 16) & 0x01FFF;
unsigned int dst_y = s->twoD_destination & 0xFFFF;
unsigned int width = (s->twoD_dimension >> 16) & 0x1FFF;
unsigned int height = s->twoD_dimension & 0xFFFF;
uint32_t dst_base = s->twoD_destination_base & 0x03FFFFFF;
unsigned int dst_pitch = (s->twoD_pitch >> 16) & 0x1FFF;
int crt = (s->dc_crt_control & SM501_DC_CRT_CONTROL_SEL) ? 1 : 0;
int fb_len = get_width(s, crt) * get_height(s, crt) * get_bpp(s, crt);
/* get frame buffer info */
uint8_t * src = s->local_mem + (s->twoD_source_base & 0x03FFFFFF);
uint8_t * dst = s->local_mem + (s->twoD_destination_base & 0x03FFFFFF);
int src_width = (s->dc_crt_h_total & 0x00000FFF) + 1;
int dst_width = (s->dc_crt_h_total & 0x00000FFF) + 1;
if (addressing != 0x0) {
printf("%s: only XY addressing is supported.\n", __func__);
abort();
if ((s->twoD_stretch >> 16) & 0xF) {
qemu_log_mask(LOG_UNIMP, "sm501: only XY addressing is supported.\n");
return;
}
if ((s->twoD_source_base & 0x08000000) ||
(s->twoD_destination_base & 0x08000000)) {
printf("%s: only local memory is supported.\n", __func__);
abort();
if (s->twoD_source_base & BIT(27) || s->twoD_destination_base & BIT(27)) {
qemu_log_mask(LOG_UNIMP, "sm501: only local memory is supported.\n");
return;
}
switch (operation) {
case 0x00: /* copy area */
#define COPY_AREA(_bpp, _pixel_type, rtl) { \
int y, x, index_d, index_s; \
for (y = 0; y < operation_height; y++) { \
for (x = 0; x < operation_width; x++) { \
if (rtl) { \
index_s = ((src_y - y) * src_width + src_x - x) * _bpp; \
index_d = ((dst_y - y) * dst_width + dst_x - x) * _bpp; \
} else { \
index_s = ((src_y + y) * src_width + src_x + x) * _bpp; \
index_d = ((dst_y + y) * dst_width + dst_x + x) * _bpp; \
} \
*(_pixel_type*)&dst[index_d] = *(_pixel_type*)&src[index_s];\
} \
} \
if (!dst_pitch) {
qemu_log_mask(LOG_GUEST_ERROR, "sm501: Zero dest pitch.\n");
return;
}
switch (format_flags) {
case 0:
COPY_AREA(1, uint8_t, rtl);
break;
case 1:
COPY_AREA(2, uint16_t, rtl);
break;
case 2:
COPY_AREA(4, uint32_t, rtl);
break;
if (!width || !height) {
qemu_log_mask(LOG_GUEST_ERROR, "sm501: Zero size 2D op.\n");
return;
}
if (rtl) {
dst_x -= width - 1;
dst_y -= height - 1;
}
if (dst_base >= get_local_mem_size(s) || dst_base +
(dst_x + width + (dst_y + height) * (dst_pitch + width)) *
(1 << format) >= get_local_mem_size(s)) {
qemu_log_mask(LOG_GUEST_ERROR, "sm501: 2D op dest is outside vram.\n");
return;
}
switch (cmd) {
case 0: /* BitBlt */
{
unsigned int src_x = (s->twoD_source >> 16) & 0x01FFF;
unsigned int src_y = s->twoD_source & 0xFFFF;
uint32_t src_base = s->twoD_source_base & 0x03FFFFFF;
unsigned int src_pitch = s->twoD_pitch & 0x1FFF;
if (!src_pitch) {
qemu_log_mask(LOG_GUEST_ERROR, "sm501: Zero src pitch.\n");
return;
}
if (rtl) {
src_x -= width - 1;
src_y -= height - 1;
}
if (src_base >= get_local_mem_size(s) || src_base +
(src_x + width + (src_y + height) * (src_pitch + width)) *
(1 << format) >= get_local_mem_size(s)) {
qemu_log_mask(LOG_GUEST_ERROR,
"sm501: 2D op src is outside vram.\n");
return;
}
if ((rop_mode && rop == 0x5) || (!rop_mode && rop == 0x55)) {
/* Invert dest, is there a way to do this with pixman? */
unsigned int x, y, i;
uint8_t *d = s->local_mem + dst_base;
for (y = 0; y < height; y++) {
i = (dst_x + (dst_y + y) * dst_pitch) * (1 << format);
for (x = 0; x < width; x++, i += (1 << format)) {
switch (format) {
case 0:
d[i] = ~d[i];
break;
case 1:
*(uint16_t *)&d[i] = ~*(uint16_t *)&d[i];
break;
case 2:
*(uint32_t *)&d[i] = ~*(uint32_t *)&d[i];
break;
}
}
}
} else {
/* Do copy src for unimplemented ops, better than unpainted area */
if ((rop_mode && (rop != 0xc || rop2_source_is_pattern)) ||
(!rop_mode && rop != 0xcc)) {
qemu_log_mask(LOG_UNIMP,
"sm501: rop%d op %x%s not implemented\n",
(rop_mode ? 2 : 3), rop,
(rop2_source_is_pattern ?
" with pattern source" : ""));
}
/* Check for overlaps, this could be made more exact */
uint32_t sb, se, db, de;
sb = src_base + src_x + src_y * (width + src_pitch);
se = sb + width + height * (width + src_pitch);
db = dst_base + dst_x + dst_y * (width + dst_pitch);
de = db + width + height * (width + dst_pitch);
if (rtl && ((db >= sb && db <= se) || (de >= sb && de <= se))) {
/* regions may overlap: copy via temporary */
int llb = width * (1 << format);
int tmp_stride = DIV_ROUND_UP(llb, sizeof(uint32_t));
uint32_t *tmp = g_malloc(tmp_stride * sizeof(uint32_t) *
height);
pixman_blt((uint32_t *)&s->local_mem[src_base], tmp,
src_pitch * (1 << format) / sizeof(uint32_t),
tmp_stride, 8 * (1 << format), 8 * (1 << format),
src_x, src_y, 0, 0, width, height);
pixman_blt(tmp, (uint32_t *)&s->local_mem[dst_base],
tmp_stride,
dst_pitch * (1 << format) / sizeof(uint32_t),
8 * (1 << format), 8 * (1 << format),
0, 0, dst_x, dst_y, width, height);
g_free(tmp);
} else {
pixman_blt((uint32_t *)&s->local_mem[src_base],
(uint32_t *)&s->local_mem[dst_base],
src_pitch * (1 << format) / sizeof(uint32_t),
dst_pitch * (1 << format) / sizeof(uint32_t),
8 * (1 << format), 8 * (1 << format),
src_x, src_y, dst_x, dst_y, width, height);
}
}
break;
case 0x01: /* fill rectangle */
#define FILL_RECT(_bpp, _pixel_type) { \
int y, x; \
for (y = 0; y < operation_height; y++) { \
for (x = 0; x < operation_width; x++) { \
int index = ((dst_y + y) * dst_width + dst_x + x) * _bpp; \
*(_pixel_type*)&dst[index] = (_pixel_type)color; \
} \
} \
}
case 1: /* Rectangle Fill */
{
uint32_t color = s->twoD_foreground;
switch (format_flags) {
case 0:
FILL_RECT(1, uint8_t);
break;
case 1:
FILL_RECT(2, uint16_t);
break;
case 2:
FILL_RECT(4, uint32_t);
break;
if (format == 2) {
color = cpu_to_le32(color);
} else if (format == 1) {
color = cpu_to_le16(color);
}
break;
pixman_fill((uint32_t *)&s->local_mem[dst_base],
dst_pitch * (1 << format) / sizeof(uint32_t),
8 * (1 << format), dst_x, dst_y, width, height, color);
break;
}
default:
printf("non-implemented SM501 2D operation. %d\n", operation);
abort();
break;
qemu_log_mask(LOG_UNIMP, "sm501: not implemented 2D operation: %d\n",
cmd);
return;
}
if (dst_base >= get_fb_addr(s, crt) &&
dst_base <= get_fb_addr(s, crt) + fb_len) {
int dst_len = MIN(fb_len, ((dst_y + height - 1) * dst_pitch +
dst_x + width) * (1 << format));
if (dst_len) {
memory_region_set_dirty(&s->local_mem_region, dst_base, dst_len);
}
}
}
@@ -756,6 +861,8 @@ static uint64_t sm501_system_config_read(void *opaque, hwaddr addr,
case SM501_DRAM_CONTROL:
ret = (s->dram_control & 0x07F107C0) | s->local_mem_size_index << 13;
break;
case SM501_COMMAND_LIST_STATUS:
ret = 0x00180002; /* FIFOs are empty, everything idle */
case SM501_IRQ_MASK:
ret = s->irq_mask;
break;
@@ -773,11 +880,13 @@ static uint64_t sm501_system_config_read(void *opaque, hwaddr addr,
case SM501_POWER_MODE_CONTROL:
ret = s->power_mode_control;
break;
case SM501_ENDIAN_CONTROL:
ret = 0; /* Only default little endian mode is supported */
break;
default:
printf("sm501 system config : not implemented register read."
" addr=%x\n", (int)addr);
abort();
qemu_log_mask(LOG_UNIMP, "sm501: not implemented system config"
"register read. addr=%" HWADDR_PRIx "\n", addr);
}
return ret;
@@ -823,11 +932,17 @@ static void sm501_system_config_write(void *opaque, hwaddr addr,
case SM501_POWER_MODE_CONTROL:
s->power_mode_control = value & 0x00000003;
break;
case SM501_ENDIAN_CONTROL:
if (value & 0x00000001) {
printf("sm501 system config : big endian mode not implemented.\n");
abort();
}
break;
default:
printf("sm501 system config : not implemented register write."
" addr=%x, val=%x\n", (int)addr, (uint32_t)value);
abort();
qemu_log_mask(LOG_UNIMP, "sm501: not implemented system config"
"register write. addr=%" HWADDR_PRIx
", val=%" PRIx64 "\n", addr, value);
}
}
@@ -882,6 +997,9 @@ static uint64_t sm501_disp_ctrl_read(void *opaque, hwaddr addr,
case SM501_DC_PANEL_PANNING_CONTROL:
ret = s->dc_panel_panning_control;
break;
case SM501_DC_PANEL_COLOR_KEY:
/* Not implemented yet */
break;
case SM501_DC_PANEL_FB_ADDR:
ret = s->dc_panel_fb_addr;
break;
@@ -954,9 +1072,8 @@ static uint64_t sm501_disp_ctrl_read(void *opaque, hwaddr addr,
break;
default:
printf("sm501 disp ctrl : not implemented register read."
" addr=%x\n", (int)addr);
abort();
qemu_log_mask(LOG_UNIMP, "sm501: not implemented disp ctrl register "
"read. addr=%" HWADDR_PRIx "\n", addr);
}
return ret;
@@ -976,8 +1093,14 @@ static void sm501_disp_ctrl_write(void *opaque, hwaddr addr,
case SM501_DC_PANEL_PANNING_CONTROL:
s->dc_panel_panning_control = value & 0xFF3FFF3F;
break;
case SM501_DC_PANEL_COLOR_KEY:
/* Not implemented yet */
break;
case SM501_DC_PANEL_FB_ADDR:
s->dc_panel_fb_addr = value & 0x8FFFFFF0;
if (value & 0x8000000) {
qemu_log_mask(LOG_UNIMP, "Panel external memory not supported\n");
}
break;
case SM501_DC_PANEL_FB_OFFSET:
s->dc_panel_fb_offset = value & 0x3FF03FF0;
@@ -1009,11 +1132,19 @@ static void sm501_disp_ctrl_write(void *opaque, hwaddr addr,
break;
case SM501_DC_PANEL_HWC_ADDR:
s->dc_panel_hwc_addr = value & 0x8FFFFFF0;
break;
value &= 0x8FFFFFF0;
if (value != s->dc_panel_hwc_addr) {
hwc_invalidate(s, 0);
s->dc_panel_hwc_addr = value;
}
break;
case SM501_DC_PANEL_HWC_LOC:
s->dc_panel_hwc_location = value & 0x0FFF0FFF;
break;
value &= 0x0FFF0FFF;
if (value != s->dc_panel_hwc_location) {
hwc_invalidate(s, 0);
s->dc_panel_hwc_location = value;
}
break;
case SM501_DC_PANEL_HWC_COLOR_1_2:
s->dc_panel_hwc_color_1_2 = value;
break;
@@ -1026,6 +1157,9 @@ static void sm501_disp_ctrl_write(void *opaque, hwaddr addr,
break;
case SM501_DC_CRT_FB_ADDR:
s->dc_crt_fb_addr = value & 0x8FFFFFF0;
if (value & 0x8000000) {
qemu_log_mask(LOG_UNIMP, "CRT external memory not supported\n");
}
break;
case SM501_DC_CRT_FB_OFFSET:
s->dc_crt_fb_offset = value & 0x3FF03FF0;
@@ -1044,11 +1178,19 @@ static void sm501_disp_ctrl_write(void *opaque, hwaddr addr,
break;
case SM501_DC_CRT_HWC_ADDR:
s->dc_crt_hwc_addr = value & 0x8FFFFFF0;
break;
value &= 0x8FFFFFF0;
if (value != s->dc_crt_hwc_addr) {
hwc_invalidate(s, 1);
s->dc_crt_hwc_addr = value;
}
break;
case SM501_DC_CRT_HWC_LOC:
s->dc_crt_hwc_location = value & 0x0FFF0FFF;
break;
value &= 0x0FFF0FFF;
if (value != s->dc_crt_hwc_location) {
hwc_invalidate(s, 1);
s->dc_crt_hwc_location = value;
}
break;
case SM501_DC_CRT_HWC_COLOR_1_2:
s->dc_crt_hwc_color_1_2 = value;
break;
@@ -1061,9 +1203,9 @@ static void sm501_disp_ctrl_write(void *opaque, hwaddr addr,
break;
default:
printf("sm501 disp ctrl : not implemented register write."
" addr=%x, val=%x\n", (int)addr, (unsigned)value);
abort();
qemu_log_mask(LOG_UNIMP, "sm501: not implemented disp ctrl register "
"write. addr=%" HWADDR_PRIx
", val=%" PRIx64 "\n", addr, value);
}
}
@@ -1089,9 +1231,8 @@ static uint64_t sm501_2d_engine_read(void *opaque, hwaddr addr,
ret = s->twoD_source_base;
break;
default:
printf("sm501 disp ctrl : not implemented register read."
" addr=%x\n", (int)addr);
abort();
qemu_log_mask(LOG_UNIMP, "sm501: not implemented disp ctrl register "
"read. addr=%" HWADDR_PRIx "\n", addr);
}
return ret;
@@ -1149,9 +1290,9 @@ static void sm501_2d_engine_write(void *opaque, hwaddr addr,
s->twoD_destination_base = value;
break;
default:
printf("sm501 2d engine : not implemented register write."
" addr=%x, val=%x\n", (int)addr, (unsigned)value);
abort();
qemu_log_mask(LOG_UNIMP, "sm501: not implemented 2d engine register "
"write. addr=%" HWADDR_PRIx
", val=%" PRIx64 "\n", addr, value);
}
}
@@ -1170,8 +1311,9 @@ static const MemoryRegionOps sm501_2d_engine_ops = {
typedef void draw_line_func(uint8_t *d, const uint8_t *s,
int width, const uint32_t *pal);
typedef void draw_hwc_line_func(SM501State * s, int crt, uint8_t * palette,
int c_y, uint8_t *d, int width);
typedef void draw_hwc_line_func(uint8_t *d, const uint8_t *s,
int width, const uint8_t *palette,
int c_x, int c_y);
#define DEPTH 8
#include "sm501_template.h"
@@ -1256,19 +1398,16 @@ static inline int get_depth_index(DisplaySurface *surface)
}
}
static void sm501_draw_crt(SM501State * s)
static void sm501_update_display(void *opaque)
{
SM501State *s = (SM501State *)opaque;
DisplaySurface *surface = qemu_console_surface(s->con);
int y;
int width = (s->dc_crt_h_total & 0x00000FFF) + 1;
int height = (s->dc_crt_v_total & 0x00000FFF) + 1;
uint8_t * src = s->local_mem;
int src_bpp = 0;
int y, c_x = 0, c_y = 0;
int crt = (s->dc_crt_control & SM501_DC_CRT_CONTROL_SEL) ? 1 : 0;
int width = get_width(s, crt);
int height = get_height(s, crt);
int src_bpp = get_bpp(s, crt);
int dst_bpp = surface_bytes_per_pixel(surface);
uint32_t * palette = (uint32_t *)&s->dc_palette[SM501_DC_CRT_PALETTE
- SM501_DC_PANEL_PALETTE];
uint8_t hwc_palette[3 * 3];
int ds_depth_index = get_depth_index(surface);
draw_line_func * draw_line = NULL;
draw_hwc_line_func * draw_hwc_line = NULL;
@@ -1276,43 +1415,46 @@ static void sm501_draw_crt(SM501State * s)
int y_start = -1;
ram_addr_t page_min = ~0l;
ram_addr_t page_max = 0l;
ram_addr_t offset = 0;
ram_addr_t offset;
uint32_t *palette;
uint8_t hwc_palette[3 * 3];
uint8_t *hwc_src = NULL;
if (!((crt ? s->dc_crt_control : s->dc_panel_control)
& SM501_DC_CRT_CONTROL_ENABLE)) {
return;
}
palette = (uint32_t *)(crt ? &s->dc_palette[SM501_DC_CRT_PALETTE -
SM501_DC_PANEL_PALETTE]
: &s->dc_palette[0]);
/* choose draw_line function */
switch (s->dc_crt_control & 3) {
case SM501_DC_CRT_CONTROL_8BPP:
src_bpp = 1;
draw_line = draw_line8_funcs[ds_depth_index];
break;
case SM501_DC_CRT_CONTROL_16BPP:
src_bpp = 2;
draw_line = draw_line16_funcs[ds_depth_index];
break;
case SM501_DC_CRT_CONTROL_32BPP:
src_bpp = 4;
draw_line = draw_line32_funcs[ds_depth_index];
break;
switch (src_bpp) {
case 1:
draw_line = draw_line8_funcs[ds_depth_index];
break;
case 2:
draw_line = draw_line16_funcs[ds_depth_index];
break;
case 4:
draw_line = draw_line32_funcs[ds_depth_index];
break;
default:
printf("sm501 draw crt : invalid DC_CRT_CONTROL=%x.\n",
s->dc_crt_control);
abort();
qemu_log_mask(LOG_GUEST_ERROR, "sm501 draw crt"
"invalid DC_CRT_CONTROL=%x.\n",
s->dc_crt_control);
break;
}
/* set up to draw hardware cursor */
if (is_hwc_enabled(s, 1)) {
int i;
/* get cursor palette */
for (i = 0; i < 3; i++) {
uint16_t rgb565 = get_hwc_color(s, 1, i + 1);
hwc_palette[i * 3 + 0] = (rgb565 & 0xf800) >> 8; /* red */
hwc_palette[i * 3 + 1] = (rgb565 & 0x07e0) >> 3; /* green */
hwc_palette[i * 3 + 2] = (rgb565 & 0x001f) << 3; /* blue */
}
if (is_hwc_enabled(s, crt)) {
/* choose cursor draw line function */
draw_hwc_line = draw_hwc_line_funcs[ds_depth_index];
hwc_src = get_hwc_address(s, crt);
c_x = get_hwc_x(s, crt);
c_y = get_hwc_y(s, crt);
get_hwc_palette(s, crt, hwc_palette);
}
/* adjust console size */
@@ -1326,15 +1468,18 @@ static void sm501_draw_crt(SM501State * s)
/* draw each line according to conditions */
memory_region_sync_dirty_bitmap(&s->local_mem_region);
for (y = 0; y < height; y++) {
int update_hwc = draw_hwc_line ? within_hwc_y_range(s, y, 1) : 0;
int update = full_update || update_hwc;
offset = get_fb_addr(s, crt);
for (y = 0; y < height; y++, offset += width * src_bpp) {
int update, update_hwc;
ram_addr_t page0 = offset;
ram_addr_t page1 = offset + width * src_bpp - 1;
/* check dirty flags for each line */
update = memory_region_get_dirty(&s->local_mem_region, page0,
page1 - page0, DIRTY_MEMORY_VGA);
/* check if hardware cursor is enabled and we're within its range */
update_hwc = draw_hwc_line && c_y <= y && y < c_y + SM501_HWC_HEIGHT;
update = full_update || update_hwc;
/* check dirty flags for each line */
update |= memory_region_get_dirty(&s->local_mem_region, page0,
page1 - page0, DIRTY_MEMORY_VGA);
/* draw line and change status */
if (update) {
@@ -1342,11 +1487,11 @@ static void sm501_draw_crt(SM501State * s)
d += y * width * dst_bpp;
/* draw graphics layer */
draw_line(d, src, width, palette);
draw_line(d, s->local_mem + offset, width, palette);
/* draw haredware cursor */
if (update_hwc) {
draw_hwc_line(s, 1, hwc_palette, y - get_hwc_y(s, 1), d, width);
draw_hwc_line(d, hwc_src, width, hwc_palette, c_x, y - c_y);
}
if (y_start < 0)
@@ -1362,9 +1507,6 @@ static void sm501_draw_crt(SM501State * s)
y_start = -1;
}
}
src += width * src_bpp;
offset += width * src_bpp;
}
/* complete flush to display */
@@ -1379,14 +1521,6 @@ static void sm501_draw_crt(SM501State * s)
}
}
static void sm501_update_display(void *opaque)
{
SM501State * s = (SM501State *)opaque;
if (s->dc_crt_control & SM501_DC_CRT_CONTROL_ENABLE)
sm501_draw_crt(s);
}
static const GraphicHwOps sm501_ops = {
.gfx_update = sm501_update_display,
};

View File

@@ -99,29 +99,24 @@ static void glue(draw_line32_, PIXEL_NAME)(
/**
* Draw hardware cursor image on the given line.
*/
static void glue(draw_hwc_line_, PIXEL_NAME)(SM501State * s, int crt,
uint8_t * palette, int c_y, uint8_t *d, int width)
static void glue(draw_hwc_line_, PIXEL_NAME)(uint8_t *d, const uint8_t *s,
int width, const uint8_t *palette, int c_x, int c_y)
{
int x, i;
int i;
uint8_t bitset = 0;
/* get hardware cursor pattern */
uint32_t cursor_addr = get_hwc_address(s, crt);
assert(0 <= c_y && c_y < SM501_HWC_HEIGHT);
cursor_addr += 64 * c_y / 4; /* 4 pixels per byte */
cursor_addr += s->base;
/* get cursor position */
x = get_hwc_x(s, crt);
d += x * BPP;
assert(0 <= c_y && c_y < SM501_HWC_HEIGHT);
s += SM501_HWC_WIDTH * c_y / 4; /* 4 pixels per byte */
d += c_x * BPP;
for (i = 0; i < SM501_HWC_WIDTH && x + i < width; i++) {
for (i = 0; i < SM501_HWC_WIDTH && c_x + i < width; i++) {
uint8_t v;
/* get pixel value */
if (i % 4 == 0) {
bitset = ldub_phys(&address_space_memory, cursor_addr);
cursor_addr++;
bitset = ldub_p(s);
s++;
}
v = bitset & 3;
bitset >>= 2;

View File

@@ -95,20 +95,46 @@ static void vga_draw_glyph9(uint8_t *d, int linesize,
} while (--h);
}
static inline uint8_t vga_read_byte(VGACommonState *vga, uint32_t addr)
{
return vga->vram_ptr[addr & vga->vbe_size_mask];
}
static inline uint16_t vga_read_word_le(VGACommonState *vga, uint32_t addr)
{
uint32_t offset = addr & vga->vbe_size_mask & ~1;
uint16_t *ptr = (uint16_t *)(vga->vram_ptr + offset);
return lduw_le_p(ptr);
}
static inline uint16_t vga_read_word_be(VGACommonState *vga, uint32_t addr)
{
uint32_t offset = addr & vga->vbe_size_mask & ~1;
uint16_t *ptr = (uint16_t *)(vga->vram_ptr + offset);
return lduw_be_p(ptr);
}
static inline uint32_t vga_read_dword_le(VGACommonState *vga, uint32_t addr)
{
uint32_t offset = addr & vga->vbe_size_mask & ~3;
uint32_t *ptr = (uint32_t *)(vga->vram_ptr + offset);
return ldl_le_p(ptr);
}
/*
* 4 color mode
*/
static void vga_draw_line2(VGACommonState *s1, uint8_t *d,
const uint8_t *s, int width)
static void vga_draw_line2(VGACommonState *vga, uint8_t *d,
uint32_t addr, int width)
{
uint32_t plane_mask, *palette, data, v;
int x;
palette = s1->last_palette;
plane_mask = mask16[s1->ar[VGA_ATC_PLANE_ENABLE] & 0xf];
palette = vga->last_palette;
plane_mask = mask16[vga->ar[VGA_ATC_PLANE_ENABLE] & 0xf];
width >>= 3;
for(x = 0; x < width; x++) {
data = ((uint32_t *)s)[0];
data = vga_read_dword_le(vga, addr);
data &= plane_mask;
v = expand2[GET_PLANE(data, 0)];
v |= expand2[GET_PLANE(data, 2)] << 2;
@@ -124,7 +150,7 @@ static void vga_draw_line2(VGACommonState *s1, uint8_t *d,
((uint32_t *)d)[6] = palette[(v >> 4) & 0xf];
((uint32_t *)d)[7] = palette[(v >> 0) & 0xf];
d += 32;
s += 4;
addr += 4;
}
}
@@ -134,17 +160,17 @@ static void vga_draw_line2(VGACommonState *s1, uint8_t *d,
/*
* 4 color mode, dup2 horizontal
*/
static void vga_draw_line2d2(VGACommonState *s1, uint8_t *d,
const uint8_t *s, int width)
static void vga_draw_line2d2(VGACommonState *vga, uint8_t *d,
uint32_t addr, int width)
{
uint32_t plane_mask, *palette, data, v;
int x;
palette = s1->last_palette;
plane_mask = mask16[s1->ar[VGA_ATC_PLANE_ENABLE] & 0xf];
palette = vga->last_palette;
plane_mask = mask16[vga->ar[VGA_ATC_PLANE_ENABLE] & 0xf];
width >>= 3;
for(x = 0; x < width; x++) {
data = ((uint32_t *)s)[0];
data = vga_read_dword_le(vga, addr);
data &= plane_mask;
v = expand2[GET_PLANE(data, 0)];
v |= expand2[GET_PLANE(data, 2)] << 2;
@@ -160,24 +186,24 @@ static void vga_draw_line2d2(VGACommonState *s1, uint8_t *d,
PUT_PIXEL2(d, 6, palette[(v >> 4) & 0xf]);
PUT_PIXEL2(d, 7, palette[(v >> 0) & 0xf]);
d += 64;
s += 4;
addr += 4;
}
}
/*
* 16 color mode
*/
static void vga_draw_line4(VGACommonState *s1, uint8_t *d,
const uint8_t *s, int width)
static void vga_draw_line4(VGACommonState *vga, uint8_t *d,
uint32_t addr, int width)
{
uint32_t plane_mask, data, v, *palette;
int x;
palette = s1->last_palette;
plane_mask = mask16[s1->ar[VGA_ATC_PLANE_ENABLE] & 0xf];
palette = vga->last_palette;
plane_mask = mask16[vga->ar[VGA_ATC_PLANE_ENABLE] & 0xf];
width >>= 3;
for(x = 0; x < width; x++) {
data = ((uint32_t *)s)[0];
data = vga_read_dword_le(vga, addr);
data &= plane_mask;
v = expand4[GET_PLANE(data, 0)];
v |= expand4[GET_PLANE(data, 1)] << 1;
@@ -192,24 +218,24 @@ static void vga_draw_line4(VGACommonState *s1, uint8_t *d,
((uint32_t *)d)[6] = palette[(v >> 4) & 0xf];
((uint32_t *)d)[7] = palette[(v >> 0) & 0xf];
d += 32;
s += 4;
addr += 4;
}
}
/*
* 16 color mode, dup2 horizontal
*/
static void vga_draw_line4d2(VGACommonState *s1, uint8_t *d,
const uint8_t *s, int width)
static void vga_draw_line4d2(VGACommonState *vga, uint8_t *d,
uint32_t addr, int width)
{
uint32_t plane_mask, data, v, *palette;
int x;
palette = s1->last_palette;
plane_mask = mask16[s1->ar[VGA_ATC_PLANE_ENABLE] & 0xf];
palette = vga->last_palette;
plane_mask = mask16[vga->ar[VGA_ATC_PLANE_ENABLE] & 0xf];
width >>= 3;
for(x = 0; x < width; x++) {
data = ((uint32_t *)s)[0];
data = vga_read_dword_le(vga, addr);
data &= plane_mask;
v = expand4[GET_PLANE(data, 0)];
v |= expand4[GET_PLANE(data, 1)] << 1;
@@ -224,7 +250,7 @@ static void vga_draw_line4d2(VGACommonState *s1, uint8_t *d,
PUT_PIXEL2(d, 6, palette[(v >> 4) & 0xf]);
PUT_PIXEL2(d, 7, palette[(v >> 0) & 0xf]);
d += 64;
s += 4;
addr += 4;
}
}
@@ -233,21 +259,21 @@ static void vga_draw_line4d2(VGACommonState *s1, uint8_t *d,
*
* XXX: add plane_mask support (never used in standard VGA modes)
*/
static void vga_draw_line8d2(VGACommonState *s1, uint8_t *d,
const uint8_t *s, int width)
static void vga_draw_line8d2(VGACommonState *vga, uint8_t *d,
uint32_t addr, int width)
{
uint32_t *palette;
int x;
palette = s1->last_palette;
palette = vga->last_palette;
width >>= 3;
for(x = 0; x < width; x++) {
PUT_PIXEL2(d, 0, palette[s[0]]);
PUT_PIXEL2(d, 1, palette[s[1]]);
PUT_PIXEL2(d, 2, palette[s[2]]);
PUT_PIXEL2(d, 3, palette[s[3]]);
PUT_PIXEL2(d, 0, palette[vga_read_byte(vga, addr + 0)]);
PUT_PIXEL2(d, 1, palette[vga_read_byte(vga, addr + 1)]);
PUT_PIXEL2(d, 2, palette[vga_read_byte(vga, addr + 2)]);
PUT_PIXEL2(d, 3, palette[vga_read_byte(vga, addr + 3)]);
d += 32;
s += 4;
addr += 4;
}
}
@@ -256,63 +282,63 @@ static void vga_draw_line8d2(VGACommonState *s1, uint8_t *d,
*
* XXX: add plane_mask support (never used in standard VGA modes)
*/
static void vga_draw_line8(VGACommonState *s1, uint8_t *d,
const uint8_t *s, int width)
static void vga_draw_line8(VGACommonState *vga, uint8_t *d,
uint32_t addr, int width)
{
uint32_t *palette;
int x;
palette = s1->last_palette;
palette = vga->last_palette;
width >>= 3;
for(x = 0; x < width; x++) {
((uint32_t *)d)[0] = palette[s[0]];
((uint32_t *)d)[1] = palette[s[1]];
((uint32_t *)d)[2] = palette[s[2]];
((uint32_t *)d)[3] = palette[s[3]];
((uint32_t *)d)[4] = palette[s[4]];
((uint32_t *)d)[5] = palette[s[5]];
((uint32_t *)d)[6] = palette[s[6]];
((uint32_t *)d)[7] = palette[s[7]];
((uint32_t *)d)[0] = palette[vga_read_byte(vga, addr + 0)];
((uint32_t *)d)[1] = palette[vga_read_byte(vga, addr + 1)];
((uint32_t *)d)[2] = palette[vga_read_byte(vga, addr + 2)];
((uint32_t *)d)[3] = palette[vga_read_byte(vga, addr + 3)];
((uint32_t *)d)[4] = palette[vga_read_byte(vga, addr + 4)];
((uint32_t *)d)[5] = palette[vga_read_byte(vga, addr + 5)];
((uint32_t *)d)[6] = palette[vga_read_byte(vga, addr + 6)];
((uint32_t *)d)[7] = palette[vga_read_byte(vga, addr + 7)];
d += 32;
s += 8;
addr += 8;
}
}
/*
* 15 bit color
*/
static void vga_draw_line15_le(VGACommonState *s1, uint8_t *d,
const uint8_t *s, int width)
static void vga_draw_line15_le(VGACommonState *vga, uint8_t *d,
uint32_t addr, int width)
{
int w;
uint32_t v, r, g, b;
w = width;
do {
v = lduw_le_p((void *)s);
v = vga_read_word_le(vga, addr);
r = (v >> 7) & 0xf8;
g = (v >> 2) & 0xf8;
b = (v << 3) & 0xf8;
((uint32_t *)d)[0] = rgb_to_pixel32(r, g, b);
s += 2;
addr += 2;
d += 4;
} while (--w != 0);
}
static void vga_draw_line15_be(VGACommonState *s1, uint8_t *d,
const uint8_t *s, int width)
static void vga_draw_line15_be(VGACommonState *vga, uint8_t *d,
uint32_t addr, int width)
{
int w;
uint32_t v, r, g, b;
w = width;
do {
v = lduw_be_p((void *)s);
v = vga_read_word_be(vga, addr);
r = (v >> 7) & 0xf8;
g = (v >> 2) & 0xf8;
b = (v << 3) & 0xf8;
((uint32_t *)d)[0] = rgb_to_pixel32(r, g, b);
s += 2;
addr += 2;
d += 4;
} while (--w != 0);
}
@@ -320,38 +346,38 @@ static void vga_draw_line15_be(VGACommonState *s1, uint8_t *d,
/*
* 16 bit color
*/
static void vga_draw_line16_le(VGACommonState *s1, uint8_t *d,
const uint8_t *s, int width)
static void vga_draw_line16_le(VGACommonState *vga, uint8_t *d,
uint32_t addr, int width)
{
int w;
uint32_t v, r, g, b;
w = width;
do {
v = lduw_le_p((void *)s);
v = vga_read_word_le(vga, addr);
r = (v >> 8) & 0xf8;
g = (v >> 3) & 0xfc;
b = (v << 3) & 0xf8;
((uint32_t *)d)[0] = rgb_to_pixel32(r, g, b);
s += 2;
addr += 2;
d += 4;
} while (--w != 0);
}
static void vga_draw_line16_be(VGACommonState *s1, uint8_t *d,
const uint8_t *s, int width)
static void vga_draw_line16_be(VGACommonState *vga, uint8_t *d,
uint32_t addr, int width)
{
int w;
uint32_t v, r, g, b;
w = width;
do {
v = lduw_be_p((void *)s);
v = vga_read_word_be(vga, addr);
r = (v >> 8) & 0xf8;
g = (v >> 3) & 0xfc;
b = (v << 3) & 0xf8;
((uint32_t *)d)[0] = rgb_to_pixel32(r, g, b);
s += 2;
addr += 2;
d += 4;
} while (--w != 0);
}
@@ -359,36 +385,36 @@ static void vga_draw_line16_be(VGACommonState *s1, uint8_t *d,
/*
* 24 bit color
*/
static void vga_draw_line24_le(VGACommonState *s1, uint8_t *d,
const uint8_t *s, int width)
static void vga_draw_line24_le(VGACommonState *vga, uint8_t *d,
uint32_t addr, int width)
{
int w;
uint32_t r, g, b;
w = width;
do {
b = s[0];
g = s[1];
r = s[2];
b = vga_read_byte(vga, addr + 0);
g = vga_read_byte(vga, addr + 1);
r = vga_read_byte(vga, addr + 2);
((uint32_t *)d)[0] = rgb_to_pixel32(r, g, b);
s += 3;
addr += 3;
d += 4;
} while (--w != 0);
}
static void vga_draw_line24_be(VGACommonState *s1, uint8_t *d,
const uint8_t *s, int width)
static void vga_draw_line24_be(VGACommonState *vga, uint8_t *d,
uint32_t addr, int width)
{
int w;
uint32_t r, g, b;
w = width;
do {
r = s[0];
g = s[1];
b = s[2];
r = vga_read_byte(vga, addr + 0);
g = vga_read_byte(vga, addr + 1);
b = vga_read_byte(vga, addr + 2);
((uint32_t *)d)[0] = rgb_to_pixel32(r, g, b);
s += 3;
addr += 3;
d += 4;
} while (--w != 0);
}
@@ -396,44 +422,36 @@ static void vga_draw_line24_be(VGACommonState *s1, uint8_t *d,
/*
* 32 bit color
*/
static void vga_draw_line32_le(VGACommonState *s1, uint8_t *d,
const uint8_t *s, int width)
static void vga_draw_line32_le(VGACommonState *vga, uint8_t *d,
uint32_t addr, int width)
{
#ifndef HOST_WORDS_BIGENDIAN
memcpy(d, s, width * 4);
#else
int w;
uint32_t r, g, b;
w = width;
do {
b = s[0];
g = s[1];
r = s[2];
b = vga_read_byte(vga, addr + 0);
g = vga_read_byte(vga, addr + 1);
r = vga_read_byte(vga, addr + 2);
((uint32_t *)d)[0] = rgb_to_pixel32(r, g, b);
s += 4;
addr += 4;
d += 4;
} while (--w != 0);
#endif
}
static void vga_draw_line32_be(VGACommonState *s1, uint8_t *d,
const uint8_t *s, int width)
static void vga_draw_line32_be(VGACommonState *vga, uint8_t *d,
uint32_t addr, int width)
{
#ifdef HOST_WORDS_BIGENDIAN
memcpy(d, s, width * 4);
#else
int w;
uint32_t r, g, b;
w = width;
do {
r = s[1];
g = s[2];
b = s[3];
r = vga_read_byte(vga, addr + 1);
g = vga_read_byte(vga, addr + 2);
b = vga_read_byte(vga, addr + 3);
((uint32_t *)d)[0] = rgb_to_pixel32(r, g, b);
s += 4;
addr += 4;
d += 4;
} while (--w != 0);
#endif
}

View File

@@ -1005,7 +1005,7 @@ void vga_mem_writeb(VGACommonState *s, hwaddr addr, uint32_t val)
}
typedef void vga_draw_line_func(VGACommonState *s1, uint8_t *d,
const uint8_t *s, int width);
uint32_t srcaddr, int width);
#include "vga-helpers.h"
@@ -1280,6 +1280,9 @@ static void vga_draw_text(VGACommonState *s, int full_update)
cx_min = width;
cx_max = -1;
for(cx = 0; cx < width; cx++) {
if (src + sizeof(uint16_t) > s->vram_ptr + s->vram_size) {
break;
}
ch_attr = *(uint16_t *)src;
if (full_update || ch_attr != *ch_attr_ptr || src == cursor_ptr) {
if (cx < cx_min)
@@ -1434,6 +1437,14 @@ void vga_invalidate_scanlines(VGACommonState *s, int y1, int y2)
}
}
static bool vga_scanline_invalidated(VGACommonState *s, int y)
{
if (y >= VGA_MAX_HEIGHT) {
return false;
}
return s->invalidated_y_table[y >> 5] & (1 << (y & 0x1f));
}
void vga_sync_dirty_bitmap(VGACommonState *s)
{
memory_region_sync_dirty_bitmap(&s->vram);
@@ -1456,13 +1467,14 @@ static void vga_draw_graphic(VGACommonState *s, int full_update)
{
DisplaySurface *surface = qemu_console_surface(s->con);
int y1, y, update, linesize, y_start, double_scan, mask, depth;
int width, height, shift_control, line_offset, bwidth, bits;
ram_addr_t page0, page1, page_min, page_max;
int width, height, shift_control, bwidth, bits;
ram_addr_t page0, page1, region_start, region_end;
DirtyBitmapSnapshot *snap = NULL;
int disp_width, multi_scan, multi_run;
uint8_t *d;
uint32_t v, addr1, addr;
vga_draw_line_func *vga_draw_line = NULL;
bool share_surface;
bool share_surface, force_shadow = false;
pixman_format_code_t format;
#ifdef HOST_WORDS_BIGENDIAN
bool byteswap = !s->big_endian_fb;
@@ -1472,11 +1484,34 @@ static void vga_draw_graphic(VGACommonState *s, int full_update)
full_update |= update_basic_params(s);
if (!full_update)
vga_sync_dirty_bitmap(s);
s->get_resolution(s, &width, &height);
disp_width = width;
depth = s->get_bpp(s);
region_start = (s->start_addr * 4);
region_end = region_start + (ram_addr_t)s->line_offset * height;
region_end += width * depth / 8; /* scanline length */
region_end -= s->line_offset;
if (region_end > s->vbe_size || depth == 0 || depth == 15) {
/*
* We land here on:
* - wraps around (can happen with cirrus vbe modes)
* - depth == 0 (256 color palette video mode)
* - depth == 15
*
* Take the safe and slow route:
* - create a dirty bitmap snapshot for all vga memory.
* - force shadowing (so all vga memory access goes
* through vga_read_*() helpers).
*
* Given this affects only vga features which are pretty much
* unused by modern guests there should be no performance
* impact.
*/
region_start = 0;
region_end = s->vbe_size;
force_shadow = true;
}
shift_control = (s->gr[VGA_GFX_MODE] >> 5) & 3;
double_scan = (s->cr[VGA_CRTC_MAX_SCAN] >> 7);
@@ -1506,8 +1541,6 @@ static void vga_draw_graphic(VGACommonState *s, int full_update)
}
}
depth = s->get_bpp(s);
/*
* Check whether we can share the surface with the backend
* or whether we need a shadow surface. We share native
@@ -1517,7 +1550,7 @@ static void vga_draw_graphic(VGACommonState *s, int full_update)
format = qemu_default_pixman_format(depth, !byteswap);
if (format) {
share_surface = dpy_gfx_check_format(s->con, format)
&& !s->force_shadow;
&& !s->force_shadow && !force_shadow;
} else {
share_surface = false;
}
@@ -1608,20 +1641,29 @@ static void vga_draw_graphic(VGACommonState *s, int full_update)
s->cursor_invalidate(s);
}
line_offset = s->line_offset;
#if 0
printf("w=%d h=%d v=%d line_offset=%d cr[0x09]=0x%02x cr[0x17]=0x%02x linecmp=%d sr[0x01]=0x%02x\n",
width, height, v, line_offset, s->cr[9], s->cr[VGA_CRTC_MODE],
s->line_compare, sr(s, VGA_SEQ_CLOCK_MODE));
#endif
addr1 = (s->start_addr * 4);
bwidth = (width * bits + 7) / 8;
bwidth = DIV_ROUND_UP(width * bits, 8);
y_start = -1;
page_min = -1;
page_max = 0;
d = surface_data(surface);
linesize = surface_stride(surface);
y1 = 0;
if (!full_update) {
vga_sync_dirty_bitmap(s);
if (s->line_compare < height) {
/* split screen mode */
region_start = 0;
}
snap = memory_region_snapshot_and_clear_dirty(&s->vram, region_start,
region_end - region_start,
DIRTY_MEMORY_VGA);
}
for(y = 0; y < height; y++) {
addr = addr1;
if (!(s->cr[VGA_CRTC_MODE] & 1)) {
@@ -1634,21 +1676,28 @@ static void vga_draw_graphic(VGACommonState *s, int full_update)
addr = (addr & ~0x8000) | ((y1 & 2) << 14);
}
update = full_update;
page0 = addr;
page1 = addr + bwidth - 1;
update |= memory_region_get_dirty(&s->vram, page0, page1 - page0,
DIRTY_MEMORY_VGA);
/* explicit invalidation for the hardware cursor */
update |= (s->invalidated_y_table[y >> 5] >> (y & 0x1f)) & 1;
page0 = addr & s->vbe_size_mask;
page1 = (addr + bwidth - 1) & s->vbe_size_mask;
if (full_update) {
update = 1;
} else if (page1 < page0) {
/* scanline wraps from end of video memory to the start */
assert(force_shadow);
update = memory_region_snapshot_get_dirty(&s->vram, snap,
page0, s->vbe_size - page0);
update |= memory_region_snapshot_get_dirty(&s->vram, snap,
0, page1);
} else {
update = memory_region_snapshot_get_dirty(&s->vram, snap,
page0, page1 - page0);
}
/* explicit invalidation for the hardware cursor (cirrus only) */
update |= vga_scanline_invalidated(s, y);
if (update) {
if (y_start < 0)
y_start = y;
if (page0 < page_min)
page_min = page0;
if (page1 > page_max)
page_max = page1;
if (!(is_buffer_shared(surface))) {
vga_draw_line(s, d, s->vram_ptr + addr, width);
vga_draw_line(s, d, addr, width);
if (s->cursor_draw_line)
s->cursor_draw_line(s, d, y);
}
@@ -1663,7 +1712,7 @@ static void vga_draw_graphic(VGACommonState *s, int full_update)
if (!multi_run) {
mask = (s->cr[VGA_CRTC_MODE] & 3) ^ 3;
if ((y1 & mask) == mask)
addr1 += line_offset;
addr1 += s->line_offset;
y1++;
multi_run = multi_scan;
} else {
@@ -1679,14 +1728,8 @@ static void vga_draw_graphic(VGACommonState *s, int full_update)
dpy_gfx_update(s->con, 0, y_start,
disp_width, y - y_start);
}
/* reset modified pages */
if (page_max >= page_min) {
memory_region_reset_dirty(&s->vram,
page_min,
page_max - page_min,
DIRTY_MEMORY_VGA);
}
memset(s->invalidated_y_table, 0, ((height + 31) >> 5) * 4);
g_free(snap);
memset(s->invalidated_y_table, 0, sizeof(s->invalidated_y_table));
}
static void vga_draw_blank(VGACommonState *s, int full_update)
@@ -2158,6 +2201,7 @@ void vga_common_init(VGACommonState *s, Object *obj, bool global_vmstate)
if (!s->vbe_size) {
s->vbe_size = s->vram_size;
}
s->vbe_size_mask = s->vbe_size - 1;
s->is_vbe_vmstate = 1;
memory_region_init_ram(&s->vram, obj, "vga.vram", s->vram_size,

View File

@@ -94,6 +94,7 @@ typedef struct VGACommonState {
uint32_t vram_size;
uint32_t vram_size_mb; /* property */
uint32_t vbe_size;
uint32_t vbe_size_mask;
uint32_t latch;
bool has_chain4_alias;
MemoryRegion chain4_alias;

View File

@@ -505,6 +505,8 @@ static inline void vmsvga_cursor_define(struct vmsvga_state_s *s,
int i, pixels;
qc = cursor_alloc(c->width, c->height);
assert(qc != NULL);
qc->hot_x = c->hot_x;
qc->hot_y = c->hot_y;
switch (c->bpp) {

View File

@@ -601,7 +601,7 @@ static void i8257_class_init(ObjectClass *klass, void *data)
idc->schedule = i8257_dma_schedule;
idc->register_channel = i8257_dma_register_channel;
/* Reason: needs to be wired up by isa_bus_dma() to work */
dc->cannot_instantiate_with_device_add_yet = true;
dc->user_creatable = false;
}
static const TypeInfo i8257_info = {

View File

@@ -305,7 +305,7 @@ static void sparc32_dma_class_init(ObjectClass *klass, void *data)
dc->vmsd = &vmstate_dma;
dc->props = sparc32_dma_properties;
/* Reason: pointer property "iommu_opaque" */
dc->cannot_instantiate_with_device_add_yet = true;
dc->user_creatable = false;
}
static const TypeInfo sparc32_dma_info = {

View File

@@ -773,7 +773,7 @@ static void omap_gpio_class_init(ObjectClass *klass, void *data)
dc->reset = omap_gpif_reset;
dc->props = omap_gpio_properties;
/* Reason: pointer property "clk" */
dc->cannot_instantiate_with_device_add_yet = true;
dc->user_creatable = false;
}
static const TypeInfo omap_gpio_info = {
@@ -804,7 +804,7 @@ static void omap2_gpio_class_init(ObjectClass *klass, void *data)
dc->reset = omap2_gpif_reset;
dc->props = omap2_gpio_properties;
/* Reason: pointer properties "iclk", "fclk0", ..., "fclk5" */
dc->cannot_instantiate_with_device_add_yet = true;
dc->user_creatable = false;
}
static const TypeInfo omap2_gpio_info = {

View File

@@ -246,7 +246,7 @@ static int i2c_ddc_rx(I2CSlave *i2c)
I2CDDCState *s = I2CDDC(i2c);
int value;
value = s->edid_blob[s->reg];
value = s->edid_blob[s->reg % sizeof(s->edid_blob)];
s->reg++;
return value;
}

View File

@@ -491,7 +491,7 @@ static void omap_i2c_class_init(ObjectClass *klass, void *data)
dc->props = omap_i2c_properties;
dc->reset = omap_i2c_reset;
/* Reason: pointer properties "iclk", "fclk" */
dc->cannot_instantiate_with_device_add_yet = true;
dc->user_creatable = false;
dc->realize = omap_i2c_realize;
}

View File

@@ -123,7 +123,7 @@ static void smbus_eeprom_class_initfn(ObjectClass *klass, void *data)
sc->read_data = eeprom_read_data;
dc->props = smbus_eeprom_properties;
/* Reason: pointer property "data" */
dc->cannot_instantiate_with_device_add_yet = true;
dc->user_creatable = false;
}
static const TypeInfo smbus_eeprom_info = {

View File

@@ -103,7 +103,7 @@ static void ich9_smb_class_init(ObjectClass *klass, void *data)
* Reason: part of ICH9 southbridge, needs to be wired up by
* pc_q35_init()
*/
dc->cannot_instantiate_with_device_add_yet = true;
dc->user_creatable = false;
}
I2CBus *ich9_smb_init(PCIBus *bus, int devfn, uint32_t smb_io_base)

View File

@@ -457,8 +457,10 @@ static void *acpi_set_bsel(PCIBus *bus, void *opaque)
{
unsigned *bsel_alloc = opaque;
unsigned *bus_bsel;
int64_t bsel = object_property_get_int(OBJECT(bus),
ACPI_PCIHP_PROP_BSEL, NULL);
if (qbus_is_hotpluggable(BUS(bus))) {
if (qbus_is_hotpluggable(BUS(bus)) && bsel < 0) {
bus_bsel = g_malloc(sizeof *bus_bsel);
*bus_bsel = (*bsel_alloc)++;
@@ -1818,9 +1820,9 @@ static Aml *build_q35_osc_method(void)
/*
* Always allow native PME, AER (no dependencies)
* Never allow SHPC (no SHPC controller in this system)
* Allow SHPC (PCI bridges can have SHPC controller)
*/
aml_append(if_ctx, aml_and(a_ctrl, aml_int(0x1D), a_ctrl));
aml_append(if_ctx, aml_and(a_ctrl, aml_int(0x1F), a_ctrl));
if_ctx2 = aml_if(aml_lnot(aml_equal(aml_arg(1), aml_int(1))));
/* Unknown revision */

View File

@@ -19,6 +19,7 @@
#include "qemu/host-utils.h"
#include "sysemu/sysemu.h"
#include "sysemu/kvm.h"
#include "sysemu/hw_accel.h"
#include "kvm_i386.h"
#include "hw/sysbus.h"
#include "hw/kvm/clock.h"
@@ -61,7 +62,7 @@ static uint64_t kvmclock_current_nsec(KVMClockState *s)
{
CPUState *cpu = first_cpu;
CPUX86State *env = cpu->env_ptr;
hwaddr kvmclock_struct_pa = env->system_time_msr & ~1ULL;
hwaddr kvmclock_struct_pa;
uint64_t migration_tsc = env->tsc;
struct pvclock_vcpu_time_info time;
uint64_t delta;
@@ -69,11 +70,14 @@ static uint64_t kvmclock_current_nsec(KVMClockState *s)
uint64_t nsec_hi;
uint64_t nsec;
cpu_synchronize_state(cpu);
if (!(env->system_time_msr & 1ULL)) {
/* KVM clock not active */
return 0;
}
kvmclock_struct_pa = env->system_time_msr & ~1ULL;
cpu_physical_memory_read(kvmclock_struct_pa, &time, sizeof(time));
assert(time.tsc_timestamp <= migration_tsc);

View File

@@ -221,15 +221,38 @@ int load_multiboot(FWCfgState *fw_cfg,
uint32_t mh_header_addr = ldl_p(header+i+12);
uint32_t mh_load_end_addr = ldl_p(header+i+20);
uint32_t mh_bss_end_addr = ldl_p(header+i+24);
mh_load_addr = ldl_p(header+i+16);
if (mh_header_addr < mh_load_addr) {
fprintf(stderr, "invalid mh_load_addr address\n");
exit(1);
}
if (mh_load_end_addr > mh_bss_end_addr) {
fprintf(stderr, "invalid mh_load_end_addr address\n");
exit(1);
}
uint32_t mb_kernel_text_offset = i - (mh_header_addr - mh_load_addr);
uint32_t mb_load_size = 0;
mh_entry_addr = ldl_p(header+i+28);
if (mh_load_end_addr) {
if (mh_bss_end_addr < mh_load_addr) {
fprintf(stderr, "invalid mh_bss_end_addr address\n");
exit(1);
}
mb_kernel_size = mh_bss_end_addr - mh_load_addr;
if (mh_load_end_addr < mh_load_addr) {
fprintf(stderr, "invalid mh_load_end_addr address\n");
exit(1);
}
mb_load_size = mh_load_end_addr - mh_load_addr;
} else {
if (kernel_file_size < mb_kernel_text_offset) {
fprintf(stderr, "invalid kernel_file_size\n");
exit(1);
}
mb_kernel_size = kernel_file_size - mb_kernel_text_offset;
mb_load_size = mb_kernel_size;
}

Some files were not shown because too many files have changed in this diff Show More