Git-commit: f6b0de53fb
References: bsc#1212968 CVE-2023-2861
The 9p protocol does not specifically define how server shall behave when
client tries to open a special file, however from security POV it does
make sense for 9p server to prohibit opening any special file on host side
in general. A sane Linux 9p client for instance would never attempt to
open a special file on host side, it would always handle those exclusively
on its guest side. A malicious client however could potentially escape
from the exported 9p tree by creating and opening a device file on host
side.
With QEMU this could only be exploited in the following unsafe setups:
- Running QEMU binary as root AND 9p 'local' fs driver AND 'passthrough'
security model.
or
- Using 9p 'proxy' fs driver (which is running its helper daemon as
root).
These setups were already discouraged for safety reasons before,
however for obvious reasons we are now tightening behaviour on this.
Fixes: CVE-2023-2861
Reported-by: Yanwu Shen <ywsPlz@gmail.com>
Reported-by: Jietao Xiao <shawtao1125@gmail.com>
Reported-by: Jinku Li <jkli@xidian.edu.cn>
Reported-by: Wenbo Shen <shenwenbo@zju.edu.cn>
Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Message-Id: <E1q6w7r-0000Q0-NM@lizzy.crudebyte.com>
Signed-off-by: Jaroslav Jindrak <jjindrak@suse.com>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Git-commit: 3ab6fdc91b
Reference: bsc#1190258, CVE-2021-3750
Add the 'memory' bit to the memory attributes to restrict bus
controller accesses to memories.
Introduce flatview_access_allowed() to check bus permission
before running any bus transaction.
Have read/write accessors return MEMTX_ACCESS_ERROR if an access is
restricted.
There is no change for the default case where 'memory' is not set.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20211215182421.418374-4-philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
[thuth: Replaced MEMTX_BUS_ERROR with MEMTX_ACCESS_ERROR, remove
"inline"]
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Jaroslav Jindrak <jjindrak@suse.com>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Git-commit: 4367a20cc4
References: bsc#1198038, CVE-2022-0216
Set current_req to NULL, not current_req->req, to prevent reusing a free'd
buffer in case of repeated SCSI cancel requests. Also apply the fix to
CLEAR QUEUE and BUS DEVICE RESET messages as well, since they also cancel
the request.
Thanks to Alexander Bulekov for providing a reproducer.
Fixes: CVE-2022-0216
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/972
Signed-off-by: Mauro Matteo Cascella <mcascell@redhat.com>
Tested-by: Alexander Bulekov <alxndr@bu.edu>
Message-Id: <20220711123316.421279-1-mcascell@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Git-commit: 736b01642d
References: bsc#1193880, CVE-2021-3929
This fixes CVE-2021-3929 "locally" by denying DMA to the iomem of the
device itself. This still allows DMA to MMIO regions of other devices
(e.g. doing P2P DMA to the controller memory buffer of another NVMe
device).
Fixes: CVE-2021-3929
Reported-by: Qiuhao Li <Qiuhao.Li@outlook.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>41
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Signed-off-by: Lin Ma <lma@suse.com>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Git-commit: ffd205ef29
References: bsc#1192463
This partially reverts commit bbd2d5a812.
This commit was misguided and broke using --disable-pie on any distro
that enables PIE by default in their compiler driver, including Debian
and its derivatives. Whilst -no-pie is not a linker flag, it is a
compiler driver flag that ensures -pie is not automatically passed by it
to the linker. Without it, all compile_prog checks will fail as any code
built with the explicit -fno-pie will fail to link with the implicit
default -pie due to trying to use position-dependent relocations. The
only bug that needed fixing was LDFLAGS_NOPIE being used as a flag for
the linker itself in pc-bios/optionrom/Makefile.
Note this does not reinstate exporting LDFLAGS_NOPIE, as it is unused,
since the only previous use was the one that should not have existed. I
have also updated the comment for the -fno-pie and -no-pie checks to
reflect what they're actually needed for.
Fixes: bbd2d5a812
Cc: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Jessica Clarke <jrtc27@jrtc27.com>
Message-Id: <20210805192545.38279-1-jrtc27@jrtc27.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Liang Yan <lyan@suse.com>
Git-commit: bbd2d5a812
References: bsc#1192463
Recent binutils changes dropping unsupported options [1] caused a build
issue in regard to the optionroms.
ld -m elf_i386 -T /<<PKGBUILDDIR>>/pc-bios/optionrom//flat.lds -no-pie \
-s -o multiboot.img multiboot.o
ld.bfd: Error: unable to disambiguate: -no-pie (did you mean --no-pie ?)
This isn't really a regression in ld.bfd, filing the bug upstream
revealed that this never worked as a ld flag [2] - in fact it seems we
were by accident setting --nmagic).
Since it never had the wanted effect this usage of LDFLAGS_NOPIE, should be
droppable without any effect. This also is the only use-case of LDFLAGS_NOPIE
in .mak, therefore we can also remove it from being added there.
[1]: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=983d925d
[2]: https://sourceware.org/bugzilla/show_bug.cgi?id=27050#c5
Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Message-Id: <20201214150938.1297512-1-christian.ehrhardt@canonical.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Liang Yan <lyan@suse.com>
Git-commit: bedd7e93d0
References: bsc#1189938 CVE-2021-3748
When mergeable buffer is enabled, we try to set the num_buffers after
the virtqueue elem has been unmapped. This will lead several issues,
E.g a use after free when the descriptor has an address which belongs
to the non direct access region. In this case we use bounce buffer
that is allocated during address_space_map() and freed during
address_space_unmap().
Fixing this by storing the elems temporarily in an array and delay the
unmap after we set the the num_buffers.
This addresses CVE-2021-3748.
Reported-by: Alexander Bulekov <alxndr@bu.edu>
Fixes: fbe78f4f55 ("virtio-net support")
Cc: qemu-stable@nongnu.org
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jose R Ziviani <jose.ziviani@suse.com>
Git-commit: ba10b9c003
References: bsc#1189938 CVE-2021-3748
All these errors are caused by a buggy guest: let's switch the device to
the broken state instead of terminating QEMU. Also we detach the element
from the virtqueue and free it.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
[JRZ: tweaked patch to bring the min. necessary]
Signed-off-by: Jose R Ziviani <jose.ziviani@suse.com>
Git-commit: 00000000000000000000000000000000000000000000
References: bsc#1180432, CVE-2020-35503
Ensure that 'cmd->frame' is not NULL before accessing the 'header' field.
This check prevents a potential NULL pointer dereference issue.
RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1910346
Signed-off-by: Mauro Matteo Cascella <mcascell@redhat.com>
Reported-by: Cheolwoo Myung <cwmyung@snu.ac.kr>
Acked-By: Jose R Ziviani <jose.ziviani@suse.com>
Git-commit: 05a40b172e
References: bsc#1186012, CVE-2021-3527
usb-host and usb-redirect try to batch bulk transfers by combining many
small usb packets into a single, large transfer request, to reduce the
overhead and improve performance.
This patch adds a size limit of 1 MiB for those combined packets to
restrict the host resources the guest can bind that way.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-Id: <20210503132915.2335822-6-kraxel@redhat.com>
Signed-off-by: Jose R Ziviani <jose.ziviani@suse.com>
Git-commit: 000000000000000000000000000000000000000000000
References: bsc#1182651, CVE-2021-20255
Patch based on discussion:
https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg06098.html
While processing controller commands, eepro100 emulator gets
command unit(CU) base address OR receive unit (RU) base address
OR command block (CB) address from guest. If these values are not
checked, it may lead to an infinite loop kind of issues. Add checks
to avoid it.
Reported-by: Ruhr-University Bochum <bugs-syssec@rub.de>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Acked-By: Jose R Ziviani <jose.ziviani@suse.com>
Git-commit: d7fb54218424c3b2517aee5b391ced0f75386a5d
References: bsc#1187364, CVE-2021-3592
RFC2131 suggests that the options field may be at least 312 bytes.
Some DHCP clients seem to assume that it has to be at least 312 bytes.
Fixes#51
Fixes: f13cad45b25d92760bb0ad67bec0300a4d7d5275 ("bootp: limit
vendor-specific area to input packet memory buffer")
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: Jose R Ziviani <jose.ziviani@suse.com>
Git-commit: f13cad45b25d92760bb0ad67bec0300a4d7d5275
References: bsc#1187364, CVE-2021-3592
sizeof(bootp_t) currently holds DHCP_OPT_LEN. Remove this optional field
from the structure, to help with the following patch checking for
minimal header size. Modify the bootp_reply() function to take the
buffer boundaries and avoiding potential buffer overflow.
Related to CVE-2021-3592.
https://gitlab.freedesktop.org/slirp/libslirp/-/issues/44
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Jose R Ziviani <jose.ziviani@suse.com>
Git commit: 6fd057f7960bc7b3a69f3c53de5c9d0d5d34a79c
References: bsc#1187364, CVE-2021-3592
When user provides a long domainname or hostname that doesn't fit in the
DHCP packet, we mustn't overflow the response packet buffer. Instead,
report errors, following the g_warning() in the slirp->vdnssearch
branch.
Also check the strlen against 256 when initializing slirp, which limit
is also from the protocol where one byte represents the string length.
This gives an early error before the warning which is harder to notice
or diagnose.
Reported-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Fam Zheng <famz@redhat.com>
Tested-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: Jose R Ziviani <jose.ziviani@suse.com>
Git-commit: 93e645e72a056ec0b2c16e0299fc5c6b94e4ca17
References: bsc#1187364, CVE-2021-3592
bsc#1187367, CVE-2021-3594
Recent security issues demonstrate the lack of safety care when casting
a mbuf to a particular structure type. At least, it should check that
the buffer is large enough. The following patches will make use of this
function.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Jose R Ziviani <jose.ziviani@suse.com>
Git-commit: 1bf8b88f14
References: bsc#1187529 CVE-2021-3611
Object property insertion code iterates over an integer to get an unused
index that can be used as an unique name for an object property. This loop
increments the integer value indefinitely. Although very unlikely, this can
still cause an integer overflow.
In this change, we fix the above code by checking against INT16_MAX and making
sure that the interger index does not overflow beyond that value. If no
available index is found, the code would cause an assertion failure. This
assertion failure is necessary because the callers of the function do not check
the return value for NULL.
Signed-off-by: Ani Sinha <ani@anisinha.ca>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20200921093325.25617-1-ani@anisinha.ca>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Cho, Yu-Chen <acho@suse.com>
Git-commit: 3059344f01
References: bsc#1172382 CVE-2020-13754
The "BCM2835 ARM Peripherals" datasheet [*] chapter 2
("Auxiliaries: UART1 & SPI1, SPI2"), list the register
sizes as 3/8/16/32 bits. We assume this means this
peripheral allows 8-bit accesses.
This was not an issue until commit 5d971f9e67 which reverted
("memory: accept mismatching sizes in memory_region_access_valid").
The model is implemented as 32-bit accesses (see commit 97398d900c,
all registers are 32-bit) so replace MemoryRegionOps.valid as
MemoryRegionOps.impl, and re-introduce MemoryRegionOps.valid
with a 8/32-bit range.
[*] https://www.raspberrypi.org/app/uploads/2012/02/BCM2835-ARM-Peripherals.pdf
Fixes: 97398d900c ("bcm2835_aux: add emulation of BCM2835 AUX (aka UART1) block")
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20201002181032.1899463-1-f4bug@amsat.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jose R Ziviani <jose.ziviani@suse.com>
Git-commit: 8e67fda2dd
References: bsc#1172382 CVE-2020-13754
QEMU XHCI advertises AC64 (64-bit addressing) but doesn't allow
64-bit mode access in "runtime" and "operational" MemoryRegionOps.
Set the max_access_size based on sizeof(dma_addr_t) as AC64 is set.
XHCI specs:
"If the xHC supports 64-bit addressing (AC64 = ‘1’), then software
should write 64-bit registers using only Qword accesses. If a
system is incapable of issuing Qword accesses, then writes to the
64-bit address fields shall be performed using 2 Dword accesses;
low Dword-first, high-Dword second. If the xHC supports 32-bit
addressing (AC64 = ‘0’), then the high Dword of registers containing
64-bit address fields are unused and software should write addresses
using only Dword accesses"
The problem has been detected with SLOF, as linux kernel always accesses
registers using 32-bit access even if AC64 is set and revealed by
5d971f9e67 ("memory: Revert "memory: accept mismatching sizes in memory_region_access_valid"")
Suggested-by: Alexey Kardashevskiy <aik@au1.ibm.com>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-id: 20200721083322.90651-1-lvivier@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Jose R Ziviani <jose.ziviani@suse.com>
Git-commit: 5d971f9e67
References: bsc#1172382 CVE-2020-13754
Memory API documentation documents valid .min_access_size and .max_access_size
fields and explains that any access outside these boundaries is blocked.
This is what devices seem to assume.
However this is not what the implementation does: it simply
ignores the boundaries unless there's an "accepts" callback.
Naturally, this breaks a bunch of devices.
Revert to the documented behaviour.
Devices that want to allow any access can just drop the valid field,
or add the impl field to have accesses converted to appropriate
length.
Cc: qemu-stable@nongnu.org
Reviewed-by: Richard Henderson <rth@twiddle.net>
Fixes: CVE-2020-13754
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1842363
Fixes: a014ed07bd ("memory: accept mismatching sizes in memory_region_access_valid")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20200610134731.1514409-1-mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jose R. Ziviani <jose.ziviani@suse.com>
Git-commit: 89ed83d8b2
References: bsc#1172382 CVE-2020-13754
The memory region ops have min_access_size == 4 so obey it.
Tested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jose R Ziviani <jose.ziviani@suse.com>
Git-commit: 4b7c06837a
References: bsc#1172382 CVE-2020-13754
The memory region ops have min_access_size == 4 so obey it.
Tested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jose R Ziviani <jose.ziviani@suse.com>
Git-commit: c7ede54cbd2e2b25385325600958ba0124e31cc0
References: bsc#1172380 CVE-2020-10756
Drop IPv6 message shorter than what's mentioned in the payload
length header (+ the size of the IPv6 header). They're invalid an could
lead to data leakage in icmp6_send_echoreply().
Signed-off-by: Jose R Ziviani <jose.ziviani@suse.com>
Git-commit: edfe2eb436
References: bsc#1181933 CVE-2021-20221
Per the ARM Generic Interrupt Controller Architecture specification
(document "ARM IHI 0048B.b (ID072613)"), the SGIINTID field is 4 bit,
not 10:
- 4.3 Distributor register descriptions
- 4.3.15 Software Generated Interrupt Register, GICD_SG
- Table 4-21 GICD_SGIR bit assignments
The Interrupt ID of the SGI to forward to the specified CPU
interfaces. The value of this field is the Interrupt ID, in
the range 0-15, for example a value of 0b0011 specifies
Interrupt ID 3.
Correct the irq mask to fix an undefined behavior (which eventually
lead to a heap-buffer-overflow, see [Buglink]):
$ echo 'writel 0x8000f00 0xff4affb0' | qemu-system-aarch64 -M virt,accel=qtest -qtest stdio
[I 1612088147.116987] OPENED
[R +0.278293] writel 0x8000f00 0xff4affb0
../hw/intc/arm_gic.c:1498:13: runtime error: index 944 out of bounds for type 'uint8_t [16][8]'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../hw/intc/arm_gic.c:1498:13
This fixes a security issue when running with KVM on Arm with
kernel-irqchip=off. (The default is kernel-irqchip=on, which is
unaffected, and which is also the correct choice for performance.)
Cc: qemu-stable@nongnu.org
Fixes: CVE-2021-20221
Fixes: 9ee6e8bb85 ("ARMv7 support.")
Buglink: https://bugs.launchpad.net/qemu/+bug/1913916
Buglink: https://bugs.launchpad.net/qemu/+bug/1913917
Reported-by: Alexander Bulekov <alxndr@bu.edu>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20210131103401.217160-1-f4bug@amsat.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Bruce Rogers <brogers@suse.com>
Git-commit: b946434f26
References: bsc#1175441, CVE-2020-14364
Store calculated setup_len in a local variable, verify it, and only
write it to the struct (USBDevice->setup_len) in case it passed the
sanity checks.
This prevents other code (do_token_{in,out} functions specifically)
from working with invalid USBDevice->setup_len values and overrunning
the USBDevice->setup_buf[] buffer.
Fixes: CVE-2020-14364
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Tested-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Message-id: 20200825053636.29648-1-kraxel@redhat.com
Signed-off-by: Bruce Rogers <brogers@suse.com>
Git-commit: 035e69b063
References: bsc#1174641, CVE-2020-16092
An assertion failure issue was found in the code that processes network packets
while adding data fragments into the packet context. It could be abused by a
malicious guest to abort the QEMU process on the host. This patch replaces the
affected assert() with a conditional statement, returning false if the current
data fragment exceeds max_raw_frags.
Reported-by: Alexander Bulekov <alxndr@bu.edu>
Reported-by: Ziming Zhang <ezrakiez@gmail.com>
Reviewed-by: Dmitry Fleytman <dmitry.fleytman@gmail.com>
Signed-off-by: Mauro Matteo Cascella <mcascell@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Bruce Rogers <brogers@suse.com>
Git-commit: 5519724a13
References: bsc#1174386 CVE-2020-15863
A buffer overflow issue was reported by Mr. Ziming Zhang, CC'd here. It
occurs while sending an Ethernet frame due to missing break statements
and improper checking of the buffer size.
Reported-by: Ziming Zhang <ezrakiez@gmail.com>
Signed-off-by: Mauro Matteo Cascella <mcascell@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Bruce Rogers <brogers@suse.com>
Git-commit: 0000000000000000000000000000000000000000
References: bsc#1181639
While activating device in vmxnet3_acticate_device(), it does not
validate guest supplied configuration values against predefined
minimum - maximum limits. This may lead to integer overflow or
OOB access issues. Add checks to avoid it.
Fixes: CVE-2021-20203
Buglink: https://bugs.launchpad.net/qemu/+bug/1913873
Reported-by: Gaoning Pan <pgn@zju.edu.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Bruce Rogers <brogers@suse.com>
Git-commit: ff0507c239
References: bsc#1180523, CVE-2020-11947
There is an overflow, the source 'datain.data[2]' is 100 bytes,
but the 'ss' is 252 bytes.This may cause a security issue because
we can access a lot of unrelated memory data.
The len for sbp copy data should take the minimum of mx_sb_len and
sb_len_wr, not the maximum.
If we use iscsi device for VM backend storage, ASAN show stack:
READ of size 252 at 0xfffd149dcfc4 thread T0
#0 0xaaad433d0d34 in __asan_memcpy (aarch64-softmmu/qemu-system-aarch64+0x2cb0d34)
#1 0xaaad45f9d6d0 in iscsi_aio_ioctl_cb /qemu/block/iscsi.c:996:9
#2 0xfffd1af0e2dc (/usr/lib64/iscsi/libiscsi.so.8+0xe2dc)
#3 0xfffd1af0d174 (/usr/lib64/iscsi/libiscsi.so.8+0xd174)
#4 0xfffd1af19fac (/usr/lib64/iscsi/libiscsi.so.8+0x19fac)
#5 0xaaad45f9acc8 in iscsi_process_read /qemu/block/iscsi.c:403:5
#6 0xaaad4623733c in aio_dispatch_handler /qemu/util/aio-posix.c:467:9
#7 0xaaad4622f350 in aio_dispatch_handlers /qemu/util/aio-posix.c:510:20
#8 0xaaad4622f350 in aio_dispatch /qemu/util/aio-posix.c:520
#9 0xaaad46215944 in aio_ctx_dispatch /qemu/util/async.c:298:5
#10 0xfffd1bed12f4 in g_main_context_dispatch (/lib64/libglib-2.0.so.0+0x512f4)
#11 0xaaad46227de0 in glib_pollfds_poll /qemu/util/main-loop.c:219:9
#12 0xaaad46227de0 in os_host_main_loop_wait /qemu/util/main-loop.c:242
#13 0xaaad46227de0 in main_loop_wait /qemu/util/main-loop.c:518
#14 0xaaad43d9d60c in qemu_main_loop /qemu/softmmu/vl.c:1662:9
#15 0xaaad4607a5b0 in main /qemu/softmmu/main.c:49:5
#16 0xfffd1a460b9c in __libc_start_main (/lib64/libc.so.6+0x20b9c)
#17 0xaaad43320740 in _start (aarch64-softmmu/qemu-system-aarch64+0x2c00740)
0xfffd149dcfc4 is located 0 bytes to the right of 100-byte region [0xfffd149dcf60,0xfffd149dcfc4)
allocated by thread T0 here:
#0 0xaaad433d1e70 in __interceptor_malloc (aarch64-softmmu/qemu-system-aarch64+0x2cb1e70)
#1 0xfffd1af0e254 (/usr/lib64/iscsi/libiscsi.so.8+0xe254)
#2 0xfffd1af0d174 (/usr/lib64/iscsi/libiscsi.so.8+0xd174)
#3 0xfffd1af19fac (/usr/lib64/iscsi/libiscsi.so.8+0x19fac)
#4 0xaaad45f9acc8 in iscsi_process_read /qemu/block/iscsi.c:403:5
#5 0xaaad4623733c in aio_dispatch_handler /qemu/util/aio-posix.c:467:9
#6 0xaaad4622f350 in aio_dispatch_handlers /qemu/util/aio-posix.c:510:20
#7 0xaaad4622f350 in aio_dispatch /qemu/util/aio-posix.c:520
#8 0xaaad46215944 in aio_ctx_dispatch /qemu/util/async.c:298:5
#9 0xfffd1bed12f4 in g_main_context_dispatch (/lib64/libglib-2.0.so.0+0x512f4)
#10 0xaaad46227de0 in glib_pollfds_poll /qemu/util/main-loop.c:219:9
#11 0xaaad46227de0 in os_host_main_loop_wait /qemu/util/main-loop.c:242
#12 0xaaad46227de0 in main_loop_wait /qemu/util/main-loop.c:518
#13 0xaaad43d9d60c in qemu_main_loop /qemu/softmmu/vl.c:1662:9
#14 0xaaad4607a5b0 in main /qemu/softmmu/main.c:49:5
#15 0xfffd1a460b9c in __libc_start_main (/lib64/libc.so.6+0x20b9c)
#16 0xaaad43320740 in _start (aarch64-softmmu/qemu-system-aarch64+0x2c00740)
Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20200418062602.10776-1-kuhn.chenqun@huawei.com
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Bruce Rogers <brogers@suse.com>
Git-commit: 2e1dcbc0c2af64fcb17009eaf2ceedd81be2b27f
References: bsc#1179467
While processing ARP/NCSI packets in 'arp_input' or 'ncsi_input'
routines, ensure that pkt_len is large enough to accommodate the
respective protocol headers, lest it should do an OOB access.
Add check to avoid it.
CVE-2020-29129 CVE-2020-29130
QEMU: slirp: out-of-bounds access while processing ARP/NCSI packets
-> https://www.openwall.com/lists/oss-security/2020/11/27/1
Reported-by: Qiuhao Li <Qiuhao.Li@outlook.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <20201126135706.273950-1-ppandit@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Bruce Rogers <brogers@suse.com>
[BR: no NCSI code, so only ARP patched]
Git-commit: 705df5466c
References: bsc#1182968, CVE-2021-3416
Some NIC supports loopback mode and this is done by calling
nc->info->receive() directly which in fact suppresses the effort of
reentrancy check that is done in qemu_net_queue_send().
Unfortunately we can't use qemu_net_queue_send() here since for
loopback there's no sender as peer, so this patch introduce a
qemu_receive_packet() which is used for implementing loopback mode
for a NIC with this check.
NIC that supports loopback mode will be converted to this helper.
This is intended to address CVE-2021-3416.
Cc: Prasad J Pandit <ppandit@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Bruce Rogers <brogers@suse.com>
Git-commit: 3de46e6fc4
References: bsc#1182577, CVE-2021-20257
During procss_tx_desc(), driver can try to chain data descriptor with
legacy descriptor, when will lead underflow for the following
calculation in process_tx_desc() for bytes:
if (tp->size + bytes > msh)
bytes = msh - tp->size;
This will lead a infinite loop. So check and fail early if tp->size if
greater or equal to msh.
Reported-by: Alexander Bulekov <alxndr@bu.edu>
Reported-by: Cheolwoo Myung <cwmyung@snu.ac.kr>
Reported-by: Ruhr-University Bochum <bugs-syssec@rub.de>
Cc: Prasad J Pandit <ppandit@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Bruce Rogers <brogers@suse.com>
Git-commit: 8132122889
References: bsc#1181108, CVE-2020-29443
A case was reported where s->io_buffer_index can be out of range.
The report skimped on the details but it seems to be triggered
by s->lba == -1 on the READ/READ CD paths (e.g. by sending an
ATAPI command with LBA = 0xFFFFFFFF). For now paper over it
with assertions. The first one ensures that there is no overflow
when incrementing s->io_buffer_index, the second checks for the
buffer overrun.
Note that the buffer overrun is only a read, so I am not sure
if the assertion failure is actually less harmful than the overrun.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20201201120926.56559-1-pbonzini@redhat.com
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Bruce Rogers <brogers@suse.com>
Git-commit: 7564bf7701
References: bsc#1178174, CVE-2020-27617
eth_get_gso_type() routine returns segmentation offload type based on
L3 protocol type. It calls g_assert_not_reached if L3 protocol is
unknown, making the following return statement unreachable. Remove the
g_assert call, it maybe triggered by a guest user.
Reported-by: Gaoning Pan <pgn@zju.edu.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Bruce Rogers <brogers@suse.com>
Git-commit: 1be90ebecc
References: bsc#1176684, CVE-2020-25625
While servicing OHCI transfer descriptors(TD), ohci_service_iso_td
retires a TD if it has passed its time frame. It does not check if
the TD was already processed once and holds an error code in TD_CC.
It may happen if the TD list has a loop. Add check to avoid an
infinite loop condition.
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Message-id: 20200915182259.68522-3-ppandit@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Bruce Rogers <brogers@suse.com>
Git-commit: f50ab86a26
References: bsc#1172383, CVE-2020-13362
A guest user may set 'reply_queue_head' field of MegasasState to
a negative value. Later in 'megasas_lookup_frame' it is used to
index into s->frames[] array. Use unsigned type to avoid OOB
access issue.
Also check that 'index' value stays within s->frames[] bounds
through the while() loop in 'megasas_lookup_frame' to avoid OOB
access.
Reported-by: Ren Ding <rding@gatech.edu>
Reported-by: Hanqing Zhao <hanqing@gatech.edu>
Reported-by: Alexander Bulekov <alxndr@bu.edu>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Acked-by: Alexander Bulekov <alxndr@bu.edu>
Message-Id: <20200513192540.1583887-2-ppandit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Bruce Rogers <brogers@suse.com>
References: bsc#1172385, CVE-2020-12829
This is in place of commit 4a1f253adb which added the log.h
reference, but did other changes we're not interested in.
Signed-off-by: Bruce Rogers <brogers@suse.com>
Git-commit: f3a60058c9
References: bsc#1172385, CVE-2020-12829
The value from twoD_foreground (which is in host endian format) must
be converted to the endianness of the framebuffer (currently always
little endian) before it can be used to perform the fill operation.
Signed-off-by: Marcus Comstedt <marcus@mc.pp.se>
Reviewed-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Bruce Rogers <brogers@suse.com>
Git-commit: eb76243c9d
References: bsc#1172385, CVE-2020-12829
Set the changed memory region dirty after performed a 2D operation to
ensure that the screen is updated properly.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Bruce Rogers <brogers@suse.com>
Git-commit: 33159dd7ce
References: bsc#1172385, CVE-2020-12829
Display updates and drawing hardware cursor did not work when frame
buffer address was non-zero. Fix this by taking the frame buffer
address into account in these cases. This fixes screen dragging on
AmigaOS. Based on patch by Sebastian Bauer.
Signed-off-by: Sebastian Bauer <mail@sebastianbauer.info>
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Bruce Rogers <brogers@suse.com>
Git-commit: 06cb926aaa
References: bsc#1172385, CVE-2020-12829
The sm501 currently implements only a very limited set of raster operation
modes. After this change, unknown raster operation modes are logged so
these can be easily spotted.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Bruce Rogers <brogers@suse.com>
Git-commit: debc7e7dad
References: bsc#1172385, CVE-2020-12829
Add support for the negated destination operation mode. This is used e.g.
by AmigaOS for the INVERSEVID drawing mode. With this change, the cursor
in the shell and non-immediate window adjustment are working now.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Bruce Rogers <brogers@suse.com>
Git-commit: 54b2a4339c
References: bsc#1172385, CVE-2020-12829
Before, crt_h_total was used for src_width and dst_width. This is a
property of the current display setting and not relevant for the 2D
operation that also can be done off-screen. The pitch register's purpose
is to describe line pitch relevant of the 2D operation.
Signed-off-by: Sebastian Bauer <mail@sebastianbauer.info>
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Bruce Rogers <brogers@suse.com>
Git-commit: 5690d9ecef
References: bsc#1172385, CVE-2020-12829
These are not really implemented (just return zero or default values)
but add these so guests accessing them can run.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Bruce Rogers <brogers@suse.com>
Git-commit: 6a2a5aae02
References: bsc#1172385, CVE-2020-12829
Rework HWC handling to simplify it and fix cursor not updating on
screen as needed. Previously cursor was not updated because checking
for changes in a line overrode the update flag set for the cursor but
fixing this is not enough because the cursor should also be updated if
its shape or location changes. Introduce hwc_invalidate() function to
handle that similar to other display controller models.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Tested-by: Aurelien Jarno <aurelien@aurel32.net>
Message-id: 6970a5e9868b7246656c1d02038dc5d5fa369507.1492787889.git.balaton@eik.bme.hu
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Bruce Rogers <brogers@suse.com>
Git-commit: e423455c4f
References: bsc#1172478, CVE-2020-13765
Both, "rom->addr" and "addr" are derived from the binary image
that can be loaded with the "-kernel" paramer. The code in
rom_copy() then calculates:
d = dest + (rom->addr - addr);
and uses "d" as destination in a memcpy() some lines later. Now with
bad kernel images, it is possible that rom->addr is smaller than addr,
thus "rom->addr - addr" gets negative and the memcpy() then tries to
copy contents from the image to a bad memory location. This could
maybe be used to inject code from a kernel image into the QEMU binary,
so we better fix it with an additional sanity check here.
Cc: qemu-stable@nongnu.org
Reported-by: Guangming Liu
Buglink: https://bugs.launchpad.net/qemu/+bug/1844635
Message-Id: <20190925130331.27825-1-thuth@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Bruce Rogers <brogers@suse.com>
Git-commit: 369ff955a8
References: bsc#1172384, CVE-2020-13361
A guest user may set channel frame count via es1370_write()
such that, in es1370_transfer_audio(), total frame count
'size' is lesser than the number of frames that are processed
'cnt'.
int cnt = d->frame_cnt >> 16;
int size = d->frame_cnt & 0xffff;
if (size < cnt), it results in incorrect calculations leading
to OOB access issue(s). Add check to avoid it.
Reported-by: Ren Ding <rding@gatech.edu>
Reported-by: Hanqing Zhao <hanqing@gatech.edu>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-id: 20200514200608.1744203-1-ppandit@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Bruce Rogers <brogers@suse.com>
When executing script in lsi_execute_script(), the LSI scsi adapter
emulator advances 's->dsp' index to read next opcode. This can lead
to an infinite loop if the next opcode is empty. Move the existing
loop exit after 10k iterations so that it covers no-op opcodes as
well.
Reported-by: Bugs SysSec <bugs-syssec@rub.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit de594e4765)
[BR: BSC#1146873 CVE-2019-12068 - minor trace related tweak applied]
Signed-off-by: Bruce Rogers <brogers@suse.com>
Git-commit: 693fd2acdf
References: bsc#1166240
When querying an iSCSI server for the provisioning status of blocks (via
GET LBA STATUS), Qemu only validates that the response descriptor zero's
LBA matches the one requested. Given the SCSI spec allows servers to
respond with the status of blocks beyond the end of the LUN, Qemu may
have its heap corrupted by clearing/setting too many bits at the end of
its allocmap for the LUN.
A malicious guest in control of the iSCSI server could carefully program
Qemu's heap (by selectively setting the bitmap) and then smash it.
This limits the number of bits that iscsi_co_block_status() will try to
update in the allocmap so it can't overflow the bitmap.
Fixes: CVE-2020-1711
Cc: qemu-stable@nongnu.org
Signed-off-by: Felipe Franciosi <felipe@nutanix.com>
Signed-off-by: Peter Turschmid <peter.turschm@nutanix.com>
Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
[BR: Converted patch from byte based to sector based]
Signed-off-by: Bruce Rogers <brogers@suse.com>
Git-commit: 2faae0f778f818fadc873308f983289df697eb93
References: bsc#1170940, CVE-2020-1983
The q pointer is updated when the mbuf data is moved from m_dat to
m_ext.
m_ext buffer may also be realloc()'ed and moved during m_cat():
q should also be updated in this case.
Reported-by: Aviv Sasson <asasson@paloaltonetworks.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
(cherry picked from commit 9bd6c5913271eabcb7768a58197ed3301fe19f2d)
Signed-off-by: Bruce Rogers <brogers@suse.com>
Git-commit: 68ccb8021a838066f0951d4b2817eb6b6f10a843
References: bsc#1163018, CVE-2020-8608
Various calls to snprintf() assume that snprintf() returns "only" the
number of bytes written (excluding terminating NUL).
https://pubs.opengroup.org/onlinepubs/9699919799/functions/snprintf.html#tag_16_159_04
"Upon successful completion, the snprintf() function shall return the
number of bytes that would be written to s had n been sufficiently
large excluding the terminating null byte."
Before patch ce131029, if there isn't enough room in "m_data" for the
"DCC ..." message, we overflow "m_data".
After the patch, if there isn't enough room for the same, we don't
overflow "m_data", but we set "m_len" out-of-bounds. The next time an
access is bounded by "m_len", we'll have a buffer overflow then.
Use slirp_fmt*() to fix potential OOB memory access.
Reported-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Message-Id: <20200127092414.169796-7-marcandre.lureau@redhat.com>
[Since porting from a quite different now libslirp, quite a bit changed]
Signed-off-by: Bruce Rogers <brogers@suse.com>
Git-commit: 82ebe9c370a0e2970fb5695aa19aa5214a6a1c80
References: bsc#1161066, CVE-2020-7039, bsc#1163018, CVE-2020-8608
While emulating services in tcp_emu(), it uses 'mbuf' size
'm->m_size' to write commands via snprintf(3). Use M_FREEROOM(m)
size to avoid possible OOB access.
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Message-Id: <20200109094228.79764-3-ppandit@redhat.com>
[Since porting from a quite different now libslirp, quite a bit changed]
Signed-off-by: Bruce Rogers <brogers@suse.com>
Git-commit: ce131029d6d4a405cb7d3ac6716d03e58fb4a5d9
References: bsc#1161066, CVE-2020-7039, bsc#1163018, CVE-2020-8608
While emulating IRC DCC commands, tcp_emu() uses 'mbuf' size
'm->m_size' to write DCC commands via snprintf(3). This may
lead to OOB write access, because 'bptr' points somewhere in
the middle of 'mbuf' buffer, not at the start. Use M_FREEROOM(m)
size to avoid OOB access.
Reported-by: Vishnu Dev TJ <vishnudevtj@gmail.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Message-Id: <20200109094228.79764-2-ppandit@redhat.com>
[Since porting from a quite different now libslirp, quite a bit changed]
Signed-off-by: Bruce Rogers <brogers@suse.com>
Git-commit: 2655fffed7a9e765bcb4701dd876e9dab975f289
References: bsc#1161066, CVE2020-7039, bsc#1161066, CVS-2020-7039
The main loop only checks for one available byte, while we sometimes
need two bytes.
Signed-off-by: Bruce Rogers <brogers@suse.com>
Git-commit: 30648c03b27fb8d9611b723184216cd3174b6775
References: bsc#1163018, CVE-2020-8608
Various calls to snprintf() in libslirp assume that snprintf() returns
"only" the number of bytes written (excluding terminating NUL).
https://pubs.opengroup.org/onlinepubs/9699919799/functions/snprintf.html#tag_16_159_04
"Upon successful completion, the snprintf() function shall return the
number of bytes that would be written to s had n been sufficiently
large excluding the terminating null byte."
Introduce slirp_fmt() that handles several pathological cases the
way libslirp usually expect:
- treat error as fatal (instead of silently returning -1)
- fmt0() will always \0 end
- return the number of bytes actually written (instead of what would
have been written, which would usually result in OOB later), including
the ending \0 for fmt0()
- warn if truncation happened (instead of ignoring)
Other less common cases can still be handled with strcpy/snprintf() etc.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Message-Id: <20200127092414.169796-2-marcandre.lureau@redhat.com>
[Since porting from a quite different now libslirp, quite a bit changed]
Signed-off-by: Bruce Rogers <brogers@suse.com>
For some reason, EMU_IDENT is not like other "emulated" protocols and
tries to reconstitute the original buffer, if it came in multiple
packets. Unfortunately, it does so wrongly, as it doesn't respect the
sbuf circular buffer appending rules, nor does it maintain some of the
invariants (rptr is incremented without bounds, etc): this leads to
further memory corruption revealed by ASAN or various malloc
errors. Furthermore, the so_rcv buffer is regularly flushed, so there
is no guarantee that buffer reconstruction will do what is expected.
Instead, do what the function comment says: "XXX Assumes the whole
command came in one packet", and don't touch so_rcv.
Related to: https://bugzilla.redhat.com/show_bug.cgi?id=1664205
Cc: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Using ip_deq after m_free might read pointers from an allocation reuse.
This would be difficult to exploit, but that is still related with
CVE-2019-14378 which generates fragmented IP packets that would trigger this
issue and at least produce a DoS.
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
(from libslirp.git commit c59279437eda91841b9d26079c70b8a540d41204)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
[BR: BSC#1149811 CVE-2019-15890]
Signed-off-by: Bruce Rogers <brogers@suse.com>
When the first fragment does not fit in the preallocated buffer, q will
already be pointing to the ext buffer, so we mustn't try to update it.
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
(cherry picked from commit 126c04acbabd7ad32c2b018fe10dfac2a3bc1210)
[LY: CVE-2019-14378 BSC#1143794]
Signed-off-by: Liang Yan <lyan@suse.com>
The interface names in qemu-bridge-helper are defined to be
of size IFNAMSIZ(=16), including the terminating null('\0') byte.
The same is applied to interface names read from 'bridge.conf'
file to form ACLs rules. If user supplied '--br=bridge' name
is not restricted to the same length, it could lead to ACL bypass
issue. Restrict bridge name to IFNAMSIZ, including null byte.
Reported-by: Riccardo Schirone <rschiron@redhat.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
[LY: BSC#1140402 CVE-2019-13164]
Signed-off-by: Liang Yan <lyan@suse.com>
When releasing spice resources in release_resource() routine,
if release info object 'ext.info' is null, it leads to null
pointer dereference. Add check to avoid it.
Reported-by: Bugs SysSec <bugs-syssec@rub.de>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-id: 20190425063534.32747-1-ppandit@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit d52680fc93)
[LY: BSC#1135902 CVE-2019-12155]
Signed-off-by: Liang Yan <lyan@suse.com>
md-clear is a new CPUID bit which is set when microcode provides the
mechanism to invoke a flush of various exploitable CPU buffers by invoking
the VERW instruction. Add the new feature.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[BR: BSC#1111331 CVE-2018-12126 CVE-2018-12127 CVE-2018-12130
CVE-2019-11091]
Signed-off-by: Bruce Rogers <brogers@suse.com>
Since Linux switched to blk-mq as the default in Linux commit
5c279bd9e406 ("scsi: default to scsi-mq"), virtio-scsi LUNs consume
about 10x as much guest kernel memory.
This commit allows you to choose the virtqueue size for each
virtio-scsi-pci controller like this:
-device virtio-scsi-pci,id=scsi,virtqueue_size=16
The default is still 128 as before. Using smaller virtqueue_size
allows many more disks to be added to small memory virtual machines.
For a 1 vCPU, 500 MB, no swap VM I observed:
With scsi-mq enabled (upstream kernel): 175 disks
-"- ditto -"- virtqueue_size=64: 318 disks
-"- ditto -"- virtqueue_size=16: 775 disks
With scsi-mq disabled (kernel before 5c279bd9e406): 1755 disks
Note that to have any effect, this requires a kernel patch:
https://lkml.org/lkml/2017/8/10/689
Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
Message-Id: <20170810165255.20865-1-rjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 5c0919d020)
[BR: FATE#327255]
Signed-off-by: Bruce Rogers <brogers@suse.com>
depth == 0 is used to indicate 256 color modes. Our region calculation
goes wrong in that case. So detect that and just take the safe code
path we already have for the wraparound case.
While being at it also catch depth == 15 (where our region size
calculation goes wrong too). And make the comment more verbose,
explaining what is going on here.
Without this windows guest install might trigger an assert due to trying
to check dirty bitmap outside the snapshot region.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1575541
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20180514103117.21059-1-kraxel@redhat.com
(cherry picked from commit a89fe6c329)
Signed-off-by: Bruce Rogers <brogers@suse.com>
Typically the scanline length and the line offset are identical. But
in case they are not our calculation for region_end is incorrect. Using
line_offset is fine for all scanlines, except the last one where we have
to use the actual scanline length.
Fixes: CVE-2018-7550
Reported-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Prasad J Pandit <pjp@fedoraproject.org>
Tested-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Message-id: 20180309143704.13420-1-kraxel@redhat.com
(cherry picked from commit 7cdc61becd)
[BR: BSC#1084604 CVE-2018-7858 (it appears the above CVE ref. is wrong)]
Signed-off-by: Bruce Rogers <brogers@suse.com>
Commit "3d90c62548 vga: stop passing pointers to vga_draw_line*
functions" is incomplete. It doesn't handle the case that the vga
rendering code tries to create a shared surface, i.e. a pixman image
backed by vga video memory. That can not work in case the guest display
wraps from end of video memory to the start. So force shadowing in that
case. Also adjust the snapshot region calculation.
Can trigger with cirrus only, when programming vbe modes using the bochs
api (stdvga, also qxl and virtio-vga in vga compat mode) wrap arounds
can't happen.
Fixes: CVE-2017-13672
Fixes: 3d90c62548
Cc: P J P <ppandit@redhat.com>
Reported-by: David Buchanan <d@vidbuchanan.co.uk>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20171010141323.14049-3-kraxel@redhat.com
(cherry picked from commit 28f77de26a)
[BR: Support for BSC#1084604 (and other useful vga fixes)]
Signed-off-by: Bruce Rogers <brogers@suse.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 362f811793)
[BR: Support for BSC#1084604 (and other useful vga fixes)]
Signed-off-by: Bruce Rogers <brogers@suse.com>
vga display update mis-calculated the region for the dirty bitmap
snapshot in case split screen mode is used. This can trigger an
assert in cpu_physical_memory_snapshot_get_dirty().
Impact: DoS for privileged guest users.
Fixes: CVE-2017-13673
Fixes: fec5e8c92b
Cc: P J P <ppandit@redhat.com>
Reported-by: David Buchanan <d@vidbuchanan.co.uk>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170828123307.15392-1-kraxel@redhat.com
(cherry picked from commit e65294157d)
[BR: Support for BSC#1084604 (and other useful vga fixes)]
Signed-off-by: Bruce Rogers <brogers@suse.com>
vga display update mis-calculated the region for the dirty bitmap
snapshot in case the scanlines are padded. This can triggere an
assert in cpu_physical_memory_snapshot_get_dirty().
Fixes: fec5e8c92b
Reported-by: Kevin Wolf <kwolf@redhat.com>
Reported-by: 李强 <liqiang6-s@360.cn>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170509104839.19415-1-kraxel@redhat.com
(cherry picked from commit bfc56535f7)
[BR: Support for BSC#1084604 (and other useful vga fixes)]
Signed-off-by: Bruce Rogers <brogers@suse.com>
The vga code clears the dirty bits *after* reading the framebuffer
memory. So if the guest framebuffer updates hits the race window
between vga reading the framebuffer and vga clearing the dirty bits
vga will miss that update
Fix it by using the new memory_region_copy_and_clear_dirty()
memory_region_copy_get_dirty() functions. That way we clear the
dirty bitmap before reading the framebuffer. Any guest display
updates happening in parallel will be properly tracked in the
dirty bitmap then and the next display refresh will pick them up.
Problem triggers with mttcg only. Before mttcg was merged tcg
never ran in parallel to vga emulation. Using kvm will hide the
problem too, due to qemu operating on a userspace copy of the
kernel's dirty bitmap.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170421091632.30900-5-kraxel@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit fec5e8c92b)
[BR: Support for BSC#1084604 (and other useful vga fixes)]
Signed-off-by: Bruce Rogers <brogers@suse.com>
Add vga_scanline_invalidated helper to check whenever a scanline was
invalidated. Add a sanity check to fix OOB read access for display
heights larger than 2048.
Only cirrus uses this, for hardware cursor rendering, so having this
work properly for the first 2048 scanlines only shouldn't be a problem
as the cirrus can't handle large resolutions anyway. Also changing the
invalidated_y_table size would break live migration.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170421091632.30900-4-kraxel@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit f3289f6f0f)
[BR: Support for BSC#1084604 (and other useful vga fixes)]
Signed-off-by: Bruce Rogers <brogers@suse.com>
This patch adds support for getting and using a local copy of the dirty
bitmap.
memory_region_snapshot_and_clear_dirty() will create a snapshot of the
dirty bitmap for the specified range, clear the dirty bitmap and return
the copy. The returned bitmap can be a bit larger than requested, the
range is expanded so the code can copy unsigned longs from the bitmap
and avoid atomic bit update operations.
memory_region_snapshot_get_dirty() will return the dirty status of
pages, pretty much like memory_region_get_dirty(), but using the copy
returned by memory_region_copy_and_clear_dirty().
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170421091632.30900-3-kraxel@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 8deaf12ca1)
[BR: Support for BSC#1084604 (and other useful vga fixes)]
Signed-off-by: Bruce Rogers <brogers@suse.com>
Recent commit 5b76ef50f6 fixed a race where v9fs_co_open2() could
possibly overwrite a fid path with v9fs_path_copy() while it is being
accessed by some other thread, ie, use-after-free that can be detected
by ASAN with a custom 9p client.
It turns out that the same can happen at several locations where
v9fs_path_copy() is used to set the fid path. The fix is again to
take the write lock.
Fixes CVE-2018-19364.
Cc: P J P <ppandit@redhat.com>
Reported-by: zhibin hu <noirfate@gmail.com>
Reviewed-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 5b3c77aa58)
[BR: BSC#1116717 CVE-2018-19364]
Signed-off-by: Bruce Rogers <brogers@suse.com>
The assumption that the fid cannot be used by any other operation is
wrong. At least, nothing prevents a misbehaving client to create a
file with a given fid, and to pass this fid to some other operation
at the same time (ie, without waiting for the response to the creation
request). The call to v9fs_path_copy() performed by the worker thread
after the file was created can race with any access to the fid path
performed by some other thread. This causes use-after-free issues that
can be detected by ASAN with a custom 9p client.
Unlike other operations that only read the fid path, v9fs_co_open2()
does modify it. It should hence take the write lock.
Cc: P J P <ppandit@redhat.com>
Reported-by: zhibin hu <noirfate@gmail.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 5b76ef50f6)
[BR: BSC#1116717 CVE-2018-19364]
Signed-off-by: Bruce Rogers <brogers@suse.com>
When using the 9P2000.u version of the protocol, the following shell
command line in the guest can cause QEMU to crash:
while true; do rm -rf aa; mkdir -p a/b & touch a/b/c & mv a aa; done
With 9P2000.u, file renaming is handled by the WSTAT command. The
v9fs_wstat() function calls v9fs_complete_rename(), which calls
v9fs_fix_path() for every fid whose path is affected by the change.
The involved calls to v9fs_path_copy() may race with any other access
to the fid path performed by some worker thread, causing a crash like
shown below:
Thread 12 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
0x0000555555a25da2 in local_open_nofollow (fs_ctx=0x555557d958b8, path=0x0,
flags=65536, mode=0) at hw/9pfs/9p-local.c:59
59 while (*path && fd != -1) {
(gdb) bt
path=0x0, flags=65536, mode=0) at hw/9pfs/9p-local.c:59
path=0x0) at hw/9pfs/9p-local.c:92
fs_path=0x555556b56858, stbuf=0x7fff84830ef0) at hw/9pfs/9p-local.c:185
path=0x555556b56858, stbuf=0x7fff84830ef0) at hw/9pfs/cofile.c:53
at hw/9pfs/9p.c:1083
at util/coroutine-ucontext.c:116
(gdb)
The fix is to take the path write lock when calling v9fs_complete_rename(),
like in v9fs_rename().
Impact: DoS triggered by unprivileged guest users.
Fixes: CVE-2018-19489
Cc: P J P <ppandit@redhat.com>
Reported-by: zhibin hu <noirfate@gmail.com>
Reviewed-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 1d20398694)
[BR: BSC#1117275]
Signed-off-by: Bruce Rogers <brogers@suse.com>
Open files and directories with O_NOFOLLOW to avoid symlinks attacks.
While being at it also add O_CLOEXEC.
usb-mtp only handles regular files and directories and ignores
everything else, so users should not see a difference.
Because qemu ignores symlinks, carrying out a successful symlink attack
requires swapping an existing file or directory below rootdir for a
symlink and winning the race against the inotify notification to qemu.
Fixes: CVE-2018-16872
Cc: Prasad J Pandit <ppandit@redhat.com>
Cc: Bandan Das <bsd@redhat.com>
Reported-by: Michael Hanselmann <public@hansmi.ch>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Michael Hanselmann <public@hansmi.ch>
Message-id: 20181213122511.13853-1-kraxel@redhat.com
(cherry picked from commit bab9df35ce)
[BR: BSC#1119493]
Signed-off-by: Bruce Rogers <brogers@suse.com>
While emulating identification protocol, tcp_emu() does not check
available space in the 'sc_rcv->sb_data' buffer. It could lead to
heap buffer overflow issue. Add check to avoid it.
Reported-by: Kira <864786842@qq.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
(cherry picked from commit a7104eda7d)
[BR: BSC#1123156 CVE-2019-6778, modify patch to use spaces instead of tabs]
Signed-off-by: Bruce Rogers <brogers@suse.com>
Do an update of system_time_msr address every time before reading
the value of tsc_timestamp from guest's kvmclock page.
There is no other code paths which ensure that qemu has an up-to-date
value of system_time_msr. So, force this update on guest's tsc_timestamp
reading.
This bug causes effect on those nested setups which turn off TPR access
interception for L2 guests and that access being intercepted by L0 doesn't
show up in L1.
Linux bootstrap initiate kvmclock before APIC initializing causing TPR access.
That's why on L1 guests, having TPR interception turned on for L2, the effect
of the bug is not revealed.
This patch fixes this problem by making sure it knows the correct
system_time_msr address every time it is needed.
Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
Message-Id: <1496054944-25623-1-git-send-email-dplotnikov@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit e2b6c1712e)
[BR: BSC#1113231]
Signed-off-by: Bruce Rogers <brogers@suse.com>
While writing a message in 'lsi_do_msgin', message length value
in 'msg_len' could be invalid due to an invalid migration stream.
Add an assertion to avoid an out of bounds access, and reject
the incoming migration data if it contains an invalid message
length.
Discovered by Deja vu Security. Reported by Oracle.
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <20181026194314.18663-1-ppandit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit e58ccf0396)
[BR: BSC#1114422 CVE-2018-18849]
Signed-off-by: Bruce Rogers <brogers@suse.com>
There should not be a reason for passing a packet size greater than
INT_MAX. It's usually a hint of bug somewhere, so ignore packet size
greater than INT_MAX in qemu_deliver_packet_iov()
CC: qemu-stable@nongnu.org
Reported-by: Daniel Shapira <daniel@twistlock.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit 1592a99470)
[LD: BSC#1111013 CVE-2018-17963]
Signed-off-by: Larry Dewey <ldewey@suse.com>
In pcnet_receive(), we try to assign size_ to size which converts from
size_t to integer. This will cause troubles when size_ is greater
INT_MAX, this will lead a negative value in size and it can then pass
the check of size < MIN_BUF_SIZE which may lead out of bound access
for both buf and buf1.
Fixing by converting the type of size to size_t.
CC: qemu-stable@nongnu.org
Reported-by: Daniel Shapira <daniel@twistlock.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit b1d80d12c5)
[LD: BSC#1111010 CVE-2018-17962]
Signed-off-by: Larry Dewey <ldewey@suse.com>
In rtl8139_do_receive(), we try to assign size_ to size which converts
from size_t to integer. This will cause troubles when size_ is greater
INT_MAX, this will lead a negative value in size and it can then pass
the check of size < MIN_BUF_SIZE which may lead out of bound access of
for both buf and buf1.
Fixing by converting the type of size to size_t.
CC: qemu-stable@nongnu.org
Reported-by: Daniel Shapira <daniel@twistlock.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit 1a326646fe)
[LD: BSC#1111006 CVE-2018-17958]
Signed-off-by: Larry Dewey <ldewey@suse.com>
In ne2000_receive(), we try to assign size_ to size which converts
from size_t to integer. This will cause troubles when size_ is greater
INT_MAX, this will lead a negative value in size and it can then pass
the check of size < MIN_BUF_SIZE which may lead out of bound access of
for both buf and buf1.
Fixing by converting the type of size to size_t.
CC: qemu-stable@nongnu.org
Reported-by: Daniel Shapira <daniel@twistlock.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit fdc89e90fa)
[LD: BSC#1110910 CVE-2018-10839]
Signed-off-by: Larry Dewey <ldewey@suse.com>
The following files have received the header qemu/uuid.h because
they are calling UUID_FMT, which it defines:
- block/vdi.c
- hw/xenpv/xen_domainbuild.c
- hw/ppc/spapr.c
- qmp.c
[LD: BSC#1106222 CVE-2018-15746]
Signed-off-by: Larry Dewey <ldewey@suse.com>
Remove -sandbox option if the host is not capable of TSYNC, since the
sandbox will fail at setup time otherwise. This will help libvirt, for
ex, to figure out if -sandbox will work.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Eduardo Otubo <otubo@redhat.com>
Acked-by: Eduardo Otubo <otubo@redhat.com>
(cherry picked from commit 5780760f5e)
[LD: BSC#1106222 CVE-2018-15746]
Signed-off-by: Larry Dewey <ldewey@suse.com>
If CONFIG_SECCOMP is undefined, the option 'elevatedprivileges' remains
compiled. This would make libvirt set the corresponding capability and
then trigger failure during guest startup. This patch moves the code
regarding seccomp command line options to qemu-seccomp.c file and
wraps qemu_opts_foreach finding sandbox option with CONFIG_SECCOMP.
Because parse_sandbox() is moved into qemu-seccomp.c file, change
seccomp_start() to static function.
Signed-off-by: Yi Min Zhao <zyimin@linux.ibm.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Tested-by: Ján Tomko <jtomko@redhat.com>
Acked-by: Eduardo Otubo <otubo@redhat.com>
(cherry picked from commit 9d0fdecbad)
[LD: BSC#1106222 CVE-2018-15746]
Signed-off-by: Larry Dewey <ldewey@suse.com>
When using "-seccomp on", the seccomp policy is only applied to the
main thread, the vcpu worker thread and other worker threads created
after seccomp policy is applied; the seccomp policy is not applied to
e.g. the RCU thread because it is created before the seccomp policy is
applied and SECCOMP_FILTER_FLAG_TSYNC isn't used.
This can be verified with
for task in /proc/`pidof qemu`/task/*; do cat $task/status | grep Secc ; done
Seccomp: 2
Seccomp: 0
Seccomp: 0
Seccomp: 2
Seccomp: 2
Seccomp: 2
Starting with libseccomp 2.2.0 and kernel >= 3.17, we can use
seccomp_attr_set(ctx, > SCMP_FLTATR_CTL_TSYNC, 1) to update the policy
on all threads.
libseccomp requirement was bumped to 2.2.0 in previous patch.
libseccomp should fail to set the filter if it can't honour
SCMP_FLTATR_CTL_TSYNC (untested), and thus -sandbox will now fail on
kernel < 3.17.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Acked-by: Eduardo Otubo <otubo@redhat.com>
(cherry picked from commit 70dfabeaa7)
[LD: BSC#1106222 CVE-2018-15746]
Signed-off-by: Larry Dewey <ldewey@suse.com>
The following patch is going to require TSYNC, which is only available
since libseccomp 2.2.0.
libseccomp 2.2.0 was released February 12, 2015.
According to repology, libseccomp version in different distros:
RHEL-7: 2.3.1
Debian (Stretch): 2.3.1
OpenSUSE Leap 15: 2.3.2
Ubuntu (Xenial): 2.3.1
This will drop support for -sandbox on:
Debian (Jessie): 2.1.1 (but 2.2.3 in backports)
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Acked-by: Eduardo Otubo <otubo@redhat.com>
(cherry picked from commit d0699bd37c)
[LD: BSC#1106222 CVE-2018-15746]
Signed-off-by: Larry Dewey <ldewey@suse.com>
The upcoming libseccomp release should have SCMP_ACT_KILL_PROCESS
action (https://github.com/seccomp/libseccomp/issues/96).
SCMP_ACT_KILL_PROCESS is preferable to immediately terminate the
offending process, rather than having the SIGSYS handler running.
Use SECCOMP_GET_ACTION_AVAIL to check availability of kernel support,
as libseccomp will fallback on SCMP_ACT_KILL otherwise, and we still
prefer SCMP_ACT_TRAP.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Acked-by: Eduardo Otubo <otubo@redhat.com>
(cherry picked from commit bda08a5764)
[LD: BSC#1106222 CVE-2018-15746]
Signed-off-by: Larry Dewey <ldewey@suse.com>
Current and upcoming mesa releases rely on a shader disk cash. It uses
a thread job queue with low priority, set with
sched_setscheduler(SCHED_IDLE). However, that syscall is rejected by
the "resourcecontrol" seccomp qemu filter.
Since it should be safe to allow lowering thread priority, let's allow
scheduling thread to idle policy.
Related to:
https://bugzilla.redhat.com/show_bug.cgi?id=1594456
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Acked-by: Eduardo Otubo <otubo@redhat.com>
(cherry picked from commit 056de1e894)
[LD: BSC#1106222 CVE-2018-15746]
Signed-off-by: Larry Dewey <ldewey@suse.com>
This patch adds [,resourcecontrol=deny] to `-sandbox on' option. It
blacklists all process affinity and scheduler priority system calls to
avoid any bigger of the process.
Signed-off-by: Eduardo Otubo <otubo@redhat.com>
(cherry picked from commit 24f8cdc572)
[LD: BSC#1106222 CVE-2018-15746]
Signed-off-by: Larry Dewey <ldewey@suse.com>
This patch adds [,spawn=deny] argument to `-sandbox on' option. It
blacklists fork and execve system calls, avoiding Qemu to spawn new
threads or processes.
Signed-off-by: Eduardo Otubo <otubo@redhat.com>
(cherry picked from commit 995a226f88)
[LD: BSC#1106222 CVE-2018-15746]
Signed-off-by: Larry Dewey <ldewey@suse.com>
This patch introduces the new argument
[,elevateprivileges=allow|deny|children] to the `-sandbox on'. It allows
or denies Qemu process to elevate its privileges by blacklisting all
set*uid|gid system calls. The 'children' option will let forks and
execves run unprivileged.
Signed-off-by: Eduardo Otubo <otubo@redhat.com>
(cherry picked from commit 73a1e64725)
[LD: BSC#1106222 CVE-2018-15746]
Signed-off-by: Larry Dewey <ldewey@suse.com>
This patch introduces the argument [,obsolete=allow] to the `-sandbox on'
option. It allows Qemu to run safely on old system that still relies on
old system calls.
Signed-off-by: Eduardo Otubo <otubo@redhat.com>
(cherry picked from commit 2b716fa6d6)
[LD: BSC#1106222 CVE-2018-15746]
Signed-off-by: Larry Dewey <ldewey@suse.com>
This patch changes the default behavior of the seccomp filter from
whitelist to blacklist. By default now all system calls are allowed and
a small black list of definitely forbidden ones was created.
Signed-off-by: Eduardo Otubo <otubo@redhat.com>
(cherry picked from commit 1bd6152ae2)
[LD: BSC#1106222 CVE-2018-15746]
Signed-off-by: Larry Dewey <ldewey@suse.com>
A number of different places across the code base use CONFIG_UUID. Some
of them are soft dependency, some are not built if libuuid is not
available, some come with dummy fallback, some throws runtime error.
It is hard to maintain, and hard to reason for users.
Since UUID is a simple standard with only a small number of operations,
it is cleaner to have a central support in libqemuutil. This patch adds
qemu_uuid_* functions that all uuid users in the code base can
rely on. Except for qemu_uuid_generate which is new code, all other
functions are just copy from existing fallbacks from other files.
Note that qemu_uuid_parse is moved without updating the function
signature to use QemuUUID, to keep this patch simple.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-Id: <1474432046-325-2-git-send-email-famz@redhat.com>
(cherry picked from commit cea25275a3)
[LD: BSC#1106222 CVE-2018-15746]
Signed-off-by: Larry Dewey <ldewey@suse.com>
AMD Zen expose the Intel equivalant to Speculative Store Bypass Disable
via the 0x80000008_EBX[25] CPUID feature bit.
This needs to be exposed to guest OS to allow them to protect
against CVE-2018-3639.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20180521215424.13520-3-berrange@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
(cherry picked from commit 403503b162)
[BR: BSC#1092885 CVE-2018-3639]
Signed-off-by: Bruce Rogers <brogers@suse.com>
"Some AMD processors only support a non-architectural means of enabling
speculative store bypass disable (SSBD). To allow a simplified view of
this to a guest, an architectural definition has been created through a new
CPUID bit, 0x80000008_EBX[25], and a new MSR, 0xc001011f. With this, a
hypervisor can virtualize the existence of this definition and provide an
architectural method for using SSBD to a guest.
Add the new CPUID feature, the new MSR and update the existing SSBD
support to use this MSR when present." (from x86/speculation: Add virtualized
speculative store bypass disable support in Linux).
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20180521215424.13520-4-berrange@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
(cherry picked from commit cfeea0c021)
[BR: BSC#1092885 CVE-2018-3639]
Signed-off-by: Bruce Rogers <brogers@suse.com>
While reading file content via 'guest-file-read' command,
'qmp_guest_file_read' routine allocates buffer of count+1
bytes. It could overflow for large values of 'count'.
Add check to avoid it.
Reported-by: Fakhri Zulkifli <mohdfakhrizulkifli@gmail.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Cc: qemu-stable@nongnu.org
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
(cherry picked from commit 141b197408)
[FL: BSC#1098735 CVE-2018-12617]
Signed-off-by: Fei Li <fli@suse.com>
While reassembling incoming fragmented datagrams, 'm_cat' routine
extends the 'mbuf' buffer, if it has insufficient room. It computes
a wrong buffer size, which leads to overwriting adjacent heap buffer
area. Correct this size computation in m_cat.
Reported-by: ZDI Disclosures <zdi-disclosures@trendmicro.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
(cherry picked from commit 864036e251)
[FL: BSC#1096223 CVE-2018-11806]
Signed-off-by: Fei Li <fli@suse.com>
Provide monitor naming of xen disks, and plumb guest driver
notification through xenstore of resizing instigated via the
monitor.
[BR: FATE#325467]
Signed-off-by: Bruce Rogers <brogers@suse.com>
Since Linux commit 313d21eeab9282e, tpm devices have their own device
class "tpm" and the cancel path must be looked up under
/sys/class/tpm/ instead of /sys/class/misc/.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
(cherry picked from commit 05b71fb207)
[LY: BSC#1079405]
Signed-off-by: Liang Yan <lyan@suse.com>
As an attempt to help the user do the right thing, warn if we
detect spec_ctrl data in the migration stream, but where the
cpu defined doesn't have the feature. This would indicate the
migration is from the quick and dirty qemu produced in January
2018 to handle Spectre v2. That qemu version exposed the IBRS
cpu feature to all vcpu types, which helped in the short term
but wasn't a well designed approach.
Warn the user that the now migrated guest needs to be restarted
as soon as possible, using the spec_ctrl cpu feature flag or a
*-IBRS vcpu model specified as appropriate.
Signed-off-by: Bruce Rogers <brogers@suse.com>
The new MSR IA32_SPEC_CTRL MSR was introduced by a recent Intel
microcode updated and can be used by OSes to mitigate
CVE-2017-5715. Unfortunately we can't change the existing CPU
models without breaking existing setups, so users need to
explicitly update their VM configuration to use the new *-IBRS
CPU model if they want to expose IBRS to guests.
The new CPU models are simple copies of the existing CPU models,
with just CPUID_7_0_EDX_SPEC_CTRL added and model_id updated.
Cc: Jiri Denemark <jdenemar@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20180109154519.25634-6-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
(cherry picked from commit ac96c41354)
[BR: BSC#1068032 CVE-2017-5715 CPU models are reduced as appropriate
to match this QEMU version]
Signed-off-by: Bruce Rogers <brogers@suse.com>
The spec can be found in Intel Software Developer Manual or in
Instruction Set Extensions Programming Reference.
Signed-off-by: Piotr Luc <piotr.luc@intel.com>
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Message-Id: <1477902446-5932-1-git-send-email-he.chen@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 95ea69fb46)
[BR: BSC#1068032 CVE-2017-5715 - orig patch modified to not provide new
feature, just infrastructure, and change to match current code's feat_name
type]
Signed-off-by: Bruce Rogers <brogers@suse.com>
This is perhaps the easiest way to handle the longer model names
introduced with the -IBRS models. This is in lew of backporting
commit 807e9869b8 and other commits
which that would require.
[BR: BSC#1068032 CVE-2017-5715]
Signed-off-by: Bruce Rogers <brogers@suse.com>
The previous patches fix problems with throttling of forced framebuffer updates
and audio data capture that would cause the QEMU output buffer size to grow
without bound. Those fixes are graceful in that once the client catches up with
reading data from the server, everything continues operating normally.
There is some data which the server sends to the client that is impractical to
throttle. Specifically there are various pseudo framebuffer update encodings to
inform the client of things like desktop resizes, pointer changes, audio
playback start/stop, LED state and so on. These generally only involve sending
a very small amount of data to the client, but a malicious guest might be able
to do things that trigger these changes at a very high rate. Throttling them is
not practical as missed or delayed events would cause broken behaviour for the
client.
This patch thus takes a more forceful approach of setting an absolute upper
bound on the amount of data we permit to be present in the output buffer at
any time. The previous patch set a threshold for throttling the output buffer
by allowing an amount of data equivalent to one complete framebuffer update and
one seconds worth of audio data. On top of this it allowed for one further
forced framebuffer update to be queued.
To be conservative, we thus take that throttling threshold and multiply it by
5 to form an absolute upper bound. If this bound is hit during vnc_write() we
forceably disconnect the client, refusing to queue further data. This limit is
high enough that it should never be hit unless a malicious client is trying to
exploit the sever, or the network is completely saturated preventing any sending
of data on the socket.
This completes the fix for CVE-2017-15124 started in the previous patches.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-12-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit f887cf165d)
[LY: BSC#1073489 CVE-2017-15124]
Signed-off-by: Liang Yan <lyan@suse.com>
The VNC server must throttle data sent to the client to prevent the 'output'
buffer size growing without bound, if the client stops reading data off the
socket (either maliciously or due to stalled/slow network connection).
The current throttling is very crude because it simply checks whether the
output buffer offset is zero. This check is disabled if the client has requested
a forced update, because we want to send these as soon as possible.
As a result, the VNC client can cause QEMU to allocate arbitrary amounts of RAM.
They can first start something in the guest that triggers lots of framebuffer
updates eg play a youtube video. Then repeatedly send full framebuffer update
requests, but never read data back from the server. This can easily make QEMU's
VNC server send buffer consume 100MB of RAM per second, until the OOM killer
starts reaping processes (hopefully the rogue QEMU process, but it might pick
others...).
To address this we make the throttling more intelligent, so we can throttle
full updates. When we get a forced update request, we keep track of exactly how
much data we put on the output buffer. We will not process a subsequent forced
update request until this data has been fully sent on the wire. We always allow
one forced update request to be in flight, regardless of what data is queued
for incremental updates or audio data. The slight complication is that we do
not initially know how much data an update will send, as this is done in the
background by the VNC job thread. So we must track the fact that the job thread
has an update pending, and not process any further updates until this job is
has been completed & put data on the output buffer.
This unbounded memory growth affects all VNC server configurations supported by
QEMU, with no workaround possible. The mitigating factor is that it can only be
triggered by a client that has authenticated with the VNC server, and who is
able to trigger a large quantity of framebuffer updates or audio samples from
the guest OS. Mostly they'll just succeed in getting the OOM killer to kill
their own QEMU process, but its possible other processes can get taken out as
collateral damage.
This is a more general variant of the similar unbounded memory usage flaw in
the websockets server, that was previously assigned CVE-2017-15268, and fixed
in 2.11 by:
commit a7b20a8efa
Author: Daniel P. Berrange <berrange@redhat.com>
Date: Mon Oct 9 14:43:42 2017 +0100
io: monitor encoutput buffer size from websocket GSource
This new general memory usage flaw has been assigned CVE-2017-15124, and is
partially fixed by this patch.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-11-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit ada8d2e436)
[LY: BSC#1073489 CVE-2017-15124]
Signed-off-by: Liang Yan <lyan@suse.com>
The VNC server must throttle data sent to the client to prevent the 'output'
buffer size growing without bound, if the client stops reading data off the
socket (either maliciously or due to stalled/slow network connection).
The current throttling is very crude because it simply checks whether the
output buffer offset is zero. This check must be disabled if audio capture is
enabled, because when streaming audio the output buffer offset will rarely be
zero due to queued audio data, and so this would starve framebuffer updates.
As a result, the VNC client can cause QEMU to allocate arbitrary amounts of RAM.
They can first start something in the guest that triggers lots of framebuffer
updates eg play a youtube video. Then enable audio capture, and simply never
read data back from the server. This can easily make QEMU's VNC server send
buffer consume 100MB of RAM per second, until the OOM killer starts reaping
processes (hopefully the rogue QEMU process, but it might pick others...).
To address this we make the throttling more intelligent, so we can throttle
when audio capture is active too. To determine how to throttle incremental
updates or audio data, we calculate a size threshold. Normally the threshold is
the approximate number of bytes associated with a single complete framebuffer
update. ie width * height * bytes per pixel. We'll send incremental updates
until we hit this threshold, at which point we'll stop sending updates until
data has been written to the wire, causing the output buffer offset to fall
back below the threshold.
If audio capture is enabled, we increase the size of the threshold to also
allow for upto 1 seconds worth of audio data samples. ie nchannels * bytes
per sample * frequency. This allows the output buffer to have a mixture of
incremental framebuffer updates and audio data queued, but once the threshold
is exceeded, audio data will be dropped and incremental updates will be
throttled.
This unbounded memory growth affects all VNC server configurations supported by
QEMU, with no workaround possible. The mitigating factor is that it can only be
triggered by a client that has authenticated with the VNC server, and who is
able to trigger a large quantity of framebuffer updates or audio samples from
the guest OS. Mostly they'll just succeed in getting the OOM killer to kill
their own QEMU process, but its possible other processes can get taken out as
collateral damage.
This is a more general variant of the similar unbounded memory usage flaw in
the websockets server, that was previously assigned CVE-2017-15268, and fixed
in 2.11 by:
commit a7b20a8efa
Author: Daniel P. Berrange <berrange@redhat.com>
Date: Mon Oct 9 14:43:42 2017 +0100
io: monitor encoutput buffer size from websocket GSource
This new general memory usage flaw has been assigned CVE-2017-15124, and is
partially fixed by this patch.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-10-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit e2b72cb6e0)
[LY: BSC#1073489 CVE-2017-15124]
Signed-off-by: Liang Yan <lyan@suse.com>
According to the RFB protocol, a client sends one or more framebuffer update
requests to the server. The server can reply with a single framebuffer update
response, that covers all previously received requests. Once the client has
read this update from the server, it may send further framebuffer update
requests to monitor future changes. The client is free to delay sending the
framebuffer update request if it needs to throttle the amount of data it is
reading from the server.
The QEMU VNC server, however, has never correctly handled the framebuffer
update requests. Once QEMU has received an update request, it will continue to
send client updates forever, even if the client hasn't asked for further
updates. This prevents the client from throttling back data it gets from the
server. This change fixes the flawed logic such that after a set of updates are
sent out, QEMU waits for a further update request before sending more data.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-8-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 728a7ac954)
[LY: BSC#1073489]
Signed-off-by: Liang Yan <lyan@suse.com>
When we encode data for writing with SASL, we encode the entire pending output
buffer. The subsequent write, however, may not be able to send the full encoded
data in one go though, particularly with a slow network. So we delay setting the
output buffer offset back to zero until all the SASL encoded data is sent.
Between encoding the data and completing sending of the SASL encoded data,
however, more data might have been placed on the pending output buffer. So it
is not valid to set offset back to zero. Instead we must keep track of how much
data we consumed during encoding and subtract only that amount.
With the current bug we would be throwing away some pending data without having
sent it at all. By sheer luck this did not previously cause any serious problem
because appending data to the send buffer is always an atomic action, so we
only ever throw away complete RFB protocol messages. In the case of frame buffer
updates we'd catch up fairly quickly, so no obvious problem was visible.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-6-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 8f61f1c5a6)
[LY: BSC#1073489]
Signed-off-by: Liang Yan <lyan@suse.com>
The vnc_update_client() method checks the 'has_dirty' flag to see if there are
dirty regions that are pending to send to the client. Regardless of this flag,
if a forced update is requested, updates must be sent. For unknown reasons
though, the code also tries to sent updates if audio capture is enabled. This
makes no sense as audio capture state does not impact framebuffer contents, so
this check is removed.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-5-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 3541b08475)
[LY: BSC#1073489]
Signed-off-by: Liang Yan <lyan@suse.com>
A previous commit:
commit 5a8be0f73d
Author: Gerd Hoffmann <kraxel@redhat.com>
Date: Wed Jul 13 12:21:20 2016 +0200
vnc: make sure we finish disconnect
Added a check for vs->disconnecting at the very start of the
vnc_update_client method. This means that the very next "if"
statement check for !vs->disconnecting always evaluates true,
and is thus redundant. This in turn means the vs->disconnecting
check at the very end of the method never evaluates true, and
is thus unreachable code.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20171218191228.31018-3-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit c53df96161)
[LY: BSC#1073489]
Signed-off-by: Liang Yan <lyan@suse.com>
While loading kernel via multiboot-v1 image, (flags & 0x00010000)
indicates that multiboot header contains valid addresses to load
the kernel image. In that, end of the data segment address
'mh_load_end_addr' should be less than the bss segment address
'mh_bss_end_addr'. Add check to validate that.
Reported-by: CERT CC <cert.cc@orange.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
[LY: BSC#1083291 CVE-2018-7550]
Signed-off-by: Liang Yan <lyan@suse.com>
Start a vm with qemu-kvm -enable-kvm -vnc :66 -smp 1 -m 1024 -hda
redhat_5.11.qcow2 -device pcnet -vga cirrus,
then use VNC client to connect to VM, and excute the code below in guest
OS will lead to qemu crash:
int main()
{
iopl(3);
srand(time(NULL));
int a,b;
while(1){
a = rand()%0x100;
b = 0x3c0 + (rand()%0x20);
outb(a,b);
}
return 0;
}
The above code is writing the registers of VGA randomly.
We can write VGA CRT controller registers index 0x0C or 0x0D
(which is the start address register) to modify the
the display memory address of the upper left pixel
or character of the screen. The address may be out of the
range of vga ram. So we should check the validation of memory address
when reading or writing it to avoid segfault.
Signed-off-by: linzhecheng <linzhecheng@huawei.com>
Message-id: 20180111132724.13744-1-linzhecheng@huawei.com
Fixes: CVE-2018-5683
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 191f59dc17)
[LY: BSC#1076114 CVE-2018-5683]
Signed-off-by: Liang Yan <lyan@suse.com>
When using bit-wise operations that exploit the power-of-two
nature of the second argument of ROUND_UP(), we still need to
ensure that the mask is as wide as the first argument (done
by using a ternary to force proper arithmetic promotion).
Unpatched, ROUND_UP(2ULL*1024*1024*1024*1024, 512U) produces 0,
instead of the intended 2TiB, because negation of an unsigned
32-bit quantity followed by widening to 64-bits does not
sign-extend the mask.
Broken since its introduction in commit 292c8e50 (v1.5.0).
Callers that passed the same width type to both macro parameters,
or that had other code to ensure the first parameter's maximum
runtime value did not exceed the second parameter's width, are
unaffected, but I did not audit to see which (if any) existing
clients of the macro could trigger incorrect behavior (I found
the bug while adding a new use of the macro).
While preparing the patch, checkpatch complained about poor
spacing, so I also fixed that here and in the nearby DIV_ROUND_UP.
CC: qemu-trivial@nongnu.org
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(cherry picked from commit 2098b073f3)
[LY: BSC#1076775 CVE-2017-18043]
Signed-off-by: Liang Yan <lyan@suse.com>
The NBD spec gives us permission to abruptly disconnect on clients
that send outrageously large option requests, rather than having
to spend the time reading to the end of the option. No real
option request requires that much data anyways; and meanwhile, we
already have the practice of abruptly dropping the connection on
any client that sends NBD_CMD_WRITE with a payload larger than 32M.
For comparison, nbdkit drops the connection on any request with
more than 4096 bytes; however, that limit is probably too low
(as the NBD spec states an export name can theoretically be up
to 4096 bytes, which means a valid NBD_OPT_INFO could be even
longer) - even if qemu doesn't permit exports longer than 256
bytes.
It could be argued that a malicious client trying to get us to
read nearly 4G of data on a bad request is a form of denial of
service. In particular, if the server requires TLS, but a client
that does not know the TLS credentials sends any option (other
than NBD_OPT_STARTTLS or NBD_OPT_EXPORT_NAME) with a stated
payload of nearly 4G, then the server was keeping the connection
alive trying to read all the payload, tying up resources that it
would rather be spending on a client that can get past the TLS
handshake. Hence, this warranted a CVE.
Present since at least 2.5 when handling known options, and made
worse in 2.6 when fixing support for NBD_FLAG_C_FIXED_NEWSTYLE
to handle unknown options.
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
(cherry picked from commit fdad35ef6c)
[LY: BSC#1070144 CVE-2017-15119]
Signed-off-by: Liang Yan <lyan@suse.com>
During Qemu guest migration, a destination process invokes ps2
post_load function. In that, if 'rptr' and 'count' values were
invalid, it could lead to OOB access or infinite loop issue.
Add check to avoid it.
Reported-by: Cyrille Chatras <cyrille.chatras@orange.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-id: 20171116075155.22378-1-ppandit@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 802cbcb730)
[LY: bsc#1068613 CVE-2017-16845]
Signed-off-by: Liang Yan <lyan@suse.com>
A guest could attempt to use an uninitialised VirtQueue object
or unset Vring.align leading to a arithmetic exception. Add check
to avoid it.
Reported-by: Zhangboxian <zhangboxian@huawei.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
(cherry picked from commit 758ead31c7)
[LY: bsc#1071228 CVE-2017-17381]
Signed-off-by: Liang Yan <lyan@suse.com>
We need to handle the bpb control on reset and migration. Normally
stfle.82 is transparent (and the normal guest part works without
hypervisor activity). To prevent any issues we require full
host kernel support for this feature.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
(cherry picked from commit b073c87517)
[LY: BSC#1076814]
Signed-off-by: Liang Yan <lyan@suse.com>
replace with proper header sync
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
(cherry picked from commit 9cbb636270)
[LY: BSC#1076814]
Signed-off-by: Liang Yan <lyan@suse.com>
Commit 04bf2526ce (exec: use
qemu_ram_ptr_length to access guest ram) start using qemu_ram_ptr_length
instead of qemu_map_ram_ptr, but when used with Xen, the behavior of
both function is different. They both call xen_map_cache, but one with
"lock", meaning the mapping of guest memory is never released
implicitly, and the second one without, which means, mapping can be
release later, when needed.
In the context of address_space_{read,write}_continue, the ptr to those
mapping should not be locked because it is used immediatly and never
used again.
The lock parameter make it explicit in which context qemu_ram_ptr_length
is called.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Message-Id: <20170726165326.10327-1-anthony.perard@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit f5aa69bdc3)
[BR: BSC#1048902 BSC#1069178 CVE-2017-11334 (additional fix needed for orig
issue)]
Signed-off-by: Bruce Rogers <brogers@suse.com>
The Xen mapcache is able to create long term mappings, they are called
"locked" mappings. The third parameter of the xen_map_cache call
specifies if a mapping is a "locked" mapping.
>From the QEMU point of view there are two kinds of long term mappings:
[a] device memory mappings, such as option roms and video memory
[b] dma mappings, created by dma_memory_map & friends
After certain operations, ballooning a VM in particular, Xen asks QEMU
kindly to destroy all mappings. However, certainly [a] mappings are
present and cannot be removed. That's not a problem as they are not
affected by balloonning. The *real* problem is that if there are any
mappings of type [b], any outstanding dma operations could fail. This is
a known shortcoming. In other words, when Xen asks QEMU to destroy all
mappings, it is an error if any [b] mappings exist.
However today we have no way of distinguishing [a] from [b]. Because of
that, we cannot even print a decent warning.
This patch introduces a new "dma" bool field to MapCacheRev entires, to
remember if a given mapping is for dma or is a long term device memory
mapping. When xen_invalidate_map_cache is called, we print a warning if
any [b] mappings exist. We ignore [a] mappings.
Mappings created by qemu_map_ram_ptr are assumed to be [a], while
mappings created by address_space_map->qemu_ram_ptr_length are assumed
to be [b].
The goal of the patch is to make debugging and system understanding
easier.
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
(cherry picked from commit 1ff7c5986a)
[BR: BSC#1048902 BSC#1069178 CVE-2017-11334 (additional fix needed for orig
issue)]
Signed-off-by: Bruce Rogers <brogers@suse.com>
9p back-end first queries the size of an extended attribute,
allocates space for it via g_malloc() and then retrieves its
value into allocated buffer. Race between querying attribute
size and retrieving its could lead to memory bytes disclosure.
Use g_malloc0() to avoid it.
Reported-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 7bd9275630)
[BR: BSC#1062069 CVE-2017-15038]
Signed-off-by: Bruce Rogers <brogers@suse.com>
The websocket GSource is monitoring the size of the rawoutput
buffer to determine if the channel can accepts more writes.
The rawoutput buffer, however, is merely a temporary staging
buffer before data is copied into the encoutput buffer. Thus
its size will always be zero when the GSource runs.
This flaw causes the encoutput buffer to grow without bound
if the other end of the underlying data channel doesn't
read data being sent. This can be seen with VNC if a client
is on a slow WAN link and the guest OS is sending many screen
updates. A malicious VNC client can act like it is on a slow
link by playing a video in the guest and then reading data
very slowly, causing QEMU host memory to expand arbitrarily.
This issue is assigned CVE-2017-15268, publically reported in
https://bugs.launchpad.net/qemu/+bug/1718964
(cherry picked from commit a7b20a8efa)
Reviewed-by: Eric Blake <eblake@redhat.com>
[Dan: Added extra checks to deal with code refactored in master but
not stable 2.10]
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
[BR: BSC#1062942 CVE-2017-15268]
Signed-off-by: Bruce Rogers <brogers@suse.com>
Instead pass around the address (aka offset into vga memory).
Add vga_read_* helper functions which apply vbe_size_mask to
the address, to make sure the address stays within the valid
range, similar to the cirrus blitter fixes (commits ffaf857778
and 026aeffcb4).
Impact: DoS for privileged guest users. qemu crashes with
a segfault, when hitting the guard page after vga memory
allocation, while reading vga memory for display updates.
Fixes: CVE-2017-13672
Cc: P J P <ppandit@redhat.com>
Reported-by: David Buchanan <d@vidbuchanan.co.uk>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170828122906.18993-1-kraxel@redhat.com
(cherry picked from commit 3d90c62548)
[FL: BSC#1056334 CVE-2017-13672]
Signed-off-by: Fei Li <fli@suse.com>
When accessing guest's ram block during DMA operation, use
'qemu_ram_ptr_length' to get ram block pointer. It ensures
that DMA operation of given length is possible; And avoids
any OOB memory access situations.
Reported-by: Alex <broscutamaker@gmail.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <20170712123840.29328-1-ppandit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 04bf2526ce)
[FL: BSC#1048902 CVE-2017-11334]
Signed-off-by: Fei Li <fli@suse.com>
While parsing dhcp options string in 'dhcp_decode', if an options'
length 'len' appeared towards the end of 'bp_vend' array, ensuing
read could lead to an OOB memory access issue. Add check to avoid it.
This is CVE-2017-11434.
Reported-by: Reno Robert <renorobert@gmail.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
(cherry picked from commit 413d463f43)
[FL: BSC#1049381 CVE-2017-11434]
Signed-off-by: Fei Li <fli@suse.com>
While loading kernel via multiboot-v1 image, (flags & 0x00010000)
indicates that multiboot header contains valid addresses to load
the kernel image. These addresses are used to compute kernel
size and kernel text offset in the OS image. Validate these
address values to avoid an OOB access issue.
This is CVE-2017-14167.
Reported-by: Thomas Garnier <thgarnie@google.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <20170907063256.7418-1-ppandit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit ed4f86e8b6)
[FL: BSC#1057585 CVE-2017-14167]
Signed-off-by: Fei Li <fli@suse.com>
Don't reinvent a broken wheel, just use the hexdump function we have.
Impact: low, broken code doesn't run unless you have debug logging
enabled.
Reported-by: 李强 <liqiang6-s@360.cn>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20170509110128.27261-1-kraxel@redhat.com
(cherry picked from commit bd4a683505)
[FL: BSC#1047674 CVE-2017-10806]
Signed-off-by: Fei Li <fli@suse.com>
qemu proper has done so for 13 years
(8a7ddc38a6), qemu-img and qemu-io have
done so for four years (526eda14a6).
Ignoring this signal is especially important in qemu-nbd because
otherwise a client can easily take down the qemu-nbd server by dropping
the connection when the server wants to send something, for example:
$ qemu-nbd -x foo -f raw -t null-co:// &
[1] 12726
$ qemu-io -c quit nbd://localhost/bar
can't open device nbd://localhost/bar: No export with name 'bar' available
[1] + 12726 broken pipe qemu-nbd -x foo -f raw -t null-co://
In this case, the client sends an NBD_OPT_ABORT and closes the
connection (because it is not required to wait for a reply), but the
server replies with an NBD_REP_ACK (because it is required to reply).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20170611123714.31292-1-mreitz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 041e32b8d9)
[FL: BSC#1046636 CVE-2017-10664]
Signed-off-by: Fei Li <fli@suse.com>
Back in qemu 2.5, qemu-nbd was immune to port probes (a transient
server would not quit, regardless of how many probe connections
came and went, until a connection actually negotiated). But we
broke that in commit ee7d7aa when removing the return value to
nbd_client_new(), although that patch also introduced a bug causing
an assertion failure on a client that fails negotiation. We then
made it worse during refactoring in commit 1a6245a (a segfault
before we could even assert); the (masked) assertion was cleaned
up in d3780c2 (still in 2.6), and just recently we finally fixed
the segfault ("nbd: Fully intialize client in case of failed
negotiation"). But that still means that ever since we added
TLS support to qemu-nbd, we have been vulnerable to an ill-timed
port-scan being able to cause a denial of service by taking down
qemu-nbd before a real client has a chance to connect.
Since negotiation is now handled asynchronously via coroutines,
we no longer have a synchronous point of return by re-adding a
return value to nbd_client_new(). So this patch instead wires
things up to pass the negotiation status through the close_fn
callback function.
Simple test across two terminals:
$ qemu-nbd -f raw -p 30001 file
$ nmap 127.0.0.1 -p 30001 && \
qemu-io -c 'r 0 512' -f raw nbd://localhost:30001
Note that this patch does not change what constitutes successful
negotiation (thus, a client must enter transmission phase before
that client can be considered as a reason to terminate the server
when the connection ends). Perhaps we may want to tweak things
in a later patch to also treat a client that uses NBD_OPT_ABORT
as being a 'successful' negotiation (the client correctly talked
the NBD protocol, and informed us it was not going to use our
export after all), but that's a discussion for another day.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1451614
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170608222617.20376-1-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 0c9390d978)
[FL: BSC#1043808 CVE-2017-9524]
Signed-off-by: Fei Li <fli@suse.com>
If a non-NBD client connects to qemu-nbd, we would end up with
a SIGSEGV in nbd_client_put() because we were trying to
unregister the client's association to the export, even though
we skipped inserting the client into that list. Easy trigger
in two terminals:
$ qemu-nbd -p 30001 --format=raw file
$ nmap 127.0.0.1 -p 30001
nmap claims that it thinks it connected to a pago-services1
server (which probably means nmap could be updated to learn the
NBD protocol and give a more accurate diagnosis of the open
port - but that's not our problem), then terminates immediately,
so our call to nbd_negotiate() fails. The fix is to reorder
nbd_co_client_start() to ensure that all initialization occurs
before we ever try talking to a client in nbd_negotiate(), so
that the teardown sequence on negotiation failure doesn't fault
while dereferencing a half-initialized object.
While debugging this, I also noticed that nbd_update_server_watch()
called by nbd_client_closed() was still adding a channel to accept
the next client, even when the state was no longer RUNNING. That
is fixed by making nbd_can_accept() pay attention to the current
state.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1451614
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170527030421.28366-1-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit df8ad9f128)
[FL: BSC#1043808 CVE-2017-9524]
Signed-off-by: Fei Li <fli@suse.com>
Rather than constructing a local structure instance on the stack, fill
the fields directly on the shared ring, just like other (Linux)
backends do. Build on the fact that all response structure flavors are
actually identical (aside from alignment and padding at the end).
This is XSA-216.
Reported by: Anthony Perard <anthony.perard@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
(cherry picked from commit b0ac694fdb)
[FL: BSC#1057378 CVE-2017-10911]
Signed-off-by: Fei Li <fli@suse.com>
The current VNC default keyboard delay is 1ms. With that we're constantly
typing faster than the guest receives keyboard events from an XHCI attached
USB HID device.
The default keyboard delay time in the input layer however is 10ms. I don't know
how that number came to be, but empirical tests on some OpenQA driven ARM
systems show that 10ms really is a reasonable default number for the delay.
This patch moves the VNC delay also to 10ms. That way our default is much
safer (good!) and also consistent with the input layer default (also good!).
Signed-off-by: Alexander Graf <agraf@suse.de>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1499863425-103133-1-git-send-email-agraf@suse.de
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit d3b0db6dfe)
[FL: BSC#1059369]
Signed-off-by: Fei Li <fli@suse.com>
When resetting the keyboard, we need to reset not just the pending keystrokes,
but also any pending modifiers. Otherwise there's a race when we're getting
reset while running an escape sequence (modifier 0x100).
Cc: qemu-stable@nongnu.org
Signed-off-by: Alexander Graf <agraf@suse.de>
Message-id: 1498117295-162030-1-git-send-email-agraf@suse.de
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 51dbea77a2)
[FL: BSC#1059369]
Signed-off-by: Fei Li <fli@suse.com>
Delays in the input layer are special cased input events. Every input
event is accounted for in a global intput queue count. The special cased
delays however did not get removed from the queue, leading to queue overruns
and thus silent key drops after typing quite a few characters.
Signed-off-by: Alexander Graf <agraf@suse.de>
Message-id: 1498117318-162102-1-git-send-email-agraf@suse.de
Fixes: be1a7176 ("input: add support for kbd delays")
Cc: qemu-stable@nongnu.org
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 77b0359bf4)
[FL: BSC#1059369]
Signed-off-by: Fei Li <fli@suse.com>
Using pread/pwrite or io_submit has the advantage of eliminating the
bounce buffer, but drops the SCSI status. This keeps the guest from
seeing unit attention codes, as well as statuses such as RESERVATION
CONFLICT. Because we know scsi-block operates on an SBC device we can
still use the DMA helpers with SG_IO; just remember to patch the CDBs
if the transfer is split into multiple segments.
This means that scsi-block will always use the thread-pool unfortunately,
instead of respecting aio=native.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 8fdc7839e4)
[LM: BSC#1043176]
Signed-off-by: Lin Ma <lma@suse.com>
Commonize all the checks for canceled requests and errors. The next patch
will add another case to check for, in order to handle passthrough commands.
There is no semantic change here; the only nontrivial modification is in
scsi_write_do_fua, where cancellation has been checked earlier by both
callers. Thus, the check is replaced with an assertion.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 5b956f415a)
[LM: BSC#1043176]
Signed-off-by: Lin Ma <lma@suse.com>
scsi-block will be able to do FUA just by passing the request through
to the LUN (which is also more efficient); there is no need to emulate
it like we do for scsi-disk.
Add a new method to distinguish this.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 94f8ba1125)
[LM: BSC#1043176]
Signed-off-by: Lin Ma <lma@suse.com>
These are replacements for blk_aio_readv and blk_aio_writev that allow
customization of the data path. They reuse the DMA helpers' DMAIOFunc
callback type, so that the same function can be used in either the
QEMUSGList or the bounce-buffered case.
This customization will be needed in the next patch to do zero-copy
SG_IO on scsi-block.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit fcaafb1001)
[LM: BSC#1043176]
Signed-off-by: Lin Ma <lma@suse.com>
This will be the place to add DMAIOFuncs in the next patch. There
are also a couple DeviceClass members that can be moved to the
abstract class's initialization function.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 993935f315)
[LM: BSC#1043176]
Signed-off-by: Lin Ma <lma@suse.com>
Callers of dma_blk_io have no way to pass extra data to the DMAIOFunc,
because the original callback and opaque are gone by the time DMAIOFunc
is called. On the other hand, the BlockBackend is usually derived
from those extra data that you could pass to the DMAIOFunc (in the
next patch, that would be the SCSIRequest).
So change DMAIOFunc's prototype, decoupling it from blk_aio_readv
and blk_aio_writev's. The new prototype loses the BlockBackend
and gains an extra opaque value which, in the case of dma_blk_readv
and dma_blk_writev, is of course used for the BlockBackend.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 8a8e63ebdd)
[LM: BSC#1043176]
Signed-off-by: Lin Ma <lma@suse.com>
Sector-based blk_aio_readv() and blk_aio_writev() should die; switch
to byte-based blk_aio_preadv() and blk_aio_pwritev() instead.
The patch had to touch multiple files at once, because dma_blk_io()
takes pointers to the functions, and ide_issue_trim() piggybacks on
the same interface (while ignoring offset under the hood).
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit d4f510eb3f)
[LM: BSC#1043176]
Signed-off-by: Lin Ma <lma@suse.com>
blk_aio_readv() and blk_aio_writev() are annoying in that they
can't access sub-sector granularity, and cannot pass flags.
Also, they require the caller to pass redundant information
about the size of the I/O (qiov->size in bytes must match
nb_sectors in sectors).
Add new blk_aio_preadv() and blk_aio_pwritev() functions to fix
the flaws. The next few patches will upgrade callers, then
finally delete the old interfaces.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 60cb2fa7eb)
[LM: BSC#1043176]
Signed-off-by: Lin Ma <lma@suse.com>
Sector-based blk_write() should die; convert the one-off
variant blk_write_zeroes() to use an offset/count interface
instead. Likewise for blk_co_write_zeroes() and
blk_aio_write_zeroes().
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 983a160050)
[LM: BSC#1043176]
Signed-off-by: Lin Ma <lma@suse.com>
Sector-based blk_read() should die; convert the one-off
variant blk_read_unthrottled().
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit b7d17f9fa4)
[LM: BSC#1043176]
Signed-off-by: Lin Ma <lma@suse.com>
We have several block drivers that understand BDRV_REQ_FUA,
and emulate it in the block layer for the rest by a full flush.
But without a way to actually request BDRV_REQ_FUA during a
pass-through blk_pwrite(), FUA-aware block drivers like NBD are
forced to repeat the emulation logic of a full flush regardless
of whether the backend they are writing to could do it more
efficiently.
This patch just wires up a flags argument; followup patches
will actually make use of it in the NBD driver and in qemu-io.
Signed-off-by: Eric Blake <eblake@redhat.com>
Acked-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 8341f00dc2)
[LM: BSC#1043176]
Signed-off-by: Lin Ma <lma@suse.com>
Commit a0e640a8 introduced a path processing error.
Pass fstatat the dirpath based path component instead
of the entire path.
[BR: BSC#1045035]
Signed-off-by: Bruce Rogers <brogers@suse.com>
This ensures that the request is unref'ed properly, and avoids a
segmentation fault in the new qtest testcase that is added.
Reported-by: Zhangyanyu <zyy4013@stu.ouc.edu.cn>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[BR: BSC#1043296 CVE-2017-9503, dropped testcase from patch]
Signed-off-by: Bruce Rogers <brogers@suse.com>
Avoid TOC-TOU bugs by storing the DCMD opcode in the MegasasCmd
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[BR: BSC#1043296 CVE-2017-9503]
Signed-off-by: Bruce Rogers <brogers@suse.com>
When trying to remove a file from a directory, both created in non-mapped
mode, the file remains and EBADF is returned to the guest.
This is a regression introduced by commit "df4938a6651b 9pfs: local:
unlinkat: don't follow symlinks" when fixing CVE-2016-9602. It changed the
way we unlink the metadata file from
ret = remove("$dir/.virtfs_metadata/$name");
if (ret < 0 && errno != ENOENT) {
/* Error out */
}
/* Ignore absence of metadata */
to
fd = openat("$dir/.virtfs_metadata")
unlinkat(fd, "$name")
if (ret < 0 && errno != ENOENT) {
/* Error out */
}
/* Ignore absence of metadata */
If $dir was created in non-mapped mode, openat() fails with ENOENT and
we pass -1 to unlinkat(), which fails in turn with EBADF.
We just need to check the return of openat() and ignore ENOENT, in order
to restore the behaviour we had with remove().
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
[groug: rewrote the comments as suggested by Eric]
(cherry picked from commit 6a87e7929f)
[BR: Fix and/or infrastructure for BSC#1020427 CVE-2016-9602]
Signed-off-by: Bruce Rogers <brogers@suse.com>
When using the mapped-file security mode, we shouldn't let the client mess
with the metadata. The current code already tries to hide the metadata dir
from the client by skipping it in local_readdir(). But the client can still
access or modify it through several other operations. This can be used to
escalate privileges in the guest.
Affected backend operations are:
- local_mknod()
- local_mkdir()
- local_open2()
- local_symlink()
- local_link()
- local_unlinkat()
- local_renameat()
- local_rename()
- local_name_to_path()
Other operations are safe because they are only passed a fid path, which
is computed internally in local_name_to_path().
This patch converts all the functions listed above to fail and return
EINVAL when being passed the name of the metadata dir. This may look
like a poor choice for errno, but there's no such thing as an illegal
path name on Linux and I could not think of anything better.
This fixes CVE-2017-7493.
Reported-by: Leo Gaspard <leo@gaspard.io>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
(cherry picked from commit 7a95434e0c)
[BR: BSC#1039495]
Signed-off-by: Bruce Rogers <brogers@suse.com>
As the pci ahci can be hotplug and unplug, in the ahci unrealize
function it should free all the resource once allocated in the
realized function. This patch add ide_exit to free the resource.
Signed-off-by: Li Qiang <liqiang6-s@360.cn>
Message-id: 1488449293-80280-3-git-send-email-liqiang6-s@360.cn
Signed-off-by: John Snow <jsnow@redhat.com>
(cherry picked from commit c9f086418a)
[BSC#1042801 CVE-2017-9373]
Signed-off-by: Bruce Rogers <brogers@suse.com>
In usb_ehci_init function, it initializes 's->ipacket', but there
is no corresponding function to free this. As the ehci can be hotplug
and unplug, this will leak host memory leak. In order to make the
hierarchy clean, we should add a ehci pci finalize function, then call
the clean function in ehci device.
Signed-off-by: Li Qiang <liqiang6-s@360.cn>
Message-id: 589a85b8.3c2b9d0a.b8e6.1434@mx.google.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit d710e1e7bd)
[BR: BSC#1043073 CVE-2017-9374]
Signed-off-by: Bruce Rogers <brogers@suse.com>
Vmware Paravirtual SCSI emulation uses command descriptors to
process SCSI commands. These come with their message ring buffers.
A guest could set the message ring page count to an arbitrary value
resulting in infinite loop. Add check to avoid it.
Reported-by: YY Z <bigbird475958471@gmail.com>
Signed-off-by: P J P <ppandit@redhat.com>
(cherry picked from commit f68826989c)
[BR: BSC#1036211 CVE-2017-8112]
Signed-off-by: Bruce Rogers <brogers@suse.com>
Does basically the same as "cirrus: stop passing around dst pointers in
the blitter", just for the src pointer instead of the dst pointer.
For the src we have to care about cputovideo blits though and fetch the
data from s->cirrus_bltbuf instead of vga memory. The cirrus_src*()
helper functions handle that.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 1489584487-3489-1-git-send-email-kraxel@redhat.com
(cherry picked from commit ffaf857778)
[BR: BSC#1035406 CVE-2017-7980]
Signed-off-by: Bruce Rogers <brogers@suse.com>
Instead pass around the address (aka offset into vga memory). Calculate
the pointer in the rop_* functions, after applying the mask to the
address, to make sure the address stays within the valid range.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 1489574872-8679-1-git-send-email-kraxel@redhat.com
(cherry picked from commit 026aeffcb4)
[BR: BSC#1035406 CVE-2017-7980]
Signed-off-by: Bruce Rogers <brogers@suse.com>
check the validity of parameters in cirrus_bitblt_rop_fwd_transp_xxx
and cirrus_bitblt_rop_fwd_xxx to avoid the OOB read which causes qemu Segmentation fault.
After the fix, we will touch the assert in
cirrus_invalidate_region:
assert(off_cur_end >= off_cur);
Signed-off-by: fangying <fangying1@huawei.com>
Signed-off-by: hangaohuai <hangaohuai@huawei.com>
Message-id: 20170314063919.16200-1-hangaohuai@huawei.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 215902d7b6)
[BR: BSC#1034908 CVE-2017-7718]
Signed-off-by: Bruce Rogers <brogers@suse.com>
There is a special code path (dpy_gfx_copy) to allow graphic emulation
notify user interface code about bitblit operations carryed out by
guests. It is supported by cirrus and vnc server. The intended purpose
is to optimize display scrolls and just send over the scroll op instead
of a full display update.
This is rarely used these days though because modern guests simply don't
use the cirrus blitter any more. Any linux guest using the cirrus drm
driver doesn't. Any windows guest newer than winxp doesn't ship with a
cirrus driver any more and thus uses the cirrus as simple framebuffer.
So this code tends to bitrot and bugs can go unnoticed for a long time.
See for example commit "3e10c3e vnc: fix qemu crash because of SIGSEGV"
which fixes a bug lingering in the code for almost a year, added by
commit "c7628bf vnc: only alloc server surface with clients connected".
Also the vnc server will throttle the frame rate in case it figures the
network can't keep up (send buffers are full). This doesn't work with
dpy_gfx_copy, for any copy operation sent to the vnc client we have to
send all outstanding updates beforehand, otherwise the vnc client might
run the client side blit on outdated data and thereby corrupt the
display. So this dpy_gfx_copy "optimization" might even make things
worse on slow network links.
Lets kill it once for all.
Oh, and one more reason: Turns out (after writing the patch) we have a
security bug in that code path ...
Fixes: CVE-2016-9603
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 1489494419-14340-1-git-send-email-kraxel@redhat.com
(cherry picked from commit 50628d3479)
[BR: BSC#1028656]
Signed-off-by: Bruce Rogers <brogers@suse.com>
In the SDHCI protocol, the transfer mode register value
is used during multi block transfer to check if block count
register is enabled and should be updated. Transfer mode
register could be set such that, block count register would
not be updated, thus leading to an infinite loop. Add check
to avoid it.
Reported-by: Wjjzhang <wjjzhang@tencent.com>
Reported-by: Jiang Xin <jiangxin1@huawei.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-id: 20170214185225.7994-3-ppandit@redhat.com
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit 6e86d90352)
[BR: BSC#1025311 CVE-2017-5987]
Signed-off-by: Bruce Rogers <brogers@suse.com>
Limits should be big enough that normal guest should not hit it.
Add a tracepoint to log them, just in case. Also, while being
at it, log the existing link trb limit too.
Reported-by: 李强 <liqiang6-s@360.cn>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 1486383669-6421-1-git-send-email-kraxel@redhat.com
(cherry picked from commit f89b60f6e5)
[BR: BSC#1025109 CVE-2017-5973]
Signed-off-by: Bruce Rogers <brogers@suse.com>
Conflicts:
hw/usb/hcd-xhci.c
hw/usb/trace-events
The local backend was recently converted to using "at*()" syscalls in order
to ensure all accesses happen below the shared directory. This requires that
we only pass relative paths, otherwise the dirfd argument to the "at*()"
syscalls is ignored and the path is treated as an absolute path in the host.
This is actually the case for paths in all fids, with the notable exception
of the root fid, whose path is "/". This causes the following backend ops to
act on the "/" directory of the host instead of the virtfs shared directory
when the export root is involved:
- lstat
- chmod
- chown
- utimensat
ie, chmod /9p_mount_point in the guest will be converted to chmod / in the
host for example. This could cause security issues with a privileged QEMU.
All "*at()" syscalls are being passed an open file descriptor. In the case
of the export root, this file descriptor points to the path in the host that
was passed to -fsdev.
The fix is thus as simple as changing the path of the export root fid to be
"." instead of "/".
This is CVE-2017-7471.
Cc: qemu-stable@nongnu.org
Reported-by: Léo Gaspard <leo@gaspard.io>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit 9c6b899f7a)
[BR: BSC#1034866 CVE-2017-7471]
Signed-off-by: Bruce Rogers <brogers@suse.com>
This helper is similar to v9fs_string_sprintf(), but it includes the
terminating NUL character in the size field.
This is to avoid doing v9fs_string_sprintf((V9fsString *) &path) and
then bumping the size.
Affected users are changed to use this new helper.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
(cherry picked from commit e3e83f2e21)
[BR: support patch for BSC#1034866]
Signed-off-by: Bruce Rogers <brogers@suse.com>
Free 'orig_value' in error path.
Signed-off-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 4ffcdef427)
[BR: BSC#1035950 CVE-2017-8086]
Signed-off-by: Bruce Rogers <brogers@suse.com>
The v9fs_create() and v9fs_lcreate() functions are used to create a file
on the backend and to associate it to a fid. The fid shouldn't be already
in-use, otherwise both functions may silently leak a file descriptor or
allocated memory. The current code doesn't check that.
This patch ensures that the fid isn't already associated to anything
before using it.
Signed-off-by: Li Qiang <liqiang6-s@360.cn>
(reworded the changelog, Greg Kurz)
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit d63fb193e7)
[BR: BSC#1032075 CVE-2017-7377]
Signed-off-by: Bruce Rogers <brogers@suse.com>
According to the 9P spec [*], when a client wants to cancel a pending I/O
request identified by a given tag (uint16), it must send a Tflush message
and wait for the server to respond with a Rflush message before reusing this
tag for another I/O. The server may still send a completion message for the
I/O if it wasn't actually cancelled but the Rflush message must arrive after
that.
QEMU hence waits for the flushed PDU to complete before sending the Rflush
message back to the client.
If a client sends 'Tflush tag oldtag' and tag == oldtag, QEMU will then
allocate a PDU identified by tag, find it in the PDU list and wait for
this same PDU to complete... i.e. wait for a completion that will never
happen. This causes a tag and ring slot leak in the guest, and a PDU
leak in QEMU, all of them limited by the maximal number of PDUs (128).
But, worse, this causes QEMU to hang on device reset since v9fs_reset()
wants to drain all pending I/O.
This insane behavior is likely to denote a bug in the client, and it would
deserve an Rerror message to be sent back. Unfortunately, the protocol
allows it and requires all flush requests to suceed (only a Tflush response
is expected).
The only option is to detect when we have to handle a self-referencing
flush request and report success to the client right away.
[*] http://man.cat-v.org/plan_9/5/flush
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit d5f2af7b95)
Signed-off-by: Bruce Rogers <brogers@suse.com>
We should pass O_NOFOLLOW otherwise openat() will follow symlinks and make
QEMU vulnerable.
While here, we also fix local_unlinkat_common() to use openat_dir() for
the same reasons (it was a leftover in the original patchset actually).
This fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
(cherry picked from commit b003fc0d8a)
[BR: Fix and/or infrastructure for BSC#1020427 CVE-2016-9602]
Signed-off-by: Bruce Rogers <brogers@suse.com>
When O_PATH is used with O_DIRECTORY, it only acts as an optimization: the
openat() syscall simply finds the name in the VFS, and doesn't trigger the
underlying filesystem.
On systems that don't define O_PATH, because they have glibc version 2.13
or older for example, we can safely omit it. We don't want to deactivate
O_PATH globally though, in case it is used without O_DIRECTORY. The is done
with a dedicated macro.
Systems without O_PATH may thus fail to resolve names that involve
unreadable directories, compared to newer systems succeeding, but such
corner case failure is our only option on those older systems to avoid
the security hole of chasing symlinks inappropriately.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
(added last paragraph to changelog as suggested by Eric Blake)
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 918112c02a)
[BR: Fix and/or infrastructure for BSC#1020427 CVE-2016-9602]
Signed-off-by: Bruce Rogers <brogers@suse.com>
The name argument can never be an empty string, and dirfd always point to
the containing directory of the file name. AT_EMPTY_PATH is hence useless
here. Also it breaks build with glibc version 2.13 and older.
It is actually an oversight of a previous tentative patch to implement this
function. We can safely drop it.
Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Greg Kurz <groug@kaod.org>
Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Eric Blake <eblake@redhat.com>
(cherry picked from commit b314f6a077)
[BR: Fix and/or infrastructure for BSC#1020427 CVE-2016-9602]
Signed-off-by: Bruce Rogers <brogers@suse.com>
If we cannot open the given path, we can return right away instead of
passing -1 to fstatfs() and close(). This will make Coverity happy.
(Coverity issue CID1371729)
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Daniel P. berrange <berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
(cherry picked from commit 23da0145cc)
[BR: Fix and/or infrastructure for BSC#1020427 CVE-2016-9602]
Signed-off-by: Bruce Rogers <brogers@suse.com>
Coverity issue CID1371731
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
(cherry picked from commit faab207f11)
[BR: Fix and/or infrastructure for BSC#1020427 CVE-2016-9602]
Signed-off-by: Bruce Rogers <brogers@suse.com>
This was spotted by Coverity as a fd leak. This is certainly true, but also
local_remove() would always return without doing anything, unless the fd is
zero, which is very unlikely.
(Coverity issue CID1371732)
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
(cherry picked from commit b7361d46e7)
[BR: Fix and/or infrastructure for BSC#1020427 CVE-2016-9602]
Signed-off-by: Bruce Rogers <brogers@suse.com>
Now that the all callbacks have been converted to use "at" syscalls, we
can drop this code.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit c23d5f1d5b)
[BR: Fix and/or infrastructure for BSC#1020427 CVE-2016-9602]
Signed-off-by: Bruce Rogers <brogers@suse.com>
The local_open2() callback is vulnerable to symlink attacks because it
calls:
(1) open() which follows symbolic links for all path elements but the
rightmost one
(2) local_set_xattr()->setxattr() which follows symbolic links for all
path elements
(3) local_set_mapped_file_attr() which calls in turn local_fopen() and
mkdir(), both functions following symbolic links for all path
elements but the rightmost one
(4) local_post_create_passthrough() which calls in turn lchown() and
chmod(), both functions also following symbolic links
This patch converts local_open2() to rely on opendir_nofollow() and
mkdirat() to fix (1), as well as local_set_xattrat(),
local_set_mapped_file_attrat() and local_set_cred_passthrough() to
fix (2), (3) and (4) respectively. Since local_open2() already opens
a descriptor to the target file, local_set_cred_passthrough() is
modified to reuse it instead of opening a new one.
The mapped and mapped-file security modes are supposed to be identical,
except for the place where credentials and file modes are stored. While
here, we also make that explicit by sharing the call to openat().
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit a565fea565)
[BR: Fix and/or infrastructure for BSC#1020427 CVE-2016-9602]
Signed-off-by: Bruce Rogers <brogers@suse.com>
The local_mkdir() callback is vulnerable to symlink attacks because it
calls:
(1) mkdir() which follows symbolic links for all path elements but the
rightmost one
(2) local_set_xattr()->setxattr() which follows symbolic links for all
path elements
(3) local_set_mapped_file_attr() which calls in turn local_fopen() and
mkdir(), both functions following symbolic links for all path
elements but the rightmost one
(4) local_post_create_passthrough() which calls in turn lchown() and
chmod(), both functions also following symbolic links
This patch converts local_mkdir() to rely on opendir_nofollow() and
mkdirat() to fix (1), as well as local_set_xattrat(),
local_set_mapped_file_attrat() and local_set_cred_passthrough() to
fix (2), (3) and (4) respectively.
The mapped and mapped-file security modes are supposed to be identical,
except for the place where credentials and file modes are stored. While
here, we also make that explicit by sharing the call to mkdirat().
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 3f3a16990b)
[BR: Fix and/or infrastructure for BSC#1020427 CVE-2016-9602]
Signed-off-by: Bruce Rogers <brogers@suse.com>
The local_mknod() callback is vulnerable to symlink attacks because it
calls:
(1) mknod() which follows symbolic links for all path elements but the
rightmost one
(2) local_set_xattr()->setxattr() which follows symbolic links for all
path elements
(3) local_set_mapped_file_attr() which calls in turn local_fopen() and
mkdir(), both functions following symbolic links for all path
elements but the rightmost one
(4) local_post_create_passthrough() which calls in turn lchown() and
chmod(), both functions also following symbolic links
This patch converts local_mknod() to rely on opendir_nofollow() and
mknodat() to fix (1), as well as local_set_xattrat() and
local_set_mapped_file_attrat() to fix (2) and (3) respectively.
A new local_set_cred_passthrough() helper based on fchownat() and
fchmodat_nofollow() is introduced as a replacement to
local_post_create_passthrough() to fix (4).
The mapped and mapped-file security modes are supposed to be identical,
except for the place where credentials and file modes are stored. While
here, we also make that explicit by sharing the call to mknodat().
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit d815e72190)
[BR: Fix and/or infrastructure for BSC#1020427 CVE-2016-9602]
Signed-off-by: Bruce Rogers <brogers@suse.com>
The local_symlink() callback is vulnerable to symlink attacks because it
calls:
(1) symlink() which follows symbolic links for all path elements but the
rightmost one
(2) open(O_NOFOLLOW) which follows symbolic links for all path elements but
the rightmost one
(3) local_set_xattr()->setxattr() which follows symbolic links for all
path elements
(4) local_set_mapped_file_attr() which calls in turn local_fopen() and
mkdir(), both functions following symbolic links for all path
elements but the rightmost one
This patch converts local_symlink() to rely on opendir_nofollow() and
symlinkat() to fix (1), openat(O_NOFOLLOW) to fix (2), as well as
local_set_xattrat() and local_set_mapped_file_attrat() to fix (3) and
(4) respectively.
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 38771613ea)
[BR: Fix and/or infrastructure for BSC#1020427 CVE-2016-9602]
Signed-off-by: Bruce Rogers <brogers@suse.com>
The local_chown() callback is vulnerable to symlink attacks because it
calls:
(1) lchown() which follows symbolic links for all path elements but the
rightmost one
(2) local_set_xattr()->setxattr() which follows symbolic links for all
path elements
(3) local_set_mapped_file_attr() which calls in turn local_fopen() and
mkdir(), both functions following symbolic links for all path
elements but the rightmost one
This patch converts local_chown() to rely on open_nofollow() and
fchownat() to fix (1), as well as local_set_xattrat() and
local_set_mapped_file_attrat() to fix (2) and (3) respectively.
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit d369f20763)
[BR: Fix and/or infrastructure for BSC#1020427 CVE-2016-9602]
Signed-off-by: Bruce Rogers <brogers@suse.com>
The local_chmod() callback is vulnerable to symlink attacks because it
calls:
(1) chmod() which follows symbolic links for all path elements
(2) local_set_xattr()->setxattr() which follows symbolic links for all
path elements
(3) local_set_mapped_file_attr() which calls in turn local_fopen() and
mkdir(), both functions following symbolic links for all path
elements but the rightmost one
We would need fchmodat() to implement AT_SYMLINK_NOFOLLOW to fix (1). This
isn't the case on linux unfortunately: the kernel doesn't even have a flags
argument to the syscall :-\ It is impossible to fix it in userspace in
a race-free manner. This patch hence converts local_chmod() to rely on
open_nofollow() and fchmod(). This fixes the vulnerability but introduces
a limitation: the target file must readable and/or writable for the call
to openat() to succeed.
It introduces a local_set_xattrat() replacement to local_set_xattr()
based on fsetxattrat() to fix (2), and a local_set_mapped_file_attrat()
replacement to local_set_mapped_file_attr() based on local_fopenat()
and mkdirat() to fix (3). No effort is made to factor out code because
both local_set_xattr() and local_set_mapped_file_attr() will be dropped
when all users have been converted to use the "at" versions.
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit e3187a45dd)
[BR: Fix and/or infrastructure for BSC#1020427 CVE-2016-9602]
Signed-off-by: Bruce Rogers <brogers@suse.com>
The local_link() callback is vulnerable to symlink attacks because it calls:
(1) link() which follows symbolic links for all path elements but the
rightmost one
(2) local_create_mapped_attr_dir()->mkdir() which follows symbolic links
for all path elements but the rightmost one
This patch converts local_link() to rely on opendir_nofollow() and linkat()
to fix (1), mkdirat() to fix (2).
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit ad0b46e6ac)
[BR: Fix and/or infrastructure for BSC#1020427 CVE-2016-9602]
Signed-off-by: Bruce Rogers <brogers@suse.com>
When using the mapped-file security model, we also have to create a link
for the metadata file if it exists. In case of failure, we should rollback.
That's what this patch does.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 6dd4b1f1d0)
[BR: Fix and/or infrastructure for BSC#1020427 CVE-2016-9602]
Signed-off-by: Bruce Rogers <brogers@suse.com>
The local_rename() callback is vulnerable to symlink attacks because it
uses rename() which follows symbolic links in all path elements but the
rightmost one.
This patch simply transforms local_rename() into a wrapper around
local_renameat() which is symlink-attack safe.
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit d2767edec5)
[BR: Fix and/or infrastructure for BSC#1020427 CVE-2016-9602]
Signed-off-by: Bruce Rogers <brogers@suse.com>
The local_renameat() callback is currently a wrapper around local_rename()
which is vulnerable to symlink attacks.
This patch rewrites local_renameat() to have its own implementation, based
on local_opendir_nofollow() and renameat().
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 99f2cf4b2d)
[BR: Fix and/or infrastructure for BSC#1020427 CVE-2016-9602]
Signed-off-by: Bruce Rogers <brogers@suse.com>
The local_lstat() callback is vulnerable to symlink attacks because it
calls:
(1) lstat() which follows symbolic links in all path elements but the
rightmost one
(2) getxattr() which follows symbolic links in all path elements
(3) local_mapped_file_attr()->local_fopen()->openat(O_NOFOLLOW) which
follows symbolic links in all path elements but the rightmost
one
This patch converts local_lstat() to rely on opendir_nofollow() and
fstatat(AT_SYMLINK_NOFOLLOW) to fix (1), fgetxattrat_nofollow() to
fix (2).
A new local_fopenat() helper is introduced as a replacement to
local_fopen() to fix (3). No effort is made to factor out code
because local_fopen() will be dropped when all users have been
converted to call local_fopenat().
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit f9aef99b3e)
[BR: Fix and/or infrastructure for BSC#1020427 CVE-2016-9602]
Signed-off-by: Bruce Rogers <brogers@suse.com>
The local_readlink() callback is vulnerable to symlink attacks because it
calls:
(1) open(O_NOFOLLOW) which follows symbolic links for all path elements but
the rightmost one
(2) readlink() which follows symbolic links for all path elements but the
rightmost one
This patch converts local_readlink() to rely on open_nofollow() to fix (1)
and opendir_nofollow(), readlinkat() to fix (2).
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit bec1e9546e)
[BR: Fix and/or infrastructure for BSC#1020427 CVE-2016-9602]
Signed-off-by: Bruce Rogers <brogers@suse.com>
The local_truncate() callback is vulnerable to symlink attacks because
it calls truncate() which follows symbolic links in all path elements.
This patch converts local_truncate() to rely on open_nofollow() and
ftruncate() instead.
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit ac125d993b)
[BR: Fix and/or infrastructure for BSC#1020427 CVE-2016-9602]
Signed-off-by: Bruce Rogers <brogers@suse.com>
The local_statfs() callback is vulnerable to symlink attacks because it
calls statfs() which follows symbolic links in all path elements.
This patch converts local_statfs() to rely on open_nofollow() and fstatfs()
instead.
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 31e51d1c15)
[BR: Fix and/or infrastructure for BSC#1020427 CVE-2016-9602]
Signed-off-by: Bruce Rogers <brogers@suse.com>
The local_utimensat() callback is vulnerable to symlink attacks because it
calls qemu_utimens()->utimensat(AT_SYMLINK_NOFOLLOW) which follows symbolic
links in all path elements but the rightmost one or qemu_utimens()->utimes()
which follows symbolic links for all path elements.
This patch converts local_utimensat() to rely on opendir_nofollow() and
utimensat(AT_SYMLINK_NOFOLLOW) directly instead of using qemu_utimens().
It is hence assumed that the OS supports utimensat(), i.e. has glibc 2.6
or higher and linux 2.6.22 or higher, which seems reasonable nowadays.
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit a33eda0dd9)
[BR: Fix and/or infrastructure for BSC#1020427 CVE-2016-9602]
Signed-off-by: Bruce Rogers <brogers@suse.com>
The local_remove() callback is vulnerable to symlink attacks because it
calls:
(1) lstat() which follows symbolic links in all path elements but the
rightmost one
(2) remove() which follows symbolic links in all path elements but the
rightmost one
This patch converts local_remove() to rely on opendir_nofollow(),
fstatat(AT_SYMLINK_NOFOLLOW) to fix (1) and unlinkat() to fix (2).
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit a0e640a872)
[BR: Fix and/or infrastructure for BSC#1020427 CVE-2016-9602]
Signed-off-by: Bruce Rogers <brogers@suse.com>
The local_unlinkat() callback is vulnerable to symlink attacks because it
calls remove() which follows symbolic links in all path elements but the
rightmost one.
This patch converts local_unlinkat() to rely on opendir_nofollow() and
unlinkat() instead.
Most of the code is moved to a separate local_unlinkat_common() helper
which will be reused in a subsequent patch to fix the same issue in
local_remove().
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit df4938a665)
[BR: Fix and/or infrastructure for BSC#1020427 CVE-2016-9602]
Signed-off-by: Bruce Rogers <brogers@suse.com>
The local_lremovexattr() callback is vulnerable to symlink attacks because
it calls lremovexattr() which follows symbolic links in all path elements
but the rightmost one.
This patch introduces a helper to emulate the non-existing fremovexattrat()
function: it is implemented with /proc/self/fd which provides a trusted
path that can be safely passed to lremovexattr().
local_lremovexattr() is converted to use this helper and opendir_nofollow().
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 72f0d0bf51)
[BR: Fix and/or infrastructure for BSC#1020427 CVE-2016-9602]
Signed-off-by: Bruce Rogers <brogers@suse.com>
The local_lsetxattr() callback is vulnerable to symlink attacks because
it calls lsetxattr() which follows symbolic links in all path elements but
the rightmost one.
This patch introduces a helper to emulate the non-existing fsetxattrat()
function: it is implemented with /proc/self/fd which provides a trusted
path that can be safely passed to lsetxattr().
local_lsetxattr() is converted to use this helper and opendir_nofollow().
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 3e36aba757)
[BR: Fix and/or infrastructure for BSC#1020427 CVE-2016-9602]
Signed-off-by: Bruce Rogers <brogers@suse.com>
The local_llistxattr() callback is vulnerable to symlink attacks because
it calls llistxattr() which follows symbolic links in all path elements but
the rightmost one.
This patch introduces a helper to emulate the non-existing flistxattrat()
function: it is implemented with /proc/self/fd which provides a trusted
path that can be safely passed to llistxattr().
local_llistxattr() is converted to use this helper and opendir_nofollow().
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 5507904e36)
[BR: Fix and/or infrastructure for BSC#1020427 CVE-2016-9602]
Signed-off-by: Bruce Rogers <brogers@suse.com>
The local_lgetxattr() callback is vulnerable to symlink attacks because
it calls lgetxattr() which follows symbolic links in all path elements but
the rightmost one.
This patch introduces a helper to emulate the non-existing fgetxattrat()
function: it is implemented with /proc/self/fd which provides a trusted
path that can be safely passed to lgetxattr().
local_lgetxattr() is converted to use this helper and opendir_nofollow().
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 56ad3e54da)
[BR: Fix and/or infrastructure for BSC#1020427 CVE-2016-9602]
Signed-off-by: Bruce Rogers <brogers@suse.com>
The local_open() and local_opendir() callbacks are vulnerable to symlink
attacks because they call:
(1) open(O_NOFOLLOW) which follows symbolic links in all path elements but
the rightmost one
(2) opendir() which follows symbolic links in all path elements
This patch converts both callbacks to use new helpers based on
openat_nofollow() to only open files and directories if they are
below the virtfs shared folder
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 996a0d76d7)
[BR: Fix and/or infrastructure for BSC#1020427 CVE-2016-9602]
Signed-off-by: Bruce Rogers <brogers@suse.com>
This patch opens the shared folder and caches the file descriptor, so that
it can be used to do symlink-safe path walk.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 0e35a37829)
[BR: Fix and/or infrastructure for BSC#1020427 CVE-2016-9602]
Signed-off-by: Bruce Rogers <brogers@suse.com>
When using the passthrough security mode, symbolic links created by the
guest are actual symbolic links on the host file system.
Since the resolution of symbolic links during path walk is supposed to
occur on the client side. The server should hence never receive any path
pointing to an actual symbolic link. This isn't guaranteed by the protocol
though, and malicious code in the guest can trick the server to issue
various syscalls on paths whose one or more elements are symbolic links.
In the case of the "local" backend using the "passthrough" or "none"
security modes, the guest can directly create symbolic links to arbitrary
locations on the host (as per spec). The "mapped-xattr" and "mapped-file"
security modes are also affected to a lesser extent as they require some
help from an external entity to create actual symbolic links on the host,
i.e. another guest using "passthrough" mode for example.
The current code hence relies on O_NOFOLLOW and "l*()" variants of system
calls. Unfortunately, this only applies to the rightmost path component.
A guest could maliciously replace any component in a trusted path with a
symbolic link. This could allow any guest to escape a virtfs shared folder.
This patch introduces a variant of the openat() syscall that successively
opens each path element with O_NOFOLLOW. When passing a file descriptor
pointing to a trusted directory, one is guaranteed to be returned a
file descriptor pointing to a path which is beneath the trusted directory.
This will be used by subsequent patches to implement symlink-safe path walk
for any access to the backend.
Symbolic links aren't the only threats actually: a malicious guest could
change a path element to point to other types of file with undesirable
effects:
- a named pipe or any other thing that would cause openat() to block
- a terminal device which would become QEMU's controlling terminal
These issues can be addressed with O_NONBLOCK and O_NOCTTY.
Two helpers are introduced: one to open intermediate path elements and one
to open the rightmost path element.
Suggested-by: Jann Horn <jannh@google.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
(renamed openat_nofollow() to relative_openat_nofollow(),
assert path is relative and doesn't contain '//',
fixed side-effect in assert, Greg Kurz)
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 6482a96163)
[BR: Fix and/or infrastructure for BSC#1020427 CVE-2016-9602]
Signed-off-by: Bruce Rogers <brogers@suse.com>
If these functions fail, they should not change *fs. Let's use local
variables to fix this.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 21328e1e57)
[BR: Fix and/or infrastructure for BSC#1020427 CVE-2016-9602]
Signed-off-by: Bruce Rogers <brogers@suse.com>
If we are to switch back to readdir(), we need a more complex type than
DIR * to be able to serialize concurrent accesses to the directory stream.
This patch introduces a placeholder type and fixes all users.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
(cherry picked from commit f314ea4e30)
[BR: Fix and/or infrastructure for BSC#1020427 CVE-2016-9602]
Signed-off-by: Bruce Rogers <brogers@suse.com>
If this function fails, it should not modify *ctx.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 00c90bd1c2)
[BR: Fix and/or infrastructure for BSC#1020427 CVE-2016-9602]
Signed-off-by: Bruce Rogers <brogers@suse.com>
These functions are always called indirectly. It really doesn't make sense
for them to sit in a header file.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 56fc494bdc)
[BR: Fix and/or infrastructure for BSC#1020427 CVE-2016-9602]
Signed-off-by: Bruce Rogers <brogers@suse.com>
The server can handle MAX_REQ - 1 PDUs at a time and the virtio-9p
device has a MAX_REQ sized virtqueue. If the client manages to fill
up the virtqueue, pdu_alloc() will fail and the request won't be
processed without any notice to the client (it actually causes the
linux 9p client to hang).
This has been there since the beginning (commit 9f10751365 "virtio-9p:
Add a virtio 9p device to qemu"), but it needs an agressive workload to
run in the guest to show up.
We actually allocate MAX_REQ PDUs and I see no reason not to link them
all into the free list, so let's fix the init loop.
Reported-by: Tuomas Tynkkynen <tuomas@tuxera.com>
Suggested-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 0d78289c3d)
[BR: Fix and/or infrastructure for BSC#1020427 CVE-2016-9602]
Signed-off-by: Bruce Rogers <brogers@suse.com>
pdus are initialized and used in 9pfs common code. Move the array from
V9fsVirtioState to V9fsState.
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 583f21f8b9)
[BR: Fix and/or infrastructure for BSC#1020427 CVE-2016-9602]
Signed-off-by: Bruce Rogers <brogers@suse.com>
If the user passes -device virtio-9p without the corresponding -fsdev, QEMU
dereferences a NULL pointer and crashes.
This is a 2.8 regression introduced by commit 702dbcc274.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
(cherry picked from commit f2b58c4375)
[BR: Fix and/or infrastructure for BSC#1020427 CVE-2016-9602]
Signed-off-by: Bruce Rogers <brogers@suse.com>
KVM has a feature bitmap of CPUID bits that it knows works for guests.
QEMU removes bits that are not part of that bitmap automatically on VM
start.
However, some times we just don't list features in that list because
they don't make sense for normal scenarios, but may be useful in specific,
targeted workloads.
For that purpose, add a new =force option to all CPUID feature flags in
the CPU property. With that we can override the accel filtering and give
users full control over the CPUID feature bits exposed into guests.
Signed-off-by: Alexander Graf <agraf@suse.de>
(backported from upstream submission)
When the guest sends VIRTIO_GPU_CMD_RESOURCE_UNREF without detaching the
backing storage beforehand (VIRTIO_GPU_CMD_RESOURCE_DETACH_BACKING)
we'll leak memory.
This patch fixes it for 3d mode, simliar to the 2d mode fix in commit
"b8e2392 virtio-gpu: call cleanup mapping function in resource destroy".
Reported-by: 李强 <liqiang6-s@360.cn>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 1485167210-4757-1-git-send-email-kraxel@redhat.com
(cherry picked from commit 5e8e3c4c75)
[BR: CVE-2017-5857 BSC#1023073]
Signed-off-by: Bruce Rogers <brogers@suse.com>
CCID device emulator uses Application Protocol Data Units(APDU)
to exchange command and responses to and from the host.
The length in these units couldn't be greater than 65536. Add
check to ensure the same. It'd also avoid potential integer
overflow in emulated_apdu_from_guest.
Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-id: 20170202192228.10847-1-ppandit@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit c7dfbf3225)
[BR: CVE-2017-5898 BSC#1023907]
Signed-off-by: Bruce Rogers <brogers@suse.com>
CIRRUS_BLTMODE_MEMSYSSRC blits do NOT check blit destination
and blit width, at all. Oops. Fix it.
Security impact: high.
The missing blit destination check allows to write to host memory.
Basically same as CVE-2014-8106 for the other blit variants.
The missing blit width check allows to overflow cirrus_bltbuf,
with the attractive target cirrus_srcptr (current cirrus_bltbuf write
position) being located right after cirrus_bltbuf in CirrusVGAState.
Due to cirrus emulation writing cirrus_bltbuf bytewise the attacker
hasn't full control over cirrus_srcptr though, only one byte can be
changed. Once the first byte has been modified further writes land
elsewhere.
[ This is CVE-2017-2620 / XSA-209 - Ian Jackson ]
Fixed compilation by removing extra parameter to blit_is_unsafe. -iwj
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
[BR: BSC#1024972]
Signed-off-by: Bruce Rogers <brogers@suse.com>
The blit_region_is_unsafe checks don't work correctly for the
patterncopy source. It's a fixed-sized region, which doesn't
depend on cirrus_blt_{width,height}. So go do the check in
cirrus_bitblt_common_patterncopy instead, then tell blit_is_unsafe that
it doesn't need to verify the source. Also handle the case where we
blit from cirrus_bitbuf correctly.
This patch replaces 5858dd1801.
Security impact: I think for the most part error on the safe side this
time, refusing blits which should have been allowed.
Only exception is placing the blit source at the end of the video ram,
so cirrus_blt_srcaddr + 256 goes beyond the end of video memory. But
even in that case I'm not fully sure this actually allows read access to
host memory. To trick the commit 5858dd18 security checks one has to
pick very small cirrus_blt_{width,height} values, which in turn implies
only a fraction of the blit source will actually be used.
Cc: Wolfgang Bumiller <w.bumiller@proxmox.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Message-id: 1486645341-5010-1-git-send-email-kraxel@redhat.com
(cherry picked from commit 95280c31cd)
Signed-off-by: Bruce Rogers <brogers@suse.com>
If the guest sets the sglist size to a value >=2GB, megasas_handle_dcmd
will return MFI_STAT_MEMORY_NOT_AVAILABLE without freeing the memory.
Avoid this by returning only the status from map_dcmd, and loading
cmd->iov_size in the caller.
Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 765a707000)
[BR: CVE-2017-5856 BSC#1023053]
Signed-off-by: Bruce Rogers <brogers@suse.com>
Apply the cirrus_addr_mask to cirrus_blt_dstaddr and cirrus_blt_srcaddr
right after assigning them, in cirrus_bitblt_start(), instead of having
this all over the place in the cirrus code, and missing a few places.
Reported-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 1485338996-17095-1-git-send-email-kraxel@redhat.com
(cherry picked from commit 60cd23e851)
Signed-off-by: Bruce Rogers <brogers@suse.com>
cirrus_invalidate_region() calls memory_region_set_dirty()
on a per-line basis, always ranging from off_begin to
off_begin+bytesperline. With a negative pitch off_begin
marks the top most used address and thus we need to do an
initial shift backwards by a line for negative pitches of
backward blits, otherwise the first iteration covers the
line going from the start offset forwards instead of
backwards.
Additionally since the start address is inclusive, if we
shift by a full `bytesperline` we move to the first address
*not* included in the blit, so we only shift by one less
than bytesperline.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Message-id: 1485352137-29367-1-git-send-email-w.bumiller@proxmox.com
[ kraxel: codestyle fixes ]
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit f153b563f8)
Signed-off-by: Bruce Rogers <brogers@suse.com>
Right now we reset all devices before we reset the cmma states. This
can result in the host kernel discarding guest pages that were
previously in the unused state but already contain a bios or a -kernel
file before the cmma reset has finished. This race results in random
guest crashes or hangs during very early reboot.
Fixes: 1cd4e0f6f0 ("s390x/cmma: clean up cmma reset")
Cc: qemu-stable@nongnu.org
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
(cherry picked from commit 1a0e4c8b02)
Signed-off-by: Bruce Rogers <brogers@suse.com>
Correct recalculation of vq->inuse after migration for the corner case
where the avail_idx has already wrapped but used_idx not yet.
Also change the type of the VirtQueue.inuse to unsigned int. This is
done to be consistent with other members representing sizes (VRing.num),
and because C99 guarantees max ring size < UINT_MAX but does not
guarantee max ring size < INT_MAX.
Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Fixes: bccdef6b ("virtio: recalculate vq->inuse after migration")
CC: qemu-stable@nongnu.org
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit e66bcc4081)
[BR: BSC#1020928]
Signed-off-by: Bruce Rogers <brogers@suse.com>
While doing multi block SDMA transfer in routine
'sdhci_sdma_transfer_multi_blocks', the 's->fifo_buffer' starting
index 'begin' and data length 's->data_count' could end up to be same.
This could lead to an OOB access issue. Correct transfer data length
to avoid it.
Reported-by: Jiang Xin <jiangxin1@huawei.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
[BR: CVE-2017-5667 BSC#1022541]
Signed-off-by: Bruce Rogers <brogers@suse.com>
When the Intel 6300ESB watchdog is hot unplug. The timer allocated
in realize isn't freed thus leaking memory leak. This patch avoid
this through adding the exit function.
Signed-off-by: Li Qiang <liqiang6-s@360.cn>
Message-Id: <583cde9c.3223ed0a.7f0c2.886e@mx.google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit eb7a20a361)
[BR: CVE-2016-10155 BSC#1021129]
Signed-off-by: Bruce Rogers <brogers@suse.com>
Virtio GPU device while processing 'VIRTIO_GPU_CMD_GET_CAPSET'
command, retrieves the maximum capabilities size to fill in the
response object. It continues to fill in capabilities even if
retrieved 'max_size' is zero(0), thus resulting in OOB access.
Add check to avoid it.
Reported-by: Zhenhao Hong <zhenhaohong@gmail.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-id: 20161214070156.23368-1-ppandit@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit abd7f08b23)
[BR: CVE-2016-10028 BSC#1017084 BSC#1016503]
Signed-off-by: Bruce Rogers <brogers@suse.com>
Commit 4299b90 added a check which is too broad, given that the source
pitch value is not required to be initialized for solid fill operations.
This patch refines the blit_is_unsafe() check to ignore source pitch in
that case. After applying the above commit as a security patch, we
noticed the SLES 11 SP4 guest gui failed to initialize properly.
Signed-off-by: Bruce Rogers <brogers@suse.com>
Message-id: 20170109203520.5619-1-brogers@suse.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 913a87885f)
[BR: BSC#1016779]
Signed-off-by: Bruce Rogers <brogers@suse.com>
In Cirrus CLGD 54xx VGA Emulator, if cirrus graphics mode is VGA,
'cirrus_get_bpp' returns zero(0), which could lead to a divide
by zero error in while copying pixel data. The same could occur
via blit pitch values. Add check to avoid it.
Reported-by: Huawei PSIRT <psirt@huawei.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-id: 1476776717-24807-1-git-send-email-ppandit@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 4299b90e9b)
[BR: CVE-2016-9921 CVE-2016-9922 BSC#1014702 BSC#1015169]
Signed-off-by: Bruce Rogers <brogers@suse.com>
In virgl_cmd_get_capset function, it uses g_malloc to allocate
a response struct to the guest. As the 'resp'struct hasn't been full
initialized it will lead the 'resp->padding' field to the guest.
Use g_malloc0 to avoid this.
Signed-off-by: Li Qiang <liqiang6-s@360.cn>
[BR: CVE-2016-9908 BSC#1014514]
Signed-off-by: Bruce Rogers <brogers@suse.com>
In ehci_init_transfer function, if the 'cpage' is bigger than 4,
it doesn't free the 'p->sgl' once allocated previously thus leading
a memory leak issue. This patch avoid this.
Signed-off-by: Li Qiang <liqiang6-s@360.cn>
Message-id: 5821c0f4.091c6b0a.e0c92.e811@mx.google.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 791f97758e)
[BR: CVE-2016-9911 BSC#1014111]
Signed-off-by: Bruce Rogers <brogers@suse.com>
Test scsi-{disk,hd,cd} wwn properties for correct 64-bit parsing.
For now piggyback on virtio-scsi.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
All integers would get parsed by strtoll(), not handling the case of
UINT64 properties with the most significient bit set.
Implement a .type_uint64 visitor callback, reusing the existing
parse_str() code through a new argument, using strtoull().
As this is a bug fix, it intentionally ignores checkpatch warnings to
prefer the use of qemu_strto[u]ll() over strto[u]ll().
Cc: qemu-stable@nongnu.org
Signed-off-by: Andreas Färber <afaerber@suse.de>
If the guest destroy the resource before detach banking, the 'iov'
and 'addrs' field in resource is not freed thus leading memory
leak issue. This patch avoid this.
Signed-off-by: Li Qiang <liq3ea@gmail.com>
[BR: CVE-2016-9912 BSC#1014112]
Signed-off-by: Bruce Rogers <brogers@suse.com>
In the init operation of proxy backend dirver, it allocates a
V9fsProxy struct and some other resources. We should free these
resources when the 9pfs device is unrealized. This is what this
patch does.
Signed-off-by: Li Qiang <liq3ea@gmail.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 898ae90a44)
[BR: CVE-2016-9913 BSC#1014110]
Signed-off-by: Bruce Rogers <brogers@suse.com>
In the init operation of handle backend dirver, it allocates a
handle_data struct and opens a mount file. We should free these
resources when the 9pfs device is unrealized. This is what this
patch does.
Signed-off-by: Li Qiang <liq3ea@gmail.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 971f406b77)
[BR: CVE-2016-9913 BSC#1014110]
Signed-off-by: Bruce Rogers <brogers@suse.com>
Currently, the backend of VirtFS doesn't have a cleanup
function. This will lead resource leak issues if the backed
driver allocates resources. This patch addresses this issue.
Signed-off-by: Li Qiang <liq3ea@gmail.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 702dbcc274)
[BR: CVE-2016-9913 BSC#1014110]
Signed-off-by: Bruce Rogers <brogers@suse.com>
Unrealize should undo things that were set during realize in
reverse order. So should do in the error path in realize.
Signed-off-by: Li Qiang <liq3ea@gmail.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 4774718e5c)
[BR: CVE-2016-9913 BSC#1014110]
Signed-off-by: Bruce Rogers <brogers@suse.com>
In virgl_cmd_get_capset_info dispatch function, the 'resp' hasn't
been full initialized before writing to the guest. This will leak
the 'resp.padding' and 'resp.hdr.padding' fieds to the guest. This
patch fix this issue.
Signed-off-by: Li Qiang <liqiang6-s@360.cn>
Message-id: 5818661e.0860240a.77264.7a56@mx.google.com
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 42a8dadc74)
[BR: CVE-2016-9845 BSC#1013767]
Signed-off-by: Bruce Rogers <brogers@suse.com>
In update_cursor_data_virgl function, if the 'width'/ 'height'
is not equal to current cursor's width/height it will return
without free the 'data' allocated previously. This will lead
a memory leak issue. This patch fix this issue.
Signed-off-by: Li Qiang <liqiang6-s@360.cn>
Message-id: 58187760.41d71c0a.cca75.4cb9@mx.google.com
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 2d1cd6c7a9)
[BR: CVE-2016-9846 BSC#1013764]
Signed-off-by: Bruce Rogers <brogers@suse.com>
ColdFire Fast Ethernet Controller uses a receive buffer size
register(EMRBR) to hold maximum size of all receive buffers.
It is set by a user before any operation. If it was set to be
zero, ColdFire emulator would go into an infinite loop while
receiving data in mcf_fec_receive. Add check to avoid it.
Reported-by: Wjjzhang <wjjzhang@tencent.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit 77d54985b8)
[BR: CVE-2016-9776 BSC#1013285]
Signed-off-by: Bruce Rogers <brogers@suse.com>
The 'fs.xattr.value' field in V9fsFidState object doesn't consider the
situation that this field has been allocated previously. Every time, it
will be allocated directly. This leads to a host memory leak issue if
the client sends another Txattrcreate message with the same fid number
before the fid from the previous time got clunked.
Signed-off-by: Li Qiang <liqiang6-s@360.cn>
Reviewed-by: Greg Kurz <groug@kaod.org>
[groug, updated the changelog to indicate how the leak can occur]
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit ff55e94d23)
[BR: CVE-2016-9102 BSC#1007450 BSC#1014256]
Signed-off-by: Bruce Rogers <brogers@suse.com>
9pfs uses g_malloc() to allocate the xattr memory space, if the guest
reads this memory before writing to it, this will leak host heap memory
to the guest. This patch avoid this.
Signed-off-by: Li Qiang <liqiang6-s@360.cn>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit eb68760285)
[BR: CVE-2016-9103 BSC#1007454]
Signed-off-by: Bruce Rogers <brogers@suse.com>
"mask" needs to be inverted before use.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 6cb99acc28)
[BR: BSC#1013341]
Signed-off-by: Bruce Rogers <brogers@suse.com>
Device ivshmem property use64=0 is designed to make the device
expose a 32 bit shared memory BAR instead of 64 bit one. The
default is a 64 bit BAR, except pc-1.2 and older retain a 32 bit
BAR. A 32 bit BAR can support only up to 1 GiB of shared memory.
This worked as designed until commit 5400c02 accidentally flipped
its sense: since then, we misinterpret use64=0 as use64=1 and vice
versa. Worse, the default got flipped as well. Devices
ivshmem-plain and ivshmem-doorbell are not affected.
Fix by restoring the test of IVShmemState member not_legacy_32bit
that got messed up in commit 5400c02. Also update its
initialization for devices ivhsmem-plain and ivshmem-doorbell.
Without that, they'd regress to 32 bit BARs.
Cc: qemu-stable@nongnu.org
Signed-off-by: Zhuang Yanying <ann.zhuangyanying@huawei.com>
Reviewed-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
(cherry picked from commit be4e0d7375)
[BR: BSC#1013341]
Signed-off-by: Bruce Rogers <brogers@suse.com>
With virtio 1, the vring layout is split in 3 separate regions of
contiguous memory for the descriptor table, the available ring and the
used ring, as opposed with legacy virtio which uses a single region.
In case of memory re-mapping, the code ensures it doesn't affect the
vring mapping. This is done in vhost_verify_ring_mappings() which assumes
the device is legacy.
This patch changes vhost_verify_ring_mappings() to check the mappings of
each part of the vring separately.
This works for legacy mappings as well.
Cc: qemu-stable@nongnu.org
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit f1f9e6c596)
[BR: BSC#1013341]
Signed-off-by: Bruce Rogers <brogers@suse.com>
virtio 1.0 spec says this is a legacy feature bit,
hide it from guests in modern mode.
Note: for cross-version migration compatibility,
we keep the bit set in host_features.
The result will be that a guest migrating cross-version
will see host features change under it.
As guests only seem to read it once, this should
not be an issue. Meanwhile, will work to fix guests to
ignore this bit in virtio1 mode, too.
Cc: qemu-stable@nongnu.org
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
(cherry picked from commit 2a083ffd2e)
[BR: BSC#1013341]
Signed-off-by: Bruce Rogers <brogers@suse.com>
Legacy features are those that transitional devices only
expose on the legacy interface.
Allow different ones per device class.
Cc: qemu-stable@nongnu.org # dependency for the next patch
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
(cherry picked from commit 9b706dbbbb)
[BR: BSC#1013341 - Added variable vdc to virtio_device_class_init]
Signed-off-by: Bruce Rogers <brogers@suse.com>
libcurl will only give us as much data as there is, not more. The block
layer will deny requests beyond the end of file for us; but since this
block driver is still using a sector-based interface, we can still get
in trouble if the file size is not a multiple of 512.
While we have already made sure not to attempt transfers beyond the end
of the file, we are currently still trying to receive data from there if
the original request exceeds the file size. This patch fixes this issue
and invokes qemu_iovec_memset() on the iovec's tail.
Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20161025025431.24714-5-mreitz@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
(cherry picked from commit 4e504535c1)
[BR: BSC#1013341]
Signed-off-by: Bruce Rogers <brogers@suse.com>
For some connection types (like FTP, generally), more than one socket
may be used (in FTP's case: control vs. data stream). As of commit
838ef60249 ("curl: Eliminate unnecessary
use of curl_multi_socket_all"), we have to remember all of the sockets
used by libcurl, but in fact we only did that for a single one. Since
one libcurl connection may use multiple sockets, however, we have to
remember them all.
Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20161025025431.24714-4-mreitz@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
[BR: BSC#1013341 - also fixed adjacent line during conflict resolution]
(cherry picked from commit ff5ca1664a)
Signed-off-by: Bruce Rogers <brogers@suse.com>
While commit 38bbc0a580 is correct in that
the callback is supposed to return the number of bytes handled; what it
does not mention is that libcurl will throw an error if the callback did
not "handle" all of the data passed to it.
Therefore, if the callback receives some data that it cannot handle
(either because the receive buffer has not been set up yet or because it
would not fit into the receive buffer) and we have to ignore it, we
still have to report that the data has been handled.
Obviously, this should not happen normally. But it does happen at least
for FTP connections where some data (that we do not expect) may be
generated when the connection is established.
Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20161025025431.24714-3-mreitz@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
(cherry picked from commit 4e7676571b)
[BR: BSC#1013341]
Signed-off-by: Bruce Rogers <brogers@suse.com>
Ensure that there are no changes between the last check to
bdrv_get_dirty_count and the switch to the target.
There is already a bdrv_drained_end call, we only need to ensure
that bdrv_drained_begin is not called twice.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-Id: <1477565348-5458-4-git-send-email-pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
(cherry picked from commit 9a0cec664e)
[BR: BSC#1013341]
Signed-off-by: Bruce Rogers <brogers@suse.com>
If the qio_channel_tls_new_(server|client) methods fail,
we disconnect the client. Unfortunately a missing return
means we then go on to try and run the TLS handshake on
a NULL I/O channel. This gives predictably segfaulty
results.
The main way to trigger this is to request a bogus TLS
priority string for the TLS credentials. e.g.
-object tls-creds-x509,id=tls0,priority=wibble,...
Most other ways appear impossible to trigger except
perhaps if OOM conditions cause gnutls initialization
to fail.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(cherry picked from commit 660a2d83e0)
[BR: BSC#1013341]
Signed-off-by: Bruce Rogers <brogers@suse.com>
We can teach Xen to drain and flush each device as it needs to, instead
of trying to flush ALL devices. This removes the last user of
blk_flush_all.
The function is therefore removed under the premise that any new uses
of blk_flush_all would be the wrong paradigm: either flush the single
device that requires flushing, or use an appropriate flush_all mechanism
from outside of the BlkBackend layer.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 49137bf684)
[BR: BSC#1013341 - remove also from place we have patched]
Signed-off-by: Bruce Rogers <brogers@suse.com>
Reimplement bdrv_flush_all for vm_stop. In contrast to blk_flush_all,
bdrv_flush_all does not have device model restrictions. This allows
us to flush and halt unconditionally without error.
This allows us to do things like migrate when we have a device with
an open tray, but has a node that may need to be flushed, or nodes
that aren't currently attached to any device and need to be flushed.
Specifically, this allows us to migrate when we have a CDROM with
an open tray.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 22af08eacf)
[BR: BSC#1013341]
Signed-off-by: Bruce Rogers <brogers@suse.com>
Commit fe1a9cbc moved the flush_all routine from the bdrv layer to the
block-backend layer. In doing so, however, the semantics of the routine
changed slightly such that flush_all now used blk_flush instead of
bdrv_flush.
blk_flush can fail if the attached device model reports that it is not
"available," (i.e. the tray is open.) This changed the semantics of
flush_all such that it can now fail for e.g. open CDROM drives.
Reintroduce bdrv_flush_all to regain the old semantics without having to
alter the behavior of blk_flush or blk_flush_all, which are already
'doing the right thing.'
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 4085f5c7a2)
[BR: BSC#1013341 - slight tweak needed in backport]
Signed-off-by: Bruce Rogers <brogers@suse.com>
When gtk_widget_style_get() is used to get the "inner-border" style
property, it returns a copy of the GtkBorder which must be freed by
the caller.
This patch also fixes a warning about the unused 'padding' structure
with VTE 0.36.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 1463127654-5171-1-git-send-email-berto@igalia.com
Cc: Cole Robinson <crobinso@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
[ kraxel: adapted to changes in ui patch queue ]
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 6978dc4adc)
[BR: BSC#1008519]
Signed-off-by: Bruce Rogers <brogers@suse.com>
The following sequence of operations fails:
virsh start vm
virsh snapshot-create vm
virshh save vm --file file
with the following error
error: Failed to save domain vm to file
error: internal error: unable to execute QEMU command 'migrate':
There's a migration process in progress
The problem is that qemu_savevm_state() calls migrate_init() which sets
migration state to MIGRATION_STATUS_SETUP and never cleaned it up.
This patch do the job.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
CC: Amit Shah <amit.shah@redhat.com>
Message-Id: <1466003203-26263-1-git-send-email-den@openvz.org>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
(cherry picked from commit 6dcf66681a)
Signed-off-by: Alexander Graf <agraf@suse.de>
Some software algorithms are based on the hardware's cache info, for example,
for x86 linux kernel, when cpu1 want to wakeup a task on cpu2, cpu1 will trigger
a resched IPI and told cpu2 to do the wakeup if they don't share low level
cache. Oppositely, cpu1 will access cpu2's runqueue directly if they share llc.
The relevant linux-kernel code as bellow:
static void ttwu_queue(struct task_struct *p, int cpu)
{
struct rq *rq = cpu_rq(cpu);
......
if (... && !cpus_share_cache(smp_processor_id(), cpu)) {
......
ttwu_queue_remote(p, cpu); /* will trigger RES IPI */
return;
}
......
ttwu_do_activate(rq, p, 0); /* access target's rq directly */
......
}
In real hardware, the cpus on the same socket share L3 cache, so one won't
trigger a resched IPIs when wakeup a task on others. But QEMU doesn't present a
virtual L3 cache info for VM, then the linux guest will trigger lots of RES IPIs
under some workloads even if the virtual cpus belongs to the same virtual socket.
For KVM, there will be lots of vmexit due to guest send IPIs.
The workload is a SAP HANA's testsuite, we run it one round(about 40 minuates)
and observe the (Suse11sp3)Guest's amounts of RES IPIs which triggering during
the period:
No-L3 With-L3(applied this patch)
cpu0: 363890 44582
cpu1: 373405 43109
cpu2: 340783 43797
cpu3: 333854 43409
cpu4: 327170 40038
cpu5: 325491 39922
cpu6: 319129 42391
cpu7: 306480 41035
cpu8: 161139 32188
cpu9: 164649 31024
cpu10: 149823 30398
cpu11: 149823 32455
cpu12: 164830 35143
cpu13: 172269 35805
cpu14: 179979 33898
cpu15: 194505 32754
avg: 268963.6 40129.8
The VM's topology is "1*socket 8*cores 2*threads".
After present virtual L3 cache info for VM, the amounts of RES IPIs in guest
reduce 85%.
For KVM, vcpus send IPIs will cause vmexit which is expensive, so it can cause
severe performance degradation. We had tested the overall system performance if
vcpus actually run on sparate physical socket. With L3 cache, the performance
improves 7.2%~33.1%(avg:15.7%).
Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
[agraf: cherry picked from 14c985cffa]
[agraf: disabled by default]
Signed-off-by: Alexander Graf <agraf@suse.de>
Conflicts:
include/hw/i386/pc.h
target-i386/cpu.c
target-i386/cpu.h
Conflicts:
target-i386/cpu-qom.h
target-i386/cpu.c
I looked at a dozen Intel CPU that have this CPUID and all of them
always had Core offset as 1 (a wasted bit when hyperthreading is
disabled) and Package offset at least 4 (wasted bits at <= 4 cores).
QEMU uses more compact IDs and it doesn't make much sense to change it
now. I keep the SMT and Core sub-leaves even if there is just one
thread/core; it makes the code simpler and there should be no harm.
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
(cherry picked from commit 5232d00a04)
[backported and disabled by default]
Signed-off-by: Alexander Graf <agraf@suse.de>
Conflicts:
include/hw/i386/pc.h
target-i386/cpu.h
i.MX Fast Ethernet Controller uses buffer descriptors to manage
data flow to/fro receive & transmit queues. While transmitting
packets, it could continue to read buffer descriptors if a buffer
descriptor has length of zero and has crafted values in bd.flags.
Set an upper limit to number of buffer descriptors.
Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
[BR: CVE-2016-7907 BSC#1002549]
Signed-off-by: Bruce Rogers <brogers@suse.com>
The JAZZ RC4030 chipset emulator has a periodic timer and
associated interval reload register. The reload value is used
as divider when computing timer's next tick value. If reload
value is large, it could lead to divide by zero error. Limit
the interval reload value to avoid it.
Reported-by: Huawei PSIRT <psirt@huawei.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
[BR: CVE-2016-8667 BSC#1004702]
Signed-off-by: Bruce Rogers <brogers@suse.com>
The v9fs_xattr_read() and v9fs_xattr_write() are passed a guest
originated offset: they must ensure this offset does not go beyond
the size of the extended attribute that was set in v9fs_xattrcreate().
Unfortunately, the current code implement these checks with unsafe
calculations on 32 and 64 bit values, which may allow a malicious
guest to cause OOB access anyway.
Fix this by comparing the offset and the xattr size, which are
both uint64_t, before trying to compute the effective number of bytes
to read or write.
Suggested-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Li Qiang <liqiang6-s@360.cn>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-By: Guido Günther <agx@sigxcpu.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 7e55d65c56)
[BR: CVE-2016-9104 BSC#1007493]
Signed-off-by: Bruce Rogers <brogers@suse.com>
Intel HDA emulator uses stream of buffers during DMA data
transfers. Each entry has buffer length and buffer pointer
position, which are used to derive bytes to 'copy'. If this
length and buffer pointer were to be same, 'copy' could be
set to zero(0), leading to an infinite loop. Add check to
avoid it.
Reported-by: Huawei PSIRT <psirt@huawei.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1476949224-6865-1-git-send-email-ppandit@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 0c0fc2b5fd)
[BR: CVE-2016-8909 BSC#1006536]
Signed-off-by: Bruce Rogers <brogers@suse.com>
RTL8139 ethernet controller in C+ mode supports multiple
descriptor rings, each with maximum of 64 descriptors. While
processing transmit descriptor ring in 'rtl8139_cplus_transmit',
it does not limit the descriptor count and runs forever. Add
check to avoid it.
Reported-by: Andrew Henderson <hendersa@icculus.org>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit c7c3591669)
[BR: CVE-2016-8910 BSC#1006538]
Signed-off-by: Bruce Rogers <brogers@suse.com>
Rocker network switch emulator has test registers to help debug
DMA operations. While testing host DMA access, a buffer address
is written to register 'TEST_DMA_ADDR' and its size is written to
register 'TEST_DMA_SIZE'. When performing TEST_DMA_CTRL_INVERT
test, if DMA buffer size was greater than 'INT_MAX', it leads to
an invalid buffer access. Limit the DMA buffer size to avoid it.
Reported-by: Huawei PSIRT <psirt@huawei.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit 8caed3d564)
[BR: CVE-2016-8668 BSC#1004706]
Signed-off-by: Bruce Rogers <brogers@suse.com>
The exit dispatch of eepro100 network card device doesn't free
the 's->vmstate' field which was allocated in device realize thus
leading a host memory leak. This patch avoid this.
Signed-off-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit 2634ab7fe2)
[BR: CVE-2016-9101 BSC#1007391]
Signed-off-by: Bruce Rogers <brogers@suse.com>
The AMD PC-Net II emulator has set of control and status(CSR)
registers. Of these, CSR76 and CSR78 hold receive and transmit
descriptor ring length respectively. This ring length could range
from 1 to 65535. Setting ring length to zero leads to an infinite
loop in pcnet_rdra_addr() or pcnet_transmit(). Add check to avoid it.
Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit 34e29ce754)
[BR: CVE-2016-7909 BSC#1002557]
Signed-off-by: Bruce Rogers <brogers@suse.com>
16550A UART device uses an oscillator to generate frequencies
(baud base), which decide communication speed. This speed could
be changed by dividing it by a divider. If the divider is
greater than the baud base, speed is set to zero, leading to a
divide by zero error. Add check to avoid it.
Reported-by: Huawei PSIRT <psirt@huawei.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <1476251888-20238-1-git-send-email-ppandit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 3592fe0c91)
[BR: CVE-2016-8669 BSC#1004707]
Signed-off-by: Bruce Rogers <brogers@suse.com>
If an error occurs when marshalling the transfer length to the guest, the
v9fs_write() function doesn't free an IO vector, thus leading to a memory
leak. This patch fixes the issue.
Signed-off-by: Li Qiang <liqiang6-s@360.cn>
Reviewed-by: Greg Kurz <groug@kaod.org>
[groug, rephrased the changelog]
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit fdfcc9aeea)
[BR: CVE-2016-9106 BSC#1007495]
Signed-off-by: Bruce Rogers <brogers@suse.com>
In 9pfs read dispatch function, it doesn't free two QEMUIOVector
object thus causing potential memory leak. This patch avoid this.
Signed-off-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit e95c9a493a)
[BR: CVE-2016-8577 BSC#1003893]
Signed-off-by: Bruce Rogers <brogers@suse.com>
The v9fs_link() function keeps a reference on the source fid object. This
causes a memory leak since the reference never goes down to 0. This patch
fixes the issue.
Signed-off-by: Li Qiang <liqiang6-s@360.cn>
Reviewed-by: Greg Kurz <groug@kaod.org>
[groug, rephrased the changelog]
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 4c1586787f)
[BR: CVE-2016-9105 BSC#1007494]
Signed-off-by: Bruce Rogers <brogers@suse.com>
If a guest sends an empty string paramater to any 9P operation, the current
code unmarshals it into a V9fsString equal to { .size = 0, .data = NULL }.
This is unfortunate because it can cause NULL pointer dereference to happen
at various locations in the 9pfs code. And we don't want to check str->data
everywhere we pass it to strcmp() or any other function which expects a
dereferenceable pointer.
This patch enforces the allocation of genuine C empty strings instead, so
callers don't have to bother.
Out of all v9fs_iov_vunmarshal() users, only v9fs_xattrwalk() checks if
the returned string is empty. It now uses v9fs_string_size() since
name.data cannot be NULL anymore.
Signed-off-by: Li Qiang <liqiang6-s@360.cn>
[groug, rewritten title and changelog,
fix empty string check in v9fs_xattrwalk()]
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit ba42ebb863)
[BR: CVE-2016-8578 BSC#1003894]
Signed-off-by: Bruce Rogers <brogers@suse.com>
While processing isochronous transfer descriptors(iTD), if the page
select(PG) field value is out of bands it will return. In this
situation the ehci's sg list is not freed thus leading to a memory
leak issue. This patch avoid this.
Signed-off-by: Li Qiang <liqiang6-s@360.cn>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(cherry picked from commit b16c129daf)
[BR: CVE-2016-7995 BSC#1003612]
Signed-off-by: Bruce Rogers <brogers@suse.com>
ColdFire Fast Ethernet Controller uses buffer descriptors to manage
data flow to/fro receive & transmit queues. While transmitting
packets, it could continue to read buffer descriptors if a buffer
descriptor has length of zero and has crafted values in bd.flags.
Set upper limit to number of buffer descriptors.
Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit 070c4b92b8)
[BR: CVE-2016-7908 BSC#1002550]
Signed-off-by: Bruce Rogers <brogers@suse.com>
virtio back end uses set of buffers to facilitate I/O operations.
If its size is too large, 'cpu_physical_memory_map' could return
a null address. This would result in a null dereference while
un-mapping descriptors. Add check to avoid it.
Reported-by: Qinghao Tang <luodalongde@gmail.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
(cherry picked from commit 973e7170dd)
[BR: CVE-2016-7422 BSC#1000346]
Signed-off-by: Bruce Rogers <brogers@suse.com>
The .receive callback of xlnx.xps-ethernetlite doesn't check the length
of data before calling memcpy. As a result, the NetClientState object in
heap will be overflowed. All versions of qemu with xlnx.xps-ethernetlite
will be affected.
Reported-by: chaojianhu <chaojianhu@hotmail.com>
Signed-off-by: chaojianhu <chaojianhu@hotmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit a0d1cbdacf)
[BR: CVE-2016-7161 BSC#1001151]
Signed-off-by: Bruce Rogers <brogers@suse.com>
When running with KVM enabled, you can choose between emulating the
gic in kernel or user space. If the kernel supports in-kernel virtualization
of the interrupt controller, it will default to that. If not, if will
default to user space emulation.
Unfortunately when running in user mode gic emulation, we miss out on
timer events which are only available from kernel space. This patch leverages
the new kernel/user space notification mechanism for those timer events.
Signed-off-by: Alexander Graf <agraf@suse.de>
In PVSCSI paravirtual SCSI bus, pvscsi_convert_sglist can take a very
long time or go into an infinite loop due to two different bugs:
1) the request descriptor data length is defined to be 64 bit. While
building SG list from a request descriptor, it gets truncated to 32bit
in routine 'pvscsi_convert_sglist'. This could lead to an infinite loop
situation for large 'dataLen' values, when data_length is cast to uint32_t
and chunk_size becomes always zero. Fix this by removing the incorrect
cast.
2) pvscsi_get_next_sg_elem can be called arbitrarily many times if the
element has a zero length. Get out of the loop early when this happens,
by introducing an upper limit on the number of SG list elements.
Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
[BR: CVE-2016-7156 BSC#997859]
Signed-off-by: Bruce Rogers <brogers@suse.com>
In Vmxnet3 device emulator while processing transmit(tx) queue,
when it reaches end of packet, it calls vmxnet3_complete_packet.
In that local 'txcq_descr' object is not initialised, which could
leak host memory bytes a guest.
Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
[BR: CVE-2016-6836 BSC#994760]
Signed-off-by: Bruce Rogers <brogers@suse.com>
Vmxnet3 device emulator does not check if the device is active,
before using it for write. It leads to a use after free issue,
if the vmxnet3_io_bar0_write routine is called after the device is
deactivated. Add check to avoid it.
Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Acked-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit 6c352ca9b4)
[BR: CVE-2016-6888 BSC#994771]
Signed-off-by: Bruce Rogers <brogers@suse.com>
virtio back end uses set of buffers to facilitate I/O operations.
An infinite loop unfolds in virtqueue_pop() if a buffer was
of zero size. Add check to avoid it.
Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 1e7aed7014)
[BR: CVE-2016-6490 BSC#991466]
Signed-off-by: Bruce Rogers <brogers@suse.com>
References: bnc#953518 - disks added via SCSI controller are visible twice on HVM XEN guest systems
Using 'vdev=sd[a-o]' will create an emulated LSI controller, which can
be used by the emulated BIOS to boot from disk. If the HVM domU has also
PV driver the disk may appear twice in the guest. To avoid this an
unplug of the emulated hardware is needed, similar to what is done for
IDE and NIC drivers already.
Since the SCSI controller provides only disks the entire controller can
be unplugged at once.
Impact of the change for classic and pvops based guest kernels:
vdev=sda:disk0
before: pvops: disk0=pv xvda + emulated sda
classic: disk0=pv sda + emulated sdq
after: pvops: disk0=pv xvda
classic: disk0=pv sda
vdev=hda:disk0, vdev=sda:disk1
before: pvops: disk0=pv xvda
disk1=emulated sda
classic: disk0=pv hda
disk1=pv sda + emulated sdq
after: pvops: disk0=pv xvda
disk1=not accessible by blkfront, index hda==index sda
classic: disk0=pv hda
disk1=pv sda
vdev=hda:disk0, vdev=sda:disk1, vdev=sdb:disk2
before: pvops: disk0=pv xvda
disk1=emulated sda
disk2=pv xvdb + emulated sdb
classic: disk0=pv hda
disk1=pv sda + emulated sdq
disk2=pv sdb + emulated sdr
after: pvops: disk0=pv xvda
disk1=not accessible by blkfront, index hda==index sda
disk2=pv xvdb
classic: disk0=pv hda
disk1=pv sda
disk2=pv sda
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Instead of calling xen_be_register() for each supported backend type
for hvm and pv guests in their machine init functions use a common
function in order not to have to add new backends twice.
This at once fixes the error that hvm domains couldn't use the qusb
backend.
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Message-id: 1470119552-16170-1-git-send-email-jgross@suse.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 0e39bb022b)
[BR: BSC#991785]
Signed-off-by: Bruce Rogers <brogers@suse.com>
Change from using glib alloc and free routines to those
from libc. Also perform safety measure of dropping privs
to user if configured no-caps.
[BR: BOO#988279]
Signed-off-by: Bruce Rogers <brogers@suse.com>
The problem with excessive flushing was found by a couple of performance
tests:
- parallel directory tree creation (from 2 processes)
- 32 cached writes + fsync at the end in a loop
For the first one results improved from 2.6 loops/sec to 3.5 loops/sec.
Each loop creates 10^3 directories with 10 files in each.
For the second one results improved from ~600 fsync/sec to ~1100
fsync/sec. Though, it was run on SSD so it probably won't show such
performance gain on rotational media.
qcow2_cache_flush() calls bdrv_flush() unconditionally after writing
cache entries of a particular cache. This can lead to as many as
2 additional fdatasyncs inside bdrv_flush.
We can simply skip all fdatasync calls inside qcow2_co_flush_to_os
as bdrv_flush for sure will do the job. These flushes are necessary to
keep the right order of writes to the different caches. Though this is
not necessary in the current code base as this ordering is ensured through
the flush in qcow2_cache_flush_dependency().
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Pavel Borzenkov <pborzenkov@virtuozzo.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit f3c3b87dae)
[BR: BSC#991296]
Signed-off-by: Bruce Rogers <brogers@suse.com>
When unplugging a device in the Xen pvusb backend drain the submit
queue before deallocation of the control structures. Otherwise there
will be bogus memory accesses when I/O contracts are finished.
Correlated to this issue is the handling of cancel requests: a packet
cancelled will still lead to the call of complete, so add a flag
to the request indicating it should be just dropped on complete.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Bruce Rogers <brogers@suse.com>
When a Xenstore watch fires indicating a backend has to be removed
don't remove all backends for that domain with the specified device
index, but just the one which has the correct type.
The easiest way to achieve this is to use the already determined
xendev as parameter for xen_be_del_xendev() instead of only the domid
and device index.
This at once removes the open coded QTAILQ_FOREACH_SAVE() in
xen_be_del_xendev() as there is no need to search for the correct
xendev any longer.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Bruce Rogers <brogers@suse.com>
Since QEMU performs cacheable accesses to guest memory when doing DMA
as part of the implementation of emulated PCI devices, guest drivers
should use cacheable accesses as well when running under KVM. Since this
essentially means that emulated PCI devices are DMA coherent, set the
'dma-coherent' DT property on the PCIe host controller DT node.
This brings the DT description into line with the ACPI description,
which already marks the PCI bridge as cache coherent (see commit
bc64b96c98).
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Message-id: 1467134090-5099-1-git-send-email-ard.biesheuvel@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit 5d636e21c4)
Signed-off-by: Alexander Graf <agraf@suse.de>
Commit 926cde5 ("scsi: esp: make cmdbuf big enough for maximum CDB size",
2016-06-16) changed the size of a migrated field. Split it in two
parts, and only migrate the second part in a new vmstate version.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit cc96677469)
[BR: CVE-2016-6351 BSC#990835]
Signed-off-by: Bruce Rogers <brogers@suse.com>
Implement SUSE specific unplug protocol for emulated PCI devices
in PVonHVM guests
(bsc#953339, bsc#953362, bsc#953518, bsc#984981)
Signed-off-by: Olaf Hering <ohering@suse.de>
Commit 9432e53a5b added xen_sysdev as a
system device to serve as an anchor for removable virtual buses. This
introduced a build failure for non-x86 builds with CONFIG_XEN_BACKEND
set, as xen_sysdev was defined in a x86 specific file while being
consumed in an architecture independent source.
Move the xen_sysdev definition and initialization to xen_backend.c to
avoid the build failure.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Add a Xenstore directory for each supported pv backend. This will allow
Xen tools to decide which backend type to use in case there are
multiple possibilities.
The information is added under
/local/domain/<backend-domid>/device-model/<domid>/backends
before the "running" state is written to Xenstore. Using a directory
for each backend enables us to add parameters for specific backends
in the future.
This interface is documented in the Xen source repository in the file
docs/misc/qemu-backends.txt
In order to reuse the Xenstore directory creation already present in
hw/xen/xen_devconfig.c move the related functions to
hw/xen/xen_backend.c where they fit better.
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Message-id: 1463062421-613-3-git-send-email-jgross@suse.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 637c53ffcb)
[BR: FATE#316612]
Signed-off-by: Bruce Rogers <brogers@suse.com>
Since commit a1666142: acpi-build: make ROMs RAM blocks resizeable,
xen HVM direct kernel boot failed. Xen HVM direct kernel boot will
insert a linuxboot.bin or multiboot.bin to /genroms, before this
commit, in acpi_setup, for rom linuxboot.bin/multiboot.bin, it
only needs 0x20000 size; after the commit, it will reserve x16
size for resize, that is 0x200000 size. It causes xen_ram_alloc
failed due to running out of memory.
To resolve it, either:
1. keep using original rom size instead of max size, don't reserve x16 size.
2. guest maxmem needs to be increased. (commit c1d322e6 "xen-hvm: increase
maxmem before calling xc_domain_populate_physmap" solved the problem for
a time, by accident. But then it is reverted in commit ffffbb369 due to
other problem.)
For 2, more discussion is needed about howto. So this patch tries 1, to
use unresizable rom size in xen case in rom_set_mr.
[CYL: BSC#970791]
Signed-off-by: Chunyan Liu <cyliu@suse.com>
Large discard requests lead to sign expansion errors in qemu.
Since there is no API to tell a guest about the limitations qmeu
has to split a large request itself.
[bsc#964427]
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Add code to read the suse specific suse-diskcache-disable-flush flag out
of xenstore, and set the equivalent flag within QEMU.
Patch taken from Xen's patch queue, Olaf Hering being the original author.
See https://bugzilla.novell.com/show_bug.cgi?id=879425
Signed-off-by: Bruce Rogers <brogers@suse.com>
Using xen tools 'xl vncviewer' with tigervnc (default on SLE-12),
found that: the display of the guest is unexpected while keep
pressing a key. We expect the same character multiple times, but
it prints only one time. This happens on a PV guest in text mode.
After debugging, found that tigervnc sends repeated key down events
in this case, to differentiate from user pressing the same key many
times. Vnc server only prints the character when it finally receives
key up event.
To solve this issue, this patch tries to add additional key up event
before the next repeated key down event (if the key is not a control
key).
[CYL: BSC#882405]
Signed-off-by: Chunyan Liu <cyliu@suse.com>
The dictzip code in SLE11 received some treatment over time to support
running on big endian hosts. Somewhere in the transition to SLE12 this
support got lost. Add it back in again from the SLE11 code base.
Furthermore while at it, fix up the debug prints to not emit warnings.
[AG: BSC#937572]
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Andreas Färber <afaerber@suse.de>
After encountering make check errors for some time, it appears that the
problem is the occasional socket timeout, which is probably due to a
combination of a slow and overloaded build host in the build service.
Increase the timeout from 5 to 15 seconds and hope we don't see the
problem again.
Signed-off-by: Bruce Rogers <brogers@suse.com>
qemu-kvm 0.15 uses the same GPE format as qemu 1.4, but as version 2
rather than 3.
Addresses part of BNC#812836.
Signed-off-by: Andreas Färber <afaerber@suse.de>
qemu-kvm 0.15 had a VMSTATE_UINT32(flags, PITState) field that
qemu 1.4 does not have.
Addresses part of BNC#812836.
Signed-off-by: Andreas Färber <afaerber@suse.de>
qemu-kvm.git commit a7fe0297840908a4fd65a1cf742481ccd45960eb
(Extend vram size to 16MB) deviated from qemu.git since kvm-61, and only
in commit 9e56edcf8d (vga: raise default
vgamem size) did qemu.git adjust the VRAM size for v1.2.
Add compatibility properties so that up to and including pc-0.15 we
maintain migration compatibility with qemu-kvm rather than QEMU and
from pc-1.0 on with QEMU (last qemu-kvm release was 1.2).
Addresses part of BNC#812836.
Signed-off-by: Andreas Färber <afaerber@suse.de>
[BR: adjust comma position in list in macro for v2.5.0 compat]
Signed-off-by: Bruce Rogers <brogers@suse.com>
Our current VNC code does not handle color maps (aka non-true-color) at all
and aborts if a client requests them. There are 2 major issues with this:
1) A VNC viewer on an 8-bit X11 system may request color maps
2) RealVNC _always_ starts requesting color maps, then moves on to full color
In order to support these 2 use cases, let's just create a fake color map
that covers exactly our normal true color 8 bit color space. That way we don't
lose anything over a client that wants true color.
Reported-by: Sascha Wehnert <swehnert@suse.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Bruce Rogers <brogers@suse.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
Allow for guests with higher amounts of ram. The current thought
is that 2TB specified on qemu commandline would be an appropriate
limit. Note that this requires the next higher bit value since
the highest address is actually more than 2TB due to the pci
memory hole.
Signed-off-by: Bruce Rogers <brogers@suse.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
For SLES we want users to be able to use large memory configurations
with KVM without fiddling with ulimit -Sv.
Signed-off-by: Andreas Färber <afaerber@suse.de>
[BR: add include for sys/resource.h]
Signed-off-by: Bruce Rogers <brogers@suse.com>
When doing lseek, SEEK_SET indicates that the offset is an unsigned variable.
Other seek types have parameters that can be negative.
When converting from 32bit to 64bit parameters, we need to take this into
account and enable SEEK_END and SEEK_CUR to be negative, while SEEK_SET stays
absolute positioned which we need to maintain as unsigned.
Signed-off-by: Alexander Graf <agraf@suse.de>
Virtio-Console can only process one character at a time. Using it on S390
gave me strage "lags" where I got the character I pressed before when
pressing one. So I typed in "abc" and only received "a", then pressed "d"
but the guest received "b" and so on.
While the stdio driver calls a poll function that just processes on its
queue in case virtio-console can't take multiple characters at once, the
muxer does not have such callbacks, so it can't empty its queue.
To work around that limitation, I introduced a new timer that only gets
active when the guest can not receive any more characters. In that case
it polls again after a while to check if the guest is now receiving input.
This patch fixes input when using -nographic on s390 for me.
Some termcaps (found using SLES11SP1) use [? sequences. According to man
console_codes (http://linux.die.net/man/4/console_codes) the question mark
is a nop and should simply be ignored.
This patch does exactly that, rendering screen output readable when
outputting guest serial consoles to the graphical console emulator.
Signed-off-by: Alexander Graf <agraf@suse.de>
Tar is a very widely used format to store data in. Sometimes people even put
virtual machine images in there.
So it makes sense for qemu to be able to read from tar files. I implemented a
written from scratch reader that also knows about the GNU sparse format, which
is what pigz creates.
This version checks for filenames that end on well-known extensions. The logic
could be changed to search for filenames given on the command line, but that
would require changes to more parts of qemu.
The tar reader in conjunctiuon with dzip gives us the chance to download
tar'ed up virtual machine images (even via http) and instantly make use of
them.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Bruce Rogers <brogers@novell.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
[TH: Use bdrv_open options instead of filename]
Signed-off-by: Tim Hardeck <thardeck@suse.de>
[AF: bdrv_file_open got an Error **errp argument, bdrv_delete -> brd_unref]
[AF: qemu_opts_create_nofail() -> qemu_opts_create(),
bdrv_file_open() -> bdrv_open(), based on work by brogers]
[AF: error_is_set() dropped for v2.1.0-rc0]
[AF: BlockDriverAIOCB -> BlockAIOCB,
BlockDriverCompletionFunc -> BlockCompletionFunc,
qemu_aio_release() -> qemu_aio_unref(),
drop tar_aio_cancel()]
[AF: common-obj-y -> block-obj-y, drop probe hook (bsc#945778)]
[AF: Drop bdrv_open() drv parameter for 2.5]
Signed-off-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Bruce Rogers <brogers@suse.com>
DictZip is an extension to the gzip format that allows random seeks in gzip
compressed files by cutting the file into pieces and storing the piece offsets
in the "extra" header of the gzip format.
Thanks to that extension, we can use gzip compressed files as block backend,
though only in read mode.
This makes a lot of sense when stacked with tar files that can then be shipped
to VM users. If a VM image is inside a tar file that is inside a DictZip
enabled gzip file, the user can run the tar.gz file as is without having to
extract the image first.
Tar patch follows.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Bruce Rogers <brogers@novell.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
[TH: Use bdrv_open options instead of filename]
Signed-off-by: Tim Hardeck <thardeck@suse.de>
[AF: Error **errp added for bdrv_file_open, bdrv_delete -> bdrv_unref]
[AF: qemu_opts_create_nofail() -> qemu_opts_create(),
bdrv_file_open() -> bdrv_open(), based on work by brogers]
[AF: error_is_set() dropped for v2.1.0-rc0]
[AF: BlockDriverAIOCB -> BlockAIOCB,
BlockDriverCompletionFunc -> BlockCompletionFunc,
qemu_aio_release() -> qemu_aio_unref(),
drop dictzip_aio_cancel()]
[AF: common-obj-y -> block-obj-y, drop probe hook (bsc#945778)]
[AF: Drop bdrv_open() drv parameter for 2.5]
Signed-off-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Bruce Rogers <brogers@suse.com>
Linux syscalls pass pointers or data length or other information of that sort
to the kernel. This is all stuff you don't want to have sign extended.
Otherwise a host 64bit variable parameter with a size parameter will extend
it to a negative number, breaking lseek for example.
Pass syscall arguments as ulong always.
Signed-off-by: Alexander Graf <agraf@suse.de>
Fedora 17 for ARM reads /proc/cpuinfo and fails if it doesn't contain
ARM related contents. This patch implements a quick hack to expose real
/proc/cpuinfo data taken from a real world machine.
The real fix would be to generate at least the flags automatically based
on the selected CPU. Please do not submit this patch upstream until this
has happened.
Signed-off-by: Alexander Graf <agraf@suse.de>
[AF: Rebased for v1.6 and v1.7]
Signed-off-by: Andreas Färber <afaerber@suse.de>
Running multi-threaded code can easily expose some of the fundamental
breakages in QEMU's design. It's just not a well supported scenario.
So if we pin the whole process to a single host CPU, we guarantee that
we will never have concurrent memory access actually happen. We can still
get scheduled away at any time, so it's no complete guarantee, but apparently
it reduces the odds well enough to get my test cases to pass.
This gets Java 1.7 working for me again on my test box.
Signed-off-by: Alexander Graf <agraf@suse.de>
The tcg code generator is not thread safe. Lock its generation between
different threads.
Signed-off-by: Alexander Graf <agraf@suse.de>
[AF: Rebased onto exec.c/translate-all.c split for 1.4]
[AF: Rebased for v2.1.0-rc0]
[AF: Rebased onto tcg_gen_code_common() drop for v2.5.0-rc0]
Signed-off-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Bruce Rogers <brogers@suse.com>
During invocations of losetup, we run into an ioctl that doesn't
exist. However, because of that we output an error, which then
screws up the kiwi logic around that call.
So let's silently ignore that bogus ioctl.
Signed-off-by: Alexander Graf <agraf@suse.de>
When we have a working host binary equivalent for the guest binary we're
trying to run, let's just use that instead as it will be a lot faster.
Signed-off-by: Alexander Graf <agraf@suse.de>
When entering the guest we take a lock to ensure that nobody else messes
with our TB chaining while we're doing it. If we get a segfault inside that
code, we manage to work on, but will not unlock the lock.
This patch forces unlocking of that lock in the segv handler. I'm not sure
this is the right approach though. Maybe we should rather make sure we don't
segfault in the code? I would greatly appreciate someone more intelligible
than me to look at this :).
Example code to trigger this is at: http://csgraf.de/tmp/conftest.c
Reported-by: Fabio Erculiani <lxnay@sabayon.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
[AF: Drop spinlock_safe_unlock() and switch to tb_lock_reset() (bonzini)]
Signed-off-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Bruce Rogers <brogers@suse.com>
When using hugetlbfs (which is required for HV mode KVM on 970), we
check for MMU notifiers that on 970 can not be implemented properly.
So disable the check for mmu notifiers on PowerPC guests, making
KVM guests work there, even if possibly racy in some odd circumstances.
When using qemu's linux-user binaries through binfmt, argv[0] gets lost
along the execution because qemu only gets passed in the full file name
to the executable while argv[0] can be something completely different.
This breaks in some subtile situations, such as the grep and make test
suites.
This patch adds a wrapper binary called qemu-$TARGET-binfmt that can be
used with binfmt's P flag which passes the full path _and_ argv[0] to
the binfmt handler.
The binary would be smart enough to be versatile and only exist in the
system once, creating the qemu binary path names from its own argv[0].
However, this seemed like it didn't fit the make system too well, so
we're currently creating a new binary for each target archictecture.
CC: Reinhard Max <max@suse.de>
Signed-off-by: Alexander Graf <agraf@suse.de>
[AF: Rebased onto new Makefile infrastructure, twice]
[AF: Updated for aarch64 for v2.0.0-rc1]
[AF: Rebased onto Makefile changes for v2.1.0-rc0]
Signed-off-by: Andreas Färber <afaerber@suse.de>
the direction given in the ioctl should be correct so we can assume the
communication is uni-directional. The alsa developers did not like this
concept though and declared ioctls IOC_R and IOC_W even though they were
IOC_RW.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Ulrich Hecht <uli@suse.de>
Hack to prevent ALSA from using mmap() interface to simplify emulation.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Ulrich Hecht <uli@suse.de>
Git-commit: 89fbea8737
References: bsc#1182137
Depending on the client activity, the server can be asked to open a huge
number of file descriptors and eventually hit RLIMIT_NOFILE. This is
currently mitigated using a reclaim logic : the server closes the file
descriptors of idle fids, based on the assumption that it will be able
to re-open them later. This assumption doesn't hold of course if the
client requests the file to be unlinked. In this case, we loop on the
entire fid list and mark all related fids as unreclaimable (the reclaim
logic will just ignore them) and, of course, we open or re-open their
file descriptors if needed since we're about to unlink the file.
This is the purpose of v9fs_mark_fids_unreclaim(). Since the actual
opening of a file can cause the coroutine to yield, another client
request could possibly add a new fid that we may want to mark as
non-reclaimable as well. The loop is thus restarted if the re-open
request was actually transmitted to the backend. This is achieved
by keeping a reference on the first fid (head) before traversing
the list.
This is wrong in several ways:
- a potential clunk request from the client could tear the first
fid down and cause the reference to be stale. This leads to a
use-after-free error that can be detected with ASAN, using a
custom 9p client
- fids are added at the head of the list : restarting from the
previous head will always miss fids added by a some other
potential request
All these problems could be avoided if fids were being added at the
end of the list. This can be achieved with a QSIMPLEQ, but this is
probably too much change for a bug fix. For now let's keep it
simple and just restart the loop from the current head.
Fixes: CVE-2021-20181
Buglink: https://bugs.launchpad.net/qemu/+bug/1911666
Reported-by: Zero Day Initiative <zdi-disclosures@trendmicro.com>
Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Message-Id: <161064025265.1838153.15185571283519390907.stgit@bahia.lan>
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Bruce Rogers <brogers@suse.com>
The cssid 255 is reserved but still valid from an architectural
point of view. However, feeding a bogus schid of 0xffffffff into
the virtio hypercall will lead to a crash:
Stack trace of thread 138363:
#0 0x00000000100d168c css_find_subch (qemu-system-s390x)
#1 0x00000000100d3290 virtio_ccw_hcall_notify
#2 0x00000000100cbf60 s390_virtio_hypercall
#3 0x000000001010ff7a handle_hypercall
#4 0x0000000010079ed4 kvm_cpu_exec (qemu-system-s390x)
#5 0x00000000100609b4 qemu_kvm_cpu_thread_fn
#6 0x000003ff8b887bb4 start_thread (libpthread.so.0)
#7 0x000003ff8b78df0a thread_start (libc.so.6)
This is because the css array was only allocated for 0..254
instead of 0..255.
Let's fix this by bumping MAX_CSSID to 255 and fencing off the
reserved cssid of 255 during css image allocation.
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
(cherry picked from commit 882b3b9769)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
In tight_encode_indexed_rect32, buf(or src)’s size is count. In for loop,
the logic is supposed to be that i is an index into src, i should be
incremented when incrementing src.
This is broken when src is incremented but i is not before while loop,
resulting in off-by-one bug in while loop.
Signed-off-by: He Rongguang <herongguang.he@huawei.com>
Message-id: 5784B8EB.7010008@huawei.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 3f7e51bca3)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
When VNC is started with '-vnc none' there will be no
listener socket present. When we try to populate the
VncServerInfo we'll crash accessing a NULL 'lsock'
field.
#0 qio_channel_socket_get_local_address (ioc=0x0, errp=errp@entry=0x7ffd5b8aa0f0) at io/channel-socket.c:33
#1 0x00007f4b9a297d6f in vnc_init_basic_info_from_server_addr (errp=0x7ffd5b8aa0f0, info=0x7f4b9d425460, ioc=<optimized out>) at ui/vnc.c:146
#2 vnc_server_info_get (vd=0x7f4b9e858000) at ui/vnc.c:223
#3 0x00007f4b9a29d318 in vnc_qmp_event (vs=0x7f4b9ef82000, vs=0x7f4b9ef82000, event=QAPI_EVENT_VNC_CONNECTED) at ui/vnc.c:279
#4 vnc_connect (vd=vd@entry=0x7f4b9e858000, sioc=sioc@entry=0x7f4b9e8b3a20, skipauth=skipauth@entry=true, websocket=websocket @entry=false) at ui/vnc.c:2994
#5 0x00007f4b9a29e8c8 in vnc_display_add_client (id=<optimized out>, csock=<optimized out>, skipauth=<optimized out>) at ui/v nc.c:3825
#6 0x00007f4b9a18d8a1 in qmp_marshal_add_client (args=<optimized out>, ret=<optimized out>, errp=0x7ffd5b8aa230) at qmp-marsh al.c:123
#7 0x00007f4b9a0b53f5 in handle_qmp_command (parser=<optimized out>, tokens=<optimized out>) at /usr/src/debug/qemu-2.6.0/mon itor.c:3922
#8 0x00007f4b9a348580 in json_message_process_token (lexer=0x7f4b9c78dfe8, input=0x7f4b9c7350e0, type=JSON_RCURLY, x=111, y=5 9) at qobject/json-streamer.c:94
#9 0x00007f4b9a35cfeb in json_lexer_feed_char (lexer=lexer@entry=0x7f4b9c78dfe8, ch=125 '}', flush=flush@entry=false) at qobj ect/json-lexer.c:310
#10 0x00007f4b9a35d0ae in json_lexer_feed (lexer=0x7f4b9c78dfe8, buffer=<optimized out>, size=<optimized out>) at qobject/json -lexer.c:360
#11 0x00007f4b9a348679 in json_message_parser_feed (parser=<optimized out>, buffer=<optimized out>, size=<optimized out>) at q object/json-streamer.c:114
#12 0x00007f4b9a0b3a1b in monitor_qmp_read (opaque=<optimized out>, buf=<optimized out>, size=<optimized out>) at /usr/src/deb ug/qemu-2.6.0/monitor.c:3938
#13 0x00007f4b9a186751 in tcp_chr_read (chan=<optimized out>, cond=<optimized out>, opaque=0x7f4b9c7add40) at qemu-char.c:2895
#14 0x00007f4b92b5c79a in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
#15 0x00007f4b9a2bb0c0 in glib_pollfds_poll () at main-loop.c:213
#16 os_host_main_loop_wait (timeout=<optimized out>) at main-loop.c:258
#17 main_loop_wait (nonblocking=<optimized out>) at main-loop.c:506
#18 0x00007f4b9a0835cf in main_loop () at vl.c:1934
#19 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4667
Do an upfront check for a NULL lsock and report an error to
the caller, which matches behaviour from before
commit 04d2529da2
Author: Daniel P. Berrange <berrange@redhat.com>
Date: Fri Feb 27 16:20:57 2015 +0000
ui: convert VNC server to use QIOChannelSocket
where getsockname() would be given a FD value -1 and thus report
an error to the caller.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1470134726-15697-2-git-send-email-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 624cdd46d7)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
The connection sharing / limits are only set in the
vnc_display_open() method and so missed when VNC is running
with '-vnc none'. This in turn prevents clients being added
to the VNC server with the QMP "add_client" command.
This was introduced in
commit e5f34cdd2d
Author: Gerd Hoffmann <kraxel@redhat.com>
Date: Thu Oct 2 12:09:34 2014 +0200
vnc: track & limit connections
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1470134726-15697-4-git-send-email-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 12e29b1682)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
The vnc_server_info_get will allocate the VncServerInfo
struct and then call vnc_init_basic_info_from_server_addr
to populate the basic fields. If this returns an error
though, the qapi_free_VncServerInfo call will then crash
because the VncServerInfo struct instance was not properly
NULL-initialized and thus contains random stack garbage.
#0 0x00007f1987c8e6f5 in raise () at /lib64/libc.so.6
#1 0x00007f1987c902fa in abort () at /lib64/libc.so.6
#2 0x00007f1987ccf600 in __libc_message () at /lib64/libc.so.6
#3 0x00007f1987cd7d4a in _int_free () at /lib64/libc.so.6
#4 0x00007f1987cdb2ac in free () at /lib64/libc.so.6
#5 0x00007f198b654f6e in g_free () at /lib64/libglib-2.0.so.0
#6 0x0000559193cdcf54 in visit_type_str (v=v@entry=
0x5591972f14b0, name=name@entry=0x559193de1e29 "host", obj=obj@entry=0x5591961dbfa0, errp=errp@entry=0x7fffd7899d80)
at qapi/qapi-visit-core.c:255
#7 0x0000559193cca8f3 in visit_type_VncBasicInfo_members (v=v@entry=
0x5591972f14b0, obj=obj@entry=0x5591961dbfa0, errp=errp@entry=0x7fffd7899dc0) at qapi-visit.c:12307
#8 0x0000559193ccb523 in visit_type_VncServerInfo_members (v=v@entry=
0x5591972f14b0, obj=0x5591961dbfa0, errp=errp@entry=0x7fffd7899e00) at qapi-visit.c:12632
#9 0x0000559193ccb60b in visit_type_VncServerInfo (v=v@entry=
0x5591972f14b0, name=name@entry=0x0, obj=obj@entry=0x7fffd7899e48, errp=errp@entry=0x0) at qapi-visit.c:12658
#10 0x0000559193cb53d8 in qapi_free_VncServerInfo (obj=<optimized out>) at qapi-types.c:3970
#11 0x0000559193c1e6ba in vnc_server_info_get (vd=0x7f1951498010) at ui/vnc.c:233
#12 0x0000559193c24275 in vnc_connect (vs=0x559197b2f200, vs=0x559197b2f200, event=QAPI_EVENT_VNC_CONNECTED) at ui/vnc.c:284
#13 0x0000559193c24275 in vnc_connect (vd=vd@entry=0x7f1951498010, sioc=sioc@entry=0x559196bf9c00, skipauth=skipauth@entry=tru e, websocket=websocket@entry=false) at ui/vnc.c:3039
#14 0x0000559193c25806 in vnc_display_add_client (id=<optimized out>, csock=<optimized out>, skipauth=<optimized out>)
at ui/vnc.c:3877
#15 0x0000559193a90c28 in qmp_marshal_add_client (args=<optimized out>, ret=<optimized out>, errp=0x7fffd7899f90)
at qmp-marshal.c:105
#16 0x000055919399c2b7 in handle_qmp_command (parser=<optimized out>, tokens=<optimized out>)
at /home/berrange/src/virt/qemu/monitor.c:3971
#17 0x0000559193ce3307 in json_message_process_token (lexer=0x559194ab0838, input=0x559194a6d940, type=JSON_RCURLY, x=111, y=1 2) at qobject/json-streamer.c:105
#18 0x0000559193cfa90d in json_lexer_feed_char (lexer=lexer@entry=0x559194ab0838, ch=125 '}', flush=flush@entry=false)
at qobject/json-lexer.c:319
#19 0x0000559193cfaa1e in json_lexer_feed (lexer=0x559194ab0838, buffer=<optimized out>, size=<optimized out>)
at qobject/json-lexer.c:369
#20 0x0000559193ce33c9 in json_message_parser_feed (parser=<optimized out>, buffer=<optimized out>, size=<optimized out>)
at qobject/json-streamer.c:124
#21 0x000055919399a85b in monitor_qmp_read (opaque=<optimized out>, buf=<optimized out>, size=<optimized out>)
at /home/berrange/src/virt/qemu/monitor.c:3987
#22 0x0000559193a87d00 in tcp_chr_read (chan=<optimized out>, cond=<optimized out>, opaque=0x559194a7d900)
at qemu-char.c:2895
#23 0x00007f198b64f703 in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
#24 0x0000559193c484b3 in main_loop_wait () at main-loop.c:213
#25 0x0000559193c484b3 in main_loop_wait (timeout=<optimized out>) at main-loop.c:258
#26 0x0000559193c484b3 in main_loop_wait (nonblocking=<optimized out>) at main-loop.c:506
#27 0x0000559193964c55 in main () at vl.c:1908
#28 0x0000559193964c55 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4603
This was introduced in
commit 98481bfcd6
Author: Eric Blake <eblake@redhat.com>
Date: Mon Oct 26 16:34:45 2015 -0600
vnc: Hoist allocation of VncBasicInfo to callers
which added error reporting for vnc_init_basic_info_from_server_addr
but didn't change the g_malloc calls to g_malloc0.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1470134726-15697-3-git-send-email-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 3e7f136d8b)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
The vnc_client_read() function is called from the vnc_client_io()
event handler callback when there is incoming data to process.
If it detects that the client has disconnected, then it will
trigger cleanup and free'ing of the VncState client struct at
a safe time.
Unfortunately, the vnc_client_io() event handler will also call
vnc_client_write() to handle any outgoing data writes. So if
vnc_client_io() was invoked with both G_IO_IN and G_IO_OUT
events set, and the client disconnects, we may try to write to
a client which has just been freed.
https://bugs.launchpad.net/qemu/+bug/1594861
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1467042529-3372-1-git-send-email-berrange@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit ea69744988)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Even if tray is not open, it can be empty (blk_is_inserted() == false).
Handle both cases correctly by replacing the s->tray_open checks with
blk_is_available(), which is an AND of the two.
Also simplify successive checks of them into blk_is_available(), in a
couple cases.
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <1473848224-24809-2-git-send-email-famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit cd723b8560)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Right after main_loop ends, we release various things but keep iothread
alive. The latter is not prepared to the sudden change of resources.
Specifically, after bdrv_close_all(), virtio-scsi dataplane get a
surprise at the empty BlockBackend:
(gdb) bt
at /usr/src/debug/qemu-2.6.0/hw/scsi/virtio-scsi.c:543
at /usr/src/debug/qemu-2.6.0/hw/scsi/virtio-scsi.c:577
It is because the d->conf.blk->root is set to NULL, then
blk_get_aio_context() returns qemu_aio_context, whereas s->ctx is still
pointing to the iothread:
hw/scsi/virtio-scsi.c:543:
if (s->dataplane_started) {
assert(blk_get_aio_context(d->conf.blk) == s->ctx);
}
To fix this, let's stop iothreads before doing bdrv_close_all().
Cc: qemu-stable@nongnu.org
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1473326931-9699-1-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit dce8921b2b)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
The XTS cipher mode needs to be used with a cipher which has
a block size of 16 bytes. If a mis-matching block size is used,
the code will either corrupt memory beyond the IV array, or
not fully encrypt/decrypt the IV.
This fixes a memory corruption crash when attempting to use
cast5-128 with xts, since the former has an 8 byte block size.
A test case is added to ensure the cipher creation fails with
such an invalid combination.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
(cherry picked from commit a5d2f44d0d)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Openstack Cinder assigns volume a 36 characters uuid as serial.
QEMU will shrinks the uuid to 20 characters, which does not match
the original uuid.
Note that there is no limit to the length of the serial number in
the SCSI spec. 20 was copy-pasted from virtio-blk which in turn was
copy-pasted from ATA; 36 is even more arbitrary. However, bumping it
up too much might cause issues (e.g. 252 seems to make sense because
then the maximum amount of returned data is 256; but who knows there's
no off-by-one somewhere for such a nicely rounded number).
Signed-off-by: Rony Weng <ronyweng@synology.com>
Message-Id: <1472457138-23386-1-git-send-email-ronyweng@synology.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 48b6206305)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
If the call to fid_to_qid() returns an error, we will call v9fs_path_free()
on uninitialized paths.
It is a regression introduced by the following commit:
56f101ecce 9pfs: handle walk of ".." in the root directory
Let's fix this by initializing dpath and path before calling fid_to_qid().
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
[groug: updated the changelog to indicate this is regression and to provide
the offending commit SHA1]
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 13fd08e631)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
The one pending element is being freed but not discarded on device
reset, which causes svq->inuse to creep up, eventually hitting the
"Virtqueue size exceeded" error.
Properly discarding the element on device reset makes sure that its
buffers are unmapped and the inuse counter stays balanced.
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Roman Kagan <rkagan@virtuozzo.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 104e70cae7)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
vq->inuse must be zeroed upon device reset like most other virtqueue
fields.
In theory, virtio_reset() just needs assert(vq->inuse == 0) since
devices must clean up in-flight requests during reset (requests cannot
not be leaked!).
In practice, it is difficult to achieve vq->inuse == 0 across reset
because balloon, blk, 9p, etc implement various different strategies for
cleaning up requests. Most devices call g_free(elem) directly without
telling virtio.c that the VirtQueueElement is cleaned up. Therefore
vq->inuse is not decremented during reset.
This patch zeroes vq->inuse and trusts that devices are not leaking
VirtQueueElements across reset.
I will send a follow-up series that refactors request life-cycle across
all devices and converts vq->inuse = 0 into assert(vq->inuse == 0) but
this more invasive approach is not appropriate for stable trees.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Cc: qemu-stable <qemu-stable@nongnu.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Ladi Prosek <lprosek@redhat.com>
(cherry picked from commit 4b7f91ed02)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
The 9P spec at http://man.cat-v.org/plan_9/5/intro says:
All directories must support walks to the directory .. (dot-dot) meaning
parent directory, although by convention directories contain no explicit
entry for .. or . (dot). The parent of the root directory of a server's
tree is itself.
This means that a client cannot walk further than the root directory
exported by the server. In other words, if the client wants to walk
"/.." or "/foo/../..", the server should answer like the request was
to walk "/".
This patch just does that:
- we cache the QID of the root directory at attach time
- during the walk we compare the QID of each path component with the root
QID to detect if we're in a "/.." situation
- if so, we skip the current component and go to the next one
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit 56f101ecce)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
According to the 9P spec http://man.cat-v.org/plan_9/5/open about the
create request:
The names . and .. are special; it is illegal to create files with these
names.
This patch causes the create and lcreate requests to fail with EINVAL if
the file name is either "." or "..".
Even if it isn't explicitly written in the spec, this patch extends the
checking to all requests that may cause a directory entry to be created:
- mknod
- rename
- renameat
- mkdir
- link
- symlink
The unlinkat request also gets patched for consistency (even if
rmdir("foo/..") is expected to fail according to POSIX.1-2001).
The various error values come from the linux manual pages.
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit 805b5d98c6)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Empty path components don't make sense for most commands and may cause
undefined behavior, depending on the backend.
Also, the walk request described in the 9P spec [1] clearly shows that
the client is supposed to send individual path components: the official
linux client never sends portions of path containing the / character for
example.
Moreover, the 9P spec [2] also states that a system can decide to restrict
the set of supported characters used in path components, with an explicit
mention "to remove slashes from name components".
This patch introduces a new name_is_illegal() helper that checks the
names sent by the client are not empty and don't contain unwanted chars.
Since 9pfs is only supported on linux hosts, only the / character is
checked at the moment. When support for other hosts (AKA. win32) is added,
other chars may need to be blacklisted as well.
If a client sends an illegal path component, the request will fail and
ENOENT is returned to the client.
[1] http://man.cat-v.org/plan_9/5/walk
[2] http://man.cat-v.org/plan_9/5/intro
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit fff39a7ad0)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
When vmxnet transport abstraction layer initialises pkt,
the maximum fragmentation count is not checked. This could lead
to an integer overflow causing a NULL pointer dereference.
Replace g_malloc() with g_new() to catch the multiplication
overflow.
Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Acked-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Vmxnet3 device emulator when parsing packet headers does not check
for IP header length. It could lead to a OOB access when reading
further packet data. Add check to avoid it.
Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Reviewed-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
The vq->inuse field is not migrated. Many devices don't hold
VirtQueueElements across migration so it doesn't matter that vq->inuse
starts at 0 on the destination QEMU.
At least virtio-serial, virtio-blk, and virtio-balloon migrate while
holding VirtQueueElements. For these devices we need to recalculate
vq->inuse upon load so the value is correct.
Cc: qemu-stable@nongnu.org
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit bccdef6b1a)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
In previous commit
commit c7628bff41
Author: Gerd Hoffmann <kraxel@redhat.com>
Date: Fri Oct 30 12:10:09 2015 +0100
vnc: only alloc server surface with clients connected
the VNC server was changed so that the 'vd->server' pixman
image was only allocated when a client is connected.
Since then if a client disconnects and then reconnects to
the VNC server all they will see is a black screen until
they do something that triggers a refresh. On a graphical
desktop this is not often noticed since there's many things
going on which cause a refresh. On a plain text console it
is really obvious since nothing refreshes frequently.
The problem is that the VNC server didn't update the guest
dirty bitmap, so still believes its server image is in sync
with the guest contents.
To fix this we must explicitly mark the entire guest desktop
as dirty after re-creating the server surface. Move this
logic into vnc_update_server_surface() so it is guaranteed
to be call in all code paths that re-create the surface
instead of only in vnc_dpy_switch()
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Peter Lieven <pl@kamp.de>
Tested-by: Peter Lieven <pl@kamp.de>
Message-id: 1471365032-18096-1-git-send-email-berrange@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit b69a553b4a)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Network transport abstraction layer supports packet fragmentation.
While fragmenting a packet, it checks for more fragments from
packet length and current fragment length. It is susceptible
to an infinite loop, if the current fragment length is zero.
Add check to avoid it.
Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Reviewed-by: Dmitry Fleytman <dmitry@daynix.com>
CC: qemu-stable@nongnu.org
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit ead315e43e)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
(commit 80dcfb8532)
Upon migration, the code use a timer based on vm_clock for 1ns
in the future from post_load to do the event send in case host_connected
differs between migration source and target.
However, it's not guaranteed that the apic is ready to inject irqs into
the guest, and the irq line remained high, resulting in any future interrupts
going unnoticed by the guest as well.
That's because 1) the migration coroutine is not blocked when it get EAGAIN
while reading QEMUFile. 2) The vm_clock is enabled default currently, it doesn't
rely on the calling of vm_start(), that means vm_clock timers can run before
VCPUs are running.
So, let's set the vm_clock disabled default, keep the initial intention of
design for vm_clock timers.
Meanwhile, change the test-aio usecase, using QEMU_CLOCK_REALTIME instead of
QEMU_CLOCK_VIRTUAL as the block code does.
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
CC: qemu-stable@nongnu.org
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Message-Id: <1470728955-90600-1-git-send-email-arei.gonglei@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 3fdd0ee393)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Commit 5a11d0f7 mistakenly converted a log message into an error
condition when no pin interrupt is found for the pci device being
passed through. Revert that part of the commit.
Signed-off-by: Bruce Rogers <brogers@suse.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
(cherry picked from commit 0968c91ce0)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
If we don't provide the page size in target-ppc:cpu_get_dump_info(),
the default one (TARGET_PAGE_SIZE, 4KB) is used to create
the compressed dump. It works fine with Macintosh, but not with
pseries as the kernel default page size is 64KB.
Without this patch, if we generate a compressed dump in the QEMU monitor:
(qemu) dump-guest-memory -z qemu.dump
This dump cannot be read by crash:
# crash vmlinux qemu.dump
...
WARNING: cannot translate vmemmap kernel virtual addresses:
commands requiring page structure contents will fail
...
Page_size is used to determine the dumpfile's block size. The
block size needs to be at least the page size, but a multiple of page
size works fine too. For PPC64, linux supports either 4KB or 64KB software
page size. So we define the page_size to 64KB.
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 760d88d1d0)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
The 53C9X Fast SCSI Controller(FSC) comes with internal 16-byte
FIFO buffers. One is used to handle commands and other is for
information transfer. Three control variables 'ti_rptr',
'ti_wptr' and 'ti_size' are used to control r/w access to the
information transfer buffer ti_buf[TI_BUFSZ=16]. In that,
'ti_rptr' is used as read index, where read occurs.
'ti_wptr' is a write index, where write would occur.
'ti_size' indicates total bytes to be read from the buffer.
While reading/writing to this buffer, index could exceed its
size. Add check to avoid OOB r/w access.
Reported-by: Huawei PSIRT <psirt@huawei.com>
Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <1465230883-22303-1-git-send-email-ppandit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit ff589551c8)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
While reading information via 'megasas_ctrl_get_info' routine,
a local bios version buffer isn't null terminated. Add the
terminating null byte to avoid any OOB access.
Reported-by: Li Qiang <liqiang6-s@360.cn>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 844864fbae)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
While doing DMA read into ESP command buffer 's->cmdbuf', it could
write past the 's->cmdbuf' area, if it was transferring more than 16
bytes. Increase the command buffer size to 32, which is maximum when
's->do_cmd' is set, and add a check on 'len' to avoid OOB access.
Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 926cde5f3e)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Avoid duplicated code between esp_do_dma and handle_ti. esp_do_dma
has the same code that handle_ti contains after the call to esp_do_dma;
but the code in handle_ti is never reached because it is in an "else if".
Remove the else and also the pointless return.
esp_do_dma also has a partially dead assignment of the to_device
variable. Sink it to the point where it's actually used.
Finally, assert that the other caller of esp_do_dma (esp_transfer_data)
only transfers data and not a command. This is true because get_cmd
cancels the old request synchronously before its caller handle_satn_stop
sets do_cmd to 1.
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 7f0b6e114a)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
The FIFO contains two bytes; hence the write ptr should be two bytes ahead
of the read pointer.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit d020aa504c)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
The 53C9X Fast SCSI Controller(FSC) comes with an internal 16-byte
FIFO buffer. It is used to handle command and data transfer.
Routine get_cmd() in non-DMA mode, uses 'ti_size' to read scsi
command into a buffer. Add check to validate command length against
buffer size to avoid any overrun.
Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <1464717207-7549-1-git-send-email-ppandit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit d3cdc49138)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
When receiving packets over MIPSnet network device, it uses
receive buffer of size 1514 bytes. In case the controller
accepts large(MTU) packets, it could lead to memory corruption.
Add check to avoid it.
Reported by: Oleksandr Bazhaniuk <oleksandr.bazhaniuk@intel.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit 3af9187fc6)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
sdl 2.0.4 currently has a bug which causes our UI shortcuts to fire
rapidly in succession:
https://bugzilla.libsdl.org/show_bug.cgi?id=3287
It's a toss up whether ctrl+alt+f or ctrl+alt+2 will fire an
odd or even number of times, thus determining whether the action
succeeds or fails.
Opening monitor/serial windows is doubly broken, since it will often
lock the UI trying to grab the pointer:
0x00007fffef3720a5 in SDL_Delay_REAL () at /lib64/libSDL2-2.0.so.0
0x00007fffef3688ba in X11_SetWindowGrab () at /lib64/libSDL2-2.0.so.0
0x00007fffef2f2da7 in SDL_SendWindowEvent () at /lib64/libSDL2-2.0.so.0
0x00007fffef2f080b in SDL_SetKeyboardFocus () at /lib64/libSDL2-2.0.so.0
0x00007fffef35d784 in X11_DispatchFocusIn.isra.8 () at /lib64/libSDL2-2.0.so.0
0x00007fffef35dbce in X11_DispatchEvent () at /lib64/libSDL2-2.0.so.0
0x00007fffef35ee4a in X11_PumpEvents () at /lib64/libSDL2-2.0.so.0
0x00007fffef2eea6a in SDL_PumpEvents_REAL () at /lib64/libSDL2-2.0.so.0
0x00007fffef2eeab5 in SDL_WaitEventTimeout_REAL () at /lib64/libSDL2-2.0.so.0
0x000055555597eed0 in sdl2_poll_events (scon=0x55555876f928) at ui/sdl2.c:593
We can work around that hang by ungrabbing the pointer before launching
a new window. This roughly matches what our sdl1 code does
Signed-off-by: Cole Robinson <crobinso@redhat.com>
Message-id: 31c9ab6540b031f7a614c59edcecea9877685612.1462557436.git.crobinso@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 56f289f383)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
ahci-test /x86_64/ahci/io/dma/lba28/retry triggers the following leak:
Direct leak of 16 byte(s) in 1 object(s) allocated from:
#0 0x7fc4b2a25e20 in malloc (/lib64/libasan.so.3+0xc6e20)
#1 0x7fc4993bce58 in g_malloc (/lib64/libglib-2.0.so.0+0x4ee58)
#2 0x556a187d4b34 in ahci_populate_sglist hw/ide/ahci.c:896
#3 0x556a187d8237 in ahci_dma_prepare_buf hw/ide/ahci.c:1367
#4 0x556a187b5a1a in ide_dma_cb hw/ide/core.c:844
#5 0x556a187d7eec in ahci_start_dma hw/ide/ahci.c:1333
#6 0x556a187b650b in ide_start_dma hw/ide/core.c:921
#7 0x556a187b61e6 in ide_sector_start_dma hw/ide/core.c:911
#8 0x556a187b9e26 in cmd_write_dma hw/ide/core.c:1486
#9 0x556a187bd519 in ide_exec_cmd hw/ide/core.c:2027
#10 0x556a187d71c5 in handle_reg_h2d_fis hw/ide/ahci.c:1204
#11 0x556a187d7681 in handle_cmd hw/ide/ahci.c:1254
#12 0x556a187d168a in check_cmd hw/ide/ahci.c:510
#13 0x556a187d0afc in ahci_port_write hw/ide/ahci.c:314
#14 0x556a187d105d in ahci_mem_write hw/ide/ahci.c:435
#15 0x556a1831d959 in memory_region_write_accessor /home/elmarco/src/qemu/memory.c:525
#16 0x556a1831dc35 in access_with_adjusted_size /home/elmarco/src/qemu/memory.c:591
#17 0x556a18323ce3 in memory_region_dispatch_write /home/elmarco/src/qemu/memory.c:1262
#18 0x556a1828cf67 in address_space_write_continue /home/elmarco/src/qemu/exec.c:2578
#19 0x556a1828d20b in address_space_write /home/elmarco/src/qemu/exec.c:2635
#20 0x556a1828d92b in address_space_rw /home/elmarco/src/qemu/exec.c:2737
#21 0x556a1828daf7 in cpu_physical_memory_rw /home/elmarco/src/qemu/exec.c:2746
#22 0x556a183068d3 in cpu_physical_memory_write /home/elmarco/src/qemu/include/exec/cpu-common.h:72
#23 0x556a18308194 in qtest_process_command /home/elmarco/src/qemu/qtest.c:382
#24 0x556a18309999 in qtest_process_inbuf /home/elmarco/src/qemu/qtest.c:573
#25 0x556a18309a4a in qtest_read /home/elmarco/src/qemu/qtest.c:585
#26 0x556a18598b85 in qemu_chr_be_write_impl /home/elmarco/src/qemu/qemu-char.c:387
#27 0x556a18598c52 in qemu_chr_be_write /home/elmarco/src/qemu/qemu-char.c:399
#28 0x556a185a2afa in tcp_chr_read /home/elmarco/src/qemu/qemu-char.c:2902
#29 0x556a18cbaf52 in qio_channel_fd_source_dispatch io/channel-watch.c:84
Follow John Snow recommendation:
Everywhere else ncq_err is used, it is accompanied by a list cleanup
except for ncq_cb, which is the case you are fixing here.
Move the sglist destruction inside of ncq_err and then delete it from
the other two locations to keep it tidy.
Call dma_buf_commit in ide_dma_cb after the early return. Though, this
is also a little wonky because this routine does more than clear the
list, but it is at the moment the centralized "we're done with the
sglist" function and none of the other side effects that occur in
dma_buf_commit will interfere with the reset that occurs from
ide_restart_bh, I think
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
(cherry picked from commit 5839df7b71)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
res_count should be set to the number of outstanding bytes after a DBDMA
request. Unfortunately this wasn't being set to zero by the non-block
transfer codepath meaning drivers that checked the descriptor result for
such requests (e.g reading the CDROM TOC) would assume from a non-zero result
that the transfer had failed.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 16275edb34)
Conflicts:
hw/ide/macio.c
* removed context dependancy on ddd495e5
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
If one attempts to perform a system_reset after a failed IO request
that causes the VM to enter a paused state, QEMU will segfault trying
to free up the pending IO requests.
These requests have already been completed and freed, though, so all
we need to do is NULL them before we enter the paused state.
Existing AHCI tests verify that halted requests are still resumed
successfully after a STOP event.
Analyzed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1469635201-11918-2-git-send-email-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
(cherry picked from commit 87ac25fd1f)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
A broken or malicious guest can submit more requests than the virtqueue
size permits, causing unbounded memory allocation in QEMU.
The guest can submit requests without bothering to wait for completion
and is therefore not bound by virtqueue size. This requires reusing
vring descriptors in more than one request, which is not allowed by the
VIRTIO 1.0 specification.
In "3.2.1 Supplying Buffers to The Device", the VIRTIO 1.0 specification
says:
1. The driver places the buffer into free descriptor(s) in the
descriptor table, chaining as necessary
and
Note that the above code does not take precautions against the
available ring buffer wrapping around: this is not possible since the
ring buffer is the same size as the descriptor table, so step (1) will
prevent such a condition.
This implies that placing more buffers into the virtqueue than the
descriptor table size is not allowed.
QEMU is missing the check to prevent this case. Processing a request
allocates a VirtQueueElement leading to unbounded memory allocation
controlled by the guest.
Exit with an error if the guest provides more requests than the
virtqueue size permits. This bounds memory allocation and makes the
buggy guest visible to the user.
This patch fixes CVE-2016-5403 and was reported by Zhenhao Hong from 360
Marvel Team, China.
Reported-by: Zhenhao Hong <hongzhenhao@360.cn>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit afd9096eb1)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
QEMU 2.6 added support for the XSAVE family of instructions, which
includes the XSETBV instruction which allows setting the XCR0
register.
But, when booting Linux kernels with XSAVE support enabled, I was
getting very early crashes where the instruction pointer was set
to 0x3. I tracked it down to a jump instruction generated by this:
gen_jmp_im(s->pc - pc_start);
where s->pc is pointing to the instruction after XSETBV and pc_start
is pointing _at_ XSETBV. Subtract the two and you get 0x3. Whoops.
The fix is to replace this typo with the pattern found everywhere
else in the file when folks want to end the translation buffer.
Richard Henderson confirmed that this is a bug and that this is the
correct fix.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: qemu-stable@nongnu.org
Cc: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit ba03584f4f)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
We changed link status register in pci express endpoint capability
over time. Specifically,
commit b2101eae63 ("pcie: Set the "link
active" in the link status register") set data link layer link active
bit in this register without adding compatibility to old machine types.
When migrating from qemu 2.3 and older this affects xhci devices which
under machine type 2.0 and older have a pci express endpoint capability
even if they are on a pci bus.
Add compatibility flags to make this bit value match what it was under
2.3.
Additionally, to avoid breaking migration from qemu 2.3 and up,
suppress checking link status during migration: this seems sane
since hardware can change link status at any time.
https://bugzilla.redhat.com/show_bug.cgi?id=1352860
Reported-by: Gerd Hoffmann <kraxel@redhat.com>
Fixes: b2101eae63
("pcie: Set the "link active" in the link status register")
Cc: qemu-stable@nongnu.org
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 6b4495401b)
Conflicts:
hw/pci/pcie.c
* removed functional dependency on 6383292
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Rather than asserting that nbdflags is within range, just give
it the correct type to begin with :) nbdflags corresponds to
the per-export portion of NBD Protocol "transmission flags", which
is 16 bits in response to NBD_OPT_EXPORT_NAME and NBD_OPT_GO.
Furthermore, upstream NBD has never passed the global flags to
the kernel via ioctl(NBD_SET_FLAGS) (the ioctl was first
introduced in NBD 2.9.22; then a latent bug in NBD 3.1 actually
tried to OR the global flags with the transmission flags, with
the disaster that the addition of NBD_FLAG_NO_ZEROES in 3.9
caused all earlier NBD 3.x clients to treat every export as
read-only; NBD 3.10 and later intentionally clip things to 16
bits to pass only transmission flags). Qemu should follow suit,
since the current two global flags (NBD_FLAG_FIXED_NEWSTYLE
and NBD_FLAG_NO_ZEROES) have no impact on the kernel's behavior
during transmission.
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1469129688-22848-3-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 7423f41782)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
The *_to_cpup() functions are not very useful, as they simply do
a pointer dereference and then a *_to_cpu(). Instead use either:
* ld*_*_p(), if the data is at an address that might not be
correctly aligned for the load
* a local dereference and *_to_cpu(), if the pointer is
the correct type and known to be correctly aligned
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <1465570836-22211-1-git-send-email-peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 773dce3c72)
* context prereq for 7423f417
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Clean up some debug message oddities missed earlier; this includes
some typos, and recognizing that %d is not necessarily compatible
with uint32_t. Also add a couple messages that I found useful
while debugging things.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1463006384-7734-3-git-send-email-eblake@redhat.com>
[Do not use PRIx16, clang complains. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 2cb347493c)
* context prereq for 7423f41
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
when setting clusters as alloacted the boundaries have
to be expanded. As Paolo pointed out the calculation of
the number of clusters is wrong:
Suppose cluster_sectors is 2, sector_num = 1, nb_sectors = 6:
In the "mark allocated" case, you want to set 0..8, i.e.
cluster_num=0, nb_clusters=4.
0--.--2--.--4--.--6--.--8
<--|_________________|--> (<--> = expanded)
Instead you are setting nb_clusters=3, so that 6..8 is not marked.
0--.--2--.--4--.--6--.--8
<--|______________|!!! (! = wrong)
Cc: qemu-stable@nongnu.org
Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <1468831940-15556-2-git-send-email-pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit eb36b953e0)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Throttling groups are named using the 'group' parameter of the
block_set_io_throttle command and the throttling.group command-line
option. If that parameter is unspecified the groups get the name of
the block device.
This patch adds a new test to check the naming of throttling groups.
Signed-off-by: Alberto Garcia <berto@igalia.com>
* backport of 435d5ee
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
When I/O limits are set for a block device, the name of the throttling
group is taken from the BlockBackend if the user doesn't specify one.
Commit efaa7c4eeb moved the naming of the BlockBackend in
blockdev_init() to the end of the function, after I/O limits are set.
The consequence is that the throttling group gets an empty name.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reported-by: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
Cc: qemu-stable@nongnu.org
* backport of ff356ee
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
When migrating from a different QEMU version, the start_address and
bios_start_address may differ. During migration these values are migrated
and overwrite the values that were detected by QEMU itself.
On a reboot, QEMU will reload its own BIOS, but use the migrated start
addresses, which does not work if the values differ.
Fix this by not relying on the migrated values anymore, but still
provide them during migration, so existing QEMUs continue to work.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
(cherry picked from commit bb0995468a)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
virtio migrates the low 32 feature bits twice, the first copy is there
for compatibility but ever since
019a3edbb2: ("virtio: make features 64bit
wide") it's ignored on load. This is wrong since virtio_net_load tests
self announcement and guest offloads before the second copy including
high feature bits is loaded. This means that self announcement, control
vq and guest offloads are all broken after migration.
Fix it up by loading low feature bits: somewhat ugly since high and low
bits become out of sync temporarily, but seems unavoidable for
compatibility. The right thing to do for new features is probably to
test the host features, anyway.
Fixes: 019a3edbb2
("virtio: make features 64bit wide")
Cc: qemu-stable@nongnu.org
Reported-by: Robin Geuze <robing@transip.nl>
Tested-by: Robin Geuze <robing@transip.nl>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 62cee1a28a)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
The rationale is similar to the above mode sense response interception:
this is practically the only channel to communicate restraints from
elsewhere such as host and block driver.
The scsi bus we attach onto can have a larger max xfer len than what is
accepted by the host file system (guarding between the host scsi LUN and
QEMU), in which case the SG_IO we generate would get -EINVAL.
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <1464243305-10661-3-git-send-email-famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 063143d5b1)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
The NBD layer was breaking up request at a limit of 2040 sectors
(just under 1M) to cater to old qemu-nbd. But the server limit
was raised to 32M in commit 2d8214885 to match the kernel, more
than three years ago; and the upstream NBD Protocol is proposing
documentation that without any explicit communication to state
otherwise, a client should be able to safely assume that a 32M
transaction will work. It is time to rely on the larger sizing,
and any downstream distro that cares about maximum
interoperability to older qemu-nbd servers can just tweak the
value of #define NBD_MAX_SECTORS.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-stable@nongnu.org
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 476b923c32)
Conflicts:
include/block/nbd.h
* removed context dependency on 943cec86
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
In function pci_assign_dev_load_option_rom, For those pci devices don't
have 'rom' file under sysfs or if loading ROM from external file, The
function returns NULL, and won't set the passed 'size' variable.
In these 2 cases, qemu still reports "Invalid ROM" error message, Users
may be confused by it.
Signed-off-by: Lin Ma <lma@suse.com>
Message-Id: <1466010327-22368-1-git-send-email-lma@suse.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit be968c721e)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
If a QAPI struct has a mandatory alternate member which is not
present on input, the input visitor reports an error for the
missing alternate without setting the discriminator, but the
cleanup code for the struct still tries to use the dealloc
visitor to clean up the alternate.
Commit dbf11922 changed visit_start_alternate to set *obj to NULL
when an error occurs, where it was previously left untouched.
Thus, before the patch, the dealloc visitor is blindly trying to
cleanup whatever branch corresponds to (*obj)->type == 0 (that is,
QTYPE_NONE, because *obj still pointed to zeroed memory), which
selects the default branch of the switch and sets an error, but
this second error is ignored by the way the dealloc visitor is
used; but after the patch, the attempt to switch dereferences NULL.
When cleaning up after a partial object parse, we specifically
check for !*obj after visit_start_struct() (see gen_visit_object());
doing the same for alternates fixes the crash. Enhance the testsuite
to give coverage for both missing struct and missing alternate
members.
Also add an abort - we expect visit_start_alternate() to either set an
error or to set (*obj)->type to a valid QType that corresponds to
actual user input, and QTYPE_NONE should never be reachable from valid
input. Had the abort() been in place earlier, we might have noticed
the dealloc visitor dereferencing bogus zeroed memory prior to when
commit dbf11922 forced our hand by setting *obj to NULL and causing a
fault.
Test case:
{'execute':'blockdev-add', 'arguments':{'options':{'driver':'raw'}}}
The choice of 'driver':'raw' selects a BlockdevOptionsGenericFormat
struct, which has a mandatory 'file':'BlockdevRef' in QAPI. Since
'file' is missing as a sibling of 'driver', this should report a
graceful error rather than fault. After this patch, we are back to:
{"error": {"class": "GenericError", "desc": "Parameter 'file' is missing"}}
Generated code in qapi-visit.c changes as:
|@@ -2444,6 +2444,9 @@ void visit_type_BlockdevRef(Visitor *v,
| if (err) {
| goto out;
| }
|+ if (!*obj) {
|+ goto out_obj;
|+ }
| switch ((*obj)->type) {
| case QTYPE_QDICT:
| visit_start_struct(v, name, NULL, 0, &err);
|@@ -2459,10 +2462,13 @@ void visit_type_BlockdevRef(Visitor *v,
| case QTYPE_QSTRING:
| visit_type_str(v, name, &(*obj)->u.reference, &err);
| break;
|+ case QTYPE_NONE:
|+ abort();
| default:
| error_setg(&err, QERR_INVALID_PARAMETER_TYPE, name ? name : "null",
| "BlockdevRef");
| }
|+out_obj:
| visit_end_alternate(v);
Reported by Kashyap Chamarthy <kchamart@redhat.com>
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1466012271-5204-1-git-send-email-eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Tested-by: Kashyap Chamarthy <kchamart@redhat.com>
[Commit message tweaked]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
(cherry picked from commit 9b4e38fe6a)
Conflicts:
tests/test-qmp-input-visitor.c
* removed contexual/functional dependencies on 68ab47e
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
commit fefe2a78 accidently dropped the code path for injecting
raw packets. This feature is needed for sending gratuitous ARPs
after an incoming migration has completed. The result is increased
network downtime for vservers where the network card is not virtio-net
with the VIRTIO_NET_F_GUEST_ANNOUNCE feature.
Fixes: fefe2a78ab
Cc: qemu-stable@nongnu.org
Cc: hongyang.yang@easystack.cn
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit ca1ee3d6b5)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
The QTask struct is just a standalone struct, not a QOM Object,
so calling object_ref() on it is not appropriate. This results
in mangling the 'destroy' field in the QTask struct, causing
the later call to qtask_free() to try to call the function
at address 0x1, with predictably segfault happy results.
There is in fact no need for ref counting with QTask, as the
call to qtask_abort() or qtask_complete() will automatically
free associated memory.
This fixes the crash shown in
https://bugs.launchpad.net/qemu/+bug/1589923
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
(cherry picked from commit bc35d51077)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
The fifo is normal ram. So kvm vcpu threads and qemu iothread can
access the fifo in parallel without syncronization. Which in turn
implies we can't use the fifo pointers in-place because the guest
can try changing them underneath us. So add shadows for them, to
make sure the guest can't modify them after we've applied sanity
checks.
Fixes: CVE-2016-4454
Cc: qemu-stable@nongnu.org
Cc: P J P <ppandit@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 1464592161-18348-4-git-send-email-kraxel@redhat.com
(cherry picked from commit 7e486f7577)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Similar to the "!drv || !drv->bdrv_aio_ioctl" case above, here it is
okay to set co.ret and return. As pointed out by Paolo, a BH will be
created as necessary by the caller (bdrv_co_maybe_schedule_bh).
Besides, as pointed out by Kevin, "data" was leaked before.
Reported-by: Kevin Wolf <kwolf@redhat.com>
Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20160601015223.19277-1-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit c8a9fd8071)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
QEMU has currently two ways to prevent migration to occur:
- migration blocker when it depends on runtime state
- VMStateDescription.unmigratable when migration is not supported at all
This patch gathers all the logic into a single function to be called from
both the savevm and the migrate paths.
This fixes a bug with 9p, at least, where savevm would succeed and the
following would happen in the guest after loadvm:
$ ls /host
ls: cannot access /host: Protocol error
With this patch:
(qemu) savevm foo
Migration is disabled when VirtFS export path '/' is mounted in the guest
using mount_tag 'host'
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <146239057139.11271.9011797645454781543.stgit@bahia.huguette.org>
[Update subject according to Paolo's suggestion - Amit]
Signed-off-by: Amit Shah <amit.shah@redhat.com>
(cherry picked from commit 24f3902b08)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Similar to commit df7b97ff, we are mishandling clients that
give an unaligned NBD_CMD_TRIM request, and potentially
trimming bytes that occur before their request; which in turn
can cause potential unintended data loss (unlikely in
practice, since most clients are sane and issue aligned trim
requests). However, while we fixed read and write by switching
to the byte interfaces of blk_, we don't yet have a byte
interface for discard. On the other hand, trim is advisory, so
rounding the user's request to simply ignore the first and last
unaligned sectors (or the entire request, if it is sub-sector
in length) is just fine.
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1464173965-9694-1-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 353ab96973)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
vfio_eeh_container_op() is the backend that communicates with
host kernel to support EEH functionality in QEMU. However, the
functon should return the value from host kernel instead of 0
unconditionally.
dwg: Specifically the problem occurs for the handful of EEH
sub-operations which can return a non-zero, non-error result.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
[dwg: clarification to commit message]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit d917e88d85)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Commit "fd3c136 vga: make sure vga register setup for vbe stays intact
(CVE-2016-3712)." causes a regression. The win7 installer is unhappy
because it can't freely modify vga registers any more while in vbe mode.
This patch introduces a new sr_vbe register set. The vbe_update_vgaregs
will fill sr_vbe[] instead of sr[]. Normal vga register reads and
writes go to sr[]. Any sr register read access happens through a new
sr() helper function which will read from sr_vbe[] with vbe active and
from sr[] otherwise.
This way we can allow guests update sr[] registers as they want, without
allowing them disrupt vbe video modes that way.
Cc: qemu-stable@nongnu.org
Reported-by: Thomas Lamprecht <thomas@lamprecht.org>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 1463475294-14119-1-git-send-email-kraxel@redhat.com
(cherry picked from commit 94ef4f337f)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Now that json-streamer tries not to leak tokens on incomplete parse,
the tokens can be freed twice if QEMU destroys the json-streamer
object during the parser->emit call. To fix this, create the new
empty GQueue earlier, so that it is already in place when the old
one is passed to parser->emit.
Reported-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1467636059-12557-1-git-send-email-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit a942d8fa01)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Valgrind complained about a number of leaks in
tests/check-qobject-json:
==12657== definitely lost: 17,247 bytes in 1,234 blocks
All of which had the same root cause: on an incomplete parse,
we were abandoning the token queue without cleaning up the
allocated data within each queue element. Introduced in
commit 95385fe, when we switched from QList (which recursively
frees contents) to g_queue (which does not).
We don't yet require glib 2.32 with its g_queue_free_full(),
so open-code it instead.
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1463608012-12760-1-git-send-email-eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
(cherry picked from commit ba4dba5434)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
We currently have an error path during migration that can cause
the source QEMU to abort:
migration_thread()
migration_completion()
runstate_is_running() ----------------> true if guest is running
bdrv_inactivate_all() ----------------> inactivate images
qemu_savevm_state_complete_precopy()
... qemu_fflush()
socket_writev_buffer() --------> error because destination fails
qemu_fflush() -------------------> set error on migration stream
migration_completion() -----------------> set migrate state to FAILED
migration_thread() -----------------------> break migration loop
vm_start() -----------------------------> restart guest with inactive
images
and you get:
qemu-system-ppc64: socket_writev_buffer: Got err=104 for (32768/18446744073709551615)
qemu-system-ppc64: /home/greg/Work/qemu/qemu-master/block/io.c:1342:bdrv_co_do_pwritev: Assertion `!(bs->open_flags & 0x0800)' failed.
Aborted (core dumped)
If we try postcopy with a similar scenario, we also get the writev error
message but QEMU leaves the guest paused because entered_postcopy is true.
We could possibly do the same with precopy and leave the guest paused.
But since the historical default for migration errors is to restart the
source, this patch adds a call to bdrv_invalidate_cache_all() instead.
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Message-Id: <146357896785.6003.11983081732454362715.stgit@bahia.huguette.org>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
(cherry picked from commit fe904ea824)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
The clang compiler supports a useful compiler option -Weverything,
and GCC also has other warnings not enabled by -Wall.
If glib header files trigger a warning, however, testing glib with
-Werror will always fail. A size mismatch is also detected without
-Werror, so simply remove it.
Cc: qemu-stable@nongnu.org
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Message-Id: <1461879221-13338-1-git-send-email-sw@weilnetz.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 5919e0328b)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
This patch is a rough fix to a memory corruption we are observing when
running VMs with xhci USB controller and OVMF firmware.
Specifically, on the following call chain
xhci_reset
xhci_disable_slot
xhci_disable_ep
xhci_set_ep_state
QEMU overwrites guest memory using stale guest addresses.
This doesn't happen when the guest (firmware) driver sets up xhci for
the first time as there are no slots configured yet. However when the
firmware hands over the control to the OS some slots and endpoints are
already set up with their context in the guest RAM. Now the OS' driver
resets the controller again and xhci_set_ep_state then reads and writes
that memory which is now owned by the OS.
As a quick fix, skip calling xhci_set_ep_state in xhci_disable_ep if the
device context base address array pointer is zero (indicating we're in
the HC reset and no DMA is possible).
Cc: qemu-stable@nongnu.org
Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
Message-id: 1462384435-1034-1-git-send-email-rkagan@virtuozzo.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 491d68d938)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
kvm_stat script is failing to execute on powerpc :
# ./kvm_stat
Traceback (most recent call last):
File "./kvm_stat", line 825, in <module>
main()
File "./kvm_stat", line 813, in main
providers = get_providers(options)
File "./kvm_stat", line 778, in get_providers
providers.append(TracepointProvider())
File "./kvm_stat", line 416, in __init__
self.filters = get_filters()
File "./kvm_stat", line 315, in get_filters
if ARCH.exit_reasons:
AttributeError: 'ArchPPC' object has no attribute 'exit_reasons'
This is because, its trying to access a non-defined attribute.
Also, the IOCTL number of RESET is incorrect for powerpc. The correct
number has been added.
Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* cherry-picked from linux commit c7d4fb5a
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
* xen-kmp used this since xen-3.0.4, instead the official protocol from xen-3.3+
* It did an unconditional "outl(1, (ioaddr + 4));"
* This approach was used until openSUSE 12.3, up to SLE11SP3 and in SLE10.
* Starting with openSUSE 13.1, SLE11SP4 and SLE12 the official protocol is used.
* pre VMDP 1.7 made use of 4 and 8 depending on how vmdp was configured.
* If VMDP was to control both disk and LAN it would use 4.
* If it controlled just disk or just LAN, it would use 8 below.
*/
PCIDevice*pci_dev=PCI_DEVICE(s);
DPRINTF("unplug disks\n");
pci_unplug_disks(pci_dev->bus);
DPRINTF("unplug nics\n");
pci_unplug_nics(pci_dev->bus);
DPRINTF("done\n");
}
break;
case8:
log_writeb(s,(uint32_t)val);
break;
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.