Git-commit: 0000000000000000000000000000000000000000
References: [SUSE-JIRA] (SLE-20965)
While using SCSI passthrough, Following scenario makes qemu doesn't
realized the capacity change of remote scsi target:
1. online resize the scsi target.
2. issue 'rescan-scsi-bus.sh -s ...' in host.
3. issue 'rescan-scsi-bus.sh -s ...' in vm.
In above scenario I used to experienced errors while accessing the
additional disk space in vm. I think the reasonable operations should
be:
1. online resize the scsi target.
2. issue 'rescan-scsi-bus.sh -s ...' in host.
3. issue 'block_resize' via qmp to notify qemu.
4. issue 'rescan-scsi-bus.sh -s ...' in vm.
The errors disappear once I notify qemu by block_resize via qmp.
So this patch replaces the number of logical blocks of READ CAPACITY
response from scsi target by qemu's bs->total_sectors. If the user in
vm wants to access the additional disk space, The administrator of
host must notify qemu once resizeing the scsi target.
Bonus is that domblkinfo of libvirt can reflect the consistent capacity
information between host and vm in case of missing block_resize in qemu.
E.g:
...
<disk type='block' device='lun'>
<driver name='qemu' type='raw'/>
<source dev='/dev/sdc' index='1'/>
<backingStore/>
<target dev='sda' bus='scsi'/>
<alias name='scsi0-0-0-0'/>
<address type='drive' controller='0' bus='0' target='0' unit='0'/>
</disk>
...
Before:
1. online resize the scsi target.
2. host:~ # rescan-scsi-bus.sh -s /dev/sdc
3. guest:~ # rescan-scsi-bus.sh -s /dev/sda
4 host:~ # virsh domblkinfo --domain $DOMAIN --human --device sda
Capacity: 4.000 GiB
Allocation: 0.000 B
Physical: 8.000 GiB
5. guest:~ # lsblk /dev/sda
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 8G 0 disk
└─sda1 8:1 0 2G 0 part
After:
1. online resize the scsi target.
2. host:~ # rescan-scsi-bus.sh -s /dev/sdc
3. guest:~ # rescan-scsi-bus.sh -s /dev/sda
4 host:~ # virsh domblkinfo --domain $DOMAIN --human --device sda
Capacity: 4.000 GiB
Allocation: 0.000 B
Physical: 8.000 GiB
5. guest:~ # lsblk /dev/sda
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 4G 0 disk
└─sda1 8:1 0 2G 0 part
Signed-off-by: Lin Ma <lma@suse.com>
Add code to read the suse specific suse-diskcache-disable-flush flag out
of xenstore, and set the equivalent flag within QEMU.
Patch taken from Xen's patch queue, Olaf Hering being the original author.
[bsc#879425]
[BR: minor edits to pass qemu's checkpatch script]
[BR: With qdevification of xen-block, code has changed significantly]
Signed-off-by: Bruce Rogers <brogers@suse.com>
Signed-off-by: Olaf Hering <olaf@aepfle.de>
For SLES we want users to be able to use large memory configurations
with KVM without fiddling with ulimit -Sv.
Signed-off-by: Andreas Färber <afaerber@suse.de>
[BR: add include for sys/resource.h]
Signed-off-by: Bruce Rogers <brogers@suse.com>
References: boo#988279
Change from using glib alloc and free routines to those
from libc. Also perform safety measure of dropping privs
to user if configured no-caps.
Signed-off-by: Bruce Rogers <brogers@suse.com>
[AF: Rebased for v2.7.0-rc2]
Signed-off-by: Andreas Färber <afaerber@suse.de>
Virtio-Console can only process one character at a time. Using it on S390
gave me strange "lags" where I got the character I pressed before when
pressing one. So I typed in "abc" and only received "a", then pressed "d"
but the guest received "b" and so on.
While the stdio driver calls a poll function that just processes on its
queue in case virtio-console can't take multiple characters at once, the
muxer does not have such callbacks, so it can't empty its queue.
To work around that limitation, I introduced a new timer that only gets
active when the guest can not receive any more characters. In that case
it polls again after a while to check if the guest is now receiving input.
This patch fixes input when using -nographic on s390 for me.
[AF: Rebased for v2.7.0-rc2]
[BR: minor edits to pass qemu's checkpatch script]
Signed-off-by: Bruce Rogers <brogers@suse.com>
When using hugetlbfs (which is required for HV mode KVM on 970), we
check for MMU notifiers that on 970 can not be implemented properly.
So disable the check for mmu notifiers on PowerPC guests, making
KVM guests work there, even if possibly racy in some odd circumstances.
Signed-off-by: Bruce Rogers <brogers@suse.com>
When doing lseek, SEEK_SET indicates that the offset is an unsigned variable.
Other seek types have parameters that can be negative.
When converting from 32bit to 64bit parameters, we need to take this into
account and enable SEEK_END and SEEK_CUR to be negative, while SEEK_SET stays
absolute positioned which we need to maintain as unsigned.
Signed-off-by: Alexander Graf <agraf@suse.de>
Linux syscalls pass pointers or data length or other information of that sort
to the kernel. This is all stuff you don't want to have sign extended.
Otherwise a host 64bit variable parameter with a size parameter will extend
it to a negative number, breaking lseek for example.
Pass syscall arguments as ulong always.
Signed-off-by: Alexander Graf <agraf@suse.de>
[JRZ: changes from linux-user/qemu.h wass moved to linux-user/user-internals.h]
Signed-off-by: Jose R Ziviani <jziviani@suse.de>
[DF: Forward port, i.e., use ulong for do_prctl too]
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Fedora 17 for ARM reads /proc/cpuinfo and fails if it doesn't contain
ARM related contents. This patch implements a quick hack to expose real
/proc/cpuinfo data taken from a real world machine.
The real fix would be to generate at least the flags automatically based
on the selected CPU. Please do not submit this patch upstream until this
has happened.
Signed-off-by: Alexander Graf <agraf@suse.de>
[AF: Rebased for v1.6 and v1.7]
Signed-off-by: Andreas Färber <afaerber@suse.de>
[DF: Restructured it a bit, to make ARM look like other arch-es]
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
This reverts commit ec87b5daca.
No need. In our build system submodules are checked out.
Signed-off-by: Bruce Rogers <brogers@suse.com>
[DF: Rebased on top of 6.2.0]
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
We add a --cross-file reference so that we can do cross compilation
of qboot from an aarch64 build.
Signed-off-by: Bruce Rogers <brogers@suse.com>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
References: bsc#1011213
Certain rom subpackages build from qemu git-submodules call the date
program to include date information in the packaged binaries. This
causes repeated builds of the package to be different, wkere the only
real difference is due to the fact that time build timestamp has
changed. To promote reproducible builds and avoid customers being
prompted to update packages needlessly, we'll use the timestamp of the
VERSION file as the packaging timestamp for all packages that build in a
timestamp for whatever reason.
Signed-off-by: Bruce Rogers <brogers@suse.com>
This includes:
- base QEMU version is 7.2.0 (not 7.1)
- update the spec files, so that they work with QEMU 7.2
- disable xen
- build only for x86_64
- refresh seabios version
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Make sure we use the branches of the submodule repositories that have
our downstream patches applied.
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
When we measure FIO read performance (cache=writethrough, bs=4k,
iodepth=64) in VMs, ~80K/s notifications (e.g., EPT_MISCONFIG) are observed
from guest to qemu.
It turns out those frequent notificatons are caused by interference from
worker threads. Worker threads queue bottom halves after completing IO
requests. Pending bottom halves may lead to either aio_compute_timeout()
zeros timeout and pass it to try_poll_mode() or run_poll_handlers() returns
no progress after noticing pending aio_notify() events. Both cause
run_poll_handlers() to call poll_set_started(false) to disable poll mode.
However, for both cases, as timeout is already zeroed, the event loop
(i.e., aio_poll()) just processes bottom halves and then starts the next
event loop iteration. So, disabling poll mode has no value but leads to
unnecessary notifications from guest.
To minimize unnecessary notifications from guest, defer disabling poll
mode to when the event loop is about to be blocked.
With this patch applied, FIO seq-read performance (bs=4k, iodepth=64,
cache=writethrough) in VMs increases from 330K/s to 413K/s IOPS.
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Message-id: 20220710120849.63086-1-chao.gao@intel.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The VMM will set vTPM enabled when the vTPM is initialized
successfully, otherwise, TDVMCALL_SERVICE.Query returns vTPM is
unsupported, OVMF will not try to initialize vTPM anymore.
Signed-off-by: Xiaocheng Dong <xiaocheng.dong@intel.com>
A wrong exit condition may lead to an infinite loop when inflating a
valid zlib buffer containing some extra bytes in the `inflate_buffer`
function. The bug only occurs post-authentication. Return the buffer
immediately if the end of the compressed data has been reached
(Z_STREAM_END).
Fixes: CVE-2023-3255
Fixes: 0bf41cab ("ui/vnc: clipboard support")
Reported-by: Kevin Denis <kevin.denis@synacktiv.com>
Signed-off-by: Mauro Matteo Cascella <mcascell@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Tested-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-ID: <20230704084210.101822-1-mcascell@redhat.com>
The TLS handshake make take some time to complete, during which time an
I/O watch might be registered with the main loop. If the owner of the
I/O channel invokes qio_channel_close() while the handshake is waiting
to continue the I/O watch must be removed. Failing to remove it will
later trigger the completion callback which the owner is not expecting
to receive. In the case of the VNC server, this results in a SEGV as
vnc_disconnect_start() tries to shutdown a client connection that is
already gone / NULL.
CVE-2023-3354
Reported-by: jiangyegen <jiangyegen@huawei.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
For symmetric algorithms, the length of ciphertext must be as same
as the plaintext.
The missing verification of the src_len and the dst_len in
virtio_crypto_sym_op_helper() may lead buffer overflow/divulged.
This patch is originally written by Yiming Tao for QEMU-SECURITY,
resend it(a few changes of error message) in qemu-devel.
Fixes: CVE-2023-3180
Fixes: 04b9b37edda("virtio-crypto: add data queue processing handler")
Cc: Gonglei <arei.gonglei@huawei.com>
Cc: Mauro Matteo Cascella <mcascell@redhat.com>
Cc: Yiming Tao <taoym@zju.edu.cn>
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Message-Id: <20230803024314.29962-2-pizhenwei@bytedance.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 9d38a84347)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
The 9p protocol does not specifically define how server shall behave when
client tries to open a special file, however from security POV it does
make sense for 9p server to prohibit opening any special file on host side
in general. A sane Linux 9p client for instance would never attempt to
open a special file on host side, it would always handle those exclusively
on its guest side. A malicious client however could potentially escape
from the exported 9p tree by creating and opening a device file on host
side.
With QEMU this could only be exploited in the following unsafe setups:
- Running QEMU binary as root AND 9p 'local' fs driver AND 'passthrough'
security model.
or
- Using 9p 'proxy' fs driver (which is running its helper daemon as
root).
These setups were already discouraged for safety reasons before,
however for obvious reasons we are now tightening behaviour on this.
Fixes: CVE-2023-2861
Reported-by: Yanwu Shen <ywsPlz@gmail.com>
Reported-by: Jietao Xiao <shawtao1125@gmail.com>
Reported-by: Jinku Li <jkli@xidian.edu.cn>
Reported-by: Wenbo Shen <shenwenbo@zju.edu.cn>
Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Message-Id: <E1q6w7r-0000Q0-NM@lizzy.crudebyte.com>
(cherry picked from commit f6b0de53fb)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(Mjt: drop adding qemu_fstat wrapper for 7.2 where wrappers aren't used)
Convert all memory to private at memory slot creation has boot performace
impact for TD with big memory. Only convert low memory (0~2GB) to private
to avoid exit to userspace each 4K page because TDVF will accepts about 2GB
low memory during boot.
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
It will segfault when cleanup migration streams before they are setup, to
avoid this, use tdx_mig.nr_streams rather than nr_channels.
Opportunistically remove the unnecessary parameter "nr_channels" in
following 2 callback functions in cgs_mig:
cgs_mig.savevm_state_cleanup
cgs_mig.loadvm_state_cleanup
This fixes LFE-9211 and LFE-9481.
Signed-off-by: Lei Wang <lei4.wang@intel.com>
Since SPR CPU model patch has merged into upstream, remove the internal
version and use upstreamed one instead.
- Internal CPU model:
"target/i386: add SPR-TDX CPU model"
66f1b579fa73aa3e00737e91776ca1f2567ca1a4
- Upstreamed CPU model:
"i386: Add new CPU model SapphireRapids"
7eb061b06e
Signed-off-by: Qian Wen <qian.wen@intel.com>
Add several fixed0/fixed1 type CPUIDs to TDX CPUID lookup table according
to the CPUID Virtualization in TDX Module v1.5 ABI Specification.
Regarding the added CPUIDs, the difference between the public spec of
TDX 1.0 and 1.5 is shown as follows:
TDX 1.0 TDX 1.5
**CPUID.0x01:ECX.DTES64[2] native fixed1**
**CPUID.0x01:ECX.DSCPL[4] native fixed1**
CPUID.0x07:EDX.SGX-KEYS[1] fixed0 fixed0
**CPUID.0x07:EDX.L1D_FLUSH[28] native fixed1**
CPUID.0x12.0x0:EAX[31:0] fixed0 fixed0
CPUID.0x12.0x0:EBX[31:0] fixed0 fixed0
CPUID.0x12.0x1:EAX[31:0] fixed0 fixed0
Although the virtualization of three CPUIDs is not the same in TDX1.0
and 1.5, they won’t break the compatibility, since
update_tdx_cpuid_lookup_by_tdx_caps() uses TDSYSINFO_STRUCT.cpuid to
drop unsupported bits.
Signed-off-by: Qian Wen <qian.wen@intel.com>
Make the alignment check against 4KB instead of page size of ramblock
to make it compatible to support 2MB page.
Also, cgs_bmap should always be calculated using 4KB page size instead
of page size of ramblock.
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Make the mount point path optional for MemoryBackendMemfdPrivate so
that users need not to specify the path to create a TD without 2MB
page support.
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
KVM TDX guest doesn't allow KVM clock. Although guest TD doesn't use a KVM
clock, qemu creates it by default with i386 KVM enabled. When guest TD
crashes and KVM_RUN returns -EIO, the following message is shown.
KVM_GET_CLOCK failed: Input/output error
The message confuses the user and misleads the debug process. Don't create
KVM_CLOCK when confidential computing is enabled, and it has a property to
disable the pv clock.
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
For large page support, it's convenient to be able to use any backend as
shared memory for memory-backend-memfd-private. Add shmemdev property to
specify memory backend.
Example:
-object memory-backend-memfd,id=ramhuge,size=6144M,hugetlb=on,hugetlbsize=2M
-object memory-backend-memfd-private,id=ram1,size=6144M,hugetlb=on,hugetlbsize=2M,shmemdev=ramhuge
-machine memory-backend=ram1
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Make HostMemoryBackend::mr pointer and add pointee to HostMemoryBackend.
Initialize mr in host_memory_backend_memory_complete().
For restricted memfd memory backend to be able to use any host memory
backend.
No functional change, just add one pointer as indirection.
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
By default (due to the recent UPM change), restricted memory attribute is
shared. Convert the memory region from shared to private at the memory
slot creation time.
Without this patch
- Secure-EPT violation on private area
- KVM_MEMORY_FAULT EXIT (kvm->qemu)
- qemu converts the 4K page from shared to private
- Resume VCPU execution
- Secure-EPT violation again
- KVM resolves EPT Violation
This also prevents huge page because page conversion is done at 4K
granularity. Although it's possible to merge 4K private mapping into
2M large page, it slows guest boot.
With this patch
- After memory slot creation, convert the region from private to shraed
- Secure-EPT violation on private area.
- KVM resolves EPT Violation
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
open_tree() Linux system call is needed to support memfd_restricted(). Add
stub for it.
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
The current guest kernel registers IRQ to one vCPU, and call
VMCALL.<SETUPEVENTNOTIFYINTERRUPT> to save the interrupt vector and APIC
ID, VMCALL.<Service> and VMCALL.<GETQUOTE> reuse the same IRQ. But QEMU
notifies the vCPU who calls VMCALL.<Service>.
This patch is a workaround to notify the saved vCPU APIC ID. There is a
potential risk for current vTPM TD interrupt implementation for
multiple vCPUs.
Signed-off-by: Xiaocheng Dong <xiaocheng.dong@intel.com>
The size is used after it was freed, which lead to
incorrect buffer size, cause later memory write overflow
and QEMU runtime crash.
Fixes: 9a5b26f590 ("kvm/tdx: Introduce SocketRecvBuffer")
Signed-off-by: Yuan Yao <yuan.yao@intel.com>
The user id is treated as raw byte stream but not common
C-style string, so the null ending byte is not necessary,
use memcpy() instead of strcpy() which adds null ending
byte, caused write overflow.
Fixes: 9cf69dadac ("kvm/tdx: QMP protocol for create/destroy vTPM instance")
Signed-off-by: Yuan Yao <yuan.yao@intel.com>
The copy size fixed to sizeof(vms->vtpm_userid) caused
buffer overread issue in case of user_id is plain text
string, fixed by only copy same size as the plain text
length.
Fixes: 13c835fa89 ("kvm/tdx: Introduce vTPM client/server vmcall service")
Signed-off-by: Yuan Yao <yuan.yao@intel.com>
disconnection can happens with/without the lock is
hold, adjusts the lock behavior based on locking
state.
Signed-off-by: Yuan Yao <yuan.yao@intel.com>
Abstracts tdx_vtpm_client_check_pending_request() as common
interface to check pending request, it can be used by return
error state for disconnected case.
Signed-off-by: Yuan Yao <yuan.yao@intel.com>
Handle trans_protocal types which has no payload
part, only head part correctly
Fixes: 619ad4e6f4 ("kvm/tdx: Introduce SocketRecvBuffer")
Signed-off-by: Yuan Yao <yuan.yao@intel.com>
These states are used to manage the STREAM socket communication,
make sure the QIOChannelSocket is freed properly and return
DEVICE_ERROR to user for disconnected case.
Signed-off-by: Yao Yuan <yuan.yao@intel.com>
ZOMBIE state is used for a session which is removed
from the client_session set, but haven't been destried
by vTPM server.
The ZOMBIE session will be freed by the peer disconnection
event. A common case for ZOMBIE session is: >= 2 TDs
connecting to same vTPM server with same user id, in this
case the later one connects to the server kick down the
existed early one, the session of early one is marked ZOMBIE.
Signed-off-by: Yao Yuan <yuan.yao@intel.com>