Commit 6f6ad2b245 "hw/i386/pc: Confine system flash handling to pc_sysfw"
causes a regression when specifying the property `-M pflash0` in the PCI PC
machines:
qemu-system-x86_64: Property 'pc-q35-9.0-machine.pflash0' not found
In order to revert the commit, the commit below must be reverted first.
This reverts commit cb05cc1602.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Message-Id: <20240226215909.30884-2-shentey@gmail.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 0fbe8d7c4c)
Referecens: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Don't get/put state of TDX VMs since accessing/mutating guest state of
production TDs is not supported.
Note, it will be allowed for a debug TD. Corresponding support will be
introduced when debug TD support is implemented in the future.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
For TDs, only MSR_IA32_UCODE_REV in kvm_init_msrs() can be configured
by VMM, while the features enumerated/controlled by other MSRs except
MSR_IA32_UCODE_REV in kvm_init_msrs() are not under control of VMM.
Only configure MSR_IA32_UCODE_REV for TDs.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
TSC of TDs is not accessible and KVM doesn't allow access of
MSR_IA32_TSC for TDs. To avoid the assert() in kvm_get_tsc, make
kvm_synchronize_all_tsc() noop for TDs,
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Add a new bool member, eoi_intercept_unsupported, to X86MachineState
with default value false. Set true for TDX VM.
Inability to intercept eoi causes impossibility to emulate level
triggered interrupt to be re-injected when level is still kept active.
which affects interrupt controller emulation.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
TDX CPU state is protected and thus vcpu state cann't be reset by VMM.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Legacy PIC (8259) cannot be supported for TDX VMs since TDX module
doesn't allow directly interrupt injection. Using posted interrupts
for the PIC is not a viable option as the guest BIOS/kernel will not
do EOI for PIC IRQs, i.e. will leave the vIRR bit set.
Hence disable PIC for TDX VMs and error out if user wants PIC.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
TDX doesn't support SMM and VMM cannot emulate SMM for TDX VMs because
VMM cannot manipulate TDX VM's memory.
Disable SMM for TDX VMs and error out if user requests to enable SMM.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Add a q35 property to check whether or not SMM ranges, e.g. SMRAM, TSEG,
etc... exist for the target platform. TDX doesn't support SMM and doesn't
play nice with QEMU modifying related guest memory ranges.
Signed-off-by: Isaku Yamahata <isaku.yamahata@linux.intel.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
In mch_realize(), process PAM initialization before SMRAM initialization so
that later patch can skill all the SMRAM related with a single check.
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
TD guest can use TDG.VP.VMCALL<REPORT_FATAL_ERROR> to request termination
with error message encoded in GPRs.
Parse and print the error message, and terminate the TD guest in the
handler.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
MapGPA is a hypercall to convert GPA from/to private GPA to/from shared GPA.
As the conversion function is already implemented as kvm_convert_memory,
wire it to TDX hypercall exit.
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Add property "quote-generation-socket" to tdx-guest, which is a property
of type SocketAddress to specify Quote Generation Service(QGS).
On request of GetQuote, it connects to the QGS socket, read request
data from shared guest memory, send the request data to the QGS,
and store the response into shared guest memory, at last notify
TD guest by interrupt.
command line example:
qemu-system-x86_64 \
-object '{"qom-type":"tdx-guest","id":"tdx0","quote-generation-socket":{"type": "vsock", "cid":"1","port":"1234"}}' \
-machine confidential-guest-support=tdx0
Note, above example uses vsock type socket because the QGS we used
implements the vsock socket. It can be other types, like UNIX socket,
which depends on the implementation of QGS.
To avoid no response from QGS server, setup a timer for the transaction.
If timeout, make it an error and interrupt guest. Define the threshold of
time to 30s at present, maybe change to other value if not appropriate.
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Codeveloped-by: Chenyi Qiang <chenyi.qiang@intel.com>
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Codeveloped-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
For SetupEventNotifyInterrupt, record interrupt vector and the apic id
of the vcpu that received this TDVMCALL.
Later it can inject interrupt with given vector to the specific vcpu
that received SetupEventNotifyInterrupt.
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Invoke KVM_TDX_FINALIZE_VM to finalize the TD's measurement and make
the TD vCPUs runnable once machine initialization is complete.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
TDX vcpu needs to be initialized by SEAMCALL(TDH.VP.INIT) and KVM
provides vcpu level IOCTL KVM_TDX_INIT_VCPU for it.
KVM_TDX_INIT_VCPU needs the address of the HOB as input. Invoke it for
each vcpu after HOB list is created.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
TDVF firmware (CODE and VARS) needs to be copied to TD's private
memory, as well as TD HOB and TEMP memory.
If the TDVF section has TDVF_SECTION_ATTRIBUTES_MR_EXTEND set in the
flag, calling KVM_TDX_EXTEND_MEMORY to extend the measurement.
After populating the TDVF memory, the original image located in shared
ramblock can be discarded.
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
The TD HOB list is used to pass the information from VMM to TDVF. The TD
HOB must include PHIT HOB and Resource Descriptor HOB. More details can
be found in TDVF specification and PI specification.
Build the TD HOB in TDX's machine_init_done callback.
Co-developed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
The RAM of TDX VM can be classified into two types:
- TDX_RAM_UNACCEPTED: default type of TDX memory, which needs to be
accepted by TDX guest before it can be used and will be all-zeros
after being accepted.
- TDX_RAM_ADDED: the RAM that is ADD'ed to TD guest before running, and
can be used directly. E.g., TD HOB and TEMP MEM that needed by TDVF.
Maintain TdxRamEntries[] which grabs the initial RAM info from e820 table
and mark each RAM range as default type TDX_RAM_UNACCEPTED.
Then turn the range of TD HOB and TEMP MEM to TDX_RAM_ADDED since these
ranges will be ADD'ed before TD runs and no need to be accepted runtime.
The TdxRamEntries[] are later used to setup the memory TD resource HOB
that passes memory info from QEMU to TDVF.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
For each TDVF sections, QEMU needs to copy the content to guest
private memory via KVM API (KVM_TDX_INIT_MEM_REGION).
Introduce a field @mem_ptr for TdxFirmwareEntry to track the memory
pointer of each TDVF sections. So that QEMU can add/copy them to guest
private memory later.
TDVF sections can be classified into two groups:
- Firmware itself, e.g., TDVF BFV and CFV, that located separately from
guest RAM. Its memory pointer is the bios pointer.
- Sections located at guest RAM, e.g., TEMP_MEM and TD_HOB.
mmap a new memory range for them.
Register a machine_init_done callback to do the stuff.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
For TDX, the address below 1MB are entirely general RAM. No need to
initialize pc.rom memory region for TDs.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
TDX doesn't support map different GPAs to same private memory. Thus,
aliasing top 128KB of BIOS as isa-bios is not supported.
On the other hand, TDX guest cannot go to real mode, it can work fine
without isa-bios.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
After TDVF is loaded to bios MemoryRegion, it needs parse TDVF metadata.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
TDX VM needs to boot with its specialized firmware, Trusted Domain
Virtual Firmware (TDVF). QEMU needs to parse TDVF and map it in TD
guest memory prior to running the TDX VM.
A TDVF Metadata in TDVF image describes the structure of firmware.
QEMU refers to it to setup memory for TDVF. Introduce function
tdvf_parse_metadata() to parse the metadata from TDVF image and store
the info of each TDVF section.
TDX metadata is located by a TDX metadata offset block, which is a
GUID-ed structure. The data portion of the GUID structure contains
only an 4-byte field that is the offset of TDX metadata to the end
of firmware file.
Select X86_FW_OVMF when TDX is enable to leverage existing functions
to parse and search OVMF's GUID-ed structures.
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
TDVF(OVMF) needs to run at private memory for TD guest. TDX cannot
support pflash device since it doesn't support read-only private memory.
Thus load TDVF(OVMF) with -bios option for TDs.
Use memory_region_init_ram_guest_memfd() to allocate the MemoryRegion
for TDVF because it needs to be located at private memory.
Also store the MemoryRegion pointer of TDVF since the shared ramblock of
it can be discared after it gets copied to private ramblock.
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Introduce memory_region_init_ram_guest_memfd() to allocate private
guset memfd on the MemoryRegion initialization. It's for the use case of
TDVF, which must be private on TDX case.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
TDX requires vMMIO region to be shared. For KVM, MMIO region is the region
which kvm memslot isn't assigned to (except in-kernel emulation).
qemu has the memory region for vMMIO at each device level.
While OVMF issues MapGPA(to-shared) conservatively on 32bit PCI MMIO
region, qemu doesn't find corresponding vMMIO region because it's before
PCI device allocation and memory_region_find() finds the device region, not
PCI bus region. It's safe to ignore MapGPA(to-shared) because when guest
accesses those region they use GPA with shared bit set for vMMIO. Ignore
memory conversion request of non-assigned region to shared and return
success. Otherwise OVMF is confused and panics there.
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Because vMMIO region needs to be shared region, guest TD may explicitly
convert such region from private to shared. Don't complain such
conversion.
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
TDX only supports readonly for shared memory but not for private memory.
In the view of QEMU, it has no idea whether a memslot is used as shared
memory of private. Thus just mark kvm_readonly_mem_enabled to false to
TDX VM for simplicity.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Reuse "-cpu,tsc-frequency=" to get user wanted tsc frequency and call VM
scope VM_SET_TSC_KHZ to set the tsc frequency of TD before KVM_TDX_INIT_VM.
Besides, sanity check the tsc frequency to be in the legal range and
legal granularity (required by TDX module).
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Three sha384 hash values, mrconfigid, mrowner and mrownerconfig, of a TD
can be provided for TDX attestation. Detailed meaning of them can be
found: https://lore.kernel.org/qemu-devel/31d6dbc1-f453-4cef-ab08-4813f4e0ff92@intel.com/
Allow user to specify those values via property mrconfigid, mrowner and
mrownerconfig. They are all in base64 format.
example
-object tdx-guest, \
mrconfigid=ASNFZ4mrze8BI0VniavN7wEjRWeJq83vASNFZ4mrze8BI0VniavN7wEjRWeJq83v,\
mrowner=ASNFZ4mrze8BI0VniavN7wEjRWeJq83vASNFZ4mrze8BI0VniavN7wEjRWeJq83v,\
mrownerconfig=ASNFZ4mrze8BI0VniavN7wEjRWeJq83vASNFZ4mrze8BI0VniavN7wEjRWeJq83v
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Validate TD attributes with tdx_caps that fixed-0 bits must be zero and
fixed-1 bits must be set.
Besides, sanity check the attribute bits that have not been supported by
QEMU yet. e.g., debug bit, it will be allowed in the future when debug
TD support lands in QEMU.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Current KVM doesn't support PMU for TD guest. It returns error if TD is
created with PMU bit being set in attributes.
Disable PMU for TD guest on QEMU side.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
For QEMU VMs, PKS is configured via CPUID_7_0_ECX_PKS and PMU is
configured by x86cpu->enable_pmu. Reuse the existing configuration
interface for TDX VMs.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
For TDX KVM use case, Linux guest is the most major one. It requires
sept_ve_disable set. Make it default for the main use case. For other use
case, it can be enabled/disabled via qemu command line.
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Bit 28 of TD attribute, named SEPT_VE_DISABLE. When set to 1, it disables
EPT violation conversion to #VE on guest TD access of PENDING pages.
Some guest OS (e.g., Linux TD guest) may require this bit as 1.
Otherwise refuse to boot.
Add sept-ve-disable property for tdx-guest object, for user to configure
this bit.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Invoke KVM_TDX_INIT in kvm_arch_pre_create_vcpu() that KVM_TDX_INIT
configures global TD configurations, e.g. the canonical CPUID config,
and must be executed prior to creating vCPUs.
Use kvm_x86_arch_cpuid() to setup the CPUID settings for TDX VM.
Note, this doesn't address the fact that QEMU may change the CPUID
configuration when creating vCPUs, i.e. punts on refactoring QEMU to
provide a stable CPUID config prior to kvm_arch_init().
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Introduce kvm_arch_pre_create_vcpu(), to perform arch-dependent
work prior to create any vcpu. This is for i386 TDX because it needs
call TDX_INIT_VM before creating any vcpu.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Move the architectural (for lack of a better term) CPUID leaf generation
to a separate helper so that the generation code can be reused by TDX,
which needs to generate a canonical VM-scoped configuration.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Some bits in TD attributes have corresponding CPUID feature bits. Reflect
the fixed0/1 restriction on TD attributes to their corresponding CPUID
bits in tdx_cpuid_lookup[] as well.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
KVM requires userspace to pass XFAM configuration via CPUID 0xD leaves.
Convert tdx_caps->xfam_fixed0/1 into corresponding
tdx_cpuid_lookup[].tdx_fixed0/1 field of CPUID 0xD leaves. Thus the
requirement can be applied naturally.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
tdx_cpuid_lookup[].tdx_fixed0/1 is QEMU maintained data which reflects
TDX restrictions regrading what bits are fixed by TDX module.
It's retrieved from TDX spec and static. However, TDX may evolve and
change some fixed fields to configurable in the future. Update
tdx_cpuid.lookup[].tdx_fixed0/1 fields by removing the bits that
reported from TDX module as configurable. This can adapt with the
updated TDX (module) automatically.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Due to the fact that Intel-PT virtualization support has been broken in
QEMU since Sapphire Rapids generation[1], below warning is triggered when
luanching TD guest:
warning: host doesn't support requested feature: CPUID.07H:EBX.intel-pt [bit 25]
Before Intel-pt is fixed in QEMU, just make Intel-PT unsupported for TD
guest, to avoid the confusing warning.
[1] https://lore.kernel.org/qemu-devel/20230531084311.3807277-1-xiaoyao.li@intel.com/
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
According to Chapter "CPUID Virtualization" in TDX module spec, CPUID
bits of TD can be classified into 6 types:
------------------------------------------------------------------------
1 | As configured | configurable by VMM, independent of native value;
------------------------------------------------------------------------
2 | As configured | configurable by VMM if the bit is supported natively
(if native) | Otherwise it equals as native(0).
------------------------------------------------------------------------
3 | Fixed | fixed to 0/1
------------------------------------------------------------------------
4 | Native | reflect the native value
------------------------------------------------------------------------
5 | Calculated | calculated by TDX module.
------------------------------------------------------------------------
6 | Inducing #VE | get #VE exception
------------------------------------------------------------------------
Note:
1. All the configurable XFAM related features and TD attributes related
features fall into type #2. And fixed0/1 bits of XFAM and TD
attributes fall into type #3.
2. For CPUID leaves not listed in "CPUID virtualization Overview" table
in TDX module spec, TDX module injects #VE to TDs when those are
queried. For this case, TDs can request CPUID emulation from VMM via
TDVMCALL and the values are fully controlled by VMM.
Due to TDX module has its own virtualization policy on CPUID bits, it leads
to what reported via KVM_GET_SUPPORTED_CPUID diverges from the supported
CPUID bits for TDs. In order to keep a consistent CPUID configuration
between VMM and TDs. Adjust supported CPUID for TDs based on TDX
restrictions.
Currently only focus on the CPUID leaves recognized by QEMU's
feature_word_info[] that are indexed by a FeatureWord.
Introduce a TDX CPUID lookup table, which maintains 1 entry for each
FeatureWord. Each entry has below fields:
- tdx_fixed0/1: The bits that are fixed as 0/1;
- depends_on_vmm_cap: The bits that are configurable from the view of
TDX module. But they requires emulation of VMM
when configured as enabled. For those, they are
not supported if VMM doesn't report them as
supported. So they need be fixed up by checking
if VMM supports them.
- inducing_ve: TD gets #VE when querying this CPUID leaf. The result is
totally configurable by VMM.
- supported_on_ve: It's valid only when @inducing_ve is true. It represents
the maximum feature set supported that be emulated
for TDs.
By applying TDX CPUID lookup table and TDX capabilities reported from
TDX module, the supported CPUID for TDs can be obtained from following
steps:
- get the base of VMM supported feature set;
- if the leaf is not a FeatureWord just return VMM's value without
modification;
- if the leaf is an inducing_ve type, applying supported_on_ve mask and
return;
- include all native bits, it covers type #2, #4, and parts of type #1.
(it also includes some unsupported bits. The following step will
correct it.)
- apply fixed0/1 to it (it covers #3, and rectifies the previous step);
- add configurable bits (it covers the other part of type #1);
- fix the ones in vmm_fixup;
(Calculated type is ignored since it's determined at runtime).
Co-developed-by: Chenyi Qiang <chenyi.qiang@intel.com>
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
It will need special handling for TDX VMs all around the QEMU.
Introduce is_tdx_vm() helper to query if it's a TDX VM.
Cache tdx_guest object thus no need to cast from ms->cgs every time.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
KVM provides TDX capabilities via sub command KVM_TDX_CAPABILITIES of
IOCTL(KVM_MEMORY_ENCRYPT_OP). Get the capabilities when initializing
TDX context. It will be used to validate user's setting later.
Since there is no interface reporting how many cpuid configs contains in
KVM_TDX_CAPABILITIES, QEMU chooses to try starting with a known number
and abort when it exceeds KVM_MAX_CPUID_ENTRIES.
Besides, introduce the interfaces to invoke TDX "ioctls" at different
scope (KVM, VM and VCPU) in preparation.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Implement TDX specific ConfidentialGuestSupportClass::kvm_init()
callback, tdx_kvm_init().
Set ms->require_guest_memfd to true to require private guest memfd
allocation for any memory backend.
More TDX specific initialization will be added later.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
TDX VM requires VM type KVM_X86_TDX_VM to be passed to
kvm_ioctl(KVM_CREATE_VM). Hence implement mc->kvm_type() for i386
architecture.
If tdx-guest object is specified to confidential-guest-support, like,
qemu -machine ...,confidential-guest-support=tdx0 \
-object tdx-guest,id=tdx0,...
it parses VM type as KVM_X86_TDX_VM. Otherwise, it's KVM_X86_DEFAULT_VM.
Also store the vm_type in MachineState for other code to query what the
VM type is.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Introduce tdx-guest object which inherits CONFIDENTIAL_GUEST_SUPPORT,
and will be used to create TDX VMs (TDs) by
qemu -machine ...,confidential-guest-support=tdx0 \
-object tdx-guest,id=tdx0
So far, it has no QAPI member/properety decleared and only one internal
member 'attributes' with fixed value 0 that not configurable.
QAPI properties will be added later.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Pull in recent TDX updates, which are not backwards compatible.
It's just to make this series runnable. It will be updated by script
scripts/update-linux-headers.sh
once TDX support is upstreamed in linux kernel
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
KVM side leaves the memory to shared by default, while may incur the
overhead of paging conversion on the first visit of each page. Because
the expectation is that page is likely to private for the VMs that
require private memory (has guest memfd).
Explicitly set the memory to private when memory region has valid
guest memfd backend.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
When geeting KVM_EXIT_MEMORY_FAULT exit, it indicates userspace needs to
do the memory conversion on the RAMBlock to turn the memory into desired
attribute, i.e., private/shared.
Currently only KVM_MEMORY_EXIT_FLAG_PRIVATE in flags is valid when
KVM_EXIT_MEMORY_FAULT happens.
Note, KVM_EXIT_MEMORY_FAULT makes sense only when the RAMBlock has
guest_memfd memory backend.
Note, KVM_EXIT_MEMORY_FAULT returns with -EFAULT, so special handling is
added.
When page is converted from shared to private, the original shared
memory can be discarded via ram_block_discard_range(). Note, shared
memory can be discarded only when it's not back'ed by hugetlb because
hugetlb is supposed to be pre-allocated and no need for discarding.
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
When memory page is converted from private to shared, the original
private memory is back'ed by guest_memfd. Introduce
ram_block_discard_guest_memfd_range() for discarding memory in
guest_memfd.
Originally-from: Isaku Yamahata <isaku.yamahata@intel.com>
Codeveloped-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Introduce the helper functions to set the attributes of a range of
memory to private or shared.
This is necessary to notify KVM the private/shared attribute of each gpa
range. KVM needs the information to decide the GPA needs to be mapped at
hva-based shared memory or guest_memfd based private memory.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Switch to KVM_SET_USER_MEMORY_REGION2 when supported by KVM.
With KVM_SET_USER_MEMORY_REGION2, QEMU can set up memory region that
backend'ed both by hva-based shared memory and guest memfd based private
memory.
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
The upper 16 bits of kvm_userspace_memory_region::slot are
address space id. Parse it separately in trace_kvm_set_user_memory().
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Add a new member "guest_memfd" to memory backends. When it's set
to true, it enables RAM_GUEST_MEMFD in ram_flags, thus private kvm
guest_memfd will be allocated during RAMBlock allocation.
Memory backend's @guest_memfd is wired with @require_guest_memfd
field of MachineState. It avoid looking up the machine in phymem.c.
MachineState::require_guest_memfd is supposed to be set by any VMs
that requires KVM guest memfd as private memory, e.g., TDX VM.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Add KVM guest_memfd support to RAMBlock so both normal hva based memory
and kvm guest memfd based private memory can be associated in one RAMBlock.
Introduce new flag RAM_GUEST_MEMFD. When it's set, it calls KVM ioctl to
create private guest_memfd during RAMBlock setup.
Allocating a new RAM_GUEST_MEMFD flag to instruct the setup of guest memfd
is more flexible and extensible than simply relying on the VM type because
in the future we may have the case that not all the memory of a VM need
guest memfd. As a benefit, it also avoid getting MachineState in memory
subsystem.
Note, RAM_GUEST_MEMFD is supposed to be set for memory backends of
confidential guests, such as TDX VM. How and when to set it for memory
backends will be implemented in the following patches.
Introduce memory_region_has_guest_memfd() to query if the MemoryRegion has
KVM guest_memfd allocated.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Guest memfd support in QEMU requires corresponding KVM guest memfd APIs,
which lands in Linux from v6.8-rc1.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Use unified confidential_guest_kvm_init(), to avoid exposing specific
functions.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
Changes from rfc v1:
- check machine->cgs not NULL before calling confidential_guest_kvm_init();
Use the unified interface to call confidential guest related kvm_init()
and kvm_reset(), to avoid exposing pef specific functions.
remove perf.h since it is now blank..
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
Changes from rfc v1:
- check machine->cgs not NULL before callling
confidential_guest_kvm_init/reset();
Use confidential_guest_kvm_init() instead of calling SEV specific
sev_kvm_init(). As a bouns, it fits to future TDX when TDX implements
its own confidential_guest_support and .kvm_init().
Move the "TypeInfo sev_guest_info" definition and related functions to
the end of the file, to avoid declaring the sev_kvm_init() ahead.
Delete the sve-stub.c since it's not needed anymore.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
Changes from rfc v1:
- check ms->cgs not NULL before calling confidential_guest_kvm_init();
- delete the sev-stub.c;
Different confidential VMs in different architectures all have the same
needs to do their specific initialization (and maybe resetting) stuffs
with KVM. Currently each of them exposes individual *_kvm_init()
functions and let machine code or kvm code to call it.
To make it more object oriented, add two virtual functions, kvm_init()
and kvm_reset() in ConfidentialGuestSupportClass, and expose two helpers
functions for invodking them.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
Changes since rfc v1:
- Drop the NULL check and rely the check from the caller;
The interrupt handlers need to be populated before the device is realized since
internal devices such as the RTC are wired during realize(). If the interrupt
handlers aren't populated, devices such as the RTC will be wired with a NULL
interrupt handler, i.e. MC146818RtcState::irq is NULL.
Fixes: fc11ca08bc "hw/i386/q35: Realize LPC PCI function before accessing it"
Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Message-ID: <20240217104644.19755-1-shentey@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit 143f3fd3d8)
References: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
pc_system_flash_create() checked for pcmc->pci_enabled which is redundant since
its caller already checked it. The method can be turned into just two lines, so
inline and remove it.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20240208220349.4948-8-shentey@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit cb05cc1602)
References: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Rather than distributing PC system flash handling across three files, let's
confine it to one. Now, pc_system_firmware_init() creates, configures and cleans
up the system flash which makes the code easier to understand. It also avoids
the extra call to pc_system_flash_cleanup_unused() in the Xen case.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20240208220349.4948-7-shentey@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit 6f6ad2b245)
References: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Handling most of smbios data generation in the machine_done notifier is similar
to how the ARM virt machine handles it which also calls smbios_set_defaults()
there. The result is that all pc machines are freed from explicitly worrying
about smbios setup.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20240208220349.4948-6-shentey@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit a0204a5ed0)
References: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
The attribute isn't user-changeable and only true for pc-based machines. Turn it
into a class attribute which allows for inlining pc_guest_info_init() into
pc_machine_initfn().
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20240208220349.4948-4-shentey@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit 6e6d59a94d)
Referencex: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Return early if bc->alloc is NULL. De-indent the if() ladder.
Note, this avoids a pointless call to error_propagate() with
errp=NULL at the 'out:' label.
Change trivial when reviewed with 'git-diff --ignore-all-space'.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Message-Id: <20231120213301.24349-16-philmd@linaro.org>
(cherry picked from commit e199f7ad4d)
References: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
A memory page poisoned from the hypervisor level is no longer readable.
The migration of a VM will crash Qemu when it tries to read the
memory address space and stumbles on the poisoned page with a similar
stack trace:
Program terminated with signal SIGBUS, Bus error.
To avoid this VM crash during the migration, prevent the migration
when a known hardware poison exists on the VM.
Signed-off-by: William Roche <william.roche@oracle.com>
Link: https://lore.kernel.org/r/20240130190640.139364-2-william.roche@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
(cherry picked from commit 06152b89db)
References: XXX
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
nvme_sriov_pre_write_ctrl() used to directly inspect SR-IOV
configurations to know the number of VFs being disabled due to SR-IOV
configuration writes, but the logic was flawed and resulted in
out-of-bound memory access.
It assumed PCI_SRIOV_NUM_VF always has the number of currently enabled
VFs, but it actually doesn't in the following cases:
- PCI_SRIOV_NUM_VF has been set but PCI_SRIOV_CTRL_VFE has never been.
- PCI_SRIOV_NUM_VF was written after PCI_SRIOV_CTRL_VFE was set.
- VFs were only partially enabled because of realization failure.
It is a responsibility of pcie_sriov to interpret SR-IOV configurations
and pcie_sriov does it correctly, so use pcie_sriov_num_vfs(), which it
provides, to get the number of enabled VFs before and after SR-IOV
configuration writes.
Cc: qemu-stable@nongnu.org
Fixes: CVE-2024-26328
Fixes: 11871f53ef ("hw/nvme: Add support for the Virtualization Management command")
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20240228-reuse-v8-1-282660281e60@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 91bb64a8d2)
References: bsc#1220065
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Update to latest stable release (8.2.2).
Full changelog here:
https://lore.kernel.org/qemu-devel/1709577077.783602.1474596.nullmailer@tls.msk.ru/
Upstream backports:
chardev/char-socket: Fix TLS io channels sending too much data to the backend
tests/unit/test-util-sockets: Remove temporary file after test
hw/usb/bus.c: PCAP adding 0xA in Windows version
hw/intc/Kconfig: Fix GIC settings when using "--without-default-devices"
gitlab: force allow use of pip in Cirrus jobs
tests/vm: avoid re-building the VM images all the time
tests/vm: update openbsd image to 7.4
target/i386: leave the A20 bit set in the final NPT walk
target/i386: remove unnecessary/wrong application of the A20 mask
target/i386: Fix physical address truncation
target/i386: check validity of VMCB addresses
target/i386: mask high bits of CR3 in 32-bit mode
pl031: Update last RTCLR value on write in case it's read back
hw/nvme: fix invalid endian conversion
update edk2 binaries to edk2-stable202402
update edk2 submodule to edk2-stable202402
target/ppc: Fix crash on machine check caused by ifetch
target/ppc: Fix lxv/stxv MSR facility check
.gitlab-ci.d/windows.yml: Drop msys2-32bit job
system/vl: Update description for input grab key
docs/system: Update description for input grab key
hw/hppa/Kconfig: Fix building with "configure --without-default-devices"
tests/qtest: Depend on dbus_display1_dep
meson: Explicitly specify dbus-display1.h dependency
audio: Depend on dbus_display1_dep
ui/console: Fix console resize with placeholder surface
ui/clipboard: add asserts for update and request
ui/clipboard: mark type as not available when there is no data
ui: reject extended clipboard message if not activated
target/i386: Generate an illegal opcode exception on cmp instructions with lock prefix
i386/cpuid: Move leaf 7 to correct group
i386/cpuid: Decrease cpuid_i when skipping CPUID leaf 1F
i386/cpu: Mask with XCR0/XSS mask for FEAT_XSAVE_XCR0_HI and FEAT_XSAVE_XSS_HI leafs
i386/cpu: Clear FEAT_XSAVE_XSS_LO/HI leafs when CPUID_EXT_XSAVE is not available
.gitlab-ci/windows.yml: Don't install libusb or spice packages on 32-bit
iotests: Make 144 deterministic again
target/arm: Don't get MDCR_EL2 in pmu_counter_enabled() before checking ARM_FEATURE_PMU
target/arm: Fix SVE/SME gross MTE suppression checks
target/arm: Handle mte in do_ldrq, do_ldro
target/arm: Split out make_svemte_desc
target/arm: Adjust and validate mtedesc sizem1
target/arm: Fix nregs computation in do_{ld,st}_zpa
linux-user/aarch64: Choose SYNC as the preferred MTE mode
tests/acpi: Update DSDT.cxl to reflect change _STA return value.
hw/i386: Fix _STA return value for ACPI0017
tests/acpi: Allow update of DSDT.cxl
smmu: Clear SMMUPciBus pointer cache when system reset
virtio_iommu: Clear IOMMUPciBus pointer cache when system reset
virtio-gpu: Correct virgl_renderer_resource_get_info() error check
hw/cxl: Pass CXLComponentState to cache_mem_ops
hw/cxl/device: read from register values in mdev_reg_read()
cxl/cdat: Fix header sum value in CDAT checksum
cxl/cdat: Handle cdat table build errors
vhost-user.rst: Fix vring address description
tcg/arm: Fix goto_tb for large translation blocks
tcg: Increase width of temp_subindex
hw/net/tulip: add chip status register values
hw/smbios: Fix port connector option validation
hw/smbios: Fix OEM strings table option validation
configure: run plugin TCG tests again
tests/docker: Add sqlite3 module to openSUSE Leap container
hw/riscv/virt-acpi-build.c: fix leak in build_rhct()
migration: Fix logic of channels and transport compatibility check
virtio-blk: avoid using ioeventfd state in irqfd conditional
virtio: Re-enable notifications after drain
virtio-scsi: Attach event vq notifier with no_poll
iotests: give tempdir an identifying name
iotests: fix leak of tmpdir in dry-run mode
hw/scsi/lsi53c895a: add missing decrement of reentrancy counter
linux-user/aarch64: Add padding before __kernel_rt_sigreturn
tcg/loongarch64: Set vector registers call clobbered
pci-host: designware: Limit value range of iATU viewport register
target/arm: Reinstate "vfp" property on AArch32 CPUs
qemu-options.hx: Improve -serial option documentation
system/vl.c: Fix handling of '-serial none -serial something'
target/arm: fix exception syndrome for AArch32 bkpt insn
block/blkio: Make s->mem_region_alignment be 64 bits
qemu-docs: Update options for graphical frontends
Make 'uri' optional for migrate QAPI
vfio/pci: Clear MSI-X IRQ index always
migration: Fix use-after-free of migration state object
migration: Plug memory leak on HMP migrate error path
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
spapr_irq_init currently uses existing macro SPAPR_XIRQ_BASE to refer to
the range of CPU IPIs during initialization of nr-irqs property.
It is more appropriate to have its own define which can be further
reused as appropriate for correct interpretation.
Suggested-by: Cedric Le Goater <clg@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Kowshik Jois <kowsjois@linux.ibm.com>
Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
(cherry picked from commit 2df5c1f5b0)
References: bsc#1220310
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
We wanted QEMU to support larger VMs (in therm of RAM size) by default
and we therefore introduced patch "[openSUSE] increase x86_64 physical
bits to 42". This, however, means that we create VMs with 42 bits of
physical address space even on hosts that only has, say, 40. And that
can't work.
In fact, it has been a problem since a long time (e.g., bsc#1205978) and
it's also the actual root cause of bsc#1219977.
Get rid of that old patch, in favor of a new one that still raise the
default number of address bits to 42, but only on hosts that supports
that.
This means that we can also use the proper SeaBIOS version, without
reverting commits that were only a problem due to our broken downstream
patch.
We probably aslo don't need to ship some of the custom ACPI tables (for
passing tests), but we'll actually remove them later, after double
checking properly that all the tests do work.
References: bsc#1205978
References: bsc#1219977
References: bsc#1220799
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Update the copyright year to 2024, sort dependencies etc.
This way, 'osc' does not have to do these changes all the times (they're
automatic, so no big deal, but it's annoying to see them in the diffs of
all the requests).
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Backported commits:
* Update version for 8.2.1 release
* target/arm: Fix incorrect aa64_tidcp1 feature check
* target/arm: Fix A64 scalar SQSHRN and SQRSHRN
* target/xtensa: fix OOB TLB entry access
* qtest: bump aspeed_smc-test timeout to 6 minutes
* monitor: only run coroutine commands in qemu_aio_context
* iotests: port 141 to Python for reliable QMP testing
* iotests: add filter_qmp_generated_node_ids()
* block/blklogwrites: Fix a bug when logging "write zeroes" operations.
* virtio-net: correctly copy vnet header when flushing TX (bsc#1218484, CVE-2023-6693)
* tcg/arm: Fix SIGILL in tcg_out_qemu_st_direct
* linux-user/riscv: Adjust vdso signal frame cfa offsets
* linux-user: Fixed cpu restore with pc 0 on SIGBUS
* block/io: clear BDRV_BLOCK_RECURSE flag after recursing in bdrv_co_block_status
* coroutine-ucontext: Save fake stack for pooled coroutine
* tcg/s390x: Fix encoding of VRIc, VRSa, VRSc insns
* accel/tcg: Revert mapping of PCREL translation block to multiple virtual addresses
* acpi/tests/avocado/bits: wait for 200 seconds for SHUTDOWN event from bits VM
* s390x/pci: drive ISM reset from subsystem reset
* s390x/pci: refresh fh before disabling aif
* s390x/pci: avoid double enable/disable of aif
* hw/scsi/esp-pci: set DMA_STAT_BCMBLT when BLAST command issued
* hw/scsi/esp-pci: synchronise setting of DMA_STAT_DONE with ESP completion interrupt
* hw/scsi/esp-pci: generate PCI interrupt from separate ESP and PCI sources
* hw/scsi/esp-pci: use correct address register for PCI DMA transfers
* migration/rdma: define htonll/ntohll only if not predefined
* hw/pflash: implement update buffer for block writes
* hw/pflash: use ldn_{be,le}_p and stn_{be,le}_p
* hw/pflash: refactor pflash_data_write()
* backends/cryptodev: Do not ignore throttle/backends Errors
* target/i386: pcrel: store low bits of physical address in data[0]
* target/i386: fix incorrect EIP in PC-relative translation blocks
* target/i386: Do not re-compute new pc with CF_PCREL
* load_elf: fix iterator's type for elf file processing
* target/hppa: Update SeaBIOS-hppa to version 15
* target/hppa: Fix IOR and ISR on error in probe
* target/hppa: Fix IOR and ISR on unaligned access trap
* target/hppa: Export function hppa_set_ior_and_isr()
* target/hppa: Avoid accessing %gr0 when raising exception
* hw/hppa: Move software power button address back into PDC
* target/hppa: Fix PDC address translation on PA2.0 with PSW.W=0
* hw/pci-host/astro: Add missing astro & elroy registers for NetBSD
* hw/hppa/machine: Disable default devices with --nodefaults option
* hw/hppa/machine: Allow up to 3840 MB total memory
* readthodocs: fully specify a build environment
* .gitlab-ci.d/buildtest.yml: Work around htags bug when environment is large
* target/s390x: Fix LAE setting a wrong access register
* tests/qtest/virtio-ccw: Fix device presence checking
* tests/acpi: disallow tests/data/acpi/virt/SSDT.memhp changes
* tests/acpi: update expected data files
* edk2: update binaries to git snapshot
* edk2: update build config, set PcdUninstallMemAttrProtocol = TRUE.
* edk2: update to git snapshot
* tests/acpi: allow tests/data/acpi/virt/SSDT.memhp changes
* util: fix build with musl libc on ppc64le
* tcg/ppc: Use new registers for LQ destination
* hw/intc/arm_gicv3_cpuif: handle LPIs in in the list registers
* hw/vfio: fix iteration over global VFIODevice list
* vfio/container: Replace basename with g_path_get_basename
* edu: fix DMA range upper bound check
* hw/net: cadence_gem: Fix MDIO_OP_xxx values
* audio/audio.c: remove trailing newline in error_setg
* chardev/char.c: fix "abstract device type" error message
* target/riscv: Fix mcycle/minstret increment behavior
* hw/net/can/sja1000: fix bug for single acceptance filter and standard frame
* target/i386: the sgx_epc_get_section stub is reachable
* configure: use a native non-cross compiler for linux-user
* include/ui/rect.h: fix qemu_rect_init() mis-assignment
* target/riscv/kvm: do not use non-portable strerrorname_np()
* iotests: Basic tests for internal snapshots
* vl: Improve error message for conflicting -incoming and -loadvm
* block: Fix crash when loading snapshot on inactive node
References: bsc#1218484 (CVE-2023-6693)
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Depending on the VM configuration (both at the VM definition level and
on the guest itself) a VGA console might be necessary, or weird lockup
will occur. Since the VGA module package is smalle enough, add a
dependency for it, from other display modules, to act as a workaround.
While there, make more explicit and precise the dependencies between all
the various modules, by specifying that they should all have the same
version and release.
References: bsc#1219164
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Historically, KVM was available only for x86 and s390, and was invoked
via a binary called 'kvm' or 'qemu-kvm'. For a while, we've shipped a
package that was making it possible to invoke QEMU like that, but only
for these two arches. This, however, created a lot of confusion and
dependencies issues.
Fix them by creating a symlink from 'qemu-kvm' to the proper binary on
all arches and by making the main QEMU package Providing and Obsoleting
(also on all arches) the old qemu-kvm one.
Note that, for RISCV, the qemu-system-riscv64 binary, to which the symlink
should point, is in the qemu-extra package. However, if we are on RISCV,
qemu-extra is an hard dependency of qemu. Therefore, it's fine to ship
the link and also set the Provides: and Obsoletes: tag in the qemu
package itself. It'd be more correct to do that in the qemu-extra
package, of course, but this would complicate the spec file and it's not
worth it, considering this is all legacy and should very well go away
soon.
References: bsc#1218684
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Add to the ipxe submodule the commit (and all its dependencies) for
fixing building with binutils 2.42
References: bsc#1219733
References: bsc#1219722
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Point the submodules to the repositories that host our downstream
patches:
* roms/seabios
- [openSUSE] switch to python3 as needed
- [openSUSE] build: enable cross compilation on ARM
- [openSUSE] build: be explicit about -mx86-used-note=no
* roms/SLOF
- Allow to override build date with SOURCE_DATE_EPOCH
* roms/ipxe
- [ath5k] Add missing AR5K_EEPROM_READ in ath5k_eeprom_read_turbo_modes
- [openSUSE] [build] Makefile: fix issues of build reproducibility
- [openSUSE] [test] help compiler out by initializing array[openSUSE]
- [openSUSE] [build] Silence GCC 12 spurious warnings
- [librm] Use explicit operand size when pushing a label address
* roms/skiboot
- [openSUSE] Makefile: define endianess for cross-building on aarch64
- [openSUSE] Make Sphinx build reproducible (boo#1102408)
* roms/qboot
- [openSUSE] add cross.ini file to handle aarch64 based build
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Update to latest upstream release.
The full list of changes are available at:
https://wiki.qemu.org/ChangeLog/8.2
Highlights include:
* New virtio-sound device emulation
* New virtio-gpu rutabaga device emulation used by Android emulator
* New hv-balloon for dynamic memory protocol device for Hyper-V guests
* New Universal Flash Storage device emulation
* Network Block Device (NBD) 64-bit offsets for improved performance
* dump-guest-memory now supports the standard kdump format
* ARM: Xilinx Versal board now models the CFU/CFI, and the TRNG device
* ARM: CPU emulation support for cortex-a710 and neoverse-n2
* ARM: architectural feature support for PACQARMA3, EPAC, Pauth2, FPAC,
FPACCOMBINE, TIDCP1, MOPS, HBC, and HPMN0
* HPPA: CPU emulation support for 64-bit PA-RISC 2.0
* HPPA: machine emulation support for C3700, including Astro memory
controller and four Elroy PCI bridges
* LoongArch: ISA support for LASX extension and PRELDX instruction
* LoongArch: CPU emulation support for la132
* RISC-V: ISA/extension support for AIA virtualization support via KVM,
and vector cryptographic instructions
* RISC-V: Numerous extension/instruction cleanups, fixes, and reworks
* s390x: support for vfio-ap passthrough of crypto adapter for
protected
virtualization guests
* Tricore: support for TC37x CPU which implements ISA v1.6.2
* Tricore: support for CRCN, FTOU, FTOHP, and HPTOF instructions
* x86: Zen support for PV console and network devices
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
In 8.2, the upstream added graph lock annotations that conflict with
our coroutine patches. The GRAPH_RDLOCK_GUARD_MAINLOOP is now asserted
at some places where we turned the code into a coroutine.
My understanding is that the annotation merely asserts that the code
is running in the main loop and serves effectively as a marker of
where the graph lock should be taken if the code were to be moved out
of the main loop. That's exactly what we're doing so it should be safe
to replace the main loop check with an actual rdlock/rdunlock pair.
Commits e7e801a2a7, 94d03a7425 and c89144c058 replace
GRAPH_RDLOCK_GUARD_MAINLOOP() with GRAPH_RDLOCK_GUARD() at the
bdrv_snapshot_list, bdrv_named_nodes_list and qmp_query_block
functions. The changes are split into those patches instead of here to
preserve bisectability.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
The fstat call can take a long time to finish when running over
NFS. Add a version of it that runs in the thread pool.
Adapt one of its users, raw_co_get_allocated_file size to use the new
version. That function is called via QMP under the qemu_global_mutex
so it has a large chance of blocking VCPU threads in case it takes too
long to finish.
Signed-off-by: João Silva <jsilva@suse.de>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Claudio Fontana <cfontana@suse.de>
References: bsc#1211000
Signed-off-by: Fabiano Rosas <farosas@suse.de>
This is another caller of bdrv_get_allocated_file_size() that needs to
be converted to a coroutine because that function will be made
asynchronous when called (indirectly) from the QMP dispatcher.
This QMP command is a candidate because it calls bdrv_do_query_node_info(),
which in turn calls bdrv_get_allocated_file_size().
We've determined bdrv_do_query_node_info() to be coroutine-safe (see
previous commits), so we can just put this QMP command in a coroutine.
Since qmp_query_block() now expects to run in a coroutine, its callers
need to be converted as well. Convert hmp_info_block(), which calls
only coroutine-safe code, including qmp_query_named_block_nodes()
which has been converted to coroutine in the previous patches.
Now that all callers of bdrv_[co_]block_device_info() are using the
coroutine version, a few things happen:
- we can return to using bdrv_block_device_info() without a wrapper;
- bdrv_get_allocated_file_size() can stop being mixed;
- bdrv_co_get_allocated_file_size() needs to be put under the graph
lock because it is being called wthout the wrapper;
- bdrv_do_query_node_info() doesn't need to acquire the AioContext
because it doesn't call aio_poll anymore;
Signed-off-by: Fabiano Rosas <farosas@suse.de>
References: bsc#1211000
[relax main loop requirement at qmp_query_block]
Signed-off-by: Fabiano Rosas <farosas@suse.de>
We're currently doing a full query-block just to enumerate the devices
for qmp_nbd_server_add and then discarding the BlockInfoList
afterwards. Alter hmp_nbd_server_start to instead iterate explicitly
over the block_backends list.
This allows the removal of the dependency on qmp_query_block from
hmp_nbd_server_start. This is desirable because we're about to move
qmp_query_block into a coroutine and don't need to change the NBD code
at the same time.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
References: bsc#1211000
Signed-off-by: Fabiano Rosas <farosas@suse.de>
We're converting callers of bdrv_get_allocated_file_size() to run in
coroutines because that function will be made asynchronous when called
(indirectly) from the QMP dispatcher.
This QMP command is a candidate because it indirectly calls
bdrv_get_allocated_file_size() through bdrv_block_device_info() ->
bdrv_query_image_info() -> bdrv_query_image_info().
The previous patches have determined that bdrv_query_image_info() and
bdrv_do_query_node_info() are coroutine-safe so we can just make the
QMP command run in a coroutine.
Signed-off-by: Lin Ma <lma@suse.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
References: bsc#1211000
[relax the main loop requirement at bdrv_named_nodes_list]
Signed-off-by: Fabiano Rosas <farosas@suse.de>
We're converting callers of bdrv_get_allocated_file_size() to run in
coroutines because that function will be made asynchronous when called
(indirectly) from the QMP dispatcher.
This function is a candidate because it calls bdrv_query_image_info()
-> bdrv_do_query_node_info() -> bdrv_get_allocated_file_size().
It is safe to turn this is a coroutine because the code it calls is
made up of either simple accessors and string manipulation functions
[1] or it has already been determined to be safe [2].
1) bdrv_refresh_filename(), bdrv_is_read_only(),
blk_enable_write_cache(), bdrv_cow_bs(), blk_get_public(),
throttle_group_get_name(), bdrv_write_threshold_get(),
bdrv_query_dirty_bitmaps(), throttle_group_get_config(),
bdrv_filter_or_cow_bs(), bdrv_skip_implicit_filters()
2) bdrv_do_query_node_info() (see previous commit);
Signed-off-by: Fabiano Rosas <farosas@suse.de>
References: bsc#1211000
Signed-off-by: Fabiano Rosas <farosas@suse.de>
We're converting callers of bdrv_get_allocated_file_size() to run in
coroutines because that function will be made asynchronous when called
(indirectly) from the QMP dispatcher.
This function is a candidate because it calls bdrv_do_query_node_info(),
which in turn calls bdrv_get_allocated_file_size().
All the functions called from bdrv_do_query_node_info() onwards are
coroutine-safe, either have a coroutine version themselves[1] or are
mostly simple code/string manipulation[2].
1) bdrv_getlength(), bdrv_get_allocated_file_size(), bdrv_get_info(),
bdrv_get_specific_info();
2) bdrv_refresh_filename(), bdrv_get_format_name(),
bdrv_get_full_backing_filename(), bdrv_query_snapshot_info_list();
Signed-off-by: Fabiano Rosas <farosas@suse.de>
References: bsc#1211000
[relax the main loop requirement at bdrv_snapshot_list]
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Some callers of this function are about to be converted to run in
coroutines, so allow it to be executed both inside and outside a
coroutine while we convert all the callers.
This will be reverted once all callers of bdrv_do_query_node_info run
in a coroutine.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
References: bsc#1211000
Signed-off-by: Fabiano Rosas <farosas@suse.de>
The following patches will add co_wrapper annotations to functions
declared in qapi.h. Add that header to the set of files used by
block-coroutine-wrapper.py.
Signed-off-by: Fabiano Rosas <farosas@suse.de>
References: bsc#1211000
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Add some block drivers and virtiofsd as hard dependencies of the
qemu-headless package, to make sure it's really useful for headless
server environments (even when recommended packages are not installed).
Singed-off-by: Dario Faggioli <dfaggioli@suse.com>
Use a fixed USER value (in case someone builds outside of OBS/osc).
References: boo#1084909
Signed-off-by: Bernhard M. Wiedemann <githubbmwprimary@lsmod.de>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Define a new sub-(meta-)package that can be installed for having
all the other modules and packages necessary for SPICE to work.
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Align to upstream stable release. It includes many of the patches we had
backported ourself, to fix bugs and issues, plus more.
See here for details:
- https://lore.kernel.org/qemu-devel/1700589639.257680.3420728.nullmailer@tls.msk.ru/
- https://gitlab.com/qemu-project/qemu/-/commits/stable-8.1?ref_type=heads
An (incomplete!) list of such backports is:
* Update version for 8.1.3 release
* hw/mips: LOONGSON3V depends on UNIMP device
* target/arm: HVC at EL3 should go to EL3, not EL2
* s390x/pci: only limit DMA aperture if vfio DMA limit reported
* target/riscv/kvm: support KVM_GET_REG_LIST
* target/riscv/kvm: improve 'init_multiext_cfg' error msg
* tracetool: avoid invalid escape in Python string
* tests/tcg/s390x: Test LAALG with negative cc_src
* target/s390x: Fix LAALG not updating cc_src
* tests/tcg/s390x: Test CLC with inaccessible second operand
* target/s390x: Fix CLC corrupting cc_src
* tests/qtest: ahci-test: add test exposing reset issue with pending callback
* hw/ide: reset: cancel async DMA operation before resetting state
* target/mips: Fix TX79 LQ/SQ opcodes
* target/mips: Fix MSA BZ/BNZ opcodes displacement
* ui/gtk-egl: apply scale factor when calculating window's dimension
* ui/gtk: force realization of drawing area
* ati-vga: Implement fallback for pixman routines
* ...
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Avoid parallel processing in sphinx because that causes variations in
generated files
This is addressed here, with a downstream patch, until a proper solution
is found upstream.
Signed-off-by: Bernhard Wiedemann <bwiedemann@suse.com>
References: boo#1102408
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
The supportconfig 'scplugin.rc' file is deprecated in favor of
supportconfig.rc'. Adapt the qemu plugin to the new scheme.
Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Our workflow does not include patches in the spec files. Still, it could
be useful to add some there, during development and/or debugging issues.
Make sure that they are applied properly, by adding -p1 to the
%autosetup directive (it's a nop if there are no patches, so both cases
are ok).
Suggested-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
This fixes the following upstream issues:
* https://gitlab.com/qemu-project/qemu/-/issues/1826
* https://gitlab.com/qemu-project/qemu/-/issues/1834
* https://gitlab.com/qemu-project/qemu/-/issues/1846
It also contains a fix for:
* CVE-2023-42467 (bsc#1215192)
As well as several upstream backports:
* target/riscv: Fix vfwmaccbf16.vf
* disas/riscv: Fix the typo of inverted order of pmpaddr13 and pmpaddr14
* roms: use PYTHON to invoke python
* hw/audio/es1370: reset current sample counter
* migration/qmp: Fix crash on setting tls-authz with null
* util/log: re-allow switching away from stderr log file
* vfio/display: Fix missing update to set backing fields
* amd_iommu: Fix APIC address check
* vdpa net: follow VirtIO initialization properly at cvq isolation probing
* vdpa net: stop probing if cannot set features
* vdpa net: fix error message setting virtio status
* vdpa net: zero vhost_vdpa iova_tree pointer at cleanup
* linux-user/hppa: Fix struct target_sigcontext layout
* chardev/char-pty: Avoid losing bytes when the other side just (re-)connected
* hw/display/ramfb: plug slight guest-triggerable leak on mode setting
* win32: avoid discarding the exception handler
* target/i386: fix memory operand size for CVTPS2PD
* target/i386: generalize operand size "ph" for use in CVTPS2PD
* subprojects/berkeley-testfloat-3: Update to fix a problem with compiler warnings
* scsi-disk: ensure that FORMAT UNIT commands are terminated
* esp: restrict non-DMA transfer length to that of available data
* esp: use correct type for esp_dma_enable() in sysbus_esp_gpio_demux()
* optionrom: Remove build-id section
* target/tricore: Fix RCPW/RRPW_INSERT insns for width = 0
* accel/tcg: Always require can_do_io
* accel/tcg: Always set CF_LAST_IO with CF_NOIRQ
* accel/tcg: Improve setting of can_do_io at start of TB
* accel/tcg: Track current value of can_do_io in the TB
* accel/tcg: Hoist CF_MEMI_ONLY check outside translation loop
* accel/tcg: Avoid load of icount_decr if unused
* softmmu: Use async_run_on_cpu in tcg_commit
* migration: Move return path cleanup to main migration thread
* migration: Replace the return path retry logic
* migration: Consolidate return path closing code
* migration: Remove redundant cleanup of postcopy_qemufile_src
* migration: Fix possible race when shutting down to_dst_file
* migration: Fix possible races when shutting down the return path
* migration: Fix possible race when setting rp_state.error
* migration: Fix race that dest preempt thread close too early
* ui/vnc: fix handling of VNC_FEATURE_XVP
* ui/vnc: fix debug output for invalid audio message
* hw/scsi/scsi-disk: Disallow block sizes smaller than 512 [CVE-2023-42467]
* accel/tcg: mttcg remove false-negative halted assertion
* meson.build: Make keyutils independent from keyring
* target/arm: Don't skip MTE checks for LDRT/STRT at EL0
* hw/arm/boot: Set SCR_EL3.FGTEn when booting kernel
* include/exec: Widen tlb_hit/tlb_hit_page()
* tests/file-io-error: New test
* file-posix: Simplify raw_co_prw's 'out' zone code
* file-posix: Fix zone update in I/O error path
* file-posix: Check bs->bl.zoned for zone info
* file-posix: Clear bs->bl.zoned on error
* hw/cxl: Fix out of bound array access
* hw/cxl: Fix CFMW config memory leak
* linux-user/hppa: lock both words of function descriptor
* linux-user/hppa: clear the PSW 'N' bit when delivering signals
* hw/ppc: Read time only once to perform decrementer write
* hw/ppc: Reset timebase facilities on machine reset
* hw/ppc: Always store the decrementer value
* target/ppc: Sign-extend large decrementer to 64-bits
* hw/ppc: Avoid decrementer rounding errors
* hw/ppc: Round up the decrementer interval when converting to ns
* host-utils: Add muldiv64_round_up
Signed-of-by: Dario Faggioli <dfaggioli@suse.com>
perl-Text-Markdown is not always available (e.g., in SLE/Leap).
Use discount instead, as the provider of the 'markdown' binary.
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
OBS SCM bridge can handle git submodule, while it can't handle (yet?)
meson subprojects. The (ugly, I know!) solution, for now, is to turn
the latter into the former, with commands like the followings:
git submodule add -f https://gitlab.com/qemu-project/berkeley-testfloat-3 subprojects/berkeley-testfloat-3
git -C subprojects/berkeley-testfloat-3 reset --hard 40619cbb3bf32872df8c53cc457039229428a263
(the hash used comes from the subprojects/berkeley-testfloat-3.wrap file)
It's also necessary to manually apply the layering of the packagefiles,
and that is done in the specfile.
Longer term and better solutions could be:
- Make SCM support meson subprojects
- Create standalone packages for the subprojects (and instruct
QEMU to pick stuff from there)
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Full list of changes are available at:
https://wiki.qemu.org/ChangeLog/8.1
Highlights:
* VFIO: improved live migration support, no longer an experimental feature
* GTK GUI now supports multi-touch events
* ARM, PowerPC, and RISC-V can now use AES acceleration on host processor
* PCIe: new QMP commands to inject CXL General Media events, DRAM
events and Memory Module events
* ARM: KVM VMs on a host which supports MTE (the Memory Tagging Extension)
can now use MTE in the guest
* ARM: emulation support for bpim2u (Banana Pi BPI-M2 Ultra) board and
neoverse-v1 (Cortex Neoverse-V1) CPU
* ARM: new architectural feature support for: FEAT_PAN3 (SCTLR_ELx.EPAN),
FEAT_LSE2 (Large System Extensions v2), and experimental support for
FEAT_RME (Realm Management Extensions)
* Hexagon: new instruction support for v68/v73 scalar, and v68/v69 HVX
* Hexagon: gdbstub support for HVX
* MIPS: emulation support for Ingenic XBurstR1/XBurstR2 CPUs, and MXU
instructions
* PowerPC: TCG SMT support, allowing pseries and powernv to run with up
to 8 threads per core
* PowerPC: emulation support for Power9 DD2.2 CPU model, and perf
sampling support for POWER CPUs
* RISC-V: ISA extension support for BF16/Zfa, and disassembly support
for Zcm*/Z*inx/XVentanaCondOps/Xthead
* RISC-V: CPU emulation support for Veyron V1
* RISC-V: numerous KVM/emulation fixes and enhancements
* s390: instruction emulation fixes for LDER, LCBB, LOCFHR, MXDB, MXDBR,
EPSW, MDEB, MDEBR, MVCRL, LRA, CKSM, CLM, ICM, MC, STIDP, EXECUTE, and
CLGEBR(A)
* SPARC: updated target/sparc to use tcg_gen_lookup_and_goto_ptr() for
improved performance
* Tricore: emulation support for TC37x CPU that supports ISA v1.6.2
instructions
* Tricore: instruction emulation of POPCNT.W, LHA, CRC32L.W, CRC32.B,
SHUFFLE, SYSCALL, and DISABLE
* x86: CPU model support for GraniteRapids
* and lots more...
This also (automatically) fixes:
- bsc#1212850 (CVE-2023-3354)
- bsc#1213001 (CVE-2023-3255)
- bsc#1213925 (CVE-2023-3180)
- bsc#1213414 (CVE-2023-3301)
- bsc#1207205 (CVE-2023-0330)
- bsc#1212968 (CVE-2023-2861)
- bsc#1179993, bsc#1181740, bsc#1211697
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
By default try to preserve argv[0].
Original report is boo#1197298, which also became relevant recently again in bsc#1212768.
Signed-off-by: Fabian Vogt <fabian@ritter-vogt.de>
References: boo#1197298
References: bsc#1212768
Signed-off-by: Fabian Vogt <fabian@ritter-vogt.de>
Create separate packages for qemu-img and qemu-pr-helper.
Signed-off-by: Vasiliy Ulyanov <vulyanov@suse.de>
Co-authored-by: Vasiliy Ulyanov <vulyanov@suse.de>
Since version 8.0.0, virtiofsd is not part of QEMU sources any longer.
We therefore have also moved it to a separate package. To retain
compatibility and consistency of behavior, require such a package as an
hard dependency.
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
For example, let's try to avoid recommending GUI UI stuff, unless GTK is
already installed. This way we avoid things like bringing in an entire
graphic stack on servers.
References: bsc#1205680
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
- The qemu-headless subpackage was defined but never build, because it
had no files. Fix that by putting there just a simple README.
- Move the docs in a dedicated subpackage
Resolves: bsc#1209629
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
As part of the effort to close the gap with Leap I think we are fine
removing the $pkgversion component to creating a unique CONFIG_STAMP.
This stamp is only used in creating a unique symbol used in ensuring the
dynamically loaded modules correspond correctly to the loading qemu.
The default inputs to producing this unique symbol are somewhat reasonable
as a generic mechanism, but specific packaging and maintenance practices
might require the default to be modified for best use. This is an example
of that.
Signed-off-by: Bruce Rogers <brogers@suse.com>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
We are disabling the following tests:
qemu-system-ppc64 / display-vga-test
They are failing due to some memory corruption errors. We believe that
this might be due to the combination of the compiler version and of LTO,
and will take up the investigation within the upstream community.
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Executing tests in obs is very fickle, since you aren't guaranteed
reliable cpu time. Triple the timeout for each test to help ensure
we don't fail a test because the stars align against us.
Signed-off-by: Bruce Rogers <brogers@suse.com>
[DF: Small tweaks necessary for rebasing on top of 6.2.0]
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Since we have a quite restricted execution environment, as far as
networking is concerned, we need to change the error message we expect
in test 162. There is actually no routing set up so the error we get is
"Network is unreachable". Change the expected output accordingly.
Signed-off-by: Bruce Rogers <brogers@suse.com>
Revert commit "tests/qtest: enable more vhost-user tests by default"
(8dcb404bff), as it causes prooblem when building with GCC 12 and LTO
enabled.
This should be considered temporary, until the actual reason why the
code of the tests that are added in that commit breaks.
It has been reported upstream, and will be (hopefully) solved there:
https://lore.kernel.org/qemu-devel/1d3bbff9e92e7c8a24db9e140dcf3f428c2df103.camel@suse.com/
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
SG_IO may return additional status in the 'status', 'driver_status',
and 'host_status' fields. When either of these fields are set the
command has not been executed normally, so we should not continue
processing this command but rather return an error.
scsi_read_complete() already checks for these errors,
scsi_write_complete() does not.
References: bsc#1178049
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Lin Ma <lma@suse.com>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
While using SCSI passthrough, Following scenario makes qemu doesn't
realized the capacity change of remote scsi target:
1. online resize the scsi target.
2. issue 'rescan-scsi-bus.sh -s ...' in host.
3. issue 'rescan-scsi-bus.sh -s ...' in vm.
In above scenario I used to experienced errors while accessing the
additional disk space in vm. I think the reasonable operations should
be:
1. online resize the scsi target.
2. issue 'rescan-scsi-bus.sh -s ...' in host.
3. issue 'block_resize' via qmp to notify qemu.
4. issue 'rescan-scsi-bus.sh -s ...' in vm.
The errors disappear once I notify qemu by block_resize via qmp.
So this patch replaces the number of logical blocks of READ CAPACITY
response from scsi target by qemu's bs->total_sectors. If the user in
vm wants to access the additional disk space, The administrator of
host must notify qemu once resizeing the scsi target.
Bonus is that domblkinfo of libvirt can reflect the consistent capacity
information between host and vm in case of missing block_resize in qemu.
E.g:
...
<disk type='block' device='lun'>
<driver name='qemu' type='raw'/>
<source dev='/dev/sdc' index='1'/>
<backingStore/>
<target dev='sda' bus='scsi'/>
<alias name='scsi0-0-0-0'/>
<address type='drive' controller='0' bus='0' target='0' unit='0'/>
</disk>
...
Before:
1. online resize the scsi target.
2. host:~ # rescan-scsi-bus.sh -s /dev/sdc
3. guest:~ # rescan-scsi-bus.sh -s /dev/sda
4 host:~ # virsh domblkinfo --domain $DOMAIN --human --device sda
Capacity: 4.000 GiB
Allocation: 0.000 B
Physical: 8.000 GiB
5. guest:~ # lsblk /dev/sda
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 8G 0 disk
└─sda1 8:1 0 2G 0 part
After:
1. online resize the scsi target.
2. host:~ # rescan-scsi-bus.sh -s /dev/sdc
3. guest:~ # rescan-scsi-bus.sh -s /dev/sda
4 host:~ # virsh domblkinfo --domain $DOMAIN --human --device sda
Capacity: 4.000 GiB
Allocation: 0.000 B
Physical: 8.000 GiB
5. guest:~ # lsblk /dev/sda
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 4G 0 disk
└─sda1 8:1 0 2G 0 part
References: [SUSE-JIRA] (SLE-20965)
Signed-off-by: Lin Ma <lma@suse.com>
The final step of xl migrate|save for an HVM domU is saving the state of
qemu. This also involves releasing all block devices. While releasing
backends ought to be a separate step, such functionality is not
implemented.
Unfortunately, releasing the block devices depends on the optional
'live' option. This breaks offline migration with 'virsh migrate domU
dom0' because the sending side does not release the disks, as a result
the receiving side can not properly claim write access to the disks.
As a minimal fix, remove the dependency on the 'live' option. Upstream
may fix this in a different way, like removing the newly added 'live'
parameter entirely.
Fixes: 5d6c599fe1 ("migration, xen: Fix block image lock issue on live migration")
Signed-off-by: Olaf Hering <olaf@aepfle.de>
References: bsc#1079730, bsc#1101982, bsc#1063993
Signed-off-by: Bruce Rogers <brogers@suse.com>
Provide monitor naming of xen disks, and plumb guest driver
notification through xenstore of resizing instigated via the
monitor.
[BR: minor edits to pass qemu's checkpatch script]
[BR: significant rework needed due to upstream xen disk qdevification]
[BR: At this point, monitor_add_blk call is all we need to add!]
Signed-off-by: Bruce Rogers <brogers@suse.com>
Add code to read the suse specific suse-diskcache-disable-flush flag out
of xenstore, and set the equivalent flag within QEMU.
Patch taken from Xen's patch queue, Olaf Hering being the original author.
[bsc#879425]
[BR: minor edits to pass qemu's checkpatch script]
[BR: With qdevification of xen-block, code has changed significantly]
Signed-off-by: Bruce Rogers <brogers@suse.com>
Signed-off-by: Olaf Hering <olaf@aepfle.de>
For SLES we want users to be able to use large memory configurations
with KVM without fiddling with ulimit -Sv.
Signed-off-by: Andreas Färber <afaerber@suse.de>
[BR: add include for sys/resource.h]
Signed-off-by: Bruce Rogers <brogers@suse.com>
Change from using glib alloc and free routines to those
from libc. Also perform safety measure of dropping privs
to user if configured no-caps.
References: boo#988279
Signed-off-by: Bruce Rogers <brogers@suse.com>
[AF: Rebased for v2.7.0-rc2]
Signed-off-by: Andreas Färber <afaerber@suse.de>
Virtio-Console can only process one character at a time. Using it on S390
gave me strange "lags" where I got the character I pressed before when
pressing one. So I typed in "abc" and only received "a", then pressed "d"
but the guest received "b" and so on.
While the stdio driver calls a poll function that just processes on its
queue in case virtio-console can't take multiple characters at once, the
muxer does not have such callbacks, so it can't empty its queue.
To work around that limitation, I introduced a new timer that only gets
active when the guest can not receive any more characters. In that case
it polls again after a while to check if the guest is now receiving input.
This patch fixes input when using -nographic on s390 for me.
[AF: Rebased for v2.7.0-rc2]
[BR: minor edits to pass qemu's checkpatch script]
Signed-off-by: Bruce Rogers <brogers@suse.com>
When using hugetlbfs (which is required for HV mode KVM on 970), we
check for MMU notifiers that on 970 can not be implemented properly.
So disable the check for mmu notifiers on PowerPC guests, making
KVM guests work there, even if possibly racy in some odd circumstances.
Signed-off-by: Bruce Rogers <brogers@suse.com>
When doing lseek, SEEK_SET indicates that the offset is an unsigned variable.
Other seek types have parameters that can be negative.
When converting from 32bit to 64bit parameters, we need to take this into
account and enable SEEK_END and SEEK_CUR to be negative, while SEEK_SET stays
absolute positioned which we need to maintain as unsigned.
Signed-off-by: Alexander Graf <agraf@suse.de>
Linux syscalls pass pointers or data length or other information of that sort
to the kernel. This is all stuff you don't want to have sign extended.
Otherwise a host 64bit variable parameter with a size parameter will extend
it to a negative number, breaking lseek for example.
Pass syscall arguments as ulong always.
Signed-off-by: Alexander Graf <agraf@suse.de>
[JRZ: changes from linux-user/qemu.h wass moved to linux-user/user-internals.h]
Signed-off-by: Jose R Ziviani <jziviani@suse.de>
[DF: Forward port, i.e., use ulong for do_prctl too]
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
It's easy enough to handle either per-spec or legacy smbios structures
in the smbios file input without regard to the machine type used, by
simply applying the basic smbios formatting rules. then depending on
what is detected. terminal numm bytes are added or removed for machine
type specific processing.
References: bsc#994082, bsc#1084316, boo#1131894
Signed-off-by: Bruce Rogers <brogers@suse.com>
We add a --cross-file reference so that we can do cross compilation
of qboot from an aarch64 build.
Signed-off-by: Bruce Rogers <brogers@suse.com>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Certain rom subpackages build from qemu git-submodules call the date
program to include date information in the packaged binaries. This
causes repeated builds of the package to be different, wkere the only
real difference is due to the fact that time build timestamp has
changed. To promote reproducible builds and avoid customers being
prompted to update packages needlessly, we'll use the timestamp of the
VERSION file as the packaging timestamp for all packages that build in a
timestamp for whatever reason.
References: bsc#1011213
Signed-off-by: Bruce Rogers <brogers@suse.com>
The sgabios submodule is no longer there, so let's get rid of any
reference to it from our spec files.
Remove no longer supported './configure' options.
We're also not set yet for using the set_version service, so we need to
update the following manually:
- the Version: tags in the spec files
- the rpm/seabios_version and rpm/skiboot_version files (see qemu.spec
for instructions on how to do that)
- the %{sbver} variable in rpm/common.inc
A better solution for handling this aspect is being worked on.
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
In an upstream tarball there are some special files, generated by a
script that is run when the archive is prepared. Let's make our
repository look a little more like that, so we can build it properly.
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Stash the "packaging files" in the QEMU repository, in the rpm/
directory. During package build, they will be pulled out from there
and used as appropriate.
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
2024-03-14 18:26:08 +01:00
209 changed files with 12075 additions and 1172 deletions
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.